A Large-Scale Study of the Evolution of Web Pages - Microsoft

Extrait du fichier (au format texte) :

A Large-Scale Study of the Evolution of Web Pages
Dennis Fetterly
Hewlett Packard Labs
1501 Page Mill Road
Palo Alto, CA 94304
dennis.fetterly@hp.com

Mark Manasse

Marc Najork

Microsoft Research
Microsoft Research
1065 La Avenida
1065 La Avenida
Mountain View, CA 94043 Mountain View, CA 94043
manasse@microsoft.com najork@microsoft.com

Janet Wiener
Hewlett Packard Labs
1501 Page Mill Road
Palo Alto, CA 94304
janet.wiener@hp.com

ABSTRACT

1. INTRODUCTION

How fast does the web change? Does most of the content remain unchanged once it has been authored, or are the documents continuously updated? Do pages change a little or a lot? Is the extent of change correlated to any other property of the page? All of these questions are of interest to those who mine the web, including all the popular search engines, but few studies have been performed to date to answer them.
One notable exception is a study by Cho and Garcia-Molina,
who crawled a set of 720,000 pages on a daily basis over four months, and counted pages as having changed if their MD5 checksum changed. They found that 40% of all web pages in their set changed within a week, and 23% of those pages that fell into the
.com domain changed daily.
This paper expands on Cho and Garcia-Molina s study, both in terms of coverage and in terms of sensitivity to change. We crawled a set of 150,836,209 HTML pages once every week, over a span of
11 weeks. For each page, we recorded a checksum of the page, and a feature vector of the words on the page, plus various other data such as the page length, the HTTP status code, etc. Moreover, we pseudo-randomly selected 0.1% of all of our URLs, and saved the full text of each download of the corresponding pages.
After completion of the crawl, we analyzed the degree of change of each page, and investigated which factors are correlated with

Les promotions



Microsoft K State Whitepaper 2021 08 17
Microsoft K State Whitepaper 2021 08 17
23/09/2024 - www.microsoft.com
Cloud enclave for academic research Streamlining security and compliance at your institution August 2021 Contents Introduction........................................................................................................ 3 1. Assess where you are today........................................................................ 4 Work directly with researchers to identify challenges............................................................................... 4 Identify existing compliance...

MSFT Echo SurfaceLaptopIntel 5g Fact Sheet
MSFT Echo SurfaceLaptopIntel 5g Fact Sheet
13/12/2025 - www.microsoft.com
Windows Hello for Business with facial recognition and Enhanced Sign-In Security Surface Laptop 5G for Business Near-edgeless display and Surface's signature 3:2 ratio for more screen in a compact footprint Premium experiences drive AI advantage anywhere NPUs delivering 40 or 48 TOPS of on-device AI performance to support today's capabilities and tomorrow's innovations5 Anti-reflective technology reduces reflections up to 50% Exceptional AI-enabled collaboration and Copilot+ PC1 productivity...

L'économie de la sécurité - Microsoft
L'économie de la sécurité - Microsoft
16/11/2016 - www.microsoft.com
nl y se Lect L économie de la sécurité Ces dernières années, la sécurité est devenue une priorité pour les pouvoirs publics et les entreprises. Crime organisé, terrorisme, interruption des chaînes d approvisionnement mondiales, virus informatiques  autant de menaces avec lesquelles il faut compter dans le monde d aujourd hui. D où l émergence d un marché des équipements et des services de sécurité de 100 milliards de dollars. Ce marché est alimenté par la demande croissante émanant...

MSR Quantum applications - Microsoft
MSR Quantum applications - Microsoft
23/08/2018 - www.microsoft.com
( What Can We Do with a Quantum Computer? ( Matthias Troyer  Station Q, ETH Zurich | 1 Classical computers have come a long way Antikythera mechanism ENIAC astronomical positions (1946) (100 BC) Kelvin s harmonic analyzer prediction of tides (1878) Difference Engine (1822) Is there anything that we cannot solve on future supercomputers? Titan, ORNL (2013) Matthias Troyer | | 2 How long will Moore s law continue? Do we see signs of the end of Moore s law? Can we go below 7nm...

User-Driven Access Control: Rethinking Permission ... - CiteSeerX
User-Driven Access Control: Rethinking Permission ... - CiteSeerX
23/08/2018 - www.microsoft.com
User-Driven Access Control: Rethinking Permission Granting in Modern Operating Systems Franziska Roesner, Tadayoshi Kohno {franzi, yoshi}@cs.washington.edu University of Washington Alexander Moshchuk, Bryan Parno, Helen J. Wang {alexmos, parno, helenw}@microsoft.com Microsoft Research, Redmond Crispin Cowan crispin@microsoft.com Microsoft Abstract tionality and security for access to the user s data and resources. From a functionality standpoint, isolation inhibits the client-side manipulation...

MSFT SurfaceLaptopIntel 5g Fact Sheet
MSFT SurfaceLaptopIntel 5g Fact Sheet
02/10/2025 - www.microsoft.com
Windows Hello for Business with facial recognition and Enhanced Sign-In Security Surface Laptop 5G for Business Near-edgeless display and Surface's signature 3:2 ratio for more screen in a compact footprint Premium experiences drive AI advantage anywhere NPUs delivering 40 or 48 TOPS of on-device AI performance to support today's capabilities and tomorrow's innovations5 Anti-reflective technology reduces reflections up to 50% Exceptional AI-enabled collaboration and Copilot+ PC1 productivity...

Vers une approche simplifiée pour introduire le caractère ... - Microsoft
Vers une approche simplifiée pour introduire le caractère ... - Microsoft
23/11/2017 - www.microsoft.com
See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/262881208 Vers une approche simplifiée pour introduire le caractère incrémental dans les systèmes de dialogue Conference Paper · July 2014 CITATION READS 1 26 3 authors, including: Hatim Khouzaimi Romain Laroche Orange Labs / Laboratoire Informatique d'Avi & Microsoft Maluuba 12 PUBLICATIONS 42 CITATIONS 58 PUBLICATIONS 185 CITATIONS SEE PROFILE All content following this page was uploaded by Hatim Khouzaimi on 28 April 2015. The user has requested enhancement of the downloaded file. SEE PROFILE 21ème...

A Large-Scale Study of the Evolution of Web Pages - Microsoft
A Large-Scale Study of the Evolution of Web Pages - Microsoft
23/08/2018 - www.microsoft.com
A Large-Scale Study of the Evolution of Web Pages Dennis Fetterly Hewlett Packard Labs 1501 Page Mill Road Palo Alto, CA 94304 dennis.fetterly@hp.com Mark Manasse Marc Najork Microsoft Research Microsoft Research 1065 La Avenida 1065 La Avenida Mountain View, CA 94043 Mountain View, CA 94043 manasse@microsoft.com najork@microsoft.com Janet Wiener Hewlett Packard Labs 1501 Page Mill Road Palo Alto, CA 94304 janet.wiener@hp.com ABSTRACT 1. INTRODUCTION How fast does the web change? Does most...
 
 

???? ?? ??????? ?????? ?????? ???? ????? ??????">???? ?? ??????? ?????? ?????? ???? ????? ??????
20/03/2015 - docs.whirlpool.eu
+þ‡þ­þ·þŽþ©þþ• þãþìþãþ” þßþàþ³þüþãþ” , +þ³þüþãþ—þÙ þíþ³þüþãþ” þþõþ§þ­þóþå þƒþãþ­ þãþìþá þŸþ©Kþ , +þóþíþÓþ­ þëþ«þ þþßþ©þßþóþÝ þíþþßþŸþìþŽþ¯ þ«þþ—þê þ—þ£þ«þóþ­þþ• þãþìþãþ” þßþàþ³þüþãþ” þ—þŸþ þ×þ­þ!þ—þìþŽ þíþþûþßþ—þ¯þþá þ‘þìþŽ þÁþíþþÝ þþßþíþ×þ•....

Mode d'emploi - Pentax
Mode d'emploi - Pentax
28/02/2018 - www.pentax.fr
Mode d emploi (Version 3.5) Merci d avoir fait l acquisition de cet appareil numérique PENTAX. Ceci est le mode d emploi du logiciel « PENTAX REMOTE Assistant 3 », spécialement conçu pour vous permettre de régler et de faire fonctionner votre appareil depuis un ordinateur. Veuillez consulter le mode d emploi de votre appareil ainsi que le présent manuel avant d utiliser votre appareil et « PENTAX REMOTE Assistant 3 », afin de tirer le meilleur parti de leurs caractéristiques et...

Transform Rooms Solutions Guide
Transform Rooms Solutions Guide
09/05/2025 - www.logitech.com
Transform rooms for greater impact Customized solutions that convert rooms into immersive learning environments Augment rooms with state-of-the-art accessories to create immersive ecosystems designed to improve instructional effectiveness, optimize device management, and provide a seamless learning experience for all. CAMERAS FOR EASY COLLABORATION RALLY BAR With AI video intelligence and advanced sound pickup, our class of Rally Bar solutions brings key concepts front and center. Flexible deployments,...

Au cœur de l'image
Au cœur de l'image
12/03/2012 - www.nikon.fr
Au coeur de l'image L'attirance est mutuelle. Découvrez le renouveau de votre passion pour l'image. EXPEED marque le point culminant d'années de développement par Nikon pour créer un traitement d'image numérique révolutionnaire. Intégrant les technologies matérielles et logicielles les plus modernes, EXPEED s'adapte parfaitement à chaque appareil photo Nikon COOLPIX pour garantir des performances et une qualité d'image optimales. Grâce à l'excellence légendaire de Nikon en matière...

NR1403 Ampli-tuner Audio/Vidéo ‚slim - Pulsat
NR1403 Ampli-tuner Audio/Vidéo ‚slim - Pulsat
15/12/2017 - www.marantz.fr
Information Produit NR1403 Ampli-tuner Audio/Vidéo slim  5+1 1 Le NR1403 est un ampli-tuner Audio/Vidéo 5.1 canaux offrant de nombreuses possibilités de connexion. Ceux qui utilisent régulièrement des caméscopes, des smartphones ou des tablettes apprécieront l entrée HDMI en façade, facilitant les branchements audio/vidéo. Sa puissance de sortie de 5 x 50 W s utilise en stéréo, mais permet aussi d alimenter un système d écoute 5.1 canaux, créant un son Surround spectaculaire transformant...

Fy25 Sasb Report
Fy25 Sasb Report
25/09/2025 - www.logitech.com
FY25 SASB Report Table of Contents 1. Introduction 3 2. Scope 3 3. Disclosures 4 3.1 Product Security (TC-HW-230a.1) 4 3.2 Employee Diversity & Inclusion (TC-HW-330a.1) 6 3.3 Product Life-Cycle Management (TC-HW-410a.1 -4) 7 3.4 Supply Chain Management (TC-HW-430a.1-2) 9 3.5 Material Sourcing (TC-HW-440a.1) 9 FY25 SASB Report Page 2 Report finalized: 04 July 2025 1. Introduction The...