A Large-Scale Study of the Evolution of Web Pages - Microsoft

Extrait du fichier (au format texte) :

A Large-Scale Study of the Evolution of Web Pages
Dennis Fetterly
Hewlett Packard Labs
1501 Page Mill Road
Palo Alto, CA 94304
dennis.fetterly@hp.com

Mark Manasse

Marc Najork

Microsoft Research
Microsoft Research
1065 La Avenida
1065 La Avenida
Mountain View, CA 94043 Mountain View, CA 94043
manasse@microsoft.com najork@microsoft.com

Janet Wiener
Hewlett Packard Labs
1501 Page Mill Road
Palo Alto, CA 94304
janet.wiener@hp.com

ABSTRACT

1. INTRODUCTION

How fast does the web change? Does most of the content remain unchanged once it has been authored, or are the documents continuously updated? Do pages change a little or a lot? Is the extent of change correlated to any other property of the page? All of these questions are of interest to those who mine the web, including all the popular search engines, but few studies have been performed to date to answer them.
One notable exception is a study by Cho and Garcia-Molina,
who crawled a set of 720,000 pages on a daily basis over four months, and counted pages as having changed if their MD5 checksum changed. They found that 40% of all web pages in their set changed within a week, and 23% of those pages that fell into the
.com domain changed daily.
This paper expands on Cho and Garcia-Molina s study, both in terms of coverage and in terms of sensitivity to change. We crawled a set of 150,836,209 HTML pages once every week, over a span of
11 weeks. For each page, we recorded a checksum of the page, and a feature vector of the words on the page, plus various other data such as the page length, the HTTP status code, etc. Moreover, we pseudo-randomly selected 0.1% of all of our URLs, and saved the full text of each download of the corresponding pages.
After completion of the crawl, we analyzed the degree of change of each page, and investigated which factors are correlated with

Les promotions



Vers une approche simplifiée pour introduire le caractère ... - Microsoft
Vers une approche simplifiée pour introduire le caractère ... - Microsoft
23/11/2017 - www.microsoft.com
See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/262881208 Vers une approche simplifiée pour introduire le caractère incrémental dans les systèmes de dialogue Conference Paper · July 2014 CITATION READS 1 26 3 authors, including: Hatim Khouzaimi Romain Laroche Orange Labs / Laboratoire Informatique d'Avi & Microsoft Maluuba 12 PUBLICATIONS 42 CITATIONS 58 PUBLICATIONS 185 CITATIONS SEE PROFILE All content following this page was uploaded by Hatim Khouzaimi on 28 April 2015. The user has requested enhancement of the downloaded file. SEE PROFILE 21ème...

MSFT SurfaceLaptopIntel Fact Sheet
MSFT SurfaceLaptopIntel Fact Sheet
02/10/2025 - www.microsoft.com
Windows Hello for Business with facial recognition and Enhanced Sign-In Security Surface Laptop for Business Near-edgeless display and Surface's signature 3:2 ratio for more screen in a compact footprint Premium experiences drive AI advantage NPUs delivering 40 or 48 TOPS of on-device AI performance to support today's capabilities and tomorrow's innovations5 Anti-reflective technology reduces reflections up to 50% Optional smart card reader16 Exceptional AI-enabled collaboration and Copilot+...

MSR Quantum applications - Microsoft
MSR Quantum applications - Microsoft
23/08/2018 - www.microsoft.com
( What Can We Do with a Quantum Computer? ( Matthias Troyer  Station Q, ETH Zurich | 1 Classical computers have come a long way Antikythera mechanism ENIAC astronomical positions (1946) (100 BC) Kelvin s harmonic analyzer prediction of tides (1878) Difference Engine (1822) Is there anything that we cannot solve on future supercomputers? Titan, ORNL (2013) Matthias Troyer | | 2 How long will Moore s law continue? Do we see signs of the end of Moore s law? Can we go below 7nm...

DictaNum : système de dialogue incrémental pour la dictée ... - Microsoft
DictaNum : système de dialogue incrémental pour la dictée ... - Microsoft
23/11/2017 - www.microsoft.com
See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/262881756 DictaNum : système de dialogue incrémental pour la dictée de numéros. Conference Paper · July 2014 CITATIONS READS 0 57 3 authors, including: Hatim Khouzaimi Romain Laroche Orange Labs / Laboratoire Informatique d'Avi & Microsoft Maluuba 12 PUBLICATIONS 42 CITATIONS 58 PUBLICATIONS 185 CITATIONS SEE PROFILE SEE PROFILE All content following this page was uploaded by Hatim Khouzaimi on 06 June 2014. The user has requested enhancement of the downloaded file. 21ème...

Msft Echo Microsoft Surface Pro 10 Fact Sheet Row
Msft Echo Microsoft Surface Pro 10 Fact Sheet Row
13/12/2025 - www.microsoft.com
Surface Pro 10 An AI PC built for business, designed for versatility Surface Pro 10 blurs the boundary between hardware and software for peak performance in a secured, lightweight device that adapts to any work style. Employees get the benefits of an AI PC that accelerates Microsoft Copilot* experiences and offers integrated AI engines that enable the next wave of business features. Choose from Wi-Fi+5G or Wi-Fi only. A new era of workplace collaboration Never-ending, on-the-go impact Take advantage...

User-Driven Access Control: Rethinking Permission ... - CiteSeerX
User-Driven Access Control: Rethinking Permission ... - CiteSeerX
23/08/2018 - www.microsoft.com
User-Driven Access Control: Rethinking Permission Granting in Modern Operating Systems Franziska Roesner, Tadayoshi Kohno {franzi, yoshi}@cs.washington.edu University of Washington Alexander Moshchuk, Bryan Parno, Helen J. Wang {alexmos, parno, helenw}@microsoft.com Microsoft Research, Redmond Crispin Cowan crispin@microsoft.com Microsoft Abstract tionality and security for access to the user s data and resources. From a functionality standpoint, isolation inhibits the client-side manipulation...

Architectures reconfigurables et traitement de proble`mes ... - Microsoft
Architectures reconfigurables et traitement de proble`mes ... - Microsoft
16/11/2016 - www.microsoft.com
RECHERCHE Architectures reconfigurables et traitement de proble`mes NP-difficiles : un nouveau domaine d application Youssef Hamadi    David Merceron  '  ' LIRMM, UMR 5506 CNRS/Universite´ Montpellier II 161, Rue Ada, 34392 Montpellier Cedex 5 hamadi@lirmm.fr ''' EURIWARE, 12-14 rue du fort de St-Cyr 78067 St Quentin-en-Yvelines Cedex damercer@euriware.fr RE´SUME´. L algorithme GSAT est un algorithme de recherche locale. Cette me´thode recherche la premie`re instanciation...

Microsoft Modern Work Plan Comparison Education 11 2021
Microsoft Modern Work Plan Comparison Education 11 2021
14/09/2024 - www.microsoft.com
Add-on licenses Endpoint and app management Microsoft Product Terms Desktop client apps1 %? %? %? %? %? Office Mobile apps2 %? %? %? %? %? %? Install apps on up to 5 PCs/Mac + 5 tablets + 5 smartphones %?3 %? %? %?3 %? %? Office for the web %? %?
 
 

REWARD GIVEAWAY - Samsung
REWARD GIVEAWAY - Samsung
21/11/2014 - www.samsung.com
REWARD GIVEAWAY FREQUENTLY ASKED QUESTIONS Samsung Reward Giveaway Available upon purchase of selected SAMSUNG SMART TVs from participating UK retailers between 19th September 2013 Noon on 24th December 2013, when supported by promotional REWARD GIVEAWAY Point of Sale in store and online. 1. How will I know that a retailer is participating in the REWARD GIVEAWAY promotion? 8. How do I redeem my Reward ? We ve tried to make redeeming your Reward  as easy as possible by either: All stores...

Fiche produit Sony : 80/1237483481680.pdf
Fiche produit Sony : 80/1237483481680.pdf
16/02/2012 - www.sony.fr
The OLED advantages Wide dynamic range ensures accurate colour reproduction in dark areas Utilising Sony TRIMASTER ELTM technology, the PVM-2551MD OLED monitor is capable of reproducing pure black, faithful to the source signal. It provides superb colour reproduction, especially for dark images. This enables medical professionals to observe very subtle details in each image. For example, the faint colour differences of tissue under low-light conditions such as blood vessels, membrane and fat, are...

French User Guide JBM20 and AV140 v1.1
French User Guide JBM20 and AV140 v1.1
16/03/2012 - www.archos.com
Jukebox Multimedia 20, 120 et AV140 Manuel d'utilisation au format PDF Jukebox Multimedia 20 Jukebox Multimedia 120 et AV140 Manuel d'utilisation du Jukebox Multimedia 20, 120 et AV140 d'ARCHOS version 1.1 Rendez-vous sur notre site Internet pour télécharger les derniers manuels et micrologiciels (firmware) de ce produit. Table des matières 1 Démarrer le Jukebox Multimedia........................................................................................................................................

Fiche compacte PDF - Smeg
Fiche compacte PDF - Smeg
05/07/2012 - www.smeg.fr
KR37X-1 SMEG ELITE Hotte décorative murale cylindrique, 37 cm, inox EAN13: 8017709163280 Commandes à touches électroniques rétro éclairées blanches 3 vitesses Fonction vitesse intensive (10 min.) Eclairage 2 halogènes (2 X 20 W) Filtre aluminium Diamètre de sortie Ø 150 mm Puissance nominale : 290 W Capacité d'aspiration à l'air libre 620 m³/h Vitesse I II III IV Capacité d'aspiration m3/h norme EN 61591 250 380 500 590 Niveau sonore dB(A) norme EN 60704-2- 13 47 57 63 67 Options KITFC161...

Essential Datasheet
Essential Datasheet
22/09/2024 - www.logitech.com
Scheda tecnica Logitech Essential per sale riunioni Logitech Essential ? un piano di assistenza completo che include strumenti software avanzati per aiutarti a gestire i tuoi spazi e i dispositivi Logitech in modo efficiente. Con Logitech Essential per sale riunioni, hai una visibilit? dettagliata e il controllo totale dei tuoi dispositivi Logitech e delle sale riunioni. Gestisci i dispositivi con facilit? Logitech Essential ti consente di accedere a tutte le funzionalit? avanzate di Logitech...

Exos 7e2000 DS1955 1 1709US En US
Exos 7e2000 DS1955 1 1709US En US
13/08/2024 - www.seagate.com
DATA SHEET Trusted. Efficient. Versatile. Exos 7E2000 The Seagate® Exos !" 7E2000 enterprise hard drives can store large amounts of data without using a ton of system space up to 2000GB in a compact 2.5-inch form factor. Exos 7E2000 provides the density, low power consumption and data integrity needed in traditional data centers and the cloud. Trusted Bulk Data Storage in a Small Data Center Footprint Exos 7E2000 drives optimize your data center footprint in a proven 2.5-inch form factor for infrastructures...