A Large-Scale Study of the Evolution of Web Pages - Microsoft

Extrait du fichier (au format texte) :

A Large-Scale Study of the Evolution of Web Pages
Dennis Fetterly
Hewlett Packard Labs
1501 Page Mill Road
Palo Alto, CA 94304
dennis.fetterly@hp.com

Mark Manasse

Marc Najork

Microsoft Research
Microsoft Research
1065 La Avenida
1065 La Avenida
Mountain View, CA 94043 Mountain View, CA 94043
manasse@microsoft.com najork@microsoft.com

Janet Wiener
Hewlett Packard Labs
1501 Page Mill Road
Palo Alto, CA 94304
janet.wiener@hp.com

ABSTRACT

1. INTRODUCTION

How fast does the web change? Does most of the content remain unchanged once it has been authored, or are the documents continuously updated? Do pages change a little or a lot? Is the extent of change correlated to any other property of the page? All of these questions are of interest to those who mine the web, including all the popular search engines, but few studies have been performed to date to answer them.
One notable exception is a study by Cho and Garcia-Molina,
who crawled a set of 720,000 pages on a daily basis over four months, and counted pages as having changed if their MD5 checksum changed. They found that 40% of all web pages in their set changed within a week, and 23% of those pages that fell into the
.com domain changed daily.
This paper expands on Cho and Garcia-Molina s study, both in terms of coverage and in terms of sensitivity to change. We crawled a set of 150,836,209 HTML pages once every week, over a span of
11 weeks. For each page, we recorded a checksum of the page, and a feature vector of the words on the page, plus various other data such as the page length, the HTTP status code, etc. Moreover, we pseudo-randomly selected 0.1% of all of our URLs, and saved the full text of each download of the corresponding pages.
After completion of the crawl, we analyzed the degree of change of each page, and investigated which factors are correlated with

Les promotions



D6. 4: Final evaluation of CLASSiC TownInfo and ... - Microsoft
D6. 4: Final evaluation of CLASSiC TownInfo and ... - Microsoft
23/11/2017 - www.microsoft.com
See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/228835240 D6. 4: Final evaluation of CLASSiC TownInfo and Appointment Scheduling systems Article · May 2011 CITATIONS READS 15 56 11 authors, including: Helen Hastie Filip Jurcicek Heriot-Watt University Charles University in Prague 105 PUBLICATIONS 858 CITATIONS 55 PUBLICATIONS 439 CITATIONS SEE PROFILE SEE PROFILE Oliver Joseph Lemon Steve Young Heriot-Watt University University of Cambridge 323 PUBLICATIONS 3,678 CITATIONS 310 PUBLICATIONS 14,308 CITATIONS SEE PROFILE SEE PROFILE Some of the authors of this publication are also working on these related projects: MaDrIgAL: Multi-Dimensional Interaction management and Adaptive Learning View project ...

MSR Quantum applications - Microsoft
MSR Quantum applications - Microsoft
23/08/2018 - www.microsoft.com
( What Can We Do with a Quantum Computer? ( Matthias Troyer  Station Q, ETH Zurich | 1 Classical computers have come a long way Antikythera mechanism ENIAC astronomical positions (1946) (100 BC) Kelvin s harmonic analyzer prediction of tides (1878) Difference Engine (1822) Is there anything that we cannot solve on future supercomputers? Titan, ORNL (2013) Matthias Troyer | | 2 How long will Moore s law continue? Do we see signs of the end of Moore s law? Can we go below 7nm...

User-Driven Access Control: Rethinking Permission ... - CiteSeerX
User-Driven Access Control: Rethinking Permission ... - CiteSeerX
23/08/2018 - www.microsoft.com
User-Driven Access Control: Rethinking Permission Granting in Modern Operating Systems Franziska Roesner, Tadayoshi Kohno {franzi, yoshi}@cs.washington.edu University of Washington Alexander Moshchuk, Bryan Parno, Helen J. Wang {alexmos, parno, helenw}@microsoft.com Microsoft Research, Redmond Crispin Cowan crispin@microsoft.com Microsoft Abstract tionality and security for access to the user s data and resources. From a functionality standpoint, isolation inhibits the client-side manipulation...

DictaNum : système de dialogue incrémental pour la dictée ... - Microsoft
DictaNum : système de dialogue incrémental pour la dictée ... - Microsoft
23/11/2017 - www.microsoft.com
See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/262881756 DictaNum : système de dialogue incrémental pour la dictée de numéros. Conference Paper · July 2014 CITATIONS READS 0 57 3 authors, including: Hatim Khouzaimi Romain Laroche Orange Labs / Laboratoire Informatique d'Avi & Microsoft Maluuba 12 PUBLICATIONS 42 CITATIONS 58 PUBLICATIONS 185 CITATIONS SEE PROFILE SEE PROFILE All content following this page was uploaded by Hatim Khouzaimi on 06 June 2014. The user has requested enhancement of the downloaded file. 21ème...

MSFT SurfaceLaptopIntel 5g Fact Sheet
MSFT SurfaceLaptopIntel 5g Fact Sheet
02/10/2025 - www.microsoft.com
Windows Hello for Business with facial recognition and Enhanced Sign-In Security Surface Laptop 5G for Business Near-edgeless display and Surface's signature 3:2 ratio for more screen in a compact footprint Premium experiences drive AI advantage anywhere NPUs delivering 40 or 48 TOPS of on-device AI performance to support today's capabilities and tomorrow's innovations5 Anti-reflective technology reduces reflections up to 50% Exceptional AI-enabled collaboration and Copilot+ PC1 productivity...

Entanglement and Rigidity in Percolation Models ... - Alexander Holroyd
Entanglement and Rigidity in Percolation Models ... - Alexander Holroyd
22/05/2017 - www.microsoft.com
 ''&'''''' '&'!' &'' &''&''''''' ' ' ''''''''''''"' ''#' '$'%&''&&'''*')'+'!',''-''''.')'+' '/ ')'0''1&''!''2 ''3 '4'6'5'8'7''9';':'=''§'H''£'Œ'X'© '’''“'”'','¾'K''‘''£'Œ'‹'“'”!’'8'’''Š''Œ''Š''›'ž'’'''£'Œ'ž'Š'­'Š',!’'8'’'''£!’'H'¥&`''œ'Š',!”''Š',!’'8'’'''£!’'H'™&'Œ'ž'“'”'¥&`'“'œ'™'H'“'œ'’'¸'¨'£'²'‹'¬''Ž'@'Ž&`'›'ž'Š',''œ'¨$i'›'ž'§'V'Š',''£'®%Ï'“'”!’'H'¥'H'»&`'’'' 'H'Š'­!”''Š'z''£!’'K'“'”!’'H'¥ 'Ž'£'$c'’'' 'H'Š','›'ž'Š$e'’''Š''Œ'!”''›'­'“'”'›'´''£'›'´''¢'Ž&`''œ''”'Ž'h'¤'‡'›','²'>'±''¥&`'Œ''t'§'H' '0'“'”!’'¯'’'' 'H'Œ''Š''Š'#'©'P'™'H'“'”!”''Š',!’'H'›'ž'“'”'Ž&`!’'H''£''¹'›''§'|''£'''Š'¼'“'”'›'Q'Š',!’'8'’'.''£!’'K'¥&`''”'Š''™ '“'«''´'“'«'’'w''z''£!’'H!’'K'Ž'£'’$i'Ÿ'V'Š'0'R'n'§'H'¾'H''”''œ'Š','™'p''£'§'|'t'Œ'ž'’'zÏ&'¤'‡' 'K'Š',!’Ð'’'' 'H'Š''Š','™'H'¥&`'Š''›''t'Œ''Š'+'Œ''Š''¥'8''£'Œ''™'K'Š','™Ñ't'›$i'§'K' %Ï'¨'@'›'ž'“'”''z't''µ'''Ž&`!’%²'© !’'H'Š''''’''“'”'Ž&`!’'H'›$i!”&''£'™'H'Š''Ž'£''*'Š',''”''£'›'ž'’''“'”'t'²'0'±Ò'¥&`'Œ'''£'§'H' Ð'“'”'›$i'Œ''“'”'¥'£'“'”'™'p'“'œ''­'“'«'’$i''z''£!’'H!’'K'Ž'£'’'º'Ÿ'"'Š'1'R'n'™'H'Š'#''¢'Ž&`'Œ'!”''Š','™&Ï$c'¤'‡' 'H'Š'!’ '’'' 'H'Š'º'Š','™'K'¥&`'Š','›'w''£'Œ'ž'Š'º'Œ''Š''¥'8''£'Œ'ž'™'H'Š','™Ó''£'›'-'›''Ž&`''œ'“'”'™''Œ'ž'Ž%²'™'K'›'-'¤'‡' 'H'“'”''.' '…'',''£!’'…'§'H'“'«'¶&`'Ž'£'’'w''F'’'-'’'' 'K'Š''¶&`'Š''Œ'ž'’''“'œ'','Š''›','²$i'·'*' 'H'Š','›'ž'Š '“'œ!’%Ï'’''¾'H'“'œ'’''“'œ'¶'£'Š$e!’'H'Ž'£'’''“'”'Ž&`!’'H'›'­'¤'‡'“'œ''”'&'Ÿ'"'Š$e''¢'Ž&`'Œ'ž!”'¯''£''œ'“'”'›'ž'Š','™'0''‘''F'’''Š','Œ''² Ô'=!’'8'’'.''£!’'H'¥'£''”'Š',!”''Š',!’'8'’'''£!’'H'™'p'Œ''“'œ'¥&`'“'”'™'H'“'«'’'¸'¨&c'“'œ!’'p'§'"'Š','Œ'ž'','Ž&`''”'t'’''“'œ'Ž&`!’'p''£'Œ''Š''Ž'£''´'“'”!’'8'’''Š''Œ''Š','›'X'’'''¢'Ž'£'Œ$i'›'ž'Š''¶'£'Š','Œ''t''Q'Œ'ž'Š'z''F'© '›'ž'Ž&`!’'H'›','²ÖÕ×'“'”'Œ''›'X'’'''«'¨&`'»'*'’'' 'H'Š'#'¨Ø' '|''z'¶&`'Š'p'“'”!”''§'"'Ž&`'Œ'ž'’'''£!’'8'’'...

Architectures reconfigurables et traitement de proble`mes ... - Microsoft
Architectures reconfigurables et traitement de proble`mes ... - Microsoft
16/11/2016 - www.microsoft.com
RECHERCHE Architectures reconfigurables et traitement de proble`mes NP-difficiles : un nouveau domaine d application Youssef Hamadi    David Merceron  '  ' LIRMM, UMR 5506 CNRS/Universite´ Montpellier II 161, Rue Ada, 34392 Montpellier Cedex 5 hamadi@lirmm.fr ''' EURIWARE, 12-14 rue du fort de St-Cyr 78067 St Quentin-en-Yvelines Cedex damercer@euriware.fr RE´SUME´. L algorithme GSAT est un algorithme de recherche locale. Cette me´thode recherche la premie`re instanciation...

1 Introduction - Microsoft
1 Introduction - Microsoft
11/04/2018 - www.microsoft.com
One-Way Accumulators: A Decentralized Alternative to Digital Signatures (Extended Abstract) Josh Benaloh Clarkson University Michael de Mare Giordano Automation Abstract This paper describes a simple candidate one-way hash function which satis es a quasi-commutative property that allows it to be used as an accumulator. This property allows protocols to be developed in which the need for a trusted central authority can be eliminated. Space-e cient distributed protocols are given for document time...
 
 

Mode d'emploi - Philips
Mode d'emploi - Philips
17/10/2017 - www.philips.fr
Toujours là pour vous aider Enregistrez votre produit et obtenez de l'assistance sur le site www.philips.com/support Des questions ? Contactez Philips Mode d'emploi HTL5130B Table des matières 1 Important 2 Aide et assistance 2 Sécurité 2 Protection de votre produit 3 Protection de l'environnement 3 Conformité 4 2 Votre produit 4 Unité principale 4 Télécommande 5 Connecteurs 6 3 Installation 7 7 4 Connexion 8 8 Connexion au téléviseur Raccordement audio d'un téléviseur et...

Fiche produit - Thomson
Fiche produit - Thomson
13/05/2016 - www.thomsontv.fr
LED TV 16/9 55UA6406 Spécifications du 55UA6406 Résolution : 3840x2160 Luminosité : 350 cd/m2 Contraste dynamique : Mega Contrast Fonctionnalité d upscaling Puissance son : Stéréo 2 x 8 W Processeur : Quad Core Prêt pour la télévision numérique haute définition (Canal Ready *: Port CI+ compatible avec le mini-décodeur (ou module) CANAL READY permettant de recevoir les chaînes payantes du groupe CANAL+ via la TNT Connectique numérique : 4 HDMI 2.0 - 1 CMP Port USB Multimédia (vidéo...

PENTAX DIGITAL CAMERA
PENTAX DIGITAL CAMERA
06/03/2012 - www.pentax.fr
FOIRE AUX QUESTIONS Reflex numérique PENTAX K-m Fonctions Principales et Caractéristiques Combien de temps met l'appareil photo pour être opérationnel à l'allumage? La mise sous tension du PENTAX K-m est approximativement de 0,25 sec Quel est le le grossissement et la couverture de champ du viseur ? Son grossissement est d'approximativement 0.85x (avec un objectif 50 mm f/1.4); Sa couverture de champ est d'approximativement 96%. Quelle est la distance maximum oculaire (entre l'oeil et le...

WD My Book® AV DVR Expanders Product Overview
WD My Book® AV DVR Expanders Product Overview
11/04/2012 - www.wdc.com
My Book AV DVR Expanders ® Enregistrez plus, effacez moins. LES D IS ES DURS EX QU US VENDUS AU Libérez de l'espace sur votre magnétoscope numérique pour stocker plus d'émissions TV Un compagnon de stockage pour votre caméra Stockez de l'audio-vidéo pour les lire sur votre TV Ajoutez instantanément des heures d'enregistrement à votre magnétoscope numérique. Ce disque est optimisé pour une lecture et un enregistrement audio-vidéo fluide sur votre TV. Et avec à la fois les interfaces...

Fiche produit Sony : 78/1237476739478.pdf
Fiche produit Sony : 78/1237476739478.pdf
16/02/2012 - www.sony.fr
Full HD solid-state camcorder. PMW-EX1R XDCAM EX Camcorder www.sonybiz.net/xdcam Full HD solid-state camcorder for superior HD Picture performance and more creative freedom. Unique Lens Operation PMW-EX1R Camcorder The lens is also equipped with independent rings for zoom and iris adjustment with stops and markings for precise adjustments. The location, rotational range, and feel are identical to other manual high-end HD lenses. This gives users a high level of familiarity and operational comfort....

AR Cards - Nintendo
AR Cards - Nintendo
02/12/2014 - www.nintendo.com
The Official Seal is your assurance that this product is licensed or manufactured by Nintendo. Always look for this seal when buying video game systems, accessories, games and related products. FOR MORE INFORMATION, SEE THE BUILT-IN INSTRUCTION MANUAL To view the built-in instruction manual, tap the button labeled Manual  displayed on the HOME Menu. NEED HELP PLAYING A GAME? For game play assistance, we recommend using your favorite Internet search engine to find tips for the game you are playing....