A large-scale study of the world wide web: network correlation functions with scale-invariant boundaries
Institute for Theoretical Physics, Goethe University Frankfurt, 60054 Frankfurt, Germany
Received: 13 December 2012
Received in final form: 11 June 2013
Published online: 5 August 2013
We performed a large-scale crawl of the world wide web, covering 6.9 million domains and 57 million subdomains, including all high-traffic sites of the internet. We present a study of the correlations found between quantities measuring the structural relevance of each node in the network (the in- and out-degree, the local clustering coefficient, the first-neighbor in-degree and the Alexa rank). We find that some of these properties show strong correlation effects and that the dependencies occurring out of these correlations follow power laws not only for the averages, but also for the boundaries of the respective density distributions. In addition, these scale-free limits do not follow the same exponents as the corresponding averages. In our study we retain the directionality of the hyperlinks and develop a statistical estimate for the clustering coefficient of directed graphs. We include in our study the correlations between the in-degree and the Alexa traffic rank, a popular index for the traffic volume, finding non-trivial power-law correlations. We find that sites with more/less than about 103 links from different domains have remarkably different statistical properties, for all correlation functions studied, indicating towards an underlying hierarchical structure of the world wide web.
Key words: Statistical and Nonlinear Physics
© EDP Sciences, Società Italiana di Fisica and Springer-Verlag, 2013