An Analysis of Internet Content Delivery Systems
Saroiu, Gummadi, Dunn, Gribble, Levy
content-delivery cdn dissemination internet p2p usage web monitoring
@inproceedings{saroiu:osdi-2002,
title={An Analysis of Internet Content Delivery Systems},
author={Saroiu, S. and Gummadi, K.P. and Dunn, R.J.
and Gribble, S.D. and Levy, H.M.},
journal={Symposium on Operating Systems Design and Imprementation}
pages={315--328},
month={December},
year={2002},
location={Boston, MA}
}
Substantial amount of data collected from U Washington network
- Looking at traffic and usage patterns
- HTTP Web Traffic
- Akamai
- Kazaa
- Gnutella
- E.g., average Kazaa user consumes 90 times more bandwidth than the average web user
- Most Web objects are small (5--10kb) but heavy tailed and large objects exist
- Web objects & servers accessed with Zipf popularity distribution
Web caching helps alleviate network and server loads
- Cache hit rates of 40--50% on Web traffic may be achieved
- Hit rate increases only logarithmically with user growth
- Constrained by dynamic content
- Web content delivery networks work mostly through DNS interposition or URL rewriting
- Do reduce average download times, but DNS redirection adds latency
- Possible they merely prevent using worst service, rather than optimal service
P2P usage patterns are very different from Web use
- Tend more toward non-interactive batch downloads, larger object transfers---three orders of magnitude larger than Web objects
- Most providers are end users with low availability and network resources
Several common P2P search structures: Centralized, (overlay) broadcast, super-peers (hybrid)
- Most download directly from provider; some download fragments in parallel from multiple providers (BitTorrent, Kazaa)
- Request rate is low compared to WWW, but transfers are long---1000x longer, meaning that many more P2P connections are live at any point, though request rate is low
Gnutella network does not restructure according to network topology, causing many queries to go outside the immediate area
In theory, Web traffic hits many popular sites hard P2P traffic has less hotspots, wider distribution of load
- This is not true in practice
Based on this study at UW
- P2P traffic outweighs Web traffic by factor of three
- P2P nodes consume bandwidth in both directions
- Not obvious that P2P traffic scales well at all
- Its use of the network is so intense that it can easily overcome resources
- Small number of objects count for disproportionate amount of bandwidth used
- Placing P2P caches at gateways could provide substantial savings on inbound/outbound bandwidth consumption