Papers /

Cuenca Acuna-HPDC 2003

Reading

Outdoors

Games

Hobbies

LEGO

Food

Code

Events

Nook

sidebar

Cuenca Acuna-HPDC 2003

PlanetP: Using Gossiping to Build Content Addressable Peer-to-Peer Information Sharing Communities

Cuenca-Acuna, Peery, Martin, Nguyen

peer-to-peer gossip content addressable publish/subscribe

@inproceedings{cuenca-acuna:HPDC-2003,
  author={Francisco Matias Cuenca-Acuna and Christopher Peery 
          and Richard P. Martin and Thu D. Nguyen},
  title={{PlanetP}: Using Gossiping to Build Content Addressable Peer-to-Peer
	 Information Sharing Communities},
  booktitle={Twelfth IEEE International Symposium on High
	     Performance Distributed Computing ({HPDC}-12)},
  publisher={IEEE Press},
  year=2003,
  month=June,
  pages={236--246}
}

PlanetP: Content addressable publish/subscribe for unstructured P2P

  • Gossip layer globally replicates membership directory and content index
    • Focus on loose consistency, rather than precise consistency
  • Search algorithm helps users find content
  • Peers only exchange periodic, randomized, point-to-point messages w/ peers

Tries to hit slightly different target usage than other P2P schemes

  • Focus on unstructured communities because structure, e.g. DHT, requires much effort to stabilize
  • Focus on thousands of users because both smaller and larger group sizes are well studied
    • Enables some things, such as content ranking, which are more difficult at larger scale

Key metrics/questions:

  • How effective is the search and ranking algorithm?
  • Can a usable level of consistency be maintained given the random gossip process and variable network size
  • Bandwidth and storage consumption?

Gossip protocol extends Demers algorithm w/ partial entropy algorithm

  • Core goal is synchronizing globally replicated data structure
    • If x learns of a change, it pushes it out every T_g seconds to a randomly chosen peer
      • Global member directory is itself replicated globally by gossiping
    • If y has not seen the rumor it alters its data and similarly begins to push the change
    • Pushing stops after n peers have been contacted that have already received the change
    • Each node also pulls data every T_r rounds from a random peer, to prevent data from not being spread to every node
  • PlanetP extends this to include a digest of a small number m of local objects when a node receives a push and replies back whether they have received the update or not
    • If a node is not actively pushing an update, it slowly increases its intervals up, then resets them when an update is received

Content is published via XML documents which point to the actual files (unless they're very small and embedded)

  • Large files are served up by a simple web server on each host
  • Inverted index is maintained for local documents
  • Searcher matches query against global index, contacts k peers according to their rank, retrieves all matching results from those peers

An approximation of TFxIDF is used to supply weights to a vector space content similarity/ranking scheme

  • Combines the frequency of a term in a document with the frequency of the term globally
  • Removing stop words and using stems ("run" instead of "running") also helps
  • Approximated by maintaining over peers and peer count rather than documents

Uses Bloom filter to summarize set of terms in local index

  • Can be compressed, versioned
  • Tradeoff accuracy for storage on memory constrained devices by combining filters

Notes that documents may not be shared evenly---on many P2P networks, a small number of peers provide most of the documents

Node peering is handled in a straight forward way

  • When a node joins, the update to the global index is gossiped out
  • When a node leaves, nodes that realize this mark it as down in their index
    • However, they do not push that information out
    • Once it has not been re-pushed as back online for some time interval, marked down entries are removed from the local indexes

After a point, new content will likely not introduce new terms

Worst case for PlanetP is when a bunch of new peers join a group, e.g. the community becomes popular

Must take into account bandwidth resources of the nodes

  • Proxying: Enable peers with better resources to conduct searches for lesser resources
  • Modifying gossip protocol to bias toward peers of similar connectivity
    • When pushing update, fast peers choose slow peers only a small percentage of the time
    • Slow nodes always choose slow peers unless it is the source of the update, in which case it chooses a fast peer as the initial target

Related work

  • Very hard to do efficient querying over DHTs due to key/value relationship

Discusses extended to larger number of nodes by having groups operate as described here, then share attenuated Bloom filters between groups

  • Gossip occassionally spills over into other groups
  • When searching, may contact a peer in a group with matching terms in the attenuated filter and have it conduct a search, returning a list of matching documents

Of note:

  • References several text collections used in precision/recall tests
Recent Changes (All) | Edit SideBar Page last modified on February 03, 2009, at 01:10 PM Edit Page | Page History