Epidemic Algorithms for Replicated Database Maintenance

Demers, Greene, Hauser, Irish, Larson, Shenker, Sturgis, Swinehart, and Terry

epidemic gossip database consistency replication

@inproceedings{demers:pdc-1987,
  title={Epidemic Algorithms for Replicated Database Maintenance},
  author={Alan Demers and Dan Greene and Carl Hauser and Wes Irish and
          John Larson and Scott Shenker and Howard Sturgis and
          Dan Swinehart and Doug Terry},
  booktitle={Sixth Annual {ACM} Symposium on
             Principles of Distributed Computing},
  pages={1--12},
  year={1987},
  organization={ACM New York, NY, USA}
}

Database replicated at many sites, wish to maintain mutual consistency

Network of hundreds or thousands of sites
Heterogenous, slightly unreliably, slowly changing network
Updates injected at a single site; propagated or supplanted
Relaxed form of consistency: Fully consistent only when updates stop, sites are quiescent; otherwise, most data is up to date
Minimal requirements placed on network layer

Application: Distributed, locally-usable, replicated global DNS

Metrics

Time required for updates to reach all sites
Volume of traffic generated in propagating update; ideally proportional to size of the update times the number of servers
Residue: How many nodes do not receive update?
Traffic: Messages sent from site to site
Delay: Average time before message has arrived at site, averaged over all sites
Delay: Time until last peer that will receive message receives message

Approaches investigated:

Direct mail: Updates sent directly to all sites; timely, not too inefficient, but all sites need to know about all other sites, doesn't address lost updates
Anti-entropy: Sites regularly choose other sites at random and synchronize databases; very reliably, cannot be used to frequently as it's comms costly
Rumor mongering: Sites periodically pick other sites, check to see if they have recent updates; stops trying to spread update after some fraction of checked peers have received update; under strict stopping scheme, some chance not all sites will get update

Terminology adopted from epidemiology

Infective: Peer has update to share
Susceptible: Peer out of date
Removed: Has update, not sharing

Uniform random partner selection results in high traffic; choose w/ bias for nearby servers?

Anti-entropy alone still created too much traffic

Spatial distributions and rumor mongering

Other approach: Hierarchical scheme, a la DNS

Peers push updates down hierarchy, each taking responsibility for

Direct mail generates n-1 messages per update, where n is number of sites

Traffic generated is proportional to the sum of the path lengths to the n nodes

Epidemiology states that simple epidemics starting from one infective host will spread to the entire population in time proportional to the log of the network size

Simple epidemic: Sites are always either susceptible or infective
Constant factors are proportional to whethere updates are pushed/pulled/push-pulled
- Push: If s has newer copy than s', push to chosen site
- Pull: If s has older copy than s', pull from chosen site
- Push-Pull: If s has newer copy than s', push to chosen site; if s has older copy than s', pull from chosen site; otherwise, nothing
- The difference here is simple but could clearly have large impacts
- I.e, pull and push-pull are preferred if most nodes get updates through some other mechanism, e.g. broadcast
- If updates are plentiful, pull is likely to hit a peer that has new updates
- If network is quiet, push minimizes network traffic after it is propagated

Naive checksumming/hashing of DB as a quick consistency check is impractical

Nodes may be just slightly out of date, making their checksums mismatch though it shouldn't require a full DB comparison
Instead, peers should exchange recent update lists
- Peers update their DBs according to list, compare checksums, then figure out what else is missing if necessary
- Update list window should be long enough for an update to hit every host
Alternatively, if maintaining inverse index keyed by timestamp, peers may compare database in reverse order

Gossip mongering

Normal: When a peer talks to a peer that already has an update, with probability 1/k it stops spreading the update
Blind: With probability 1/k peer stops spreading the update
Counter: Stop spreading update after contacting k peers that already have update
Experimentally, these variations differ only in delay, not coverage; counters and feedback (not blind) reduce delay
Minimization: Peers exchange counters; if both already have update, only the peer with smaller counter increments (both if equal); produces best coverage
Connection limit: Limit how many nodes may contact a peer in a given cycle
- Push is much better in this case, as it won't receive the same update redundantly within the same cycle
- Even so that probably worth limiting to 1 connection regardless of provisioning?
Hunting: After being rejected, peer continues to search for updateable hosts

Cannot delete an item simply by removing it---other machines will simply update it back, treating it as if it had never been heard

Deletions managed by spreading death certificates
But raises problem of when to delete item?
- Distributed snapshot, ensuring all nodes have update, is one mechanism, but can be costly
- Easier to simply set a very long threshold beyond which death certificates are culled
  - Some sites could hold onto death certificates, long past threshold, and repropagate those certificates if they discover anyone propagating the supposedly deleted information

Accounting for spatial distribution significantly reduces traffic across critical links

This matters much more than average traffic across all links
Let each site build a list of other sites, with distances; bias probability of choosing a site by this function
- Choose uniformly within groups of equidistant peers
But k needs to be adjusted to ensure enough different peers are being checked

Of note:

Boris Pittel. On Spreading a Rumor. SIAM Journal of Applied Mathematics. 47(1):213--223, 1987.
J. C. Frauenthal. Mathematical Modeling in Epidemiology. Pages 12--24. Springer-Verlag. 1980.
M. Ben-Or. Fast Asynchronous Byzantine Agreement. 4th ACM Symposium on Principles of Distributed Computing. 1985. Pgs 149--151.
Birrell, Levin, Needham, Schroeder. Grapevine: An Exercise in Distributed Computing. CACM 25(4):260--274. 1982.

tjkopena's wiki
	Papers / Demers-PDC 1987	Edit Page \| Page History \| Wiki Help Print \| Search:

Demers-PDC 1987

Epidemic Algorithms for Replicated Database Maintenance

Demers, Greene, Hauser, Irish, Larson, Shenker, Sturgis, Swinehart, and Terry