Epidemic Algorithms for Replicated Database Maintenance
Demers, Greene, Hauser, Irish, Larson, Shenker, Sturgis, Swinehart, and Terry
epidemic gossip database consistency replication
@inproceedings{demers:pdc-1987,
title={Epidemic Algorithms for Replicated Database Maintenance},
author={Alan Demers and Dan Greene and Carl Hauser and Wes Irish and
John Larson and Scott Shenker and Howard Sturgis and
Dan Swinehart and Doug Terry},
booktitle={Sixth Annual {ACM} Symposium on
Principles of Distributed Computing},
pages={1--12},
year={1987},
organization={ACM New York, NY, USA}
}
Database replicated at many sites, wish to maintain mutual consistency
- Network of hundreds or thousands of sites
- Heterogenous, slightly unreliably, slowly changing network
- Updates injected at a single site; propagated or supplanted
- Relaxed form of consistency: Fully consistent only when updates stop, sites are quiescent; otherwise, most data is up to date
- Minimal requirements placed on network layer
Application: Distributed, locally-usable, replicated global DNS
Metrics
- Time required for updates to reach all sites
- Volume of traffic generated in propagating update; ideally proportional to size of the update times the number of servers
- Residue: How many nodes do not receive update?
- Traffic: Messages sent from site to site
- Delay: Average time before message has arrived at site, averaged over all sites
- Delay: Time until last peer that will receive message receives message
Approaches investigated:
- Direct mail: Updates sent directly to all sites; timely, not too inefficient, but all sites need to know about all other sites, doesn't address lost updates
- Anti-entropy: Sites regularly choose other sites at random and synchronize databases; very reliably, cannot be used to frequently as it's comms costly
- Rumor mongering: Sites periodically pick other sites, check to see if they have recent updates; stops trying to spread update after some fraction of checked peers have received update; under strict stopping scheme, some chance not all sites will get update
Terminology adopted from epidemiology
- Infective: Peer has update to share
- Susceptible: Peer out of date
- Removed: Has update, not sharing
Uniform random partner selection results in high traffic; choose w/ bias for nearby servers?
Anti-entropy alone still created too much traffic
- Spatial distributions and rumor mongering
Other approach: Hierarchical scheme, a la DNS
- Peers push updates down hierarchy, each taking responsibility for
Direct mail generates n-1 messages per update, where n is number of sites
- Traffic generated is proportional to the sum of the path lengths to the n nodes
Epidemiology states that simple epidemics starting from one infective host will spread to the entire population in time proportional to the log of the network size
- Simple epidemic: Sites are always either susceptible or infective
- Constant factors are proportional to whethere updates are pushed/pulled/push-pulled
- Push: If s has newer copy than s', push to chosen site
- Pull: If s has older copy than s', pull from chosen site
- Push-Pull: If s has newer copy than s', push to chosen site; if s has older copy than s', pull from chosen site; otherwise, nothing
- The difference here is simple but could clearly have large impacts
- I.e, pull and push-pull are preferred if most nodes get updates through some other mechanism, e.g. broadcast
- If updates are plentiful, pull is likely to hit a peer that has new updates
- If network is quiet, push minimizes network traffic after it is propagated
Naive checksumming/hashing of DB as a quick consistency check is impractical
- Nodes may be just slightly out of date, making their checksums mismatch though it shouldn't require a full DB comparison
- Instead, peers should exchange recent update lists
- Peers update their DBs according to list, compare checksums, then figure out what else is missing if necessary
- Update list window should be long enough for an update to hit every host
- Alternatively, if maintaining inverse index keyed by timestamp, peers may compare database in reverse order
Gossip mongering
- Normal: When a peer talks to a peer that already has an update, with probability 1/k it stops spreading the update
- Blind: With probability 1/k peer stops spreading the update
- Counter: Stop spreading update after contacting k peers that already have update
- Experimentally, these variations differ only in delay, not coverage; counters and feedback (not blind) reduce delay
- Minimization: Peers exchange counters; if both already have update, only the peer with smaller counter increments (both if equal); produces best coverage
- Connection limit: Limit how many nodes may contact a peer in a given cycle
- Push is much better in this case, as it won't receive the same update redundantly within the same cycle
- Even so that probably worth limiting to 1 connection regardless of provisioning?
- Hunting: After being rejected, peer continues to search for updateable hosts
Cannot delete an item simply by removing it---other machines will simply update it back, treating it as if it had never been heard
- Deletions managed by spreading death certificates
- But raises problem of when to delete item?
- Distributed snapshot, ensuring all nodes have update, is one mechanism, but can be costly
- Easier to simply set a very long threshold beyond which death certificates are culled
- Some sites could hold onto death certificates, long past threshold, and repropagate those certificates if they discover anyone propagating the supposedly deleted information
Accounting for spatial distribution significantly reduces traffic across critical links
- This matters much more than average traffic across all links
- Let each site build a list of other sites, with distances; bias probability of choosing a site by this function
- Choose uniformly within groups of equidistant peers
- But k needs to be adjusted to ensure enough different peers are being checked
Of note:
- Boris Pittel. On Spreading a Rumor. SIAM Journal of Applied Mathematics. 47(1):213--223, 1987.
- J. C. Frauenthal. Mathematical Modeling in Epidemiology. Pages 12--24. Springer-Verlag. 1980.
- M. Ben-Or. Fast Asynchronous Byzantine Agreement. 4th ACM Symposium on Principles of Distributed Computing. 1985. Pgs 149--151.
- Birrell, Levin, Needham, Schroeder. Grapevine: An Exercise in Distributed Computing. CACM 25(4):260--274. 1982.