Space/Time Trade-Offs in Hash Coding with Allowable Errors
Bloom
bloom filters dictionaries hash coding set membership
@article{bloom:cacm-1970,
author={Bloom, B.H.},
title={Space/Time Trade-Offs in Hash Coding with Allowable Errors},
journal={Communications of the ACM},
volume={13},
number={7},
year={1970},
pages={422--426},
publisher={ACM},
address={New York, NY, USA}
}
Basic problem: Determining membership of a series of objects in a set of objects
- Trying to make tradeoffs to facillitate performance, e.g. reducing required data structure size so it may remain solely in core memory
Makes tradeoffs among several aspects:
- Space: Size of the hash area
- Time: How long to reject an object as not in the set
- Error rate: Target fraction of objects that may be misidentified as members of the set
- This is the key---if non-zero error rate is permissible then Bloom filters permit space savings
- Can additionally augment with a secondary, more time consuming check to manage cases where an object is incorrectly accepted