The Many Faces of Publish/Subscribe
Eugster, Felber, Guerraoui, and Kermarrec
networking pub/sub publish subscribe dissemination internet
@article{eugster:acm2003,
author = {P. Th. Eugster and P. A. Felber and R. Guerraoui and A. -m. Kermarrec},
title = {The Many Faces of Publish/Subscribe},
journal = {ACM Computing Surveys},
year = {2003},
volume = {35},
pages = {114--131}
}
Pub/sub has seen many variants to meet many applications or networks
- Basic: Subscribers register interest in event or pattern, asynchronously notified of events generated by publishers and disseminated via message bus
- Many event managers provide and
advertise()
function, in addition to subscribe()
, unsubscribe()
, and publish()
.
- Allows event manager to adjust to future expected event flows, subscribers to learn of new types of information available
- Some variants: Topic based, content based, type based
- All variants vary in three decoupling dimensions: Time, space, synchronization
Several schemes offer similar functionality as pub/sub, but do not as fully decouple the participating entities
- Message passing is fairly low level, requiring both data marshaling and physical addressing; requires producer and consumer to be active at the same time and space, recipient must be known to sender; using async at producer, sync at subscriber
- RPC/remote invocation: Have to deal with comms failures; usually sync at producer, async at subscriber, requires strong time/space coordination
- Fire and forget variants provides weak reliability guarantees
- Wait-by-necessity variant allows query to be sent, process to continue until data is needed and arrives
- Observer design pattern: Consumers register interest with publishers; often used in web caches; entities fairly coupled, burden on publisher
- Shared spaces/distributed shared memory: Tuple producers and consumers remain anonymous; consumer mechanism of pulling tuples produces some synchronization though modern systems have async notification features
- Rendezvous/Internet Indirection Infrastructure (I3): Decouples sending from receiving
- Message Queueing: Provide transactional, timing, and ordering guarantees not given by tuple spaces; do not provide synchronization decoupling, which is hard to do while maintaining ordering and other constraints
Within pub/sub systems, how to register for events is a prime differentiation
- Topic-based: Subscribers join groups/channels identified by keywords; publishers push messages to that group, effectively broadcasting them to members
- Hierarchical topics are a common extension of this scheme; many allow wildcards in topic names
- This scheme can be very efficient in terms of compute time, memory
- Content-based: Register by properties of events themselves or metadata; some filter language usually given for specifying matches on those properties (, e.g. = < > and or)
- Some provide for correlation, only notifying subscribers when a specified combo of events has occurred
- Alternative languague is template matching; executable code has also been used, though they're difficult to optimize
- Content filters enable better bandwidth usage reductions without a huge explosion in topic tree, but have higher runtime load, more complex implementation
- Can combine the two, e.g. applying filters to subscriptions within particular topics
- Type-based: Similar to template based?
Architectures
- Centralized server solutions support reliability, data consistency, transactional support; limited throughput
- More likely to utilize point-to-point networking
- Distributed approaches may support fast data delivery
- Topic based systems easily use standard multicast approaches
- Content based schemes do not as readily apply to standard multicast
- Middleground: network of servers
- Obvious tree, propagating subscriptions upward, can overload root servers
- Can create overlay of message brokers to matchup interests
Qualities of service
- Persistence: Generally only offered by centralized solutions
- Priorities
- Transactions: For example, ensures that a block of messages is received, or none of them
- Reliability: Relatively straightforward in centralized solutions using point-to-point, reliable networking and persistent storage