Cayuga: A General Purpose Event Monitoring System

Demers, Gehrke, Pand, Riedewald, Sharma, White

event correlation databases in-network continuous query stream processing

@conference{demers:cidr-2007,
  title={Cayuga: A General Purpose Event Monitoring System},
  author={Demers, A. and Gehrke, J. and Panda, B. and
          Riedewald, M. and Sharma, V. and White, W.M.},
  booktitle={Conference on Innovative Data Systems Research},
  year={2007}
}

Event monitoring applications differ from previous stream processing

Many monitoring queries must find correlation between events, patterns in the stream
- One important class of patterns is "safety conditions" which stipulate events that must not happen between other events
- Such queries in previous stream languages are hard to write, hard to optimize
Larger number of concurrent queries

Major additions to query language are NEXT and FOLD operators

NEXT operator permits the next tuple matching a given filter that occur after a given event
- Example: "Suppose we want to match pairs of stock quotes with identical prices, and return the stock name of the second quote of each pair. We can formulate this query in CEL as follows."
```
SELECT Name
FROM (SELECT Price FROM Stock)
NEXT{$1.Price = $2.Price} (Stock)
```
FOLD operator permits tuples matching a given filter until a tuple matching a stop condition is received, with aggregation permitted along the way
- Example: "Suppose we want to find a monotonically increasing run of prices for a single company, where the run lasts for at least 10 stock quotes, and the first quote has a volume greater than 10000. We can formulate this query in CEL as follows."
```
SELECT *
FROM FILTER{cnt > 10} (
(SELECT *, 1 AS cnt FROM
FILTER{Volume > 10000}(Stock))
FOLD{$2.Name = $.Name, $2.Price > $.Price,
$.cnt+1 AS cnt} Stock)
```

Cayuga queries expressible as slightly generalized NFAs

Input is arbitrary relational streams
State transitions controlled using predicates
Data may be stored and selection may compare to previous events
Internal query processor works off detailed, explicit XML encoding of the automatons
- This makes it easier to plug other interfaces on top of the engine, e.g. a new high level query interface
- Predicates are compiled into bytecodes executed by an interpreter

Time synchronization is a big deal as events must be processed in order

It must correct for clock skew between data sources, network delay & reordering, etc
Everything gets thrown into a priority queue, ordering items to be processed even if earlier events come in slightly earlier than other events
With clock skew bounded by T, events are buffered for T before being processed
- Note that other events, received later, may have taken priority by this point
- Seems like a bad actor could cause this to halt permanently, by constantly sending falsely timestamped tuples just before the interval expires
- High T affects latency but not throughput
- Earlier events received after an event has been processed (T is too small) are discarded

Related work

Pub/sub has limited expressiveness, don't span multiple events
Stream databases are awkward for relating multiple events, and may have scalability issues
Complex event systems are similar, but not as general or well defined as Cayuga

Future work

Functions
Hierarchy of servers performing event processing at various levels of abstraction

Of note:

Makes heavy use of garbage collection, shallow copies to manage the large amounts of objects instantiated and killed throughout the process
- Interesting bimodal distribution of objects: Some die very fast (filtered tuples), others last for quite some time, e.g. operator objects
- Copying collection?
Uses an internal canonical string storage pool for strings
- Reduces memory footprint, speeds checks after canonicalized (can simply compare index/pointer)
- Note that this is not as straightforward as simply a hash table, as the objects need to be dereference so that they get garbage collected
  - Uses "weak references," which do not block garbage collection
Demers et al, Towards expressive publish/subscribe systems. EDBT, 2006
Fabret et al, Filtering algorithms and implementation for very fast publish/subscribe. SIGMOD, 2001

tjkopena's wiki
	Papers / Demers-CIDR 2007	Edit Page \| Page History \| Wiki Help Print \| Search:

Demers-CIDR 2007

Cayuga: A General Purpose Event Monitoring System

Demers, Gehrke, Pand, Riedewald, Sharma, White