Papers /

Demers-CIDR 2007

Reading

Outdoors

Games

Hobbies

LEGO

Food

Code

Nook

sidebar

Demers-CIDR 2007

Cayuga: A General Purpose Event Monitoring System

Demers, Gehrke, Pand, Riedewald, Sharma, White

event correlation databases in-network continuous query stream processing

@conference{demers:cidr-2007,
  title={Cayuga: A General Purpose Event Monitoring System},
  author={Demers, A. and Gehrke, J. and Panda, B. and
          Riedewald, M. and Sharma, V. and White, W.M.},
  booktitle={Conference on Innovative Data Systems Research},
  year={2007}
}

Event monitoring applications differ from previous stream processing

  • Many monitoring queries must find correlation between events, patterns in the stream
    • One important class of patterns is "safety conditions" which stipulate events that must not happen between other events
    • Such queries in previous stream languages are hard to write, hard to optimize
  • Larger number of concurrent queries

Major additions to query language are NEXT and FOLD operators

  • NEXT operator permits the next tuple matching a given filter that occur after a given event
    • Example: "Suppose we want to match pairs of stock quotes with identical prices, and return the stock name of the second quote of each pair. We can formulate this query in CEL as follows."
      SELECT Name
      FROM (SELECT Price FROM Stock)
      NEXT{$1.Price = $2.Price} (Stock)
      
  • FOLD operator permits tuples matching a given filter until a tuple matching a stop condition is received, with aggregation permitted along the way
    • Example: "Suppose we want to find a monotonically increasing run of prices for a single company, where the run lasts for at least 10 stock quotes, and the first quote has a volume greater than 10000. We can formulate this query in CEL as follows."
      SELECT *
      FROM FILTER{cnt > 10} (
      (SELECT *, 1 AS cnt FROM
      FILTER{Volume > 10000}(Stock))
      FOLD{$2.Name = $.Name, $2.Price > $.Price,
      $.cnt+1 AS cnt} Stock)
      

Cayuga queries expressible as slightly generalized NFAs

  • Input is arbitrary relational streams
  • State transitions controlled using predicates
  • Data may be stored and selection may compare to previous events
  • Internal query processor works off detailed, explicit XML encoding of the automatons
    • This makes it easier to plug other interfaces on top of the engine, e.g. a new high level query interface
    • Predicates are compiled into bytecodes executed by an interpreter

Time synchronization is a big deal as events must be processed in order

  • It must correct for clock skew between data sources, network delay & reordering, etc
  • Everything gets thrown into a priority queue, ordering items to be processed even if earlier events come in slightly earlier than other events
  • With clock skew bounded by T, events are buffered for T before being processed
    • Note that other events, received later, may have taken priority by this point
    • Seems like a bad actor could cause this to halt permanently, by constantly sending falsely timestamped tuples just before the interval expires
    • High T affects latency but not throughput
    • Earlier events received after an event has been processed (T is too small) are discarded

Related work

  • Pub/sub has limited expressiveness, don't span multiple events
  • Stream databases are awkward for relating multiple events, and may have scalability issues
  • Complex event systems are similar, but not as general or well defined as Cayuga

Future work

  • Functions
  • Hierarchy of servers performing event processing at various levels of abstraction

Of note:

  • Makes heavy use of garbage collection, shallow copies to manage the large amounts of objects instantiated and killed throughout the process
    • Interesting bimodal distribution of objects: Some die very fast (filtered tuples), others last for quite some time, e.g. operator objects
    • Copying collection?
  • Uses an internal canonical string storage pool for strings
    • Reduces memory footprint, speeds checks after canonicalized (can simply compare index/pointer)
    • Note that this is not as straightforward as simply a hash table, as the objects need to be dereference so that they get garbage collected
      • Uses "weak references," which do not block garbage collection
  • Demers et al, Towards expressive publish/subscribe systems. EDBT, 2006
  • Fabret et al, Filtering algorithms and implementation for very fast publish/subscribe. SIGMOD, 2001
Recent Changes (All) | Edit SideBar Page last modified on December 17, 2008, at 02:41 PM Edit Page | Page History