Papers /

Seltzer-CACM 2008

Reading

Outdoors

Games

Hobbies

LEGO

Food

Code

Events

Nook

sidebar

Seltzer-CACM 2008

Beyond Relational Databases

Seltzer

sql relational databases modular query languages
expressiveness complexity hierarchical applications storage 
article{seltzer:cacm2008,
  title="Beyond Relational Databases",
  author="Margo Seltzer",
  journal="Communications of the ACM",
  volume="51",
  number="7",
  month="July",
  year="2008",
  pages="52--58"
}

Relational databases have become the de facto mechanism for data management

  • Hide the underlying organization and optimization from application developers
  • Amortize those development costs
  • Provide an easy to use, well defined mechanism for working with and defining data

Over time, RDBs have stretched to cover a large variety of functionality

  • Vendors want to provide differentiation
  • Users want lots of different functionality for various applications
    • However, very very few want all of the functionality...

Many applications, however, work or should work in very different ways

  • Data Warehousing: Bulk updates periodically adding data, but very very rarely is existing data revised. Queries are varied and ad hoc, data basically consists of one table with only a few columns used at any one time.
  • Directory Services: Infrequent writes, generally single-row retrieval, hierarchical structures and multivalued attributes.
  • Web Search: Semistructured data, keyword lookups, read-mostly with bulk updates and non-traditional indexing
  • Mobile device caching: Data generally read mostly and completely transitory, able to be regenerated or refetched if necessary.
    • Interesting point: Possible to view phonebook as a cache of a global database, which might offer additional capabilities and a streamlined framework to manage adding new callers, etc.
  • XML Management: Many apps today are spending a lot of effort encoding and decoding XML documents, often in cases where it's very inefficient to do so. XML native storage and querying is the next big evolution. Interestingly here, big chunks of content here (the documents) are generally static, but those documents may be added to and deleted from the repository frequently.
  • Stream processing: Despite being about filtering rather than management and not having persistent storage, people want to use something like SQL for this. Doesn't need storage or even complex queries, but needs to be fast. Also focuses dynamic data and static queries rather than static data and dynamic queries as in typical DB uses.

Traditional databases focus on ad hoc queries, significant write traffic, and strong transactional and integrity guarantees

  • None of the applications above feature all or, in some cases, any of those criteria
  • Much of the data isn't even naturally relational

To address this, need to shift data management toward highly modular, very configurable data engines

  • Must be modular to enable a wide variety of implementations and features
  • Must be configurable to match and tune application needs, environment capabilities (e.g. memory & CPU resources), and functionality

One can see several hierarchies of modular functional components

  • Query: Single table w/ B+ tree driven simple indexing, updating, and selection; transactions; select-project-join; aggregates
  • Concurrency: Single threaded; minimal concurrency, table or API level locks; high concurrency w/ fine grain locking and isolation
  • Transactions: basic transactions, savepoints, two-phase commits, nested transactions, compensating transactions
  • Logging: Simple, backup, rollback capable, etc
  • Replication and high reliability services
  • Control interface: Single threaded, cooperating processes, threads, event-based
  • Interestingly, open interfaces between all these modules enable functionality such as transactions over non-DB operations, e.g. "power up the backup network interface card"

"The databased must also avoid making decisions about network protocols."

  • E.g., some people will be using over WANs, some people over backplanes
Recent Changes (All) | Edit SideBar Page last modified on September 03, 2008, at 10:11 AM Edit Page | Page History