Column-Store Support for RDF Data Management: Not All Swans are White

Sidirourgos, Goncalves, Kersten, Nes, Manegold

rdf triple store vertical partition column store semantic web

Revisits paper by Abadi et al on vertically partitioned triple stores

Could not substantiate that vertical partitioning outperformed triple store in a row-oriented DB

Vertical partitioning is well suited to column store DBs

  • Generally dominant performance

Vertical partitioning has potential scalability problems when the number of properties is high

Long tail requires efficient handling of small sets

  • Overhead of having a table for each
  • Leads to huge queries, quantifying over predicates

Loading, clustering, nad index construction kept outside benchmark as the RDF data is assumed to be largely read-only

Cold runs vs hot runs to get at cache effects

Real time vs user time

RDF triple-storage in MonetDB/SQL and DBX

Triple store maded of three tables

  • Clustered on (subject, property, object)
  • Unclustered on (property, object, subject)
  • Unclustered on (object, subject, property)

Vertical partitioning creates table for each property, clusters by subject

Have to look at how vertical partitioning scales with number of properties in query

  • Test this by taking the data set and arbitrarily dividing properties into multiple new properties

Also implement (p, s, o) in column store

Barton libraries data set

