Column-Store Support for RDF Data Management: Not All Swans are White
Sidirourgos, Goncalves, Kersten, Nes, Manegold
rdf triple store vertical partition column store semantic web
@article{sidirourgos:vldb-2008,
title={Column-Store Support for {RDF} Data Management:
Not All Swans are White},
author={Sidirourgos, L. and Goncalves, R. and Kersten, M.
and Nes, N. and Manegold, S.},
journal={Proceedings of the {VLDB} Endowment},
volume={1},
number={2},
pages={1553--1563},
year={2008},
publisher={{VLDB} Endowment}
}
Download PDF
Revisits paper by Abadi et al on vertically partitioned triple stores
Could not substantiate that vertical partitioning outperformed triple store in a row-oriented DB
Vertical partitioning is well suited to column store DBs
- Generally dominant performance
Vertical partitioning has potential scalability problems when the number of properties is high
Long tail requires efficient handling of small sets
- Overhead of having a table for each
- Leads to huge queries, quantifying over predicates
Loading, clustering, nad index construction kept outside benchmark as the RDF data is assumed to be largely read-only
Cold runs vs hot runs to get at cache effects
Real time vs user time
RDF triple-storage in MonetDB/SQL and DBX
Triple store maded of three tables
- Clustered on (subject, property, object)
- Unclustered on (property, object, subject)
- Unclustered on (object, subject, property)
Vertical partitioning creates table for each property, clusters by subject
Have to look at how vertical partitioning scales with number of properties in query
- Test this by taking the data set and arbitrarily dividing properties into multiple new properties
Also implement (p, s, o) in column store
Barton libraries data set