Column-Stores vs. Row-Stores: How Different Are They Really?Reference: Abadi, Daniel J., Samuel R. Madden, and Nabil Hachem. "Column-stores vs. row-stores: How different are they really?." Proceedings of the 2008 ACM SIGMOD international conference on Management of data. ACM, 2008.
This is one of the several papers belong to suggested readings for Storage Models & Data Layout of CMU 15-721: Database Systems.
There has been a significant amount of excitement and recent work on column-oriented database systems (“column-stores”). These database systems have been shown to perform more than an order of magnitude better than traditional row-oriented database systems (“row-stores”) on analytical workloads such as those found in data warehouses, decision support, and business intelligence applications. The elevator pitch behind this performance difference is straightforward: column-stores are more I/O efficient for read-only queries since they only have to read from disk (or from memory) those attributes accessed by a query.
This simplistic view leads to the assumption that one can obtain the performance benefits of a column-store using a row-store: either by vertically partitioning the schema, or by indexing every column so that columns can be accessed independently. In this paper, we demonstrate that this assumption is false. We compare the performance of a commercial row-store under a variety of different configurations with a column-store and show that the row-store performance is significantly slower on a recently proposed data warehouse benchmark. We then analyze the performance difference and show that there are some important differences between the two systems at the query executor level (in addition to the obvious differences at the storage layer level). Using the column-store, we then tease apart these differences, demonstrating the impact on performance of a variety of column-oriented query execution techniques, including vectorized query processing, compression, and a new join algorithm we introduce in this paper. We conclude that while it is not impossible for a row-store to achieve some of the performance advantages of a column-store, changes must be made to both the storage layer and the query executor to fully obtain the benefits of a column-oriented approach.
- First question: Are these performance gains due to something fundamental about the way column-oriented DBMSs are internally architected, or would such gains also be possible in a conventional system that used a more column-oriented physical design?
- Second Question: Which of the many column-database specific optimizations proposed in the literature are most responsible for the significant performance advantage of column-stores over row-stores on warehouse workloads.
- We show that trying to emulate a column-store in a row-store does not yield good performance, and that a variety of techniques typically seen as “good” for warehouse performance (index-only plans, bitmap indices, etc.) do little to improve the situation.
- We propose a new technique for improving join performance in column stores called invisible join.
- We break-down the sources of column-database performance on warehouse workloads, exploring the contribution of late materialization, compression, block iteration, and invisible join on overall system performance.
2. Background and Prior Work
- MonetDB and MonetDB/X100 systems pioneered the design of modern column-oriented database systems and vectorized query execution, and they show that column-oriented designs – due to superior CPU and cache performance (in addition to reduced I/O) – can dramatically outperform commercial and open source databases on benchmarks like TPC-H.
- Fractured mirrors approach proposed a hybrid row/column approach.
- C-Store includes many of the same features as MonetDB/X100, as well as optimizations for direct operation on compressed data.
3. Star Schema Benchmark
- The Star Schema Benchmark (SSBM) is a data warehousing benchmark derived from TPC-H. Unlike TPC-H, it uses a pure textbook star-schema. It also consists of fewer queries than TPC-H and has less stringent requirements on what forms of tuning are and are not allowed.
4. Row-oriented Execution
- In a fully vertically partitioned approach, some mechanism is needed to connect fields from the same row together.
- One approach creates one physical table for each column in the logical schema, while the i-th table has two columns, one with values from column i of the logical schema and one with the corresponding value in the position column. It is often preferable to using the primary key because primary keys can be large and sometime composite.
- Two problems of vertical partitioning: (1) it requires the position attribute to be stored in every column, which wastes space and disk bandwidth; (2) most row-stores store a relatively large header on each tuple, which further wastes space, while column stores typically store headers in separate columns to avoid these overheads.
- Index-only plan: base relations are stored using a standard, row-oriented design, but an additional unclustered B+-tree index is added on every column of every table. Such plans never accesses the actual tuple on disk.
- One problem with the index-only plan is that if a column has no predicate on it, the index-only approach requires the index to be scanned to extract needed values, which can be slower than scanning a heap file. One optimization is to create indices with composite keys, where the secondary keys are from predicate-less columns. We implemented this optimization by storing the primary key of each dimension table as a secondary sort attribute on the indices over the attributes of that dimension table.
- We create an optimal set of materialized views for every query flight in the workload, where the optimal view for a given flight has only the columns needed to answer queries in that flight. We do not pre-join columns from different tables in these views.
- This approach voids the overheads of explicitly storing record-id or positions, and storing tuple headers just once per tuple. However, it does require the query workload to be known in advance, making it practical only in limited situations.
5. Column-oriented Execution
- Compressing data using column-oriented compression algorithms and keeping data in this compressed format as it is operated upon has been shown to improve query performance by up to an order of magnitude.
- Intuitively, data stored in columns is more compressible than data stored in rows. Compression algorithms perform better on data with low information entropy.
- Prior work concludes that the biggest difference between compression in a row-store and compression in a column-store are the cases where a column is sorted (or secondarily sorted) and there are consecutive repeats of the same value in a column.
5.2 Late Materialization
- Since most queries access more than one attribute from a particular entity, and most database output standards (e.g., ODBC and JDBC) access database results entity-at-a-time, at some point in most query plans, data from multiple columns must be combined together into “rows” of information about an entity. Consequently, this join-like materialization of tuples (also called “tuple construction”) is an extremely common operation in a column store.
- Advantages: (1) selection and aggregation operators tend to render the construction of some tuples unnecessary; (2) if data is compressed using a column-oriented compression method, it must be decompressed before the combination of values with values from other columns, this removes the advantages of operating directly on compressed data described above; (3) cache performance is improved when operating directly on column data, since a given cache line is not polluted with surrounding irrelevant attributes for a given operation; (4) the block iteration has a higher impact on performance for fixed-length attributes, since in a late materialized column-store, fixed-width columns can be operated on separately.
5.3 Block Iteration
- In all column-stores, blocks of values from the same column are sent to an operator in a single function call, and no attribute extraction is needed, and if the column is fixed-width, these values can be iterated through directly as an array. Operating on data as array not only minimizes per-tuple overhead, but it also exploits potential for parallelism on modern CPUs, as loop-pipelining techniques can be used.
5.4 Invisible Join
- Two “old” ways: (1) traditional plan that pipelines joins in order of predicate selectivity, which constructs tuples before the join precludes all of the late materialization benefits; (2) late materialization join, in which values from dimension table group-by columns need to be extracted in out-of-position order, which can have significant cost.
- Invisible join: rewriting the joins into predicates on the foreign key columns in the fact table, they can be executed at the same time as other selection predicates that are being applied to the fact table. By waiting until all predicates have been applied before doing this extraction, the number of out-of-order extractions is minimized.
Three Phases of Join
- Each predicate is applied to the appropriate dimension table to extract a list of dimension table keys that satisfy the predicate. These keys are used to build a hash table that can be used to test whether a particular key value satisfies the predicate.
- Each hash table is used to extract the positions of records in the fact table that satisfy the corresponding predicate. This is done by probing into the hash table with each value in the foreign key column of the fact table, creating a list of all the positions in the foreign key column that satisfy the predicate. Then, the position lists from all of the predicates are intersected to generate a list of satisfying positions P in the fact table.
- For each column C in the fact table containing a foreign key reference to a dimension table that is needed to answer the query, foreign key values from C are extracted using P and are looked up in the corresponding dimension table.
- The predicate can be rewritten from a hash-lookup predicate on the fact table to a “between” predicate where the foreign key falls between two ends of the key range.
- System X: a commercial row-oriented DBMS
6.1 Motivation for Experimental Setup
- C-Store outperforms System X by a factor of six in the base case, and a factor of three when System X is using materialized views, while System X outperforms C-Store (with row-store implementation) by more than a factor of two (mainly because of partitioning and multi-threading).
6.2 Column-Store Simulation in a Row-Store
- Materialized views perform best in all cases, because they read the minimal amount of data required to process a query. After materialized views, the traditional approach or the traditional approach with bitmap indexing, is usually the best choice.
- Two high level issues that limit the approach of the columnar approaches: (1) tuple overheads (for vertically partitioning); (2) column join (merging two columns from the same table together requires a join operation).
- Detailed row-store performance breakdown on query 2.1
- The previous results show that none of our attempts to emulate a column-store in a row-store are particularly effective.
- The vertical partitioning can provide performance that is competitive with or slightly better than a row-store when selecting just a few columns. This approach also requires relatively expensive hash joins to combine columns from the fact table together.
- Index-only plans have a lower per-record overhead, but introduce another problem that the system is forced to join columns of the fact table together using expensive hash joins before filtering the fact table using dimension columns.
- With respect to the traditional plan, materialized views are an obvious win, and bitmap indices sometime help, especially when the selectivity of queries is low.
6.3 Column-Store Performance
Tuple Overhead and Join Cost
- Modern column-stores do not explicitly store the record-id (or primary key) needed to join together columns from the same table. Rather, they use implicit column positions to reconstruct columns (the $i$th value from each column belongs to the $i$th tuple in the table). Further, tuple headers are stored in their own separate columns and so they can be accessed separately from the actual column values.
- In a column-store, heap files are stored in position order, whereas the order of heap files in many row-stores, even on a clustered attribute, is only guaranteed through an index. This makes a merge join (without a sort) the obvious choice for tuple reconstruction in a column-store.
- However, these optimizations can also be implemented in row-stores.
Breakdown of Column-Store Advantages
- Block-processing can improve performance anywhere from a factor of only 5% to 50% depending on whether compression has already been removed (when compression is removed, the CPU benefits of block processing is not as significant since I/O becomes a factor).
- The invisible join improves performance by 50-75%.
- Compression improves performance by almost a factor of two on average, or an order-of-magnitude on queries that access sorted data.
- Late materialization results in almost a factor of three performance improvement.
- Once all of these optimizations are removed, the column-store performs similarly to the row-store with materialized view since the I/O requirements and the query processing are similar, while the only difference is the necessary tuple-construction at the beginning of the query plans for the column store. However, tuple construction can be very expensive (it adds almost a factor of 2).
Implications of Join Performance
- Denormalization is actually not very useful in column-stores (at least for star schemas). This is because the invisible join performs so well that reducing the number of joins via denormalization provides an insignificant benefit.
- In face, denormalization only appears to be useful when the dimension table attributes included in the fact table are sorted (or secondarily sorted) or are otherwise highly compressible.