OLTP Through the Looking Glass, and What We Found ThereReference: Harizopoulos, Stavros, et al. "OLTP through the looking glass, and what we found there." Proceedings of the 2008 ACM SIGMOD international conference on Management of data. ACM, 2008.
This is one of the several papers belong to suggested readings for In-Memory Databases of CMU 15-721: Database Systems.
Online Transaction Processing (OLTP) databases include a suite of features — disk-resident B-trees and heap files, locking-based concurrency control, support for multi-threading — that were optimized for computer technology of the late 1970’s. Advances in modern processors, memories, and networks mean that today’s computers are vastly different from those of 30 years ago, such that many OLTP databases will now fit in main memory, and most OLTP transactions can be processed in milliseconds or less. Yet database architecture has changed little.
Based on this observation, we look at some interesting variants of conventional database systems that one might build that exploit recent hardware trends, and speculate on their performance through a detailed instruction-level breakdown of the major components involved in a transaction processing database system (Shore) running a subset of TPC-C. Rather than simply profiling Shore, we progressively modified it so that after every feature removal or optimization, we had a (faster) working system that fully ran our workload. Overall, we identify overheads and opti- mizations that explain a total difference of about a factor of 20x in raw performance. We also show that there is no single “high pole in the tent” in modern (memory resident) database systems, but that substantial time is spent in logging, latching, locking, B-tree, and buffer management operations.
- Modern general purpose OLTP database systems: a collection of on-disk data structures for table storage, including heap files and B-trees, support for multiple concurrent queries via locking-based concurrency control, log-based recovery, and an efficient buffer manager.
- Situation is different today: (1) dramatic performance improvement and cost reduction in hardware; (2) rising demand for database-like services along with the rise of the Internet.
1.1 Alternative DBMS Architectures
- Log-less databases
- Single threaded databases
- Transaction-less databases
1.2 Measuring the Overheads of OLTP
- Four major components whose removal substantially improved the throughput of the system: (1) Logging; (2) Locking; (3) Latching; (4) Buffer management.
1.4 Contributions and Paper Organization
- Dissect where time goes inside of a modern database system
- Measure the performance of variants of a modern database system
- Use these measurements to speculate on the performance of different database management systems
2. Trends in OLTP
- Cluster Computing
- Memory Resident Databases
- Single Threading in OLTP Systems
- High Availability vs. Logging
- Transaction Variants
3.1 Shore Architecture
- Thread support
- Locking and logging: two-phase locking, ACID, WAL, log sequence number(LSN)
- Buffer Manager
3.2 Removing Shore Components
- Remove logging: (1) avoid generating I/O requests along with the time associated to perform these requests by allowing group commit and increasing the log buffer size; (2) remove all functions for preparing and writing log records; (3) avoid processing Log Sequence Numbers.
- Remove Locking
- Remove Latching
- Remove buffer manager calls
- Miscellaneous optimizations: (1) accelerating the B-tree code by hand-coding node searches to optimize for the common case that keys are uncompressed integers; (2) accelerating directory lookups by using a single cache for all transactions; (3) increasing page size to reduce the number of levels in a B-tree and frequency of page allocations for newly created records; (4) removing the overhead of setting up and terminating a session for each transaction, by consolidating transactions into a single session.
4. Performance Study
4.1 OLTP Workload
- TPC-C models a wholesale parts supplier operating out of a number of warehouses and their associated sales districts. TPC-C is designed to represent any industry that must manage, sell, or distribute a product or service. It is designed to scale as the supplier expands and new warehouses are created. The database size for one warehouse is approximately 100 MB.
- TPC-C involves a mix of five concurrent transactions of different types and complexity. These transactions include entering orders(the New Order transaction), recording payments (Payment), delivering orders, checking the status of orders, and monitoring the level of stock at the warehouses. TPC-C also specifies that about 90% of the time the first two transactions are executed.
4.2 Setup and Measurement Methodology
- Measurement is based on CPU instruction counts(as measured through the CPU performance counters using PAPI library) and not wall clock time, because instruction counts are representative of the total run-time code path length, and they are deterministic.
- Equal instruction counts among different components can result in different wall clock execution times(CPU cycles), because of different micro-architectural behavior(cache misses, TLB misses, etc.).
4.3 Experimental Results
5. Implications for Future OLTP Engines
- Stripping out any one of the components of the system has a relatively small benefit on overall performance.
- The most significant gains are to be had when multiple optimizations are applied.
5.1 Concurrency Control
- some sort of optimistic concurrency would be the prevailing operation
5.2 Multi-core Support
- Run multiple transactions concurrently on separate cores;
- Use virtualization;
- Exploit intra-query parallelism, but the amount of intra-query parallelism available in a typical OLTP transactions is likely to be limited.
5.3 Replication Management
The main reason that active-passive replication with log shipping has been used in the past is that the cost of rolling the log forward has been assumed to be far lower than the cost of performing the transaction logic on the replica.
Limitation of active-passive replication with log shipping
- The remote copies are not transactionally consistent with the primary, unless a form of two-phase commit is used;
- Failover is not instantaneous;
- Requires the availability of a log.
In a main memory DBMS, the cost of a transaction is typically less than 1 msec, in this case, an alternate active-active architecture makes sense. In such a scenario, two-phase committing will introduce substantial additional latency, perhaps by performing transactions in timestamp order.
5.4 Weak Consistency
- Eventual Consistency: sensitive to workloads
5.5 Cache-conscious B-trees
It appears to be more important to optimize other components, such as concurrency control and recovery, than to optimize data structures.