TicToc: Time Traveling Optimistic Concurrency Control

Reference: Yu, Xiangyao, et al. "Tictoc: Time traveling optimistic concurrency control." Proceedings of the 2016 International Conference on Management of Data. ACM, 2016.

This is one of the several papers belong to suggested readings for Optimistic Concurrency Control of CMU 15-721: Database Systems.

0. Abstract

Concurrency control for on-line transaction processing (OLTP) database management systems (DBMSs) is a nasty game. Achieving higher performance on emerging many-core systems is difficult. Previous research has shown that timestamp management is the key scalability bottleneck in concurrency control algorithms. This prevents the system from scaling to large numbers of cores.

In this paper we present TicToc, a new optimistic concurrency control algorithm that avoids the scalability and concurrency bottlenecks of prior T/O schemes. TicToc relies on a novel and provably correct data-driven timestamp management protocol. Instead of assigning timestamps to transactions, this protocol assigns read and write timestamps to data items and uses them to lazily compute a valid commit timestamp for each transaction. TicToc removes the need for centralized timestamp allocation, and commits transactions that would be aborted by conventional T/O schemes. We implemented TicToc along with four other concurrency control algorithms in an in-memory, shared-everything OLTP DBMS and compared their performance on different workloads. Our results show that TicToc achieves up to 92% better throughput while reducing the abort rate by 3.3× over these previous algorithms.

1. Introduction

  • T/O algorithms are popular because they allow significant concurrency, but suffer from a fundamental scalability bottleneck: timestamp allocation.
  • Prior work has proposed hardware and software techniques to increase timestamp allocation throughput, but both approaches have serious limitations. Hardware: (1) centralized asynchronous counters, (2) remote atomic memory operations, (3) fully-synchronized clocks. Software: coarse-grained timestamp epochs with group commit.
  • The key contribution of TicToc is a technique that we call data-driven timestamp management: instead of assigning timestamps to each transaction independently of the data it accesses, TicToc embeds the necessary timestamp information in each tuple to enable each transaction to compute a valid commit timestamp after it has run, right before it commits.

2. Background

2.1 Timestamp Allocation

  • A common way to implement the timestamp allocator is through an atomic add instruction that increments a global counter for each new transaction. This approach, however, is only able to generate less than 5 million instructions per second on a modern multi-core system due to the long latency incurred by the CPU’s cache coherence protocol.
  • The hardware solutions either do not exist or only exist in a subset of CPUs today, make it unpractical.

2.2 Optimistic Concurrency Control

  • The original OCC algorithm was proposed in 1981, but it has only recently been adopted in high-performance OLTP DBMSs. By contrast, MVCC, the other T/O-based algorithm, has been used in DBMSs for several decades (e.g., Oracle, Postgres).
  • Dynamic timestamp allocation(DTA)
  • Silo

3. The TicToc Algorithm

  • A transaction’s timestamp is calculated lazily at its commit time in a distributed manner based on the tuples it accesses. Its advantages: (1) its distributed nature avoids all of the bottlenecks inherent in timestamp allocation, making it highly scalable; (2) laziness makes it possible for the DBMS to exploit more parallelism in the workload, thereby reducing aborts and improving performance.

3.1 Lazy Timestamp Management

  • Each data version in TicToc has a valid range of timestamps bounded by the write timestamp (wts) and the read timestamp (rts). Specifically, a particular version is created at timestamp wts and is valid until timestamp rts.
Commit Timestamp

3.2 Protocol Specification

Read Phase

  • Each entry in the read or write set is encoded as {tuple, data, wts, rts}, where tuple is a pointer to the tuple in the database, data is the data value of the tuple, and wts and rts are the timestamps copied from the tuple when it was accessed by the transaction.
  • Note that the value and timestamps must be loaded atomically to guarantee that the value matches the timestamps.
Read Phase

Validation Phase

Validation Phase
  • In TicToc, there is no centralized contention point during transaction execution. The locks and atomic sections protect only the tuples that a transaction touches.

Write Phase

Validation Phase

3.3 Example

3.4 Spurious Aborts

  • In TicToc a transaction aborts only if a tuple it reads or writes is overwritten by another transaction that enters the validation phase first.

3.5 Discussion

  • TicToc is not order-preserving serializable, since the serial order may not be the commit order.
  • Logical timestamps grow more slowly than the number of committed transactions. Moreover, the rate at which the logical timestamp advances is an indicator of the contention level in the workload.

3.6 Implementation

  • To atomically read or write tuples’ timestamps, TicToc adopts an optimization from Silo to encode a lock bit and a tuple’s wts and rts into a single 64-bit word (TS_word) of the following form
Read TS_word
  • Standard techniques for solving “phantom reads” include using locks in indexes or rescanning the tuples during the validation phase.

3.7 Logging and Durability

  • The DBMS can use the canonical ARIES approach if there is only a single log file. But ARIES cannot provide the bandwidth required in today’s multi-core systems.
  • The basic idea for parallel logging is to perform logging in batches. All transactions in a previous batch must be ordered before any transaction in a later batch, but the relative ordering among transactions within the same batch can be arbitrary. For each batch, logs are written to multiple files in parallel. A batch is considered durable only after all the logs within that batch have been written to files.
  • In TicToc, the batching scheme requires that transactions in a later batch must have commit timestamps greater than transactions in a previous batch. This can be achieved by setting a minimum commit timestamp for transactions belonging to the new batch.

4. Proof of Correctness

4.1 Proof Idea

  • Any schedule in TicToc is equivalent to the serial schedule defined.

4.2 Formal Proofs

  • Transactions writing to the same tuple must have different commit timestamps.
  • Transactions that commit at the same timestamp and physical time do not conflict with each other.
  • A read operation from a committed transaction re- turns the value of the latest write to the tuple in the serial schedule.

5. Optimizations

5.1 No-Wait Locking in Validation Phase

  • Thrashing occurs because a transaction waits to acquire new locks while holding other locks, which cause other transactions to block and form a convoy.
  • With no-wait, if a transaction fails to acquire a lock for a tuple in its write set during the validation phase, the validation is immediately aborted (releasing any locks) and then TicToc restarts the validation phase. The transaction sleeps for a short period (1 μs) before retrying to reduce restarts.
  • The no-wait optimization minimizes the blocking and allows more transactions to validate simultaneously.

5.2 Preemptive Aborts

  • We find an approximate commit timestamp before locking the write set tuples, then we will determine if the transaction should be preemptively aborted.

5.3 Timestamp History

  • e can extend TicToc to maintain a history of each tuple’s wts rather than just one scalar value. When a new version is created for a tuple, the wts of the old version is stored in a history buffer.
  • The value of the old version does not need to be stored since transactions in TicToc always read the latest data version. Therefore, the storage overhead of this optimization in TicToc is smaller than that of MVCC.
  • During a transaction’s validation phase, if a read tuple’s local rts is less than commit_ts and the wts does not match the latest wts, then the DBMS checks if the wts matches any version in the tuple’s history buffer. If so, the valid range of that version is from the local wts to the next wts in the history buffer. If commit_ts falls within that range, the tuple can still be validated.

5.4 Lower Isolation Levels

  • Snapshot Isolation: this level mandates that all of a transaction’s reads see a consistent snapshot of the database, and that the transaction will commit only if it does not conflict with any con- current updates made since that snapshot. To support this, two commit timestamps are used, one for reads (commit_rts) and one for writes (commit_wts), and the algorithm verifies that all reads are valid at commit_rts and all writes are valid at commit_wts.
  • Repeatable Reads: with this weaker isolation level, a trans- action’s reads do not need to happen at the same timestamp even though writes should still have the same commit timestamp. This means there is no need to verify the read set in the validation phase.

6. Experimental Evaluation

  • Transaction statistics, such as throughput and abort rates, are collected after the system achieves a steady state during the warm-up period.
  • To minimize memory latency, we use numactl to ensure each thread allocates memory from its own socket.

6.1 Workloads

  • TPC-C
  • YCSB

6.2 TPC-C Results

  • Varying the number of threads, and the number of warehouses(for parallelism)
  • TicToc is better than Silo, because TICTOC’s ability to achieve better parallelism by dynamically selecting the commit timestamp.

6.3 YCSB Results

  • Workloads: Read-Only, Medium Contention, High Contention
  • Read-Only: the baseline for each concurrency control scheme
  • Medium Contention: Silo and TicToc both scale well and achieve similar throughput. But TicToc has a ~3.3x lower abort rate than Silo because of its data-driven timestamp management
  • High Contention: the performance difference between TicToc and Silo is more prominent

6.4 TicToc Optimizations

  • The the timestamp history optimization does not provide any measurable performance gain in either workload.

6.5 Logical Time Analysis

6.6 solation Levels

7. Conclusion

Written on July 20, 2017