Feral Concurrency Control: An Empirical Investigation of Modern Application Integrity

Reference: Bailis, Peter, et al. "Feral concurrency control: An empirical investigation of modern application integrity." Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data. ACM, 2015.

0. Abstract

The rise of data-intensive “Web 2.0” Internet services has led to a range of popular new programming frameworks that collectively embody the latest incarnation of the vision of Object-Relational Mapping (ORM) systems, albeit at unprecedented scale. In this work, we empirically investigate modern ORM-backed applications’ use and disuse of database concurrency control mechanisms. Specifically, we focus our study on the common use of feral, or application-level, mechanisms for maintaining database integrity, which, across a range of ORM systems, often take the form of declarative correctness criteria, or invariants. We quantitatively analyze the use of these mechanisms in a range of open source applications written using the Ruby on Rails ORM and find that feral invariants are the most popular means of ensuring integrity (and, by usage, are over 37 times more popular than transactions). We evaluate which of these feral invariants actually ensure integrity (by usage, up to 86.9%) and which—due to concurrency errors and lack of database support—may lead to data corruption (the remainder), which we experimentally quantify. In light of these findings, we present recommendations for database system designers for better supporting these modern ORM programming patterns, thus eliminating their adverse effects on application integrity.

1. Introduction

  • Choose Ruby on Rails for: (1) popular, and active open source community and user base; (2) it is a “opinionated software” that bears an actively antagonistic relationship to database management systems.
  • By shunning decades of work on native database concurrency control solutions, Rails has developed a set of primitives for handling application integrity in the application tier—building, from the underlying database system’s perspective, a feral concurrency control system.
  • Goal: understanding how this growing class of applications currently interacts with database systems and how we can positively engage with these criticisms to better serve the needs of these developers.
  • Accordingly, we apply invariant confluence analysis and show that, in fact, up to 86.9% of Rails validation usage by volume is actually safe under concurrent execution.

2. Background

2.1 Rails Tenets and MVC

  • MVC: Model, View and Controller
  • Model Layer: Rails’s Active Record, which is an object that wraps a row in a database or view, encapsulates the database access, and adds domain logic on that data.

2.2 Databases and Deployment

  • In a Rails application, the only coordination between individual application requests occurs within the database system, which may give serious pause to disciples of transaction processing systems.
  • For maintaining consistency of application data, Rails focuses on keeping application logic within Rails, and not in the database.

3. Feral Mechanisms in Rails

  • To maintain application integrity for concurrent operations, Rails has developed a range of concurrency control strategies, two of which operate external to the database, at the application level, which the authors term feral concurrency control mechanisms.

3.1 Rails Concurrency Control Mechanisms

  1. Transactions: backed by an actual database application, whose isolation level can be configured on a per-transaction basis.
  2. Optimistic and pessimistic per-record locking: optimistic lock is implemented by maintaining a special lock_version field in an Active Record model; pessimistic lock is implemented by invoking a SELECT FOR UPDATE statement in the database.
  3. Application-level validations: the framework will run each declared validation sequentially and , if all succeed, the model state is updated in the database; this happens within a database-backed transaction(partially).
  4. Application-level associations: association acts like a foreign key. However, in Rails 4.2, declaring an association does not declare a corresponding key constraint and vice-versa.

In effect, from the database’s perspective, these validations and associations exist external to the system and are feral concurrency control mechanisms.

3.2 Adoption in Practice

  • Application corpus
  • Mechanism usage: validations and associations are overwhelming the most popular forms of concurrency control.
  • Case study: Spree
  • Additional metrics: (1) they analyzed the number of models, transactions, validations and associations over each project’s lifetime; (2) they analyzed the distribution of authors to commits compared to the distribution of authors to validations and associations authored.

3.3 Summary and Discussion

4. Isolation and Integrity

The validation’s intended integrity will be preserved provided that the database is using serializable isolation since each sequence of validations is wrapped within a database-backed transaction. However, validations are likely to run at default database isolation in production environments.

4.1 Understanding Validation Behavior

  • Invariant confluence(I-confluence): provides a necessary and sufficient condition for whether or not invariants can be preserved under coordination-free, concurrent execution of transactions.

4.2 Built-In Validations

  • Under insertions, 86.9% of built-in validation occurrences as I-confluent. Under deletions, only 36.6% of occurrences are I-confluent.
  • Associations and multi-record uniqueness are (depending on the workload) not I-confluent and are therefore likely to cause problems.

4.3 Custom Validations

  • Similar with built-in validations.

5. Quantifying Feral Anomalies

5.1 Uniqueness Constraints and Isolation

  • ActiveRecord accomplishes uniqueness by issuing a SELECT query in SQL and, if no such record is found, Rails updates the instance sate in the database. However, this behavior is assured correct only under serializable isolation.
  • Thus the uniqueness validation can introduce at most P(permitting P concurrent validations) duplicate records for each unique value in the database table.

5.2 Quantifying Uniqueness Anomalies

5.3 Association Validations and Isolation

  • ActiveRecord accomplishes association by issuing a SELECT WHERE query in SQL to find an associated record and, if a matching association is found, Rails updates the instance state in the database. However, this behavior is assured correct only under serializable isolation.
  • In Rails 4.1, there is no way to natively declare a foreign key constraint; it must be done via a third-party library.
  • Thus any number of concurrent insertions may occur during validation, leading to unbounded numbers of dangling records.

5.4 Quantifying Association Anomalies

5.5 Takeaways and Discussion

  • Why still use validations?: (1) they correctly guard against non-concurrency-related anomalies such as data entry or input errors; (2) validations do reduce the incidence of inconsistency.
  • Nevertheless, Rail’s feral mechanisms are a poor substitute of their respective database counterparts.

6. Other Frameworks

  • Java Persistence API
  • Hibernate
  • CakePHP
  • Laravel
  • Django
  • Waterline

7. Implications for Databases

7.1 Summary: Shortcomings Today

  • Currently, although database provides transaction for maintaining integrity, there isn’t a clean, idiomatic means of expressing correctness criteria in domain logic.
  • The authors believe there is an opportunity and pressing need to build systems that provide all three criteria: performance, correctness, and programmability.

7.2 Domesticating Feral Mechanisms

Requirements of a new database interface for application users and framework authors

  • Express correctness criteria in th language of their domain model, with minimal friction, while permitting their automatic enforcement;
  • Only pay the price of coordination when necessary;
  • Easily deploy to multiple database backends.
  • ORMs
  • Applications and weak isolation
  • Quantifying anomalies
  • Empirical software analysis
Written on April 25, 2017