The Volcano Optimizer Generator: Extensibility and Efficient Search

Reference: Graefe, Goetz, and William J. McKenna. "The Volcano optimizer generator: Extensibility and efficient search." Data Engineering, 1993. Proceedings. Ninth International Conference on. IEEE, 1993.

This is one of the several papers belong to suggested readings for Optimizer Implementation (Part I) of CMU 15-721: Database Systems.

0. Abstract

Emerging database application domains demand not only new functionality but also high performance. To satisfy these two requirements, the Volcano project provides efficient, extensible tools for query and request processing, particularly for object-oriented and scientific database systems. One of these tools is a new optimizer generator. Data model, logical algebra, physical algebra, and optimization rules are translated by the optimizer generator into optimizer source code. Compared with our earlier EXODUS optimizer generator prototype, the search engine is more extensible and powerful; it provides effective support for non-trivial cost models and for physical properties such as sort order. At the same time, it is much more efficient as it combines dynamic programming, which until now had been used only for relational select-project-join optimization, with goal-directed search and branch-and-bound pruning. Compared with other rule-based optimization systems, it provides complete data model independence and more natural extensibility.

1. Introduction

  • Extensibility and Performance
  • Five requirements

2. The Outside View of the Volcano Optimizer Generator

2.1 Design Principles

  • The Volcano optimizer generator uses two algebras, called the logical and the physical algebras, and generates optimizers that map an expression of the logical algebra (a query) into an expression of the physical algebras (a query evaluation plan consisting of algorithms). To do so, it uses transformations within the logical algebra and cost-based mapping of logical operators to algorithms.
  • In our design, rules are translated independently from one another and are combine only by the search engine when optimizing a query. Modularization is an advantage in itself both for initial construction of an optimizer and for its maintenance.
  • Optimizer choices are represented in the Volcano optimizer generator’s input file as algebraic equivalences, and the optimizer generator’s search engine applies them in a suitable manner.
  • In general, interpretation can be made more flexible (in particular the rule set can be augmented at run-time), while compiled rule sets typically execute faster. Since query optimization is very CPU-intensive, we decided on rule compilation. In general, the issue of flexibility in the search engine and the choice between interpretation vs. compilation are orthogonal.
  • The search engine used by optimizers generated with the Volcano optimizer generator is based on dynamic programming.

2.2 Optimizer Generator Input and Optimizer Operation

  • The user queries to be optimized by a generated optimizer are specified as an algebra expression (tree) of logical operators.
  • Optimization consists of mapping a logical algebra expression into the optimal equivalent physical algebra expression. In other words, a generated optimizer reorders operators and selects implementation algorithms.
  • The algebraic rules of expression equivalence, e.g., commutativity or associativity, are specified using transformation rules. The possible mappings of operators to algorithms are specified using implementation rules.

  • The results of expressions are described using properties, e.g., logical properties (schema, expected size) and physical properties (sort order, partitioning). Logical properties are attached to equivalence classes - sets of equivalent logical expressions and plans - whereas physical properties are attached to specific plans and algorithms choices.
  • The types and semantics of physical properties can be designed by the optimizer implementor.

  • Each optimization goal (and subgoal) is a pair of a logical expression and a physical property vector. In order to decide whether or not an algorithm or enforcer can be used to execute the root node of a logical expression, a generated optimizer matches the implementation rule, executes the condition code associated with the rule, and then invokes an applicability function that determines whether or not the algorithm or enforcer can deliver the logical expression with physical properties that satisfy the physical property vector.
  • The applicability functions also determine the physical property vectors that the algorithm’s input must satisfy.

  • After the optimizer decides to explore using an algorithm or enforcer, it invokes the algorithm’s cost function to estimate its cost.
  • Cost is an abstract data type for the optimize generator; therefore, the optimizer implementor can choose cost to be a number (e.g., estimated elapsed time), a record (e.g., estimated CPU time and I/O count), or any other type.

  • For each logical and physical algebra expression, logical and physical properties are derived using property functions. The logical properties are determined based on the logical expression, while physical properties can only be determined after an execution plan has been chosen.

Summary

The optimizer implementor provides:

  1. a set of logical operators
  2. algebraic transformation rules, possibly with condition code
  3. a set of algorithms and enforcers
  4. implementation rules, possibly with condition code
  5. an ADT “cost” with functions for basic arithmetic and comparison
  6. an ADT “logical properties”
  7. an ADT “physical property vector” including comparisons functions (equality and cover)
  8. an applicability function for each algorithm and enforcer
  9. a cost function for each algorithm and enforcer
  10. a property function for each operator, algorithm, and enforcer

3. The Search Engine

  • The Volcano optimizer generator provides a search engine to be used all created optimizers.
  • Dynamic programming is used in optimizers created with the Volcano optimizer generator by retaining a large set of partial optimization results and using these earlier results in later optimization decisions.
  • In order to prevent redundant optimization effort by detecting redundant (i.e., multiple equivalent) derivations of the same logical expressions and plans during optimization, expression and plans are captured in a hash table of expressions and equivalence classes.
  • The search algorithm employed by optimizers created with the Volcano optimizer generators uses dynamic programming by storing all optimal subplans as well as optimization failures until a query is completely optimized.

4. Comparison with the EXODUS Optimizer Generator

4.1 Functionality and Extensibility

  • Volcano makes a distinction between logical expressions and physical expressions
  • Physical properties were handled rather haphazardly in EXODUS.
  • Volcano algorithm is driven top-down; subexpressions are optimized only if warranted.
  • Cost is defined in much more general terms in Volcano than in the EXODUS optimizer generator.
  • The Volcano optimizer generator is more extensible than the EXODUS prototype, in particular with respect to the search strategy.

4.2 Search Efficiency and Effectiveness

  • the Volcano optimizer generator is not only more extensible, it is also much more efficient and effective than the earlier EXODUS prototype.
  • Starburst
Written on October 28, 2017