The Volcano Optimizer Generator: Extensibility and Efficient Search
Reference: Graefe, Goetz, and William J. McKenna. "The Volcano optimizer generator: Extensibility and efficient search." Data Engineering, 1993. Proceedings. Ninth International Conference on. IEEE, 1993.This is one of the several papers belong to suggested readings for Optimizer Implementation (Part I) of CMU 15721: Database Systems.
0. Abstract
Emerging database application domains demand not only new functionality but also high performance. To satisfy these two requirements, the Volcano project provides efficient, extensible tools for query and request processing, particularly for objectoriented and scientific database systems. One of these tools is a new optimizer generator. Data model, logical algebra, physical algebra, and optimization rules are translated by the optimizer generator into optimizer source code. Compared with our earlier EXODUS optimizer generator prototype, the search engine is more extensible and powerful; it provides effective support for nontrivial cost models and for physical properties such as sort order. At the same time, it is much more efficient as it combines dynamic programming, which until now had been used only for relational selectprojectjoin optimization, with goaldirected search and branchandbound pruning. Compared with other rulebased optimization systems, it provides complete data model independence and more natural extensibility.
1. Introduction
 Extensibility and Performance
 Five requirements
2. The Outside View of the Volcano Optimizer Generator
2.1 Design Principles
 The Volcano optimizer generator uses two algebras, called the logical and the physical algebras, and generates optimizers that map an expression of the logical algebra (a query) into an expression of the physical algebras (a query evaluation plan consisting of algorithms). To do so, it uses transformations within the logical algebra and costbased mapping of logical operators to algorithms.
 In our design, rules are translated independently from one another and are combine only by the search engine when optimizing a query. Modularization is an advantage in itself both for initial construction of an optimizer and for its maintenance.
 Optimizer choices are represented in the Volcano optimizer generator’s input file as algebraic equivalences, and the optimizer generator’s search engine applies them in a suitable manner.
 In general, interpretation can be made more flexible (in particular the rule set can be augmented at runtime), while compiled rule sets typically execute faster. Since query optimization is very CPUintensive, we decided on rule compilation. In general, the issue of flexibility in the search engine and the choice between interpretation vs. compilation are orthogonal.
 The search engine used by optimizers generated with the Volcano optimizer generator is based on dynamic programming.
2.2 Optimizer Generator Input and Optimizer Operation
 The user queries to be optimized by a generated optimizer are specified as an algebra expression (tree) of logical operators.
 Optimization consists of mapping a logical algebra expression into the optimal equivalent physical algebra expression. In other words, a generated optimizer reorders operators and selects implementation algorithms.

The algebraic rules of expression equivalence, e.g., commutativity or associativity, are specified using transformation rules. The possible mappings of operators to algorithms are specified using implementation rules.
 The results of expressions are described using properties, e.g., logical properties (schema, expected size) and physical properties (sort order, partitioning). Logical properties are attached to equivalence classes  sets of equivalent logical expressions and plans  whereas physical properties are attached to specific plans and algorithms choices.

The types and semantics of physical properties can be designed by the optimizer implementor.
 Each optimization goal (and subgoal) is a pair of a logical expression and a physical property vector. In order to decide whether or not an algorithm or enforcer can be used to execute the root node of a logical expression, a generated optimizer matches the implementation rule, executes the condition code associated with the rule, and then invokes an applicability function that determines whether or not the algorithm or enforcer can deliver the logical expression with physical properties that satisfy the physical property vector.

The applicability functions also determine the physical property vectors that the algorithm’s input must satisfy.
 After the optimizer decides to explore using an algorithm or enforcer, it invokes the algorithm’s cost function to estimate its cost.

Cost is an abstract data type for the optimize generator; therefore, the optimizer implementor can choose cost to be a number (e.g., estimated elapsed time), a record (e.g., estimated CPU time and I/O count), or any other type.
 For each logical and physical algebra expression, logical and physical properties are derived using property functions. The logical properties are determined based on the logical expression, while physical properties can only be determined after an execution plan has been chosen.
Summary
The optimizer implementor provides:
 a set of logical operators
 algebraic transformation rules, possibly with condition code
 a set of algorithms and enforcers
 implementation rules, possibly with condition code
 an ADT “cost” with functions for basic arithmetic and comparison
 an ADT “logical properties”
 an ADT “physical property vector” including comparisons functions (equality and cover)
 an applicability function for each algorithm and enforcer
 a cost function for each algorithm and enforcer
 a property function for each operator, algorithm, and enforcer
3. The Search Engine
 The Volcano optimizer generator provides a search engine to be used all created optimizers.
 Dynamic programming is used in optimizers created with the Volcano optimizer generator by retaining a large set of partial optimization results and using these earlier results in later optimization decisions.
 In order to prevent redundant optimization effort by detecting redundant (i.e., multiple equivalent) derivations of the same logical expressions and plans during optimization, expression and plans are captured in a hash table of expressions and equivalence classes.
 The search algorithm employed by optimizers created with the Volcano optimizer generators uses dynamic programming by storing all optimal subplans as well as optimization failures until a query is completely optimized.
4. Comparison with the EXODUS Optimizer Generator
4.1 Functionality and Extensibility
 Volcano makes a distinction between logical expressions and physical expressions
 Physical properties were handled rather haphazardly in EXODUS.
 Volcano algorithm is driven topdown; subexpressions are optimized only if warranted.
 Cost is defined in much more general terms in Volcano than in the EXODUS optimizer generator.
 The Volcano optimizer generator is more extensible than the EXODUS prototype, in particular with respect to the search strategy.
4.2 Search Efficiency and Effectiveness
 the Volcano optimizer generator is not only more extensible, it is also much more efficient and effective than the earlier EXODUS prototype.
5. Other Related Work
 Starburst