Efficient and Robust Automated Machine Learning

Reference: Feurer, Matthias, et al. "Efficient and robust automated machine learning." Advances in Neural Information Processing Systems. 2015.

0. Abstract

The success of machine learning in a broad range of applications has led to an ever-growing demand for machine learning systems that can be used off the shelf by non-experts. To be effective in practice, such systems need to automatically choose a good algorithm and feature preprocessing steps for a new dataset at hand, and also set their respective hyperparameters. Recent work has started to tackle this automated machine learning (AutoML) problem with the help of efficient Bayesian optimization methods. Building on this, we introduce a robust new AutoML system based on scikit-learn (using 15 classifiers, 14 feature preprocessing methods, and 4 data preprocessing methods, giving rise to a structured hypothesis space with 110 hyperparameters). This system, which we dub AUTO-SKLEARN, improves on existing AutoML methods by automatically taking into account past performance on similar datasets, and by constructing ensembles from the models evaluated during the optimization. Our system won the first phase of the ongoing ChaLearn AutoML challenge, and our comprehensive analysis on over 100 diverse datasets shows that it substantially outperforms the previous state of the art in AutoML. We also demonstrate the performance gains due to each of our contributions and derive insights into the effectiveness of the individual components of AUTO-SKLEARN.

1. Introduction

  • We define AutoML as the problem of automatically (without human input) producing test set predictions for a new dataset within a fixed computational budget.
  • In practice, the budget would comprise computational resources, such as CPU and/or wallclock time and memory usage.

2. AutoML as a CASH problem

CASH problem: Combined Algorithm Selection and Hyperparameter optimization (CASH)

  • Bayesian optimization fits a probabilistic model to capture the relationship between hyperparameter settings and their measured performance; it then uses this model to select the most promising hyperparameter setting (trading off exploration of new parts of the space vs. exploitation in known good regions), evaluates that hyperparameter setting, updates the model with the result, and iterates.
  • While Bayesian optimization based on Gaussian process models performs best in low-dimensional problems with numerical hyperparameters, tree-based models have been shown to be more successful in high-dimensional, structured, and partly discrete problems, such as the CASH problem.
  • Among the tree-based Bayesian optimization methods, the random-forest-based SMAC outperforms the tree Parzen estimator TPE.

3. New Methods for Increasing Efficiency and Robustness of AutoML

3.1 Meta-Learning for Finding Good Instantiations of Machine Learning Frameworks

  1. In an offline phase, for each machine learning dataset in a dataset repository (in our case 140 datasets from the OpenML repository), we evaluated a set of meta-features and used Bayesian optimization to determine and store an instantiation of the given ML framework with strong empirical performance for that dataset.
  2. Given a new dataset , we compute its meta-features, rank all datasets by their distance to in meta-feature space and select the stored ML framework instantiations for the k = 25 nearest datasets for evaluation before starting Bayesian optimization with their results.

3.2 Automated Ensemble Construction of Models Evaluated During Optimization

  • We store the good models and use an efficient post-processing method to construct an ensemble out of them. This automatic ensemble construction avoids to commit itself to a single hyperparameter setting and is thus more robust (and less prone to overfitting) than using the point estimate that standard hyperparameter optimization yields.
  • It is well known that ensembles often outperform individual models, and that effective ensembles can be created from a library of models. Ensembles perform particularly well if the models they are based on (1) are individually strong and (2) make uncorrelated errors.
  • We experimented with different approaches to optimize these weights of models in the ensemble: stacking, gradient-free numerical optimization, and ensemble selection. While we found both numerical optimization and stacking to overfit to the validation set and to be computationally costly, ensemble selection was fast and robust.

4. A Practical Automated Machine Learning System

  • Use SMAC for Bayesian Optimization of all hyperparameters

5. Comparing auto-sklearn to Auto-WEKA and hyperopt-sklearn

6. Evaluation of the Proposed AutoML Improvements

  • Since the class distribution in many of these datasets is quite imbalanced we evaluated all AutoML methods using a measure called balanced classification error rate (BER).

7. Detailed Analysis of auto-sklearn Components

  • Overall, as expected, random forests, extremely randomized trees, AdaBoost, and gradient boosting, showed the most robust performance, and SVMs showed strong peak performance for some datasets.
  • The decision tree, passive aggressive, kNN, Gaussian NB, LDA and QDA were statistically significantly inferior to the best classifier on most datasets.
  • The table indicates that no single method was the best choice for all datasets.
Written on February 12, 2018