Learning to learn by gradient descent by gradient descent

Reference: Andrychowicz, Marcin, et al. "Learning to learn by gradient descent by gradient descent." Advances in Neural Information Processing Systems. 2016.

0. Abstract

The move from hand-designed features to learned features in machine learning has been wildly successful. In spite of this, optimization algorithms are still designed by hand. In this paper we show how the design of an optimization algorithm can be cast as a learning problem, allowing the algorithm to learn to exploit structure in the problems of interest in an automatic way. Our learned algorithms, implemented by LSTMs, outperform generic, hand-designed competitors on the tasks for which they are trained, and also generalize well to new tasks with similar structure. We demonstrate this on a number of tasks, including simple convex problems, training neural networks, and styling images with neural art.

1. Introduction

In this work we take a different tack and instead propose to replace hand-designed update rules with a learned update rule, which we call the optimizer g, specified by its own set of parameters φ.

1.1 Transfer learning and generalization

The goal of this work is to develop a procedure for constructing a learning algorithm which performs well on a particular class of optimization problems.

2. Learning to learn with recurrent neural networks

2.1 Coordinatewise LSTM optimizer

  • Optimizing at this scale with a fully connected RNN is not feasible as it would require a huge hidden state and an enormous number of parameters. To avoid this difficulty we will use an optimizer m which operates coordinatewise on the parameters of the objective function, similar to other common update rules like RMSprop and ADAM.
  • In practice rescaling inputs and outputs of an LSTM optimizer using suitable constants (shared across all timesteps and functions f).

3. Experiments

3.1 Quadratic functions

3.2 Training a small neural network on MNIST

3.3 Training a convolutional network on CIFAR-10

3.4 Neural Art

Written on October 26, 2017