Evolving Reinforcement Learning Algorithms

What?

Genetic algorithms for learning RL algorithms.

Why?

Researchers designing algorithms is not cool. Researchers who design algorithms to design algorithms is much cooler.

How?

source: original paper

TLDR: Run a two-level optimisation problem, with the algorithm (computation graph) optimised in the outer loop, and a value-based RL agent trained in the inner loop using the computation graph trained above, i.e. RL algorithm is the evaluation function for the genetic algorithm:

$$ L^* = \arg\max_L [\sum_{\mathcal{E}}\text{Eval}(L, \mathcal{E})] $$

There are three node types in the computation graph:

Input nodes:
- Typical transition info $(s_t, a_t, s_{t+1}, r_t)$
- Constants, e.g. $\gamma$
Parameter nodes:
- NN weights, e.g. $Q$-function network
Output nodes, i.e. take inputs and apply:
- parameter nodes
- basic math operators (e.g. divide by zero)
- probability and statistics (e.g. calculate entropy)

So, in all above, the RL algorithm is simply the evaluation of the proposed computation graph. The genetic algorithm is the main thing here. The authours used regularized evolution here that in every iteration:

picks $T$ algorithms at random;
selects the best algorithm;
mutates the selected algorithm;
removes the oldest in the population.

Everything that runs an RL algorithm in an inner loop is incredibly slow. To speed up the computation, the authors do the following:

Hash the computation by function equivalence.
Use the mountain car as a quick test of the algorithm. If it does not learn from the very beginning, the evaluation is termination.
Drop invalid programs (e.g. the program should be differentiable, i.e. there should be a path in the graph between the output and the policy parameters)
Bootstrap the algorithm using Q-learning as the graph initialisation (plus some random nodes)