Genetic algorithms for learning RL algorithms.
Researchers designing algorithms is not cool. Researchers who design algorithms to design algorithms is much cooler.
source: original paper
TLDR: Run a two-level optimisation problem, with the algorithm (computation graph) optimised in the outer loop, and a value-based RL agent trained in the inner loop using the computation graph trained above, i.e. RL algorithm is the evaluation function for the genetic algorithm:
$$ L^* = \arg\max_L [\sum_{\mathcal{E}}\text{Eval}(L, \mathcal{E})] $$
There are three node types in the computation graph:
So, in all above, the RL algorithm is simply the evaluation of the proposed computation graph. The genetic algorithm is the main thing here. The authours used regularized evolution here that in every iteration:
Everything that runs an RL algorithm in an inner loop is incredibly slow. To speed up the computation, the authors do the following: