Torby's Blog: 11/01/2016

This is a quick summary of current work on hierarchical reinforcement learning (RL) aimed at students choosing to do hierarchical RL projects under my supervision.

The most common formalisation of hierarchical RL in terms of semi-MDPs was given by Sutton, Precup and Singh

Richard S. Sutton, Doina Precup and Satinder Singh (1999) Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning. In Artificial Intelligence, 112:181–211.

There is also a summary of this area by

Andrew G. Barto, George Konidaris and Christopher Vigorito (2013) Behavioral Hierarchy: Exploration and Representation, in Computatonal and Robotic Models of the Hierarchical Organization of Behavior, pp13-46, Springer.

In 2015, Pierre-Luc Bacon, Jean Harb and Doina Precup published an article entitled 'The Option-Critic Architecture', describing an algorithm for automatically sub-dividing ans solving an RL problem.

Torby's Blog

Wednesday, November 23, 2016

Hierarchical Reinforcement Learning: A Literature Summary