The most common formalisation of hierarchical RL in terms of semi-MDPs was given by Sutton, Precup and Singh
- Richard S. Sutton, Doina Precup and Satinder Singh (1999) Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning. In Artificial Intelligence, 112:181–211.
There is also a summary of this area by
- Andrew G. Barto, George Konidaris and Christopher Vigorito (2013) Behavioral Hierarchy: Exploration and Representation, in Computatonal and Robotic Models of the Hierarchical Organization of Behavior, pp13-46, Springer.
In 2015, Pierre-Luc Bacon, Jean Harb and Doina Precup published an article entitled 'The Option-Critic Architecture', describing an algorithm for automatically sub-dividing ans solving an RL problem.