Torby's Blog

Wednesday, November 23, 2016

Hierarchical Reinforcement Learning: A Literature Summary

This is a quick summary of current work on hierarchical reinforcement learning (RL) aimed at students choosing to do hierarchical RL projects under my supervision.

The most common formalisation of hierarchical RL in terms of semi-MDPs was given by Sutton, Precup and Singh

Richard S. Sutton, Doina Precup and Satinder Singh (1999) Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning. In Artificial Intelligence, 112:181–211.

There is also a summary of this area by

Andrew G. Barto, George Konidaris and Christopher Vigorito (2013) Behavioral Hierarchy: Exploration and Representation, in Computatonal and Robotic Models of the Hierarchical Organization of Behavior, pp13-46, Springer.

In 2015, Pierre-Luc Bacon, Jean Harb and Doina Precup published an article entitled 'The Option-Critic Architecture', describing an algorithm for automatically sub-dividing ans solving an RL problem.

Wednesday, October 26, 2016

Spatio-Temporal Data from Reinforcement Learning

Applying RL algorithms, in a spatial POMDP domains produces spatio-temporal data that it is necessary to analyse and organise in order to produce effective control policies.

There has recently been a great amount of progress in analysing cortical representations of space and time in terms of place-cells, gird cells. This work has the potential to inform the area of RL in terms of efficient encoding and reuse of spatial data.
The overlap between RL and the neuroscience of mapping local space is particularly interesting as RL can produce raw spatio-temporal data from local sensors. This provides us with an opportunity to analyse, explore and identify the computational and behavioural principles that enable efficient learning of spatial behaviours.

Neuroscience - Mapping local space

A great introduction to this work is available through the lectures from three Nobel price winners in this area:

John O'Keefe's Nobel lecture on Spatial Cells in the Hippocampal Formation
May-Britt Moser's Nobel lecture on Grid Cells, Place Cells and Memory
Edward I. Moser's Nobel lecture on Grid Cells and the Enthorinal Map of Space

There is also a TED talk from 2011 on this subject by Neil Burgess from UCL (in O'Keefe's group) entitled How your brain tells you where you are. Burgess has a range of more general papers on spatial cognition, including:

How environment and self-motion combine in neural representations of space, Journal of Physiology, 594(22):6535-6546, 2016
The 2014 Nobel Prize in Physiology or Medicine: A Spatial Model for Cognitive Neuroscience, Neuron 84:1120-1125, 2014

A brief colloquial presentation of this research entitled 'Discovering grid cells' is available from the Kavli Insitute of Systems Neuroscience's Centre for Neural Computation.

There was also a nice review article from in the Annual Review of Neuroscience entitled 'Place Cells, Grid Cells, and the Brain's Spatial Representation System', Vol. 31:69-89, 2008, by Edvard I. Moser, Emilio Kropff and May-Britt Moser.

There was also a Hippocampus special issue in grid-cells in 2008 edited by Michael E. Hasselmo, Edvard I. Moser and May-Britt Moser.

Recently there was another summary article in Nature Reviews Neuroscience entitled 'Grid cells and cortical representation', Vol. 15:466–481, 2014, by Edvard I. Moser, Yasser Roudi, Menno P. Witter, Clifford Kentros, Tobias Bonhoeffer and May-Britt Moser.

Further relevant work has recently been presented in an article entitled 'Grid Cells and Place Cells: An Integrated View of their Navigational and Memory Function' in Trends in Neurosciences, Vol. 38(12):763–775, 2015, by Honi Sanders, César Rennó-Costa, Marco Idiart and John Lisman.

A more general introcution to

Computational Approaches

There is a review article on computational approaches to these issues entitled 'Place Cells, Grid Cells, Attractors, and Remapping' in Neural Plasticity, Vol. 2011, 2011 by Kathryn J. Jeffery.

Other relevant articles:

'Impact of temporal coding of presynaptic entorhinal cortex grid cells on the formation of hippocampal place fields' in Neural Networks, 21(2-3):303-310, 2008, by Colin Molter and Yoko Yamaguchi.
'An integrated model of autonomous topological spatial cognition' in Autonomous Robots, 40(8):1379–1402, 2016, by Hakan Karaoğuz and Işıl Bozma.
In 2003, in a paper entitled 'Subsymbolic action planning for mobile robots: Do plans need to be precise?', John Pisokas and Ulrich Nehmzow used the topology-preserving properties of self-organising maps to create spatial proto-maps that supported sub-symbolic action planning in a mobile robot.
A paper entitled Emergence of multimodal action representations from neural network self-organization by German I. Parisi, Jun Tani, Cornelius Weber and Stefan Wermter includes an intteresting section called 'A self-organizing spatiotemporal hierarchy' wich addresses the automated structuring of spetio-temporal data.

Monday, April 11, 2016

A short bibliography BBAI and hierarchical RL with SOMs

This bibliography is meant for anyone who joins my research group to work on hierarchical reinforcement learning algorithms or related areas,

My publications

Georgios Pierris and Torbjørn S. Dahl, Learning Robot Control based on a Computational Model of Infant Cognition. In the IEEE Transactions on Cognitive and Developmental Systems, accepted for publication, 2016.
Georgios Pierris and Torbjørn S. Dahl, Humanoid Tactile Gesture Production using a Hierarchical SOM-based Encoding. In the IEEE Transactions on Autonomous Mental Development, 6(2):153-167, 2014.
Georgios Pierris and Torbjørn S. Dahl, A Developmental Perspective on Humanoid Skill Learning using a Hierarchical SOM-based Encoding. In the Proceedings of the 2014 International Joint Conference on Neural Networks (IJCNN'14), pp708-715, Beijing, China, July 6-11, 2014.
Torbjørn S. Dahl, Hierarchical Traces for Reduced NSM Memory Requirements. In the Proceedings of the BCS SGAI International Conference on Artificial Intelligence, pp165-178, Cambridge, UK, December 14-16, 2010.

Relevant papers

Daan Wierstra, Alexander Forster, Jan Peters and Jurgen Schmidhuber, Recurrent Policy Gradients. In Logic Journal of IGPL, 18:620-634, 2010. [pdf from IDSIA]
Andrew G. Barto and Sridhar Mahadevan, Recent advances in hierarchical reinforcement learning, Discrete Event Dynamic Systems, 13(4):341-379, 2003. [pdf from Citeseer]
Harold H. Chaput and Benjamin Kuipers and Risto Miikkulainenn Constructivist learning: A neural implementation of the schema mechanism. In the Proceedings of the Workshop on Self-Organizing Maps (WSOM03), Kitakyushu, Japan, 2003. [pdf from Citeseer]
Leslie B. Cohen, Harold H. Chaput and Cara H. Cashon, A constructivist model of infant cognition, Cognitive Development, 17:1323–1343, 2002 [pdf from ResearchGate]
Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning, In Artificial Intelligence, 112:181–211, 1999. [pdf from the University of Alberta]
Patti Maes, How to do the right thing, Connection Science Journal, 1:291-323, 1989. [pdf from Citeseer]
Rodney A. Brooks, A Robust Layered Control System for a Mobile Robot, IEEE Journal of Robotics and Automation, 2(1):14-23, 1986. [pdf of MIT AI Memo 864]

Books

Joaquin M. Fuster, Cortex and mind: Unifying cognition, Oxford University Press, 2003. [pdf from ResearcgGate]
Richard S. Sutton and Andrew G. Barto, Reinforcement learning: An introduction, MIT Press, 1998. [pdf of unfinished 2nd edition]
G. L. Drescher, Made-up minds, MIT Press, 1991 [pdf of MIT dissertation] - An actual constructivist architecture.

Wednesday, January 06, 2016

Sequence Similarity for Hidden State Estimation

Little work has been done on comparing long- and short-term memory (LTM and STM) traces in the context of hidden state estimation in POMDPs. Belief-state algorithms use a probabilistic step-by-step approach which should be optimal, but doesn't scale well and has an unrealistic requirement for knowledge of the state space underlying observations,

The instance-based Nearest Sequence Memory (NSM) algorithm performs remarkably well without any knowledge of the underlying state space. Instead is compares previously observed sequences of observations and actions in LTM with a recently observed sequence in STM to estimate the underlying state. The NSM algorithm uses a count of matching observation-action records as a metric for sequence proximity.

In problems where certain observation-actions are particularly salient, e.g., passing through a 'door' in Sutton's world, or picking up a passenger in the Taxi problem, a simple match count is not a particularly good sequence proximity metric and, as a result, I have recently been casting around for other work on such metrics.

ADD SLTM work reference here.

I have come across some interesting work on sequence comparison by Z. Meral Ozsoyoglu, Cenk Sahinalp and Piotr Indyk on Improving Proximity Search for Sequences. This work has been done in the context of genome sequencing could be an interesting starting point for going beyond the simple match count metric used by NSM.

Friday, October 30, 2015

Robagogy - Imitating and exploring

Just saw this new article on how human children mix imitation and exploration entitled Imitation and Innovation: The Dual Engines of Cultural Learning. It seems that the level to which they rely on imitation is reduced reliably with increased performance. This should inform our work on robot learning.
This and similar insights from psychology makes me think we should define a new field of robagogy, meaning 'how to teach robots'. This area should be informed by pedagogy 'how to teach children' and the less popular andragogy or 'how to teach adults'.

Wednesday, February 25, 2015

Beyond Random Motor Babbling: How to explore when learning behaviors

Humans can learn or improve skills by practicing, e.g., tennis strokes improve as you spend time whacking balls against a wall. This process is likely to involve exploration of a problem space, in the case of tennis, trying different the actuator values to develop appropriate responses to a range of different ball trajectories and speeds.

In robot skill learning, an important question is 'how do we do this exploration?' Many papers use random motor babbling [refs], typically within a safe envelope of values and most commonly with a flat probability distribution within each range [refs]. It is highly unlikely that this is how humans and animals explore.

It has been recognised for some time [ref] that system noise can provide sufficient variability to support exploration leading to the identification of effective solutions. Pinheiro et al. [1] just showed that such variability is also a sufficient requirement for learning new skills in humans.
Think more about using this to inspire new, biologically more plausible, exploration strategies for robot learning.

[1] João de Paula Pinheiro, Pricila Garcia Marques, Go Tani and Umberto Cesar Corrêa (2015) Diversification of motor skills rely upon an optimal amount of variability of perceptive and motor task demands. In Adaptive Behavior, (only online so far http://adb.sagepub.com/content/early/2015/02/23/1059712315571369?papetoc ).

Thursday, May 23, 2013

Other health care projects

There are a number of healthcare research projects that are highly relevant to tele-care. Below are the ones I know about so far:

Sensor Platform for HEalthcare in a Residential Environment (SPHERE), EPSRC