Torby's Blog: 02/01/2012

Learn from demonstration typically means learning from training data that are in the form of a relatively small number of complex sequences of observations and potentially actions. The strength of this learning paradigm is that the data provided is related to the crucial areas of the problem space. In the case of reinforcement learning, this would involve the key reward states and effective paths to these states from relevant starting states. However, due to the restricted part of a problem space that can be covered using this form of learning, it typically leads to brittle behaviours that are not able to compensate for perturbations that place the robot outside the known area. One solution to this problem is to use the training data to learn a policy that generalises across large areas of the problem space, such as the Nonlinear Dynamical Systems presented by Ijspeert et al. [3]. Another approach is to hard code a mechanism for returing to the known area such as the extension to Gaussian Mixture Models presented by Calinon [2]. Abbeel and Ng [1], argued, from their experience in the domain of autonomous helicopter control, that an explicit exploration policy is not required in order to improve performance up to or beyond that of the teacher. Instead, the natural perturbations would provide sufficient exploration.

Bill Smart and Leslie Kaelbling [5] developed the JAQL (Joystick and Q-learning?) algorithm to overcome this problem. The JAQL algorithms has two different learning phases. In the first phase, the robot is driven through the "interesting" parts of the problem space by a hand coded controller or by a human controller using a joystick. In the second phase, the policy learned was in control and responsible for further exploration, running in a more standard reinforcement learning mode. The JAQL algorithm has an explicit exploration policy designed to work with policies learnt from demonstration.

The JAQL exploration policy creates slight deviations from the greedy action by adding a small amount of Gaussian noise [4]. This policy creates actions that are "similar to, but different from", the greedy action.

Our RLSOM algorithm has so far been applied only to learning by demonstration, but should be capable of handling learning from exploration without other modifications that a reasonable exploration policy. This is one of the most exciting direction in which to take our research.

[1] Pieter Abbeel and Andrew Y. Ng, Exploration and apprenticeship learning in reinforcement learning. In the Proceedings of the 22nd International Conference on Machine Learning (ICML'05), pp1-8, August 7-11, Bonn, Germany, 2005.

[2] Sylvain Calinon, Robot Programming by Demonstration: A Probabilistic Approach. EPFL/CRC Press, 2009.

[3] Auke J. Ijspeert, Jun Nakanishi and Stefan Schaal, Movement imitation with nonlinear dynamical systems in humanoid robots. In the Proceedings of the International Conference on Robotics and Automation (ICRA'02), pp1398-1403, May 11 - 15, Washington, DC, 2002.

[4] William D. Smart, Making Reinforcement Learning Work on Real Robots. Ph.D. thesis, Department of Computer Science, Brown University, 2002.

[5] William D. Smart and Leslie Pack Kaelbling, Reinforcement Learning for Robot Control. In Mobile Robots XVI (Proceedings of the SPIE 4573), pp92-103, Douglas W. Gage and Howie M. Choset (eds.), Boston, Massachusetts, 2001.

The article below builds on some old ideas discussed in my old ECAL 2001 paper, Evolution, Adaptation, and Behavioural Holism in Artificial Intelligence and also included in my PhD dissertation Behaviour-Based Learning: Evolution-Inspired Development of Adaptive Robot Behaviours. They were originally a comment on behavior-based robotics as presented in Rodney Brooks's papers A Robust Layered Control System for a Mobile Robot and Elephants Don't Play Chess, and in Ronald C. Arkin's book Behavior-based Robotics.

These ideas gained new relevance recently when I participated in a the Challenges for Artificial Cognitive System II workshop arranged by the European Network for the Advancement of Artificial Cognitive Systems, Interaction and Robotics (EUCogII). Some ideas from that workshop were captures in the workshop wiki. Below I summarize what I took from the workshop.

There is a history, in the science of AI and its related fields, to look a problems in isolation and to simplify problems to the extent where the solutions do contribute significantly to our knowledge of cognition. Three examples are symbolic problem solving, Computer Vision and Speech Recognition. While each of these areas have produced valuable technologies, in their own right, they have also, by narrowing their focus, removed themselves so far from the problems faced by animals, including humans, that their solutions have not, to any great extent, helped our understanding of cognitive systems. One cannot criticize this work for simplifying and narrowing their studies, as this is a necessary part of developing working solutions to given problems. The specificity of their solutions however, raises the question of whether it is possible to develop systems that both solve a specific problem, and also provide key insights into cognition. Cognition is, arguably, the ability to apply a general understanding of the world to new problems and situations, and, if so, cognition is generalization rather than specialization.

A generalist approach to modelling cognition raises two fundamental questions:

Generalize across what?
What kind of problems can provide tractable challenges for generalist cognitive systems?

By challenges being tractable we mean that it is possible to imagine solutions based on the incremental development or integration of existing technologies. There are many examples of interesting but intractable challenges for generalist cognitive systems. Most challenges requiring human-like cognitive abilities such as autonomous robot workers or companions are clearly still intractable, but many challenges which have, arguably, lower cognitive requirements, e.g., robotic sheep dogs, guard dogs, steeds or even pack animals, also look intractable w.r.t. many of the sub-problems they contain.

Beyond finding tractable challenges it is also interesting to consider whether we can identify a sequence of challenges that could form milestones along a path of increasingly high levels of generalist cognitive abilities. Following from the second question, we can also ask whether any such problems would be practical, by which we mean that solving them would provide a technology that could be useful to society.

Physiology

Recognizing that cognition is dependent on physiology introduces a number of physiological dimensions of cognition:

Sensors; Vision (stereo, colour), audition (stereo), proprioceptive, tactile
Actuators; Muscles, Legs, arms, hands, thumbs
Nervous system; Spine, limbic system, cortical architecture

Neil R. Carlson's Physiology of Behavior is a good introduction to sensors, actuators and related physiology. The architecture of the complete brain is described in Larry W. Swanson's book Brain Architecture: Understanding the Basic Plan. The cortical architecture is discussed in Joaquin M Fuster's book Cortex and Mind: Unifying Cognition.

Environment

The environment is also a crucial actor in enabling or prohibiting intelligent behaviour. I have divided this into:

Physical environment
Social environment

A book that discusses some issues in this respect is John Alcock's Animal Behavior: An Evolutionary Approach.

Cognition

Anette Karmiloff-Smith, in her book Beyond Modularity: A developmental perspective on cognitive science builds on traditional developmental approaches such as those of Piaget and Fodor to suggest that a child's cognitive development takes place within five different domains before the process of representational redescription produces domain generic knowledge from the previous domain specific knowledge. Anette Karmiloff-Smith considers the following domains:

The child as a linguist
The child as a physicist
The child as a mathematician
The child as a psychologist
The child as a notator

Looking for dimensions of cognition, Stephen Mithen, in his book The prehistory of the mind, 1996, suggests a number of specialized intelligences that act as a foundation, or 'chapels' around an initial general intelligence. On top of these chapels, the 'superchapel' of meta-representation is then built, to provide the cognitive capabilities of modern humans. The specialized intelligences suggested by Steven Mithen are:

Technical intelligence
Natural history intelligence
Social intelligence
Linguistic intelligence

Finally, in the book A Roadmap for Cognitive Development in Humanoid Robots, David Vernon, Claes von Hofsten and Luciano Fadiga have used knowledge from human cognitive development to define a road map for cognitive development in humanoid robots. This work is very much in the spirit of what I suggest here, but I would like to consider a wider scientific that will give us a better chance of identifying realistic milestones.

Examples

The Kizmet robot was developed at MIT and used in a wide range of research activities.

The iCub robot is a popular humanoid research robot that has also been used for a wide range of research activities. As a result, it also has a well developed cognitive architecture.

Torby's Blog

Wednesday, February 15, 2012

Incremental exploration

Sunday, February 05, 2012

Dimensions of cognition: A Generalist Manifesto