<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0'><id>tag:blogger.com,1999:blog-8155311428649606619</id><updated>2012-02-17T01:00:06.080-08:00</updated><title type='text'>Torby's Blog</title><subtitle type='html'></subtitle><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://torbydahl.blogspot.com/feeds/posts/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8155311428649606619/posts/default?max-results=100'/><link rel='alternate' type='text/html' href='http://torbydahl.blogspot.com/'/><link rel='hub' href='http://pubsubhubbub.appspot.com/'/><author><name>Torby</name><uri>http://www.blogger.com/profile/06885720710016578176</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='23' height='32' src='http://staff.newport.ac.uk/tdahl01/Images/tdahl.jpg'/></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>7</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>100</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-8155311428649606619.post-4602023107979672696</id><published>2012-02-15T01:51:00.001-08:00</published><updated>2012-02-17T00:54:55.834-08:00</updated><title type='text'>Incremental exploration</title><content type='html'>&lt;div&gt;&lt;div&gt;&lt;p&gt;Learn from demonstration typically means learning from training data that are in the form of a relatively small number of complex sequences of observations and potentially actions.  The strength of this learning paradigm is that the data provided is related to the crucial areas of the problem space.  In the case of reinforcement learning, this would involve the key reward states and effective paths to these states from relevant starting states.  However, due to the restricted part of a problem space that can be covered using this form of learning, it typically leads to brittle behaviours that are not able to compensate for perturbations that place the robot outside the known area.  One solution to this problem is to use the training data to learn a policy that generalises across large areas of the problem space, such as the Nonlinear Dynamical Systems presented by Ijspeert &lt;i&gt;et al.&lt;/i&gt; [3].  Another approach is to hard code a mechanism for returing to the known area such as the extension to Gaussian Mixture Models  presented by Calinon [2].  Abbeel and Ng [1], argued, from their experience in the domain of autonomous helicopter control, that an explicit exploration policy is not required in order to improve performance up to or beyond that of the teacher.  Instead, the natural perturbations would provide sufficient exploration.&lt;/p&gt;&lt;p&gt;Bill Smart and Leslie Kaelbling [5] developed the JAQL (Joystick and Q-learning?) algorithm to overcome this problem.  The JAQL algorithms has two different learning phases.  In the first phase, the robot is driven through the "interesting" parts of the problem space by a hand coded controller or by a human controller using a joystick.  In the second phase, the policy learned was in control and responsible for further exploration, running in a more standard reinforcement learning mode.  The JAQL algorithm has an explicit exploration policy designed to work with policies learnt from demonstration.&lt;/p&gt;&lt;p&gt;The JAQL exploration policy creates &lt;i&gt;slight &lt;/i&gt;deviations from the greedy action by adding a small amount of Gaussian noise [4].  This policy creates actions that are "&lt;i&gt;similar to, but different from&lt;/i&gt;", the greedy action.&lt;/p&gt;&lt;p&gt;Our RLSOM algorithm has so far been applied only to learning by demonstration, but should be capable of handling learning from exploration without other modifications that a reasonable exploration policy.  This is one of the most exciting direction in which to take our research.  &lt;/p&gt;&lt;p&gt;[1] Pieter Abbeel and Andrew Y. Ng, Exploration and apprenticeship learning in reinforcement learning.  In the &lt;i&gt;Proceedings of the 22nd International Conference on Machine Learning (ICML'05)&lt;/i&gt;, pp1-8, August 7-11, Bonn, Germany, 2005.&lt;/p&gt;&lt;p&gt;[2] Sylvain Calinon, Robot Programming by Demonstration: A Probabilistic Approach. EPFL/CRC Press, 2009.&lt;/p&gt;&lt;p&gt;[3] Auke J. Ijspeert, Jun Nakanishi and Stefan Schaal, Movement imitation with nonlinear dynamical systems in humanoid robots.  In the Proceedings of the International Conference on Robotics and Automation (ICRA'02), pp1398-1403, May 11 - 15, Washington, DC, 2002.&lt;/p&gt;&lt;p&gt;[4] William D. Smart, Making Reinforcement Learning Work on Real Robots.  Ph.D. thesis, Department of Computer Science, Brown University, 2002.&lt;/p&gt;&lt;p&gt;[5] William D. Smart and Leslie Pack Kaelbling,  Reinforcement Learning for Robot Control.  In &lt;i&gt;Mobile Robots XVI (Proceedings of the SPIE 4573)&lt;/i&gt;, pp92-103, Douglas W. Gage and Howie M. Choset (eds.), Boston, Massachusetts, 2001.&lt;/p&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8155311428649606619-4602023107979672696?l=torbydahl.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://torbydahl.blogspot.com/feeds/4602023107979672696/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8155311428649606619&amp;postID=4602023107979672696' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8155311428649606619/posts/default/4602023107979672696'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8155311428649606619/posts/default/4602023107979672696'/><link rel='alternate' type='text/html' href='http://torbydahl.blogspot.com/2012/02/exploration.html' title='Incremental exploration'/><author><name>Torby</name><uri>http://www.blogger.com/profile/06885720710016578176</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='23' height='32' src='http://staff.newport.ac.uk/tdahl01/Images/tdahl.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8155311428649606619.post-4458217538630447038</id><published>2012-02-05T05:39:00.001-08:00</published><updated>2012-02-05T09:27:04.910-08:00</updated><title type='text'>Dimensions of cognition</title><content type='html'>There is a history, in the science of AI and its related fields, to simplify problems to an extent where the solutions no longer contributed to our knowledge of cognition.&lt;div&gt;&lt;p class="MsoNormal"&gt;&lt;o:p&gt;&lt;/o:p&gt;&lt;/p&gt;  &lt;p class="MsoNormal" style="text-align:justify"&gt;Three examples are symbolic problem solving, Computer Vision and Speech Recognition.  While each of these areas have produced valuable technologies, in their own right, they have also, by narrowing their focus, removed themselves so far from the problems faced by animals, including humans, that their solutions have not provided major breakthroughs in our understanding of cognitive systems.  One cannot criticize this work for simplifying and narrowing their studies, as this is a necessary part of developing working solutions to given problems.  The specificity of their solutions however, raises the question of whether it is possible to develop systems that both solve a specific problem and also provide key insights into cognition.  Cognition is, arguably, the ability to apply a general understanding of the world to new problems and situations, and, if so, cognition is the ability to generalize rather than specialize.&lt;o:p&gt;&lt;/o:p&gt;&lt;/p&gt;  &lt;p class="MsoNormal"&gt;A generalist approach to modelling cognition raises two fundamental questions:&lt;/p&gt;&lt;p class="MsoNormal"&gt;&lt;/p&gt;&lt;ol&gt;&lt;li&gt;&lt;span style="text-align: justify; text-indent: -21.6pt; "&gt;Generalize across what?&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="text-align: justify; text-indent: -21.6pt; "&gt;What kind of problems can provide tractable challenges for generalist cognitive systems?&lt;/span&gt;&lt;/li&gt;&lt;/ol&gt;&lt;p&gt;&lt;/p&gt;&lt;p class="MsoListParagraphCxSpLast" style="margin-left:39.6pt;mso-add-space:auto; text-align:justify;text-indent:-21.6pt;mso-list:l0 level2 lfo1"&gt;&lt;o:p&gt;&lt;/o:p&gt;&lt;/p&gt;  &lt;p class="MsoNormal" style="text-align:justify"&gt;By challenges being tractable we mean that it is possible to imagine solutions based on the incremental development or integration of existing technologies.  There are many examples of interesting but intractable challenges for generalist cognitive systems.  Most challenges requiring human-like cognitive abilities such as autonomous robot workers or companions are clearly still intractable, but many challenges which have, arguably, lower cognitive requirements, e.g., robotic sheep  dogs, guard dogs, steeds or even pack animals, also look intractable w.r.t. many of the sub-problems they contain.&lt;o:p&gt;&lt;/o:p&gt;&lt;/p&gt;  &lt;p class="MsoNormal" style="text-align:justify"&gt;Beyond finding tractable challenges it is also interesting to consider whether we can identify a sequence of challenges that could form milestones along a path of increasingly high levels of generalist cognitive abilities.  Following from the second question, we can also ask whether any such problems would be practical, by which we mean that solving them would provide a technology that could be useful to society.&lt;/p&gt;&lt;p&gt;&lt;/p&gt;&lt;p&gt;Looking for dimensions of cognition, Stephen Mithen, in his book &lt;a href="http://www.amazon.co.uk/Prehistory-Mind-Origins-Religion-Science/dp/075380204X"&gt;The prehistory of the mind&lt;/a&gt;, 1996, suggests a number of &lt;i&gt;specialized&lt;/i&gt; intelligences that act as a foundation, or 'chapels' around an initial &lt;i&gt;general&lt;/i&gt; intelligence.  On top of these chapels, the 'superchapel' of meta-representation is then built, to provide the cognitive capabilities of modern humans.  The specialized intelligences suggested by Steven Mithen are:&lt;/p&gt;&lt;p&gt;&lt;/p&gt;&lt;ol&gt;&lt;li&gt;Technical intelligence&lt;/li&gt;&lt;li&gt;Natural history intelligence&lt;/li&gt;&lt;li&gt;Social intelligence&lt;/li&gt;&lt;li&gt;Linguistic intelligence &lt;/li&gt;&lt;/ol&gt;&lt;p&gt;&lt;/p&gt;&lt;p&gt;Before Steven Mithen, Anette Karmiloff-Smith, in her book &lt;a href="http://www.amazon.co.uk/Beyond-Modularity-Developmental-Perspective-Development/dp/0262611147/ref=ntt_at_ep_dpt_1"&gt;Beyond Modularity: A developmental perspective on cognitive science&lt;/a&gt; built on Piaget and Fodor to suggest that a child's cognitive development takes place within five different &lt;i&gt;domains&lt;/i&gt; before the process of &lt;i&gt;representational redescription&lt;/i&gt; produces &lt;i&gt;domain generic&lt;/i&gt; knowledge from the previous &lt;i&gt;domain specific&lt;/i&gt; knowledge.  Anette Karmiloff-Smith considers the following domains:&lt;/p&gt;&lt;p&gt;&lt;/p&gt;&lt;ol&gt;&lt;li&gt;The child as a linguist&lt;/li&gt;&lt;li&gt;The child as a physicist&lt;/li&gt;&lt;li&gt;The child as a mathematician&lt;/li&gt;&lt;li&gt;The child as a psychologist&lt;/li&gt;&lt;li&gt;The child as a notator&lt;/li&gt;&lt;/ol&gt;&lt;p&gt;&lt;/p&gt;&lt;p&gt;These and other proposals of cognitive dimensions, including the traditional four F's (feeding, fighting, fleeing and mating) and theories from neuroscience, e.g., Joaquin M Fuster's book &lt;a href="http://www.amazon.co.uk/Cortex-Mind-Cognition-Joaquin-Fuster/dp/0195147529"&gt;Cortex and Mind: Unifying Cognition&lt;/a&gt;.&lt;/p&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8155311428649606619-4458217538630447038?l=torbydahl.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://torbydahl.blogspot.com/feeds/4458217538630447038/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8155311428649606619&amp;postID=4458217538630447038' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8155311428649606619/posts/default/4458217538630447038'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8155311428649606619/posts/default/4458217538630447038'/><link rel='alternate' type='text/html' href='http://torbydahl.blogspot.com/2012/02/dimensions-of-cognition.html' title='Dimensions of cognition'/><author><name>Torby</name><uri>http://www.blogger.com/profile/06885720710016578176</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='23' height='32' src='http://staff.newport.ac.uk/tdahl01/Images/tdahl.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8155311428649606619.post-7000814216704454629</id><published>2012-01-23T13:44:00.001-08:00</published><updated>2012-01-23T13:50:10.705-08:00</updated><title type='text'>Prioritized sweeping</title><content type='html'>&lt;div&gt;In considering how we can make our RLSOM algorithm more efficient, the idea of focusing the activation spreading around the area of maximum activity was raised (by my PhD student Georgios Pierris).  This concept was formalised as 'prioritized sweeping' by Moore and Atkeson in 1993.&lt;/div&gt;&lt;div&gt;Like growing SOMs, this is a concept I would like to explore further.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;span &gt;Reference:&lt;/span&gt;&lt;/div&gt;&lt;div&gt;Moore, A.W., Atkeson, C.G.: Prioritized sweeping: Reinforcement learning with less data and less time. &lt;i&gt;Machine Learning,&lt;/i&gt; 13:103–130, 1993.&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8155311428649606619-7000814216704454629?l=torbydahl.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://torbydahl.blogspot.com/feeds/7000814216704454629/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8155311428649606619&amp;postID=7000814216704454629' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8155311428649606619/posts/default/7000814216704454629'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8155311428649606619/posts/default/7000814216704454629'/><link rel='alternate' type='text/html' href='http://torbydahl.blogspot.com/2012/01/prioritized-sweeping.html' title='Prioritized sweeping'/><author><name>Torby</name><uri>http://www.blogger.com/profile/06885720710016578176</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='23' height='32' src='http://staff.newport.ac.uk/tdahl01/Images/tdahl.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8155311428649606619.post-2616692866030368411</id><published>2011-10-31T07:57:00.000-07:00</published><updated>2011-10-31T08:58:48.231-07:00</updated><title type='text'>Matching landscapes</title><content type='html'>One of my PhD students had a very good idea today.   In combining SOMs and RL we are defining two &lt;i&gt;landscapes&lt;/i&gt;, the SOM landscape and the reward landscape.  It would be very interesting to see if it would be possible to use the SOM landscape to approximate the RL landscape and, thus, calculate utilities for unexplored areas of the RL landscape.  It brings to mind Ulrich Nehmzow's work on sub-symbolic planning. &lt;div&gt;&lt;ul&gt;&lt;li&gt;John Pisokas and Ulrich Nehmzow, Performance Comparison of Three Subsymbolic Action Planners for Mobile Robots, Robotics and Autonomous Systems, 51(1):55-67, 2005&lt;/li&gt;&lt;/ul&gt;&lt;div&gt;Must develop this idea further...&lt;/div&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8155311428649606619-2616692866030368411?l=torbydahl.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://torbydahl.blogspot.com/feeds/2616692866030368411/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8155311428649606619&amp;postID=2616692866030368411' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8155311428649606619/posts/default/2616692866030368411'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8155311428649606619/posts/default/2616692866030368411'/><link rel='alternate' type='text/html' href='http://torbydahl.blogspot.com/2011/10/matching-landscapes.html' title='Matching landscapes'/><author><name>Torby</name><uri>http://www.blogger.com/profile/06885720710016578176</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='23' height='32' src='http://staff.newport.ac.uk/tdahl01/Images/tdahl.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8155311428649606619.post-6782895624739822531</id><published>2011-10-17T01:17:00.000-07:00</published><updated>2011-10-17T01:49:25.632-07:00</updated><title type='text'>Dimensions of RLSOM</title><content type='html'>At the Cognitive Robotics Research Centre at the University of Wales, Newport, we have been working on a reinforcement learning self-organizing map (RLSOM).&lt;br /&gt;Without going into the details of the RLSOM algorithm, I'd liek to list some potential extensions:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;From fixed length memory to primitives&lt;/li&gt;&lt;li&gt;STM Length and Learning Frequency&lt;/li&gt;&lt;li&gt;Representing space using decaying activation&lt;/li&gt;&lt;li&gt;Sparse representation of connections&lt;/li&gt;&lt;li&gt;One-shot and incremental learning&lt;/li&gt;&lt;li&gt;Pre-structuring hierarchies&lt;/li&gt;&lt;li&gt;Random connections across hierarchical levels&lt;/li&gt;&lt;/ul&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8155311428649606619-6782895624739822531?l=torbydahl.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://torbydahl.blogspot.com/feeds/6782895624739822531/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8155311428649606619&amp;postID=6782895624739822531' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8155311428649606619/posts/default/6782895624739822531'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8155311428649606619/posts/default/6782895624739822531'/><link rel='alternate' type='text/html' href='http://torbydahl.blogspot.com/2011/10/dimensions-of-rlsom.html' title='Dimensions of RLSOM'/><author><name>Torby</name><uri>http://www.blogger.com/profile/06885720710016578176</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='23' height='32' src='http://staff.newport.ac.uk/tdahl01/Images/tdahl.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8155311428649606619.post-1366351937204662387</id><published>2010-09-27T05:53:00.000-07:00</published><updated>2011-09-23T07:23:06.295-07:00</updated><title type='text'>Stability</title><content type='html'>I have remembered another feature that is required of efficient robot learning. This is &lt;em&gt;stability&lt;/em&gt;, i.e., the ability to correct perturbed trajectory. Bugmann &lt;em&gt;et al.&lt;/em&gt; &lt;a href="http://www.tech.plym.ac.uk/soc/staff/GuidBugm/pub/09_Bugmann.pdf"&gt;(2006)&lt;/a&gt; demonstrate this property in an autonomous wheelchair. When perturbations occur, blind, i.e., open loop, reproduction of a learned trajectory will not lead the robot to a target. In order to reach the goal, the robot must be able to receive feedback and to correct the learned trajectory so that the final target is reached.&lt;br /&gt;&lt;br /&gt;This feature has been described by Ijspeert &lt;em&gt;et al.&lt;/em&gt; &lt;a href="http://books.nips.cc/papers/files/nips15/CN01.pdf"&gt;(2002)&lt;/a&gt; as an &lt;em&gt;attractor landscape&lt;/em&gt; where the the learned policy will produce trajectories that move towards the ideal trajectory.  This can be achieved through designing reward functions that promote the ideal trajectory.  Algorithms that explore the landscape around this will produce an attractor landscape through the discounted rewards.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8155311428649606619-1366351937204662387?l=torbydahl.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://torbydahl.blogspot.com/feeds/1366351937204662387/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8155311428649606619&amp;postID=1366351937204662387' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8155311428649606619/posts/default/1366351937204662387'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8155311428649606619/posts/default/1366351937204662387'/><link rel='alternate' type='text/html' href='http://torbydahl.blogspot.com/2010/09/stability.html' title='Stability'/><author><name>Torby</name><uri>http://www.blogger.com/profile/06885720710016578176</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='23' height='32' src='http://staff.newport.ac.uk/tdahl01/Images/tdahl.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8155311428649606619.post-6056663259091891765</id><published>2010-09-23T03:43:00.000-07:00</published><updated>2011-09-23T07:15:27.800-07:00</updated><title type='text'>Features Necessary for Efficient Robot Learning</title><content type='html'>I have been collecting features I think are necessary for efficient robot learning algorithms. Much of my research lately has been related to integrating these features into a single system.&lt;br /&gt;&lt;br /&gt;The features so far are:&lt;br /&gt;&lt;ol&gt;&lt;li&gt;&lt;strong&gt;Sequential&lt;/strong&gt; - Sequences of events in time. This is something artificial neural networks (ANNs) are typically not very good at.&lt;/li&gt;&lt;br /&gt;&lt;li&gt;&lt;strong&gt;Hierarchical&lt;/strong&gt; - Hierarchical structures for long term memory (LTM). This enables reuse of low level behaviours and efficient encoding of observations. It is a well studied problem in reinforcement learning (RL) and Barto and Mahadevan &lt;a href="http://rlai.cs.ualberta.ca/papers/barto03recent.pdf"&gt;(2003)&lt;/a&gt; have published a good review paper.&lt;/li&gt;&lt;br /&gt;&lt;li&gt;&lt;strong&gt;Incremental&lt;/strong&gt; - Observations are processed in the sequence they are made without storing and revisiting old observations, improving the existing system before further observations are made. Reinke and Michalski &lt;a href="http://www.mli.gmu.edu/papers/86-90/88-24.pdf"&gt;(1988)&lt;/a&gt; first introduced this concept w.r.t. their incremental AQ algorithm for learning concept descriptions. Many ANNs are &lt;em&gt;statistic&lt;/em&gt;, i.e., they need to repeatedly pass through batches of observations.&lt;/li&gt;&lt;br /&gt;&lt;li&gt;&lt;strong&gt;One-shot&lt;/strong&gt; - Able to learn from a single, or a small number of, observations. Bayesian approaches to this problem have been published by Fei-Fei &lt;em&gt;et al. &lt;/em&gt;(2003) and Maas and Kemp &lt;a href="http://www.psy.cmu.edu/~ckemp/papers/maask09.pdf"&gt;(2009)&lt;/a&gt;. Instance-based learning algorithms such as the Nearest Sequence Memory (NSM) algorithm presented by McCallum &lt;a href="http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.38.6696"&gt;(1996)&lt;/a&gt; are an extreme form of one-shot learning. Wu and Demiris &lt;a href="http://www.iis.ee.ic.ac.uk/yiannis/wu2010icarcv.pdf"&gt;(2010)&lt;/a&gt; recently published an algorithm that is both hierarchical and one-shot.&lt;/li&gt;&lt;br /&gt;&lt;li&gt;&lt;strong&gt;Auto-associative&lt;/strong&gt; - Retrieving memory content using the content itself as a reference, e.g., retieving a stored image of a person's face using another image or a partially obscured image of that face. This is what neural networks are good at and it makes them able to handle noisy real world date. This is something RL algorithms are typically not so good at. &lt;/li&gt;&lt;br /&gt;&lt;li&gt;&lt;strong&gt;Future reward prediction&lt;/strong&gt; - Predicting what actions will optimise future rewards in what world states. This is a core feature of RL algorithms &lt;a href="http://webdocs.cs.ualberta.ca/~sutton/book/the-book.html"&gt;(Sutton &amp;amp; Barto 1998)&lt;/a&gt;.&lt;/li&gt;&lt;br /&gt;&lt;li&gt;&lt;strong&gt;Hidden state identification&lt;/strong&gt; - Different world states can produce identical observations, e.g., two different corridors in a building can look exactly the same. The only way to tell these states apart is to remember observations from the past, e.g., you know what corridor you're in because you remember what floor you took the elevator to. Some, but not all, RL algorithms have this feature.&lt;/li&gt;&lt;/ol&gt;&lt;p&gt;Cohen &lt;em&gt;et al.&lt;/em&gt; &lt;a href="http://ril.newport.ac.uk/pubs/Cohen02.pdf"&gt;(2002)&lt;/a&gt; have presented the Constructivist Learning Architecture (CLA) that was both sequential, hierarchical and auto-associative, using decaying node activities in a hierarchical self-organising map (SOM) or Kohonen network &lt;a href="http://books.google.co.uk/books?id=e4igHzyfO78C&amp;amp;dq=self-organizing+maps+kohonen&amp;amp;printsec=frontcover&amp;amp;source=bn&amp;amp;hl=en&amp;amp;ei=CVmbTKOcNZm8jAee-MjUCQ&amp;amp;sa=X&amp;amp;oi=book_result&amp;amp;ct=result&amp;amp;resnum=4&amp;amp;ved=0CC0Q6AEwAw#v=onepage&amp;amp;q&amp;amp;f=false"&gt;(Kohonen 2001)&lt;/a&gt;.&lt;/p&gt;&lt;p&gt;My paper (&lt;a href="http://staff.newport.ac.uk/tdahl01/pubs/tdahl-ai10.pdf"&gt;2010&lt;/a&gt;) presented a hierarchical extension of McCallum's NSM algorithm that was both sequential, one-shot and hierarchical with future reward prediction and hidden state identification capabilities.&lt;/p&gt;&lt;p&gt;Pierris and I &lt;a href="http://staff.newport.ac.uk/tdahl01/pubs/Pierris-roman10.pdf"&gt;(2010)&lt;/a&gt; published an algorithm that used a version of the Cohen's CLA algorithm, the Compressed Sparse Code (CoSCo) SOM, to reproduce a humanoid robot motion demonstrated by a human teacher. This work went beyond the original work of Cohen &lt;em&gt;et al.&lt;/em&gt; in that it repeatedly used the learned SOM for action selection in order to reproduce the motion. &lt;/p&gt;&lt;p&gt;In the future I aim to use the mechanisms I developed for the hierarchical NSM algorithm to extend Chaput's decaying activity hierarchical SOMs so that it can support future reward discount and hidden state identification. This will create an algorithm suitable for RL problems such as reaching.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8155311428649606619-6056663259091891765?l=torbydahl.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://torbydahl.blogspot.com/feeds/6056663259091891765/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8155311428649606619&amp;postID=6056663259091891765' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8155311428649606619/posts/default/6056663259091891765'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8155311428649606619/posts/default/6056663259091891765'/><link rel='alternate' type='text/html' href='http://torbydahl.blogspot.com/2010/09/necessary-features-of-robot-learning.html' title='Features Necessary for Efficient Robot Learning'/><author><name>Torby</name><uri>http://www.blogger.com/profile/06885720710016578176</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='23' height='32' src='http://staff.newport.ac.uk/tdahl01/Images/tdahl.jpg'/></author><thr:total>1</thr:total></entry></feed>
