I have remembered another feature that is required of efficient robot learning. This is stability, i.e., the ability to correct perturbed trajectory. Bugmann et al. (2006) demonstrate this property in an autonomous wheelchair. When perturbations occur, blind, i.e., open loop, reproduction of a learned trajectory will not lead the robot to a target. In order to reach the goal, the robot must be able to receive feedback and to correct the learned trajectory so that the final target is reached.
This feature has been described by Ijspeert et al. (2002) as an attractor landscape where the the learned policy will produce trajectories that move towards the ideal trajectory. This can be achieved through designing reward functions that promote the ideal trajectory. Algorithms that explore the landscape around this will produce an attractor landscape through the discounted rewards.