Nature machine intelligence, March 2019
Ali Marjaninejad1,2, Darío Urbina-Meléndez1, Brian A. Cohn4, Francisco J. Valero-Cuevas1,2,3,4,5,* *Corresponding author’s email: [email protected]
Departments of Biomedical1, Electrical (Systems)2, and Aerospace & Mechanical Engineering3, Department of Computer Science4, and Division of Biokinesiology & Physical Therapy5 University of Southern California, Los Angeles, CA, USA.
Robots will become ubiquitously useful only when they require just a few attempts to teach themselves to perform different tasks, even with complex bodies and in dynamic environments. Vertebrates use sparse trial and error to learn multiple tasks, despite their intricate tendon-driven anatomies, which are particularly hard to control because they are simultaneously nonlinear, under-determined and over-determined. We demonstrate—in simulation and hardware—how a model-free, open-loop approach allows few-shot autonomous learning to produce effective movements in a three-tendon two-joint limb. We use a short period of motor babbling (to create an initial inverse map) followed by building functional habits by reinforcing high-reward behaviour and refinements of the inverse map in a movement’s neighbourhood. This biologically plausible algorithm, which we call G2P (general to particular), can potentially enable quick, robust and versatile adaptation in robots as well as shed light on the foundations of the enviable functional versatility of organisms.