Computational neuroscience for learning from a small sample
Abstract
Deep neural networks have been remarkably useful for image
classification and phoneme recognition. Combined with reinforcement
learning algorithms, deep neural networks have outperformed human
experts in simulated video games and the game "Go". To achieve such
successes, millions of images, hundreds of millions of phonemes, and
tens of millions of games have been utilized as training data sets in
the supervised learning or training trials in the reinforcement
learning. Meanwhile, in the 2015 DARPA robotics challenge final
competition (2015 DARPA Robotics Challenge Finals), many humanoid robots
fell while walking on sand, going up stairs, turning bulbs, or getting
out of a car. A small number of humanoids completed all the tasks, but
they were extremely slower than humans. By age 5, human infants are able
to execute all of the above tasks more quickly and reliably than
humanoid robots developed by world premier researchers. What could be
the reasons of this dramatic contrast between success and failure for
simulated versus real-world tasks by artificial intelligence? In the
simulated video games and "Go", the degrees of freedom of the controlled
system were relatively small, there were no hidden variables, and state
transitions were deterministic without noise and perfectly described by
simple rules. Thus, the computer simulations were exactly correct
without errors. For the final reason, tens of millions of simulated
games are generated by software players, and they can be used
efficiently for DeepQ learning (a Q-learning algorithm of reinforcement
learning combined with deep neural network learning). In contrast, a
humanoid robot in the real world is a complicated nonlinear dynamical
system with huge degrees of freedom. Indeed, hidden states can be
situated far above measured sensory signals and far below issued motor
commands. Many physical processes, including contact and friction, are
difficult to model. Mainly for the final reason, quantitatively reliable
simulations of humanoid robots in real-world environments are extremely
difficult even if not impossible. Thus, reinforcement learning in
humanoids designed to operate in the real world has been typically
conducted using real experimental trials. However, when humanoids fall,
they are often damaged such that no further trials can be accumulated
before painful, expensive and laborious repairs are made. In artificial
intelligence, or more precisely, in neural networks learning and machine
learning, it is well established that when a learning system with a
fixed degrees of freedom n is utilized, approximately 10n training
samples are necessary. If it is possible to conduct tens of millions of
learning trials, a large learning system, such as deep neural networks,
can be utilized. However, if only 100 trials can be accumulated, only
very simple learning systems with ten degrees of freedom should be
utilized to avoid over-fitting problems in learning. I postulate that
these differences in the number of training samples and consequently
resulting allowed degrees of freedom of the control systems readily
explain the dramatic contrast between the success of the simulated
learning and the failure of the real-world learning mentioned above.
Animal brains are confronted with sensorimotor problems that are
much more challenging than those faced by humanoid robots. Animal bodies
are flexible and possess an enormous number of muscles, sensors, and
motor neurons. Neurons are slow-computing devices with a significant
degree of noise. Thus, physical modeling of animal movements is very
difficult, as there are many degrees of freedom, hidden variables, a
high noise level, and a risk of injury or death in the case of failure.
The human brain contains 10 to the 11th neurons and 10 to the 14th
synapses. As a learning control system it has enormous degrees of
freedom. If we assume that the number of synapses correspond to the
degree of freedom of the learning system, and that a single
reinforcement learning trial can be obtained within 10 seconds, then it
follows that an animal brain will need 10 to the 15th training trials,
and thus 10 to the 16th seconds for learning time to avoid over-fitting.
This period is much longer than an animal life. In contrast to this
estimate, humans learn motor control very quickly. For example, humans
can learn new dynamic environment within a few trials. Human infants
learn to walk after only several thousands falls. Through computational
neuroscience research of sensorimotor learning, I hope to understand a
mystery to brake the common sense in artificial intelligence: 10 to the
11th degrees-of-freedom learning system can learn to control an
extremely complicated nonlinear dynamical system only after 1,000
failures. Kawato and Samejima (2007) reviewed several computational
schemes for enabling efficient reinforcement learning from a small
training samples. They include internal models, sparse estimation
algorithms, multiple- paired forward and inverse models, and a
hierarchical reinforcement learning algorithms. Attention,
consciousness, metacognition, and episodic memory are important research
topics in cognitive neuroscience, and have recently attracted the
interests of artificial intelligence researchers with the hope that they
could provide computational mechanisms to decrease high dimensionality
of data in learning. They may play essential roles in constructing
abstract concepts, dimensions and attributes that are high-level
representations necessary in the upper layers of hierarchical
reinforcement learning. With respect to reducing the dimensionality of
high-dimensional data, electrical synapses that transmit information via
gap junctions are attractive elements in neuronal circuits because they
tend to synchronize neurons and effectively reduce the degrees of
freedom of the circuit.
The cerebellum is important for motor control and motor learning and
plays very important roles in multi-joint movements such as walking.
The inferior olivary (IO) nucleus sends climbing fiber inputs to
Purkinje cell (PC), the only output of all motor coordination in the
cerebellar cortex, and possesses the highest density of gap junctions in
the mammalian brain. As a good candidate for a neuronal system that
plays a central role in motor learning and that may be useful in
investigating the above-mentioned disparity between the large degrees of
freedom of learning systems and conditions where only a small number of
training trials are available, I focus on the olivo-cerebellar system.
Of special interest is the network of IO neurons, which may control the
degrees of freedom by adjusting their synchronous/asynchronous firing
activities to provide an adaptive framework for the learning machinery.
In the cerebellar motor learning, it has been known that the IO neurons
transmit error signals to the PC, inducing plasticity at the parallel
fiber-PC synapses. Recent investigations have also revealed multiple
plasticity mechanisms as well as evidence that parallel fiber-evoked
simple spikes to PCs contribute to cerebellum-dependent learning to some
extent. One dominant view over the last several decades suggests that
complex spikes transmitted through the climbing fibers provide
instructive signals to the PCs to drive learning. To examine the
functions of the IO, computational modeling has been one of the
promising driving forces. As the carrier of the teaching signals, the IO
has been modeled to provide the climbing fiber inputs in the simulation
studies of the cerebellar learning. To explore the IO dynamics in
detail, a class of simplified conductance-based models has been
developed to reproduce experimental observation of sub- threshold
oscillations. Further details of the electrophysiological properties of
the IO neurons have been described by multiple compartment models, which
have been applied to elucidate experimental observation of the
sub-threshold activities, to examine the capability of their information
transmission, and to estimate conductance levels of the IO network from
experimental data. Owing to the advanced experimental methods as well
as the rapid growth in computer power, the computational models have
been nowadays utilized for quantitative understanding of the
experimentally measured IO dynamics and furthermore for testing
hypotheses regarding IO functions. Here, I review recent advances in the
computational modeling of the olivo- cerebellar system.