Statistical Learning Overview
Examples of statistical learning applied at the Media Lab:
- User interfaces
- modelling subjectivity and affect, intelligent agents, transduction
(input from camera, microphone, or fish sensor)
- Recovering visual models
- face recognition, model-based video, avatars
- Dynamical systems
- speech recognition, visual tracking, gesture
recognition, virtual instruments
- Probabilistic modeling
- image compression, low bandwidth teleconferencing, texture synthesis
All can be placed somewhere in this taxonomy, ordered from specific to
- Function learning
- Learn a mathematical function from (x, y) pairs,
e.g. the best fit line. Also known as regression.
Classification is the special case where y is nominal (the
values have no intrinsic meaning or ordering).
Training may be based on reinforcement, as common in real life, rather than
explicit example. However, reinforcement can be reduced to the case of
explicit examples, using dynamic programming.
Regression can be reduced to surface learning or conditional probability
learning, which are therefore harder.
- Surface learning
- Learn a mathematical relation from (x, y) pairs,
e.g. the best fit circle.
An x may be associated with zero, one, or
Can be reduced to unconditional probability learning, which is therefore
- Probability learning
- Learn a conditional or unconditional PMF or PDF from occurrences,
e.g. guess the bias of a coin given the outcome of flips. Also known as
unsupervised learning. Conditional PDFs can be represented by
unconditional PDFs and vice versa. Common techniques are to use a mixture
(sum) of simple PDFs of the same type, to multiply simple PDFs, or to
convolve simple PDFs (add the corresponding random variables).
This is the hardest of all. Consider that you never directly observe a
probability. Probability learning is also the most lucrative, since it
subsumes the rest. The learning process itself can be modeled with
probabilities (so-called Bayesian learning).
Two fundamental questions in statistical learning:
- Incorporating prior knowledge
- For example, how can a learning algorithm take advantage of the fact
that the functions in the domain are periodic? quasi-periodic? smooth?
How can we represent and visualize such knowledge?
- Recovering prior knowledge
- Can learning algorithms help design or improve themselves? That is,
can they recover the important qualities of the domain and change
themselves to exploit these qualities? How can they evaluate competing
domain models? Do we need to answer the previous question first?
Note the similarity with compression of information. The first issue is
how to design an encoder/decoder pair given knowledge of the information's
distribution. This was solved by Huffman and arithmetic coding. The
second issue is how to discover this distribution automatically. This is
still unsolved but Lempel-Ziv does a good job. Many coding algorithms,
especially image coders, unfortunately do not distinguish these two issues;
the same is true for many learning algorithms. I think learning research
should strive to recover the distinction and embrace probabilistic models
as the representation of a domain.
How the reading group papers fit in:
- Incomplete data
- How to deal with incomplete examples during supervised learning.
This arises in meta-learning and vision (occlusion).
An example of reducing regression to conditional density estimation and this
to unconditional density estimation.
- Model-based interpolation
- Applying a surface or probability density
once you've estimated it.
- Family learning
- An example of meta-learning. Meta-learning as a density estimation
problem. Meta-learning via regularization.
Reducing a surface to a probability density.
- Canonical Distortion Measure
- Analyzing the relationship between the parameters of a learning
algorithm and the assumed distribution of functions (fundamental issue #1).
How a distance metric can encode such a distribution. How to learn a
distance metric directly, without a density estimation step (fundamental
issue #2). How the hierarchical structure and sharing used in
meta-learning is useful in base-level learning.
- Style and Content
- Applying a simple, general-purpose probabilistic model.
Taxonomy of meta-learning tasks. Meta-learning by linear combination.
Applying incomplete data algorithms to meta-learning. How to relax the
- Factorial learning
- Different ways to do unsupervised learning. Why unsupervised learning
is unconditional density estimation with latent variables, via a
compression argument. The autoencoder approach to recovering latent
variables. Representations of unconditional densities: undirected
(Boltzmann machine) and directed Bayesian networks.
Last modified: Fri Oct 3 12:42:39 EDT 1997