Statistical Approaches to

Learning and Discovery

[ home | schedule | assignments ]

Date Topic Supplementary Reading Statistical inference and exact models1/16,18 Sufficient statistics, Cramer-Rao bounds, maximum likelihood, exponential families. Tanner Chapter 1 1/23,25 Bayesian inference, conjugate priors, and decision theory. Tanner Chapter 1

Pathologies of Orthodox Statistics

Bayesian inference of a uniform distribution1/30 Predictive distributions. Noninformative priors. Bayesian inference of a Gaussian distribution. 2/1 Scoring functions for model selection. Bayesian linear regression. Bayesian model selection overview

Bayesian linear regression2/6,8 Bayesian inference and model selection for the multinomial distribution. Bayesian inference, entropy, and the multinomial distribution Numerical methods and approximations2/13,15,20,22 Normal approximations to likelihoods and posteriors.

Laplace's method for integration.Tanner Chapter 2 2/27 Numerical integration by interpolation (quadrature).

Gradient descent, Newton's method, and the EM Algorithm.Beyond Newton's method

Expectation-Maximization as lower bound maximization

Tanner Chapter 43/1 EM using hidden variables vs. EM using lower bounds. EM for mixture weights and alternative lower bounds. Faster EM methods in High-Dimensional Finite Mixtures

Variational bounds via reversing EM3/6 EM for mixtures of Gaussians. Model selection for mixtures of Gaussians. Using lower bounds to approximate integrals. Variational bounds via reversing EM

Using lower bounds to approximate integrals3/13 Monte Carlo integration. How to sample from a distribution. Tanner sec 3.3

Monte Carlo integration (thesis chapter)

Bibliography for above

Luc Devroye, Non-uniform random variate generation, 1986 (ENGR&SCI 519 D51N)3/15 Monte Carlo with correlated samples. Markov chain sampling. Metropolis and Hastings algorithms. Tanner Chapter 6

"Markov Chain Monte Carlo and Related Topics"

"Lattice methods for multivariate integration", Sloan & Joe, 1994 (ENGR&SCI 515 S63L)3/20 Metropolis sampling example. Estimating the Monte Carlo error when samples are correlated. Gibbs sampling. Gibbs sampling with hidden variables. Mixture of Gaussians example. Tanner Chapter 6 3/22 Hammersley-Clifford-Besag theorem. Changepoint analysis via Gibbs sampling. Tanner 6.2.3 Advanced models4/3,4/5 Estimating the size of a population, i.e. filling in missing values in a contingency table. Capture-recapture estimates. Models which capture dependence and heterogeneity. Item response model in educational testing. Estimating the size of a population

Tanner 6.2.54/10 Bayesian analysis of the threshold classifier. Noise models: logistic regression, probit regression, uniform label error. Bayesian analysis of a threshold classifier

Tanner p28,41,50,574/12 Linear classifiers. Logistic regression. Logistic regression examples

Algorithms for maximum-likelihood logistic regression

Logistic regression overview4/17 Discriminative (conditional) models vs. generative (joint) models for classification. Logistic regression can achieve the same set of decision boundaries as any exponential family generative model. Nonparametric generative models can achieve the benefits of both approaches, but present significant computational and theoretical challenges. Discriminative vs Informative Learning

Why the logistic function?4/19,4/24 Log-linear models for multi-way contingency tables. Special cases: hierarchical, graphical, decomposable. Markov random fields. Exploits the assumption that high-order interactions are rare. 4/26 Learning the structure of a decomposable log-linear model (i.e. a Bayesian network). "Model Selection and Accounting for Model Uncertainty in Graphical Models Using Occam's Window" D. Madigan, A. Raftery, JASA 89:1535-1546, Dec 1994 5/1 Advanced regression models: feedforward networks, decision trees, splines. Gaussian processes as a unifying framework.

- Gaussian process examples
- Deriving quadrature rules from Gaussian processes (includes an overview of Gaussian processes)
- Gaussian processes: A replacement for supervised neural networks?
5/3 Gaussian processes in practice: Predicting water arsenic levels. Formulating the model, performing computations, and model validation.