Day 3 - Using probabilistic models
View the class slides
[PDF]
Synopsis
Classification and anomaly detection using histograms as probability
models. An overview of text classification, image classification, and
fraud detection.
Probability distributions are powerful tools, useful for many applications.
Thus it is important that we study how to estimate and represent them well.
Optional reading
The "corrected estimate" for a population probability is a basic result in
Bayesian statistics called "Laplace's law of succession" or simply "Laplace
smoothing". It is not covered in introductory statistics, but widely used in
artificial intelligence, including the papers below.
For a quick introduction, take a look at these slides.
There is a mathematical derivation in
my tutorial
"Bayesian inference, entropy, and the multinomial distribution"
It also discusses the connection to the chi-square test.
A nice discussion of Laplace smoothing and
other methods for handling zeros can be found in the paper
"A
Natural Law of Succession", by Eric Ristad.
Text classification with histograms
Image classification with histograms
Fraud detection
FCC page on cellular phone cloning
Tom Minka
Last modified: Tue Apr 25 09:53:23 GMT 2006