Day 4 - Abstraction hierarchies

View the class slides [PDF]

R commands needed for homework

Synopsis

The first step in any data mining analysis is to represent your variables at an appropriate level of abstraction. Statistics students often do not get a chance to do this despite how crucial it is. By constructing an abstraction hierarchy, you define a set of possibilities. Next time we will talk about methods for choosing among these possibilities.

Optional reading

This subject falls under the names "data modeling" or "data preprocessing" in the literature. It is not easy to find a unified treatment of the problem. Here are some books which talk about it:
Tom Minka
Last modified: Mon Aug 26 15:15:16 EDT 2002