Day 4 - Abstraction hierarchies
View the class slides
[PDF]
R commands needed for homework
Synopsis
The first step in any data mining analysis is to represent your variables
at an appropriate level of abstraction. Statistics students often do not
get a chance to do this despite how crucial it is.
By constructing an abstraction hierarchy, you define a set of
possibilities. Next time we will talk about methods for choosing among
these possibilities.
Optional reading
This subject falls under the names "data modeling" or "data preprocessing"
in the literature. It is not easy to find a unified treatment of the problem.
Here are some books which talk about it:
- Data Preparation for Data Mining, by Dorian Pyle
- Data Mining: Concepts and Techniques, Chapter 3,
by Han and Kamber
- Data Mining Solutions, Chapter 2, by Westphal and Blaxton
Tom Minka
Last modified: Mon Aug 26 15:15:16 EDT 2002