Unfortunately, the loose structure of data mining goals has often been accompanied by a lack of structure and precision in data mining methodology, with the result that only the most glaring features of the data are found.
In this course we will follow a middle ground, emphasizing visualization and nonparametric techniques while also employing parametric models as tools to aid visualization (e.g. by removing dominant effects). The alternation between nonparametric and parametric techniques is a key component of our strategy.
Frequently the emphasis in statistics and machine learning is on finding the simplest, most automated tool in our bag of tricks to achieve the goal (classify objects correctly, show that a difference is significant, that two variables are dependent, etc). In this process, one ignores certain variables, outlying cases, or unusual cases as necessary. We will also learn to do all of those things, but only to temporarily defocus aspects of the data so that we may focus on something else. In data mining, the outlying cases may be exactly the cases we are interested in.