36-350 Data Mining, Fall 2002

www.stat.cmu.edu/~minka/courses/36-350/

Instructor:
Tom Minka, Statistics Dept, Baker Hall 228D, minka@stat.cmu.edu

Teaching Assistant:
Fang Chen, Baker Hall A60D, fangc@stat.cmu.edu

Lectures: Monday and Wednesday, 10:30-11:20, CFA 211
Computer labs: Friday, 10:30-11:20, Baker 140F

Overview

Data mining is the conversion of data into knowledge. Advances of computing technology have led to great opportunities in collecting and analyzing data, which is now a major part of science, medicine, business, and government. The purpose of this course is to help you take advantage of these opportunities.

Data mining has significant overlap with statistics and machine learning, but is different in its procedure. Statistics and machine learning methods provide powerful microscopes for examining specific phenomena. Data mining is the systematic use of these microscopes to find `nuggets' of value in a mountain of data.

Course Objectives

The aim of the course is to provide you with a comprehensive introduction to contemporary data mining practice and principles. You will learn to:
  1. Determine which method to apply in a given situation,
  2. Program statistical software to carry out the method, and
  3. Communicate the results in terms relevant to science, business, etc.

Schedule

  1. Searching for similar objects
    Searching the web, document collections, and image databases
  2. Visualizing and exploring data
    Multivariate geometry, projection, parallel plots
  3. Clustering and segmentation
    Customer profiling, market segmentation, changepoints in time
  4. Predictive modeling
    Modeling prices, predicting sales
  5. Characterizing subgroups
    Profitable customers, fraudulent activity, junk e-mail
  6. Finding patterns and rules
    Market basket analysis, demographic associations

Format

Grading

Final grade breakdown:

Late homework will not be accepted without a written medical excuse. Each homework assignment will be worth 100 points. These points will be divided approximately equally among each of the parts of the assignment.

The lowest homework grade will be dropped except if it is the last assignment of the semester which is mandatory. The remaining homework grades will be used to compute the homework average. The same procedure is used for computer lab grades.

All work and computer code must be your own. Sharing code or results will result in zero credit and a letter to your dean. See the CMU Student Handbook on Cheating and Plagiarism.