Day 2 - Summarizing a batch

View the class slides [PDF]

Code for the spray example

Synopsis

When a data set is large, containing a huge number of factors influencing a response, we end up with a lot of batches of numbers. It is important that we simplify these batches as much as possible without hiding important details. The levels of simplification we can apply are: There are various obstacles to reaching a high level of simplification, which we would like to eliminate. If the batches have different spread, we are prevented from using center only. If the batches are skewed, we are prevented from going beyond a boxplot. If there are outside points, we are prevented from including them in our summary. Transformation can often eliminate these problems.

If the distribution has multiple peaks, then transformation doesn't work and we are prevented from going beyond a histogram. We will discuss how to handle this case later.

You may worry that transformation hides the absolute scale of differences. After you've found an interesting `nugget', you can transform back to report results. But transformation helps you find the nugget in the first place.

The lecture also described some basic principles of plotting, and we will add to these as the course goes on.

Optional reading

Measures of center and spread, boxplots Transformations for symmetry
Tom Minka
Last modified: Mon Aug 26 15:14:35 EDT 2002