SS = sum_{clusters j} sum_{points i in j} ||x_i - m_j||^2Ward's method works by making each point a cluster, then merging clusters in order to minimize the sum of squares. The merging cost that results from the sum of squares criterion is

hc <- hclust(dist(x)^2,method="ward") plot.hclust.trace(hc)

The merging trace shows that 4 is an interesting number. Given the desired number of clusters, the function

x$cluster <- factor(cutree(hc,k=4)) cplot(x)

The sum of squares criterion desires **separation**
and **balance**. The merging cost prefers to merge small clusters
rather than large clusters, given the same amount of separation. In the
multivariate case, sum of squares also wants the clusters to be spherical;
it does not want highly elongated clusters. This is because distance is
measured equally in all directions. Counterintuitive
clusterings can result from this property.
Consider this two-dimensional dataset:

The data falls into six vertical strips.
This can easily happen when the clusters are compact in one
dimension (horizontal) and highly variable in another (vertical).
Here is the result of Ward's method, 6 clusters:

The clusters are chosen to be small and round, in violation of the
structure of the data.

You can switch to single-link clustering by telling
`hclust` to use `method="single"`.
Here is the result on the strips data:

hc <- hclust(dist(x)^2,method="single") x$cluster <- factor(cutree(hc,k=6)) cplot(x)

It divides the data into the six strips.

However, the blind pursuit of separation can also lead to counterintuitive
clusterings.
Here is the result on the first example above, showing the
`k=5` solution:

Four outlying data-points have been assigned their own cluster, and the
rest of the data in the middle, which is very dense, has been lumped into
the remaining cluster. Single-link can do this because it doesn't care
about balance.

Here are some more examples:

Ward's | single-link |
---|---|

sx <- scale(x) w <- pca(sx,2) hc <- hclust(dist(sx)^2,method="ward") sx$cluster <- factor(cutree(hc,k=5)) cplot(project(sx,w)) plot.axes(w)

The clusters follow the PCA projection pretty closely, even though the clusters were computed in the full 10-dimensional space, not the 2-dimensional projection. The only difference from our analysis on day36 is that the cars with high MPG have been divided into two groups, corresponding to compact and midsize cars.

Another way to view the result of clustering is to treat the clusters
as classes and make a **discriminative projection**:

x$cluster <- factor(cutree(hc,k=5)) w2 <- projection(x,2) cplot(project(x,w2)) plot.axes(w2)

The projection is a little odd in that the seemingly most important variables like

sx <- scale(x) w <- pca(sx,2) hc <- hclust(dist(sx)^2,method="ward") sx$cluster <- factor(cutree(hc,k=4)) cplot(project(sx,w)) plot.axes(w)

In this case, the clustering does not completely agree with PCA. The green class is split up and one red point is far from its cluster. Here is the discriminative projection:

x$cluster <- factor(cutree(hc,k=4)) w2 <- projection(x,2,type="m") cplot(project(x,w2)) plot.axes(w2)

Only the three variables

Functions introduced in this lecture:

`hclust``plot.hclust.trace``cutree`

Tom Minka Last modified: Mon Aug 22 16:41:25 GMT 2005