Day 27 - Bilinear extension

Additivity is a strong and powerful assumption, and is great if you can find it. Some tables deviate from additivity in isolated places, while others deviate systematically. For these latter tables, we need to go beyond additivity and make a more general assumption. One way to make the additive model more flexible is to add a bilinear extension.

Additivity says that the difference between two rows is the same in all columns, and vice versa. The rows are parallel. Bilinearity says that the difference between two rows is linear in the column, and vice versa. The rows are converging or diverging lines. Additivity is a special case of bilinearity, when the slope is zero. To get bilinearity, we add a multiplicative term to our model. Instead of xij = m + ai + bj, we will have

xij = m + ai + bj + ui*vj
Each row and column now has two effects: an additive effect and a multiplicative effect. The additive effects are (ai,bj), and the multiplicative effects are (ui,vj). To see that this agrees with the above notion of bilinearity, take the difference between two rows i and k:
 
(m + ai + bj + ui*vj) - (m + ak + bj + uk*vj) = (ai-ak) + (ui-uk)*vj 
This is a linear function of the column effect vj. Note that bilinearity is a useful extension only on tables that are large enough for "linear" to mean something. In particular, the table must have more than two rows and more than two columns.

Bilinearity has a characteristic crossing pattern, as shown in the following plot of z = x*y:

This surface looks curved, but if you look at any slice along x or along y, the function is linear. Because of the properties of multiplication, the sign is the same in opposite corners.

In some statistics programs, they will use the notation A*B to represent a non-additive interaction term between A and B. Unfortunately, this is a misleading notation, because it doesn't refer to a bilinear or multiplicative term, but the full table of additive residuals.

Simple bilinear table

Here is an example of a table which is perfectly bilinear:
response 
    Var2
Var1      1    2     3     4     5   6
   1 -0.042 1.46  0.76 -0.58  1.26 1.7
   2 -0.300 1.53  0.30 -2.25 -0.21 1.1
   3 -0.920 0.57 -0.11 -1.40  0.43 0.8
   4  0.728 1.80  1.80  2.07  3.65 2.8
   5 -0.776 0.28  0.31  0.66  2.22 1.4
A raw profile plot shows that it is non-additive:

To see bilinearity, we need to make a standardized profile plot:

In this plot, the additive column effects have been subtracted, and the columns have been sorted and spaced according to the estimated multiplicative effect vj. Thanks to this arrangement, the rows of a bilinear table show up as non-parallel lines.

Mathematically, here is what is shown on a raw vs. standardized profile plot:

Estimating multiplicative effects

How does profile.plot estimate the multiplicative effects? The algorithm for estimating multiplicative effects is complex, so we will discuss only the case when all cell counts are constant: nij = const. As before, the task is to minimize the sum of squares between actual and expected responses. To remove ambiguities in the effects, we require that both the additive and multiplicative effects have zero sum. When you set derivatives to zero, you find that the estimation equations for the additive effects are the same as before. So the only new thing is to minimize the sum of squares between the additive residuals and the bilinear term:
rij = xij - m - ai - bj
minimize sum_ij (rij - ui*vj)^2
This problem is known as matrix factorization: representing a matrix as the product of simpler matrices, in this case vectors. The algorithm to do this is called singular value decomposition or SVD. It is built into R so one function call to svd gives us the ui and vj. We will see more applications of matrix factorization later in the course.

Climate data

For a realistic example of bilinearity, consider this table of average monthly temperature in various US cities:
Temp 
     Place
Month Laredo Baton.Rouge Atlanta Washington Boston Portland Caribou
  Jan   57.6        52.4    44.6       36.2   31.3     20.7     8.7
  Feb   61.9        55.6    46.7       37.1   29.2     21.5     9.8
  Mar   68.4        60.3    52.7       45.3   37.6     31.9    21.6
  Apr   75.9        66.8    61.7       54.4   47.2     41.9    34.7
  May   81.2        73.6    70.0       63.9   57.8     52.3    48.5
  Jun   85.8        79.6    77.7       73.4   67.2     61.8    58.4
  Jul   87.7        81.1    79.5       77.3   72.2     67.8    64.0
  Aug   87.8        80.7    78.6       75.4   71.5     66.4    61.8
  Sep   83.5        77.5    74.4       69.6   64.3     58.4    53.2
  Oct   76.5        69.6    63.4       58.2   55.0     48.4    42.1
  Nov   59.3        58.4    51.2       47.0   43.7     36.6    28.5
  Dec   59.5        53.6    45.2       38.0   32.8     24.9    14.6
A profile plot (with months ordered) shows a rise in the summer and fall in the winter:

But the temperature curves are not just shifts of each other, as shown in the standardized profile plot:

The temperature difference across the US is greater in the winter than in the summer. With the months ordered in this way, we can't see the bilinearity. Try plotting month effect as a function of place:

Now we see that the temperatures are bilinear: for a given month, temperature is a linear function of location. And if we reordered the months as above (Jul,Jun,Sep,May,...), temperature in a given location would be a linear function of month. Laredo (Texas) has the least temperature variation and Caribou (Maine) the most.

Numbers of phones

Finally, consider this AT&T dataset recording the number of phones (in thousands) in different continents around the globe:
Phones 
          Year
Continent   1951  1956  1957  1958  1959  1960  1961
  N.Amer   45939 60423 64721 68484 71799 76036 79831
  Europe   21574 29990 32510 35218 37598 40341 43173
  Asia      2876  4708  5230  6662  6856  8220  9053
  S.Amer    1815  2568  2695  2845  3000  3145  3338
  Oceania   1646  2366  2526  2691  2868  3054  3224
  Africa     895  1411  1546  1663  1769  1905  2005
  Mid.Amer   555   733   773   836   911  1008  1076
Based on our previous experience with modeling expenditures over time, we will start by taking the logarithm of the table. Here is the resulting standardized profile plot:

The lines are nearly horizontal, so the variables are nearly additive. This means that the growth rate in the number of phones for any particular year is nearly the same over all continents, or equivalently that the ratio of the number of phones in two continents is nearly the same every year.

To zoom in on the deviations from additivity, we take residuals:

fit <- aov.rtable(z)
res <- z - rtable(fit)
profile.plot(res,standard=T)

This plot is the same as above, but with the mean of each row removed. The plot shows a clear bilinearity. How do we interpret bilinearity in the log-number of phones? The model says
phones(cnt,yr) = exp(m + a(cnt) + b(yr) + u(cnt)*v(yr))
The ratio of phones in two years yr1 and yr2 is therefore
phones(cnt,yr2)/phones(cnt,yr1) = 
   exp(b(yr2)-b(yr1))*exp(u(cnt)*(v(yr2)-v(yr1)))
In the above profile plot, v(yr) is position of yr on the x-axis. So v(yr2)-v(yr1) is the span in years. For a one year span, we get
phones(cnt,yr2)/phones(cnt,yr1) = 
   exp(b(yr2)-b(yr1))*exp(u(cnt))
If we call exp(b(yr2)-b(yr1)) the worldwide growth rate, then exp(u(cnt)) is a growth rate modifier for continent cnt. In the plot, u(cnt) is the slope of the line for continent cnt.

So the interpretation of bilinearity is that different continents, Asia especially, scale the worldwide growth rate by a fixed amount. So if the worldwide growth rate is 5% in one year, Asia would scale this by 1.1, to get 5.5%. It is not simply a matter of each country having a separate fixed growth rate.

Median polish

If there is a very strong outlier, standardization can give a misleading plot. Suppose we change the count for Africa in 1951 to be 89 instead of 895. Here are the raw and standardized profile plots, after taking logarithms:

The standardized plot makes it look like all countries except Africa had an unusually high number of phones in 1951. This is because standardization subtracts the mean, and the mean has been substantially affected by Africa's value. Ideally, we would want the opposite picture: that Africa's value is unusually low, and the others are normal.

A simple solution to the outlier problem is to standardize by median instead of mean. You can tell profile.plot to do this by saying med=T:

profile.plot(z,standard=T,med=T)

This trick can be incorporated into the analysis of variance procedure as a whole, giving a process known as median polish. The effects are estimated by the median of the row/column residuals instead of the mean. This makes the effect estimates more resistant to outliers. In R, the eda library has a function called medpolish which is an alternative to aov.


Tom Minka
Last modified: Wed Nov 07 16:50:02 Eastern Standard Time 2001