Rate
Gender
Age Female Male
50-54 8.55 13.55
55-59 12.65 21.20
60-64 19.80 31.95
65-69 33.00 47.80
70-74 52.15 68.55
The conclusion from profile plots was that the table is nearly additive,
except that the Gender effect gets larger with Age. But there is something
suspicious about that conclusion. Looking at the table, we see that the
Gender effect for Ages 70-74 is (68.55-52.15 = 16.4) while the Gender
effect for Ages 50-54 is (13.55-8.55 = 5). This seems to support the
conclusion. But does it make sense to compare differences of rates?
As discussed on day14, rate differences can be misleading. It is better to compare them using ratios. For Ages 70-74, the risk ratio for Males to Females is (68.55/52.15 = 1.3) and for Ages 50-54 it is (13.55/8.55 = 1.6). So according to risk ratio, the Gender effect decreases with Age. An additive model does not make sense on rates.
However, an additive model does make sense on log-rates, because a difference of log-rates is equivalent to taking a ratio. So we should have begun by taking the logarithm of the original response table:
> z <- log(y)
> z
Rate
Gender
Age Female Male
50-54 2.145931 2.606387
55-59 2.537657 3.054001
60-64 2.985682 3.464172
65-69 3.496508 3.867026
70-74 3.954124 4.227563
Here are the profiles:


The moral of the story is that you may need to transform your data before an additive model makes sense. This wasn't a big issue when we were doing trees, since trees only try to stratify the dataset when there are different responses. Now transformation is an issue.
Expenditure
Year
Segment 1940 1945 1950 1955 1960
Food and Tobacco 22.200 44.500 59.60 73.2 86.80
Household Operation 10.500 15.500 29.00 36.5 46.20
Medical and Health 3.530 5.760 9.71 14.0 21.10
Personal Care 1.040 1.980 2.45 3.4 5.40
Private Education 0.341 0.974 1.80 2.6 3.64
(Use dget("Expenditure.dat") to read the table into R.)
A profile plot shows that the predictors are not very additive:
z <- log(y) profile.plot(z)
To get a better look at the deviations from
additivity, we standardize the plot:
On a standardized plot, the rows of an additive table should be horizontal.
The biggest deviation from additivity is the expenditure on Private
Education in 1940. The expenditure on Food and Tobacco in
1960 also seems low. To get a quantitative idea of the size of these
deviations, we make an additive fit and look at the residuals.
Since we don't have the original data frame, we run aov directly
on the response table:
> fit <- aov.rtable(z)
> e <- rtable(fit)
> res <- z-e
> res
Expenditure
Year
Segment 1940 1945 1950 1955 1960
Food and Tobacco 0.131 0.173 0.012 -0.0820 -0.235
Household Operation 0.153 -0.111 0.062 -0.0078 -0.095
Medical and Health 0.046 -0.118 -0.049 0.0172 0.104
Personal Care 0.113 0.103 -0.137 -0.1091 0.030
Private Education -0.443 -0.047 0.113 0.1817 0.195
> sort.cells(res)
Segment Year Expenditure
5 Private Education 1940 -0.4431
21 Food and Tobacco 1960 -0.2346
14 Personal Care 1950 -0.1374
...
The function sort.cells that we used for contingency tables is
also useful here.
Cars93 <- read.table("Cars93.dat")
# remove dominant effects via tree
tr <- tree(Price~Horsepower+Weight,Cars93)
x <- Cars93
x$Price <- residuals(tr)
fit <- aov(Price ~ AirBags + Type + Origin, x)
effects.plot(fit)
y <- rtable(Price ~ AirBags + Origin, x) profile.plot(y)
District Group Age Holders Claims 1 1 <1l <25 197 38 2 1 <1l 25-29 264 35 3 1 <1l 30-35 246 20 4 1 <1l >35 1680 156 5 1 1-1.5l <25 284 63 ...Here Group means the size of the car's engine, and Age is the age of the policy holder. We want to model claim rate, which is the ratio of the number of claims to the number of holders. We compute that from the ratio of two response tables:
> claims <- rtable(Claims ~ Age + Group, Insurance, sum)
> holders <- rtable(Holders ~ Age + Group, Insurance, sum)
> y <- claims/holders
> y
Claims
Group
Age <1l 1-1.5l 1.5-2l >2l
<25 0.199 0.20 0.19 0.26
25-29 0.138 0.16 0.21 0.25
30-35 0.106 0.13 0.20 0.20
>35 0.097 0.12 0.14 0.18
Notice the extra argument sum given to rtable.
This makes each cell of the rtable equal to the total as opposed
to the mean. Here are the raw and standardized profile plots: