Gabriel Singer
21.1.2025
This makes it necessary to check normal distribution:
Statisticians tend to rely on graphical means!
Consider data transformation to achieve normally distributed data: Applying the same mathematical operation to all data. Consider \(pH=-\log_{10}[H_3O^+]\). A log-normal distribution is very common in ecological/environmental data!
For final graphs and tables, data may be back-transformed into its original units, e.g. \(x=e^y\) for \(y=\ln(x)\).
What happens if we have more than two groups and all we know is the t-test?
Pairwise testing is possible: For example groups A, B, C might be tested A vs. B, B vs. C, A vs. C.
Two major problems:
What is the real “experimentwise error rate” when doing two tests in sequence?
A conservative estimate for the probability to make any failure (total error) in multiple test procedures is:
\[ \alpha_t=1-(1-\alpha)^k \] … where \(\alpha\) and \(\alpha_t\) are alpha errors for any single test and the sequence of \(k\) tests.
Bonferroni-correction (also known as Dunn-Sidak):
\[ \alpha_{adj}=1-(1-\alpha_t)^{\frac{1}{k}} \] with \(\alpha_t\) set to a desired level (0.05) this results in an adjusted (lower) \(\alpha_{adj}\) to be used for any single test.
A parametric test procedure for testing significant differences among means of more than two independent samples from normally distributed populations.
Continuous response variable = dependent.
Categorical group coding variable = independent = factor (groups = factor levels).
Different types of ANOVA:
bartlett.test
.When assumption of normal distribution is violated:
When assumption of variance homogeneity violated:
oneway.test
.data<-read.table(file="data/MaraRiver.txt",header=TRUE) # landuse and water chemistry of 54 streams
data$landuse<-factor(data$landuse) # define factor
levels(data$landuse)
## [1] "A" "F" "M"
data$landuse<-factor(data$landuse,levels=c("A","M","F"))
# First check ND and variance homogeneity?
boxplot(TDN~landuse,data=data)
shapiro.test(data$TDN[data$landuse=="M"]) # also needed for A, F
##
## Shapiro-Wilk normality test
##
## data: data$TDN[data$landuse == "M"]
## W = 0.86275, p-value = 0.02644
bartlett.test(TDN~landuse,data=data)
##
## Bartlett test of homogeneity of variances
##
## data: TDN by landuse
## Bartlett's K-squared = 16.736, df = 2, p-value = 0.0002321
# -> variances differ significantly among landuses !
# Get out of here ASAP ;-)
oneway.test(TDN~landuse,data=data,var.equal=FALSE)
##
## One-way analysis of means (not assuming equal variances)
##
## data: TDN and landuse
## F = 18.843, num df = 2.000, denom df = 27.735, p-value = 6.79e-06
# ... or try again with log-transformed TDN!
boxplot(log(TDN)~landuse,data=data)
(Illustration assumes 3 groups and equal group sizes n.)
\(H_0\): The three groups are not different, i.e. they come from same population. \(\mu_1=\mu_2=\mu_3\)
\(H_A\): At least one group differs from at least one other, i.e. one comes from a different population. \(\mu_1\neq\mu_2=\mu_3\) or \(\mu_1=\mu_2\neq\mu_3\) or \(\mu_1\neq\mu_2\neq\mu_3\).
These hypotheses are here formulated for means (ANOVA), but are similarly formulated for variances (Bartlett-test).
We will use variances to test these hypotheses!
Recall F-distribution built by taking two (!) samples from a population (ND with \(\mu,\sigma^2\)) and computing: \[ F=\frac{s_1^2}{s_2^2}\sim1 \]
Remember \(H_0\) is true :-). Task: Estimate variance from the ONE underlying population!
For step 2: Calculation of sum of squares within the groups, i.e. the sum of the squared differences between each data point and the respective group mean.
For step 3: Calculation of sums of squares between the groups, i.e. the sum of the squared differences between the group means and the grand mean.
\[ S.E.M.=\sigma_{\bar x}=\frac{\sigma}{\sqrt n} \]
\[ \sigma ^2=n \cdot \sigma_{\bar x}^2 \]
For each strategy to estimate variance an expression for sums of squares is needed:
\[ SS_{t(total)}=\sum_{z=1}^Z\sum_{i=1}^n(x_{iz}-\bar{x})^2 \]
\[ SS_{w(within)}=\sum_{z=1}^Z\sum_{i=1}^n(x_{iz}-\bar{x}_z)^2 \]
\[ SS_{b(between)}=n\cdot\sum_{z=1}^Z(\bar{x}_z-\bar{x})^2 \]
Sums of squares are additive! The variation of the whole dataset is now partitioned into two parts depending on origin!
\[ SS_t=SS_b+SS_w \]
This can be translated into percentages of variation due to treatment and due to random error!
The process is also called Partitioning the total sum of squares (SS) (“Splitting of variance”)
Sums of squares are turned into mean squared deviations by dividing through the appropriate degrees of freedom:
\[ MS_t=\frac{SS_t}{Z\cdot{n}-1} \] \[ MS_w=\frac{SS_w}{Z\cdot{(n-1)}} \] \[ MS_b=\frac{SS_b}{Z-1} \]
All mean squares are variance estimates!
Recall that \(SEM^2=\frac{s^2}{n}\). As in \(SS_b\) the sum of squared group mean deviations from the grand mean was already multiplied with \(n\), \(MS_b\) estimates \(\sigma^2\).
Under a true \(H_0\) all mean squares are good estimates for the same population variance \(\sigma^2\).
Under a true \(H_A\):
Calculation of the test statistic F:
\[ TS=F_{emp}=\frac{MS_b}{MS_w} \]
\(F\approx1\) under a true \(H_0\), but \(F>>1\) when \(MS_b\) includes a group effect. A P-value can be computed from F-distribution with \(df_1=Z-1\) and \(df_2=Z(n-1)\).
Post-hoc tests: Pairwise comparisons with adjusted P-values after all!
# using the Mara river data
fit<-aov(log(TDN)~landuse,data=data)
summary(fit)
## Df Sum Sq Mean Sq F value Pr(>F)
## landuse 2 20.93 10.467 22.87 8.14e-08 ***
## Residuals 51 23.34 0.458
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
fit2<-lm(log(TDN)~landuse,data=data)
anova(fit2)
## Analysis of Variance Table
##
## Response: log(TDN)
## Df Sum Sq Mean Sq F value Pr(>F)
## landuse 2 20.934 10.4668 22.867 8.141e-08 ***
## Residuals 51 23.343 0.4577
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# -> yes, there is a significant landuse effect on log(TDN)
TukeyHSD(fit) # post hoc tests, needs usage of aov() above
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = log(TDN) ~ landuse, data = data)
##
## $landuse
## diff lwr upr p adj
## M-A -0.4678338 -1.009851 0.07418378 0.1033445
## F-A -1.4818796 -2.013546 -0.95021281 0.0000000
## F-M -1.0140458 -1.601003 -0.42708868 0.0003427
# check last column for adjusted p-values
pairwise.t.test(log(data$TDN),data$landuse,p.adj="bonferroni",pool.sd=TRUE)
##
## Pairwise comparisons using t tests with pooled SD
##
## data: log(data$TDN) and data$landuse
##
## A M
## M 0.12668 -
## F 4.4e-08 0.00035
##
## P value adjustment method: bonferroni
# -> F differs from M and A, F has lowest TDN (i.e. "some" agriculture already increases TDN)
# in case of unequal variances: Welch-test variant for >2 groups
# here applied on the non-transformed TDN data
oneway.test(TDN~landuse,data=data,var.equal=FALSE)
##
## One-way analysis of means (not assuming equal variances)
##
## data: TDN and landuse
## F = 18.843, num df = 2.000, denom df = 27.735, p-value = 6.79e-06
# should then be followed up by appropriate post-hoc tests
pairwise.t.test(data$TDN,data$landuse,p.adj="bonferroni",pool.sd=FALSE)
##
## Pairwise comparisons using t tests with non-pooled SD
##
## data: data$TDN and data$landuse
##
## A M
## M 0.312 -
## F 1.2e-05 0.016
##
## P value adjustment method: bonferroni
# -> similar result, but less powerful (P-values are closer to significance threshold)
# in case of no ND and/or unequal variances: H-test after Kruskal and Wallis
# based on ranks, so can be applied on non-transformed TDN data
kruskal.test(TDN~landuse,data=data)
##
## Kruskal-Wallis rank sum test
##
## data: TDN by landuse
## Kruskal-Wallis chi-squared = 24.443, df = 2, p-value = 4.923e-06
# appropriate post-hoc tests by multiple U-tests (only possible manually) #
wilcox.test(TDN~landuse,data=data[data$landuse!="A",]) # this is F vs. M
##
## Wilcoxon rank sum exact test
##
## data: TDN by landuse
## W = 214, p-value = 7.337e-05
## alternative hypothesis: true location shift is not equal to 0
wilcox.test(TDN~landuse,data=data[data$landuse!="M",]) # this is A vs. F
##
## Wilcoxon rank sum exact test
##
## data: TDN by landuse
## W = 342, p-value = 6.068e-07
## alternative hypothesis: true location shift is not equal to 0
wilcox.test(TDN~landuse,data=data[data$landuse!="F",]) # this is A vs. M
##
## Wilcoxon rank sum exact test
##
## data: TDN by landuse
## W = 228, p-value = 0.1009
## alternative hypothesis: true location shift is not equal to 0
# get the P-values of these tests
P.values<-c(FM=0,AF=0,AM=0)
for (i in 1:3) {
P.values[i]<-wilcox.test(TDN~landuse,data=data[data$landuse!=c("A","M","F")[i],])$p.value
}
# and adjust them using Bonferroni
p.adjust(P.values,method="bonferroni")
## FM AF AM
## 2.201037e-04 1.820305e-06 3.028057e-01
A two-factorial ANOVA considers two factors simultaneously. A multifactorial ANOVA…
The simplest example has 2 factors with 2 levels each, the data is stratified into 4 groups.
A random sample has to be collected for each factor level combination.
In such an experimental design each group forms a cell with several replicates.
A hypothetical example:
Possible outcomes of this experiment:
Options 1. and 2. are main effects.
Option 3. is interaction: dependence of effect of one factor on level of other factor, can be inhibition or synergism.
3 sets of hypotheses, the null hypotheses are:
\(H_0\): no difference between sexes
\(H_0\): no difference between food types
\(H_0\): no interaction
\(SS_t\) is again the expression for total variation. \[ SS_{t}=\sum_{s=1}^S\sum_{f=1}^F\sum_{i=1}^n(x_{isf}-\bar{\bar{x}})^2 \]
\(SS_{sex}\) is computed from means of sexes (food types pooled). \[ SS_{sex}=F\cdot{n}\cdot\sum_{s=1}^S(\bar{x}_{s}-\bar{\bar{x}})^2 \]
\(SS_{food}\) is computed from means of food types (sexes pooled). \[ SS_{food}=S\cdot{n}\cdot\sum_{f=1}^F(\bar{x}_{f}-\bar{\bar{x}})^2 \]
\(SS_w\) is still an expression for within-group variation, also called not explained SS (due to random error).
\[ SS_{w}=\sum_{s=1}^S\sum_{f=1}^F\sum_{i=1}^n(x_{isf}-\bar{x}_{sf})^2 \]
Sums of squares are additive!
\(SS_{interaction}\) can be computed by difference from \(SS_t\):
\[ SS_{interaction}=SS_t-(SS_{sex}+SS_{food}+SS_w) \]
The variation of the whole dataset is partitioned into 4 (!) components depending on origin! All MS except \(MS_w\) are expressions for group-to-group variation (and thus explained SS, i.e. due to treatment).
Sums of squares are turned into mean squared deviations by dividing through the appropriate degrees of freedom:
\[
MS_t=\frac{SS_t}{S\cdot{F}\cdot{n}-1}
\] \[
MS_w=\frac{SS_w}{S\cdot{F}\cdot{(n-1)}}
\] \[
MS_{sex}=\frac{SS_{sex}}{S-1}
\] \[
MS_{food}=\frac{SS_{food}}{F-1}
\] \[
MS_{interaction}=\frac{SS_{interaction}}{(S-1)\cdot{(F-1)}}
\]
All mean squares are again variance estimates!
Under the respective true \(H_0\) (one of 3) the respective mean square is an equally good estimate for \(\sigma^2\) as \(MS_w\):
\[ MS_w\approx{MS}_{sex}\approx{MS}_{food}\approx{MS}_{interaction} \]
Under the respective true \(H_A\) (one of 3) the respective mean square is larger than \(MS_w\).
Calculation of the F test statistics:
\[ F_{sex}=\frac{MS_{sex}}{MS_w} \] \[ F_{food}=\frac{MS_{food}}{MS_w} \] \[ F_{interaction}=\frac{MS_{interaction}}{MS_w} \]
An effect of sex, food or interaction adds variation to the corresponding MS and the respective \(F>>1\).
P-values can be computed from F-distributions with the appropriate df.
Post-hoc tests: Pairwise comparisons with adjusted P-values often still of interest after all!
In the rats x food illustration case we only have 2 factor levels for each factor, post-hoc tests are thus only interesting for specific interaction patterns.
A significant interaction effect is often the more interesting effect ;-).
A significant interaction effect also means main effects need to be carefully evaluated :-(. Main effects may be made worthless by a strong interaction.
Fig (a) shows a parallel response without interaction, while (b) and (c) show two (of many possible) types of interaction.
data<-read.table("data/RatsFood.txt",header=TRUE)
tapply(data$cons,list(data$food,data$sex),mean)
## female male
## 1 20.8635 20.010
## 2 19.8795 21.239
tapply(data$cons,list(data$food,data$sex),sd)
## female male
## 1 1.2889909 1.182353
## 2 0.8464195 1.142693
tapply(data$cons,list(data$food,data$sex),length) # check if design is balanced
## female male
## 1 20 20
## 2 20 20
# Make sure factors are recognized properly
data$food<-factor(data$food)
data$sex<-factor(data$sex)
# could check ND tediously in each cell of the design
f1s1<-data$cons[data$food=="1" & data$sex=="male"]
# or after standardisation within each cell
combfac<-factor(paste(data$food,data$sex,sep="_"))
z <- unlist(tapply(data$cons, INDEX=combfac, scale))
shapiro.test(z)
##
## Shapiro-Wilk normality test
##
## data: z
## W = 0.98458, p-value = 0.4505
layout(matrix(c(1:4),2,2,byrow = TRUE))
hist(z,freq=FALSE); lines(density(z))
boxplot(cons~food,data=data,main="Differences by food")
boxplot(cons~sex,data=data,main="Differences by sex")
boxplot(cons~food*sex,data=data,main="Differences by sex and food")
bartlett.test(cons~combfac,data=data) # testing variance homogeneity
##
## Bartlett test of homogeneity of variances
##
## data: cons by combfac
## Bartlett's K-squared = 3.3712, df = 3, p-value = 0.3379
# two-way ANOVA table - BETTER ONLY USE THIS FOR BALANCED DESIGNS !
fit<-aov(cons~food*sex,data=data)
fit
## Call:
## aov(formula = cons ~ food * sex, data = data)
##
## Terms:
## food sex food:sex Residuals
## Sum of Squares 0.30013 1.28018 24.48685 96.55093
## Deg. of Freedom 1 1 1 76
##
## Residual standard error: 1.127123
## Estimated effects may be unbalanced
summary(fit)
## Df Sum Sq Mean Sq F value Pr(>F)
## food 1 0.30 0.30 0.236 0.628
## sex 1 1.28 1.28 1.008 0.319
## food:sex 1 24.49 24.49 19.275 3.61e-05 ***
## Residuals 76 96.55 1.27
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
anova(lm(cons~food*sex,data=data))
## Analysis of Variance Table
##
## Response: cons
## Df Sum Sq Mean Sq F value Pr(>F)
## food 1 0.300 0.3001 0.2362 0.6283
## sex 1 1.280 1.2802 1.0077 0.3186
## food:sex 1 24.487 24.4868 19.2748 3.607e-05 ***
## Residuals 76 96.551 1.2704
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
layout(matrix(c(1:2),1,2,byrow = TRUE))
interaction.plot(data$food,data$sex,data$cons)
interaction.plot(data$sex,data$food,data$cons)
#######################################
# Post-hoc comparisons between groups #
TukeyHSD(fit)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = cons ~ food * sex, data = data)
##
## $food
## diff lwr upr p adj
## 2-1 0.1225 -0.3794661 0.6244661 0.6283313
##
## $sex
## diff lwr upr p adj
## male-female 0.253 -0.2489661 0.7549661 0.3186417
##
## $`food:sex`
## diff lwr upr p adj
## 2:female-1:female -0.9840 -1.9202631 -0.04773692 0.0356193
## 1:male-1:female -0.8535 -1.7897631 0.08276308 0.0867495
## 2:male-1:female 0.3755 -0.5607631 1.31176308 0.7185775
## 1:male-2:female 0.1305 -0.8057631 1.06676308 0.9830986
## 2:male-2:female 1.3595 0.4232369 2.29576308 0.0015454
## 2:male-1:male 1.2290 0.2927369 2.16526308 0.0050128
A combination of ANOVA and regression, can be regarded as:
Simplest case: One metric continuous response Y, one factor A (at least 2 levels) and one metric continuous variable X as predictors. In the simplest case we do not assume interaction between A and X.
\[ Y=\beta_0+\beta_1 \cdot X+\alpha_j+\epsilon \]
where:
Adjusted means are accounting for the effect of the covariate. They represent the group means at the overall grand mean of X. They are classically computed in case of a nuisance covariate when the ANOVA-component of the model is important and the range of X covered by the two groups differ.
\[ \bar y_{j(adj.)}=\bar y_j-b_1(\bar x_j-\bar x) \]
Assuming the simplest case of 1 factor and 1 covariate and no interaction included. 2 hypotheses (pairs):
\[ Y=\beta_0+\beta_1 \cdot X+\alpha_j+\epsilon \]
H0: All adjusted means are identical, no treatment effects (ANOVA hypothesis but considering simultaneous adjustment with the covariate). Given the assumption of slope homogeneity, this is identical to a test of equality of group intercepts!
H0: No effect of the covariate (as in simple linear regression).
Adding an interaction term:
\[ Y=\beta_0+\beta_1 \cdot X+\beta_2 \cdot A+\beta_3 \cdot X \cdot A \]
H0: No interaction. Equal slopes between groups. This is the easiest way to test for homogeneity of slopes!
Assumption and diagnostics as in ANOVA and LR:
# continuing analysis of log(TDN) in the Mara River
plot(log(data$TDN)~log(data$Q),col=data$landuse) # effect of catchment size?
summary(lm(log(TDN)~log(Q),data=data))
##
## Call:
## lm(formula = log(TDN) ~ log(Q), data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.5015 -0.7248 0.1641 0.6646 1.5661
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -8.96812 0.15466 -57.987 <2e-16 ***
## log(Q) -0.14861 0.05942 -2.501 0.0156 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.8718 on 52 degrees of freedom
## Multiple R-squared: 0.1074, Adjusted R-squared: 0.09021
## F-statistic: 6.255 on 1 and 52 DF, p-value: 0.01557
m1<-lm(log(TDN)~log(Q)+landuse,data=data)
anova(m1)
## Analysis of Variance Table
##
## Response: log(TDN)
## Df Sum Sq Mean Sq F value Pr(>F)
## log(Q) 1 4.7544 4.7544 10.333 0.00229 **
## landuse 2 16.5167 8.2584 17.948 1.333e-06 ***
## Residuals 50 23.0059 0.4601
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(m1)
##
## Call:
## lm(formula = log(TDN) ~ log(Q) + landuse, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.65826 -0.29533 -0.05748 0.30829 1.67994
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -8.61085 0.14768 -58.307 < 2e-16 ***
## log(Q) -0.04275 0.04991 -0.857 0.396
## landuseM -0.42214 0.23136 -1.825 0.074 .
## landuseF -1.40661 0.23767 -5.918 2.93e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.6783 on 50 degrees of freedom
## Multiple R-squared: 0.4804, Adjusted R-squared: 0.4492
## F-statistic: 15.41 on 3 and 50 DF, p-value: 3.155e-07
# with interaction: the effect of catchment size
# could differ between landuses
m2<-lm(log(TDN)~log(Q)*landuse,data=data)
anova(m2) # no interaction
## Analysis of Variance Table
##
## Response: log(TDN)
## Df Sum Sq Mean Sq F value Pr(>F)
## log(Q) 1 4.7544 4.7544 10.006 0.002706 **
## landuse 2 16.5167 8.2584 17.380 2.1e-06 ***
## log(Q):landuse 2 0.1977 0.0988 0.208 0.812949
## Residuals 48 22.8083 0.4752
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# adjusted means only make sense when slopes are homogeneous!
common.slope<-coefficients(m1)[2]
adj.m<-tapply(log(data$TDN),INDEX=data$landuse,FUN=mean)-
common.slope*
(tapply(log(data$Q),INDEX=data$landuse,FUN=mean)-mean(log(data$Q)))
adj.m
## A M F
## -8.682227 -9.104369 -10.088833
adj.m-adj.m[1] # these are the differences to reference level=A
## A M F
## 0.0000000 -0.4221428 -1.4066067
# same as in the summary(m1) output
# compare with unadjusted means
m<-tapply(X=log(data$TDN),INDEX=data$landuse,FUN=mean)
m-m[1] # these are the differences of unadjusted means to reference level=large
## A M F
## 0.0000000 -0.4678338 -1.4818796
# --> the inclusion of the covariate has reduced
# the "apparent" differences a bit
# checking assumptions
shapiro.test(m1$res)
##
## Shapiro-Wilk normality test
##
## data: m1$res
## W = 0.98259, p-value = 0.6172
bartlett.test(m1$res~data$landuse)
##
## Bartlett test of homogeneity of variances
##
## data: m1$res by data$landuse
## Bartlett's K-squared = 3.1891, df = 2, p-value = 0.203
layout(matrix(1:4,2,2))
hist(m1$res)
qqnorm(m1$res); qqline(m1$res)
plot(m1$res~m1$fitted) # unexplained relationship/pattern left?
plot(m1$res^2~m1$fitted) # variances homogeneous? should not show any trend