方差分析的多重比较

用到[multcomp] (http://cran.r-project.org/web/packages/multcomp/index.html)包,并非方差分析完全教程

安装&加载

install.packages("multcomp")
library(multcomp)

数据

data(warpbreaks)
str(warpbreaks)
## 'data.frame':	54 obs. of  3 variables:
##  $ breaks : num  26 30 54 25 70 52 51 26 67 18 ...
##  $ wool   : Factor w/ 2 levels "A","B": 1 1 1 1 1 1 1 1 1 1 ...
##  $ tension: Factor w/ 3 levels "L","M","H": 1 1 1 1 1 1 1 1 1 2 ...

单因素方差分析

单因素(tension)三水平(L,M,H)方差分析

数据分布

#### Shapiro-Wilk 正态性检验

with(warpbreaks, tapply(breaks, tension, shapiro.test))
## $L
## 
## 	Shapiro-Wilk normality test
## 
## data:  X[[1L]]
## W = 0.9058, p-value = 0.07265
## 
## 
## $M
## 
## 	Shapiro-Wilk normality test
## 
## data:  X[[2L]]
## W = 0.9428, p-value = 0.3232
## 
## 
## $H
## 
## 	Shapiro-Wilk normality test
## 
## data:  X[[3L]]
## W = 0.9191, p-value = 0.1247

条形图

with(warpbreaks, tapply(breaks, tension, function(x) hist(x, freq = F)))

center center center

方差分析表

对比双因素结果

library(xtable)
amod <- aov(breaks ~ tension, data = warpbreaks)
anov.tb <- xtable(summary(amod), caption = "方差分析表")
print(anov.tb, type = "html", caption.placement = getOption("xtable.caption.placement", 
    "top"))

<TABLE border=1>

方差分析表 Df Sum Sq Mean Sq F value Pr(&gt F) tension 2 2034.26 1017.13 7.21 0.0018 Residuals 51 7198.56 141.15

</TABLE>

多重比较

Tukey’s multiple comparisons tests: The “Honestly Significantly Different” (HSD) test proposed by the statistician John Tukey is based on what is called the “studentized range distribution.”
M与L差异显著,H与L差异极显著,H与M差异不显著

multcomp包

tuk <- glht(amod, linfct = mcp(tension = "Tukey"))
summary(tuk)
## 
## 	 Simultaneous Tests for General Linear Hypotheses
## 
## Multiple Comparisons of Means: Tukey Contrasts
## 
## 
## Fit: aov(formula = breaks ~ tension, data = warpbreaks)
## 
## Linear Hypotheses:
##            Estimate Std. Error t value Pr(>|t|)   
## M - L == 0   -10.00       3.96   -2.53   0.0384 * 
## H - L == 0   -14.72       3.96   -3.72   0.0014 **
## H - M == 0    -4.72       3.96   -1.19   0.4631   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## (Adjusted p values reported -- single-step method)

基础包(stats)

(tuk.base <- TukeyHSD(amod, "tension"))
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = breaks ~ tension, data = warpbreaks)
## 
## $tension
##        diff    lwr     upr  p adj
## M-L -10.000 -19.56 -0.4402 0.0385
## H-L -14.722 -24.28 -5.1624 0.0014
## H-M  -4.722 -14.28  4.8376 0.4631

置信区间图

multcomp包

plot(print(confint(tuk)))

center

基础包(stats)

plot(tuk.base)

center

残差分布

残差 vs 拟合值可以检查残差是否有趋势或残差是否满足一致性;正态Q-Q图检查残差是否满足正态性。

plot(amod, which = 1:2)

center center

双因素方差分析

双因素(tension和wool)多水平(t:L,M,H; w:A,B)方差分析

数据分布

多因素方差分析中,数据要求满足因素水平交叉内的正态性,然而水平交叉内的自由度往往较小,直接的正态性检查不可操作,因此一般由残差的正态性来反映,见残差分布

条形图

with(warpbreaks, for (i in c("L", "H", "M")) {
    hist(warpbreaks[wool == "A" & tension == i, "breaks"], xlab = paste0("breaks A", 
        i), main = NULL)
})

center center center

with(warpbreaks, for (i in c("L", "H", "M")) {
    hist(warpbreaks[wool == "B" & tension == i, "breaks"], xlab = paste0("breaks B", 
        i), main = NULL)
})

center center center

方差分析表

比较单因素,双因素中tension的p值小一点点,wool不显著。

library(xtable)
amod2 <- aov(breaks ~ tension + wool, data = warpbreaks)
anov.tb <- xtable(summary(amod2), caption = "方差分析表")
print(anov.tb, type = "html", caption.placement = getOption("xtable.caption.placement", 
    "top"))

<TABLE border=1>

方差分析表 Df Sum Sq Mean Sq F value Pr(&gt F) tension 2 2034.26 1017.13 7.54 0.0014 wool 1 450.67 450.67 3.34 0.0736 Residuals 50 6747.89 134.96

</TABLE>

多重比较

tension同样是M与L差异显著,H与L差异极显著,H与M差异不显著

multcomp包

tuk2 <- glht(amod2, linfct = mcp(tension = "Tukey"))
summary(tuk2)
## 
## 	 Simultaneous Tests for General Linear Hypotheses
## 
## Multiple Comparisons of Means: Tukey Contrasts
## 
## 
## Fit: aov(formula = breaks ~ tension + wool, data = warpbreaks)
## 
## Linear Hypotheses:
##            Estimate Std. Error t value Pr(>|t|)   
## M - L == 0   -10.00       3.87   -2.58   0.0335 * 
## H - L == 0   -14.72       3.87   -3.80   0.0011 **
## H - M == 0    -4.72       3.87   -1.22   0.4474   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## (Adjusted p values reported -- single-step method)

基础包(stats)

(tuk.base2 <- TukeyHSD(amod2, "tension"))
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = breaks ~ tension + wool, data = warpbreaks)
## 
## $tension
##        diff    lwr     upr  p adj
## M-L -10.000 -19.35 -0.6466 0.0336
## H-L -14.722 -24.08 -5.3688 0.0011
## H-M  -4.722 -14.08  4.6312 0.4474

置信区间图

multcomp包

plot(print(confint(tuk2)))

center

基础包(stats)

plot(tuk.base2)

center

残差分布

残差 vs 拟合值可以检查残差是否有趋势或残差是否满足一致性;正态Q-Q图检查残差是否满足正态性。

plot(amod2, which = 1:2)

center center