用到[multcomp] (http://cran.r-project.org/web/packages/multcomp/index.html)包,并非方差分析完全教程
安装&加载
install.packages("multcomp")
library(multcomp)
数据
data(warpbreaks)
str(warpbreaks)
## 'data.frame': 54 obs. of 3 variables:
## $ breaks : num 26 30 54 25 70 52 51 26 67 18 ...
## $ wool : Factor w/ 2 levels "A","B": 1 1 1 1 1 1 1 1 1 1 ...
## $ tension: Factor w/ 3 levels "L","M","H": 1 1 1 1 1 1 1 1 1 2 ...
单因素方差分析
单因素(tension)三水平(L,M,H)方差分析
数据分布
#### Shapiro-Wilk 正态性检验
with(warpbreaks, tapply(breaks, tension, shapiro.test))
## $L
##
## Shapiro-Wilk normality test
##
## data: X[[1L]]
## W = 0.9058, p-value = 0.07265
##
##
## $M
##
## Shapiro-Wilk normality test
##
## data: X[[2L]]
## W = 0.9428, p-value = 0.3232
##
##
## $H
##
## Shapiro-Wilk normality test
##
## data: X[[3L]]
## W = 0.9191, p-value = 0.1247
条形图
with(warpbreaks, tapply(breaks, tension, function(x) hist(x, freq = F)))
方差分析表
library(xtable)
amod <- aov(breaks ~ tension, data = warpbreaks)
anov.tb <- xtable(summary(amod), caption = "方差分析表")
print(anov.tb, type = "html", caption.placement = getOption("xtable.caption.placement",
"top"))
<TABLE border=1>
</TABLE>
多重比较
Tukey’s multiple comparisons tests: The “Honestly Significantly Different” (HSD) test proposed by the statistician John Tukey is based on what is called the “studentized range distribution.”
M与L差异显著,H与L差异极显著,H与M差异不显著
multcomp包
tuk <- glht(amod, linfct = mcp(tension = "Tukey"))
summary(tuk)
##
## Simultaneous Tests for General Linear Hypotheses
##
## Multiple Comparisons of Means: Tukey Contrasts
##
##
## Fit: aov(formula = breaks ~ tension, data = warpbreaks)
##
## Linear Hypotheses:
## Estimate Std. Error t value Pr(>|t|)
## M - L == 0 -10.00 3.96 -2.53 0.0384 *
## H - L == 0 -14.72 3.96 -3.72 0.0014 **
## H - M == 0 -4.72 3.96 -1.19 0.4631
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## (Adjusted p values reported -- single-step method)
基础包(stats)
(tuk.base <- TukeyHSD(amod, "tension"))
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = breaks ~ tension, data = warpbreaks)
##
## $tension
## diff lwr upr p adj
## M-L -10.000 -19.56 -0.4402 0.0385
## H-L -14.722 -24.28 -5.1624 0.0014
## H-M -4.722 -14.28 4.8376 0.4631
置信区间图
multcomp包
plot(print(confint(tuk)))
基础包(stats)
plot(tuk.base)
残差分布
残差 vs 拟合值可以检查残差是否有趋势或残差是否满足一致性;正态Q-Q图检查残差是否满足正态性。
plot(amod, which = 1:2)
双因素方差分析
双因素(tension和wool)多水平(t:L,M,H; w:A,B)方差分析
数据分布
多因素方差分析中,数据要求满足因素水平交叉内的正态性,然而水平交叉内的自由度往往较小,直接的正态性检查不可操作,因此一般由残差的正态性来反映,见残差分布。
条形图
with(warpbreaks, for (i in c("L", "H", "M")) {
hist(warpbreaks[wool == "A" & tension == i, "breaks"], xlab = paste0("breaks A",
i), main = NULL)
})
with(warpbreaks, for (i in c("L", "H", "M")) {
hist(warpbreaks[wool == "B" & tension == i, "breaks"], xlab = paste0("breaks B",
i), main = NULL)
})
方差分析表
比较单因素,双因素中tension的p值小一点点,wool不显著。
library(xtable)
amod2 <- aov(breaks ~ tension + wool, data = warpbreaks)
anov.tb <- xtable(summary(amod2), caption = "方差分析表")
print(anov.tb, type = "html", caption.placement = getOption("xtable.caption.placement",
"top"))
<TABLE border=1>
</TABLE>
多重比较
tension同样是M与L差异显著,H与L差异极显著,H与M差异不显著
multcomp包
tuk2 <- glht(amod2, linfct = mcp(tension = "Tukey"))
summary(tuk2)
##
## Simultaneous Tests for General Linear Hypotheses
##
## Multiple Comparisons of Means: Tukey Contrasts
##
##
## Fit: aov(formula = breaks ~ tension + wool, data = warpbreaks)
##
## Linear Hypotheses:
## Estimate Std. Error t value Pr(>|t|)
## M - L == 0 -10.00 3.87 -2.58 0.0335 *
## H - L == 0 -14.72 3.87 -3.80 0.0011 **
## H - M == 0 -4.72 3.87 -1.22 0.4474
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## (Adjusted p values reported -- single-step method)
基础包(stats)
(tuk.base2 <- TukeyHSD(amod2, "tension"))
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = breaks ~ tension + wool, data = warpbreaks)
##
## $tension
## diff lwr upr p adj
## M-L -10.000 -19.35 -0.6466 0.0336
## H-L -14.722 -24.08 -5.3688 0.0011
## H-M -4.722 -14.08 4.6312 0.4474
置信区间图
multcomp包
plot(print(confint(tuk2)))
基础包(stats)
plot(tuk.base2)
残差分布
残差 vs 拟合值可以检查残差是否有趋势或残差是否满足一致性;正态Q-Q图检查残差是否满足正态性。
plot(amod2, which = 1:2)