Kickstarting R

Kickstarting R - T tests

Caveats first. Using appropriate statistics is not always easy. Please do not blame me when a reviewer caustically refers to the 56 uncorrected t-tests that you performed on your data as the work of a moron. The techniques explained here will probably be adequate for univariate experiments and confirmatory tests of group equality. This is not a statistics book.

Tests of between-group means

Again using the infert data set and the brkdn() function, let's look at the means of age for cases and non-cases.

> brkdn(age~case,infert)
                 0        1
Mean      31.49091 31.53012
Variance  27.60510 27.86189
n        165.00000 83.00000
attr(,"class")
[1] "dstat"

It looks as though the two groups have been age-matched. Try a t-test to see if there is a difference.

> t.test(subset(infert$age,infert$case == 0),
+ subset(infert$age,infert$case == 1))
 
         Welch Two Sample t-test
 
data:  subset(infert$age, infert$case == 0) and
 subset(infert$age, infert$case == 1)
t = -0.0553, df = 163.766, p-value = 0.956
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -1.439600  1.361177
sample estimates:
mean of x mean of y
 31.49091  31.53012

Bronwyn wants to know if those women in the sample who completed high school were significantly younger than those who did not.

> brkdn(age~education,infert)
           0-5yrs   6-11yrs   12+ yrs
Mean     35.25000  32.85000  29.72414
Variance 40.02273  28.66639  19.19280
n        12.00000 120.00000 116.00000
attr(,"class")
[1] "dstat"

She may have a case here, but first let's do something about that painful typing in of every subsetting operation. Have a look at the function group.t.test(). By calling this function as follows:

> group.t.test(infert$age,infert$education,"12+yrs")

we can specify a grouping factor of high school completion versus everything else. This function also allows us to test two specified groups against one another.

Before we leave t.test(), the ellipsis (...) at the end of the arguments means that you can pass additional arguments to t.test(). For example, you might want the 99% confidence interval displayed rather than the default 95% one.

        Welch Two Sample t-test

data:  age by as.factor(ifelse(infert$education != "12+ yrs", "<12 yrs", "12+ yrs"))
t = 5.3423, df = 243.997, p-value = 2.103e-07
alternative hypothesis: true difference in means is not equal to 0
99 percent confidence interval:
 1.718973 4.969115
sample estimates:
mean in group <12 yrs mean in group 12+ yrs
             33.06818              29.72414

Looks like Bronwyn was right.

For a much more detailed treatment of ANOVAs and other methods, get the VR package or Notes on the use of R..." in the Contributed documentation page.

Back to Table of Contents