Confidence intervals with proportions

Probably the most useful tools for data analysis is a plot with suitable error bars (Cousineau, Goulet, & Harding, 2021). In this vignette, we show how to make confidence intervals for proportions.

Theory behind Confidence intervals for proportions

For proportions, ANOPA is based on the Anscombe transform . This measure has a known theoretical standard error which depends only on sampe size \(n\):

\[SE_{A}(n) = 1/\sqrt{4(n+1/2)}.\]

Consequently, when the groups’ sizes are similar, homogeneity of variances holds.

From this, we can decomposed the total test statistic \(F\) into a component for each cell of the design. We thus get

\[\left[ A + z_{0.5-\gamma/2} \times SE_{A}(n), \; A + z_{0.5+\gamma/2} \times SE_{A}(n) \right]\]

in which \(SE_{A}(n)\) is the theoretical standard error based only on \(n\), and \(\gamma\) is the desired confidence level (often .95).

This technique returns stand-alone confidence intervals, that is, intervals which can be used to compare the proportion to a fixed point. However, such stand-alone intervals cannot be used to compare one proportion to another proportion (Cousineau et al., 2021). To compare an observed proportion to another observed proportion, it is necessary to adjust them for pair-wise differences (Baguley, 2012). This is achieved by increasing the wide of the intervals by \(\sqrt{2}\).

Also, in repeated measure designs, the correlation is beneficial to improve estimates. As such, the interval wide can be reduced when correlation is positive by multiplying its length by \(\sqrt{1-\alpha_1}\), where \(\alpha_1\) is a measure of correlation in a matrix containing repeated measures (based on the unitary alpha measure).

Finally, the above returns confidence intervals for the transformed scores. However, when used in a plot, it is typically more convenient to plot proportions (from 0 to 1) rather than Anscombe-scores (from 0 to \(\pi/2 \approx\) 1.57). Thus, it is possible to rescale the vertical axis using the inverse Anscombe transform and be shown proportions.

This is it.

Complicated?

Well, not really:

library(ANOPA)
w <- anopa( {success;total} ~ Class * Difficulty, twoWayExample)
anopaPlot(w) 
Figure 1. The proportions as a function of class and Difficulty. Error bars show difference-adjusted 95% confidence intervals.
Figure 1. The proportions as a function of class and Difficulty. Error bars show difference-adjusted 95% confidence intervals.

Because the analyses summary(w) suggests that only the factor Difficulty has a significant effect, you may select only that factors for plotting, with e.g.,

anopaPlot(w, ~ Difficulty ) 
Figure 2. The proportions as a function of Difficulty only. Error bars show difference-adjusted 95% confidence intervals.
Figure 2. The proportions as a function of Difficulty only. Error bars show difference-adjusted 95% confidence intervals.

As is the case with any ggplot2 figure, you can customize it at will. For example,

library(ggplot2)
anopaPlot(w, ~ Difficulty) + 
            theme_bw() +  # change theme
            scale_x_discrete(limits = c("Easy", "Moderate", "Difficult")) #change order
Figure 3. Same as Figure 2 with some visual improvements.
Figure 3. Same as Figure 2 with some visual improvements.

As you can see from this plot, Difficulty is very significant, and the most different conditions are Easy vs. Difficult.

Here you go.

References

Baguley, T. (2012). Calculating and graphing within-subject confidence intervals for ANOVA. Behavior Research Methods, 44, 158–175. https://doi.org/10.3758/s13428-011-0123-7
Cousineau, D., Goulet, M.-A., & Harding, B. (2021). Summary plots with adjusted error bars: The superb framework with an implementation in R. Advances in Methods and Practices in Psychological Science, 4, 1–18. https://doi.org/10.1177/25152459211035109