This article is a brief illustration of how to use
power4test()
from the package power4mome to do power
analysis of mediation, moderation, and moderated mediation in a model to
be fitted by structural equation modeling using lavaan
.
Basic knowledge about fitting models by lavaan
is
required. Readers are also expected to have basic knowledge of
mediation, moderation, and/or moderated mediation.
This is a brief illustration. More complicated scenarios and other
features of power4mome
will be described in other
vignettes.
Two functions are sufficient for estimating power given a model, population values, sample size, and the test to be used. This is the workflow:
Specify the model syntax for the population model, in
lavaan
style, and set the population values of the model
parameters.
Call power4test()
to examine the setup and the
datasets generated. Repeat this and previous steps until the model is
specified correctly.
Call power4test()
again, with the test to do
specified.
Call rejection_rates()
to compute the
power.
Let’s consider a simple mediation model. We would like to estimate the power of testing a mediation effect by Monte Carlo confidence interval.
This is the model syntax
Note that, even if we are going to test mediation, moderation, or
moderated mediation effects, we do not need to add any labels to this
model. This will be taken care of by the test functions, through the use
of the package manymome
(Cheung & Cheung, 2024).
There are two approaches to do this:
Using named vectors or lists.
Using a multiline string similar to lavaan
model
syntax.
The second approach is demonstrated below.
Suppose we want to estimate the power when:
The path from x
to m
are “large” in
strength.
The path from m
to y
are “medium” in
strength.
The path from x
to m
are “small” in
strength.
By default, power4mome
uses this convention for
regression path and correlation:1
Small: .10 (or -.10)
Medium: .30 (or -.30)
Large: .50 (or -.50)
For a product term in moderation, this is the convention:
Small: .05 (or -.05)
Medium: .10 (or -.10)
Large: .15 (or -.15)
All these values are for the standardized solution (the correlations and so-called “betas”).
The following string denotes the desired values:
Each line starts with a tag, which is the parameter
presented in lavaan
syntax. The tag ends with a colon,
:
.
After the colon is population value, which can be:
A string denoting the value. By default:
s
: Small. (-s
for small and
negative.)
m
: Medium. (-m
for medium and
negative.)
l
: Large. (-l
for large and
negative.)
nil
: Zero.
All other regression coefficients and covariances, if not specified in this string, are set to zero.
power4test()
to Check the ModelThese are the arguments used:
nrep
: The number of replications. In this stage, a
small number can be used. It is more important to have a large sample
size than to have many replications.
model
: The model syntax.
pop_es
: The string setting the population
values.
n
: The sample size in each replications. In this
stage, just for checking the model and the data generation, this number
can be set to a large value unless the model is slow to fit when the
sample size is large.
iseed
: If supplied, it is used to set the seed for
the random number generator. It is advised to always set this to an
arbitrary integer, to make the results reproducible.2
The population values can be shown by printing this object:
out
#>
#> ====================== Model Information ======================
#>
#> == Model on Factors/Variables ==
#>
#> m ~ x
#> y ~ m + x
#>
#> == Model on Variables/Indicators ==
#>
#> m ~ x
#> y ~ m + x
#>
#> ====== Population Values ======
#>
#> Regressions:
#> Population
#> m ~
#> x 0.500
#> y ~
#> m 0.300
#> x 0.100
#>
#> Variances:
#> Population
#> .m 0.750
#> .y 0.870
#> x 1.000
#>
#> (Computing indirect effects for 2 paths ...)
#>
#> == Population Conditional/Indirect Effect(s) ==
#>
#> == Indirect Effect(s) ==
#>
#> ind
#> x -> m -> y 0.150
#> x -> y 0.100
#>
#> - The 'ind' column shows the indirect effect(s).
#>
#> ======================= Data Information =======================
#>
#> Number of Replications: 2
#> Sample Sizes: 50000
#>
#> Call print with 'data_long = TRUE' for further information.
#>
#> ==================== Extra Element(s) Found ====================
#>
#> - fit
#>
#> === Element(s) of the First Dataset ===
#>
#> ============ <fit> ============
#>
#> lavaan 0.6-19 ended normally after 1 iteration
#>
#> Estimator ML
#> Optimization method NLMINB
#> Number of model parameters 5
#>
#> Number of observations 50000
#>
#> Model Test User Model:
#>
#> Test statistic 0.000
#> Degrees of freedom 0
By default, the population model will be fitted to each dataset,
hence the section <fit>
. Perfect fit is expected if
the population model is a saturated model.
We check the section Population Values
to see whether
the values are what we expected.
If they are different from what we expect, check the string for
pop_es
to see whether we set the population values
correctly.
If necessary, we can check the data generation by adding
data_long = TRUE
when printing the output:
print(out,
data_long = TRUE)
#>
#> ====================== Model Information ======================
#>
#> == Model on Factors/Variables ==
#>
#> m ~ x
#> y ~ m + x
#>
#> == Model on Variables/Indicators ==
#>
#> m ~ x
#> y ~ m + x
#>
#> ====== Population Values ======
#>
#> Regressions:
#> Population
#> m ~
#> x 0.500
#> y ~
#> m 0.300
#> x 0.100
#>
#> Variances:
#> Population
#> .m 0.750
#> .y 0.870
#> x 1.000
#>
#> (Computing indirect effects for 2 paths ...)
#>
#> == Population Conditional/Indirect Effect(s) ==
#>
#> == Indirect Effect(s) ==
#>
#> ind
#> x -> m -> y 0.150
#> x -> y 0.100
#>
#> - The 'ind' column shows the indirect effect(s).
#>
#> ======================= Data Information =======================
#>
#> Number of Replications: 2
#> Sample Sizes: 50000
#>
#> ==== Descriptive Statistics ====
#>
#> vars n mean sd skew kurtosis se
#> m 1 1e+05 0.00 1 0.01 0.03 0
#> y 2 1e+05 0.01 1 0.01 0.00 0
#> x 3 1e+05 0.00 1 0.01 0.01 0
#>
#> ===== Parameter Estimates Based on All 2 Samples Combined =====
#>
#> Total Sample Size: 100000
#>
#> ==== Standardized Estimates ====
#>
#> Variances and error variances omitted.
#>
#> Regressions:
#> est.std
#> m ~
#> x 0.500
#> y ~
#> m 0.295
#> x 0.102
#>
#>
#> ==================== Extra Element(s) Found ====================
#>
#> - fit
#>
#> === Element(s) of the First Dataset ===
#>
#> ============ <fit> ============
#>
#> lavaan 0.6-19 ended normally after 1 iteration
#>
#> Estimator ML
#> Optimization method NLMINB
#> Number of model parameters 5
#>
#> Number of observations 50000
#>
#> Model Test User Model:
#>
#> Test statistic 0.000
#> Degrees of freedom 0
The section Descriptive Statistics
, generated by
psych::describe()
, shows basic descriptive statistics for
the observed variables. As expected, they have means close to zero and
standard deviations close to one, because the datasets were generated
using the standardized model.
The section Parameter Estimates Based on
shows the
parameter estimates when the population model is fitted to
all the datasets combined. When the total sample size is large,
these estimates should be close to the population values.
The results show that we have specified the population model correctly. We can proceed to specify the test and estimate the power.
power4test()
to Do the Target TestWe can now do the simulation to estimate power. A large number of datasets (e.g., 500) of the target sample size are to be generated, and then the target test will be conducted in each of these datasets.
Suppose we would like to estimate the power of using Monte Carlo
confidence intervals to test the indirect effect from x
to
y
through m
, when sample size is 50. This is
the call:
out <- power4test(nrep = 400,
model = mod,
pop_es = mod_es,
n = 50,
R = 2000,
ci_type = "mc",
test_fun = test_indirect_effect,
test_args = list(x = "x",
m = "m",
y = "y",
mc_ci = TRUE),
iseed = 1234,
parallel = TRUE)
These are the new arguments used:
R
: The number of replications used to generate the
Monte Carlo simulated estimates, 2000 in this example. In real studies,
this number should be 10000 or even 20000 for Monte Carlo confidence
intervals. However, 2000 is sufficient because the goal is to estimate
power by generating many intervals, rather than to have one single
stable interval.
ci_type
: The method used to generate estimates.
Support both Monte Carlo ("mc"
) and nonparametric
bootstrapping ("boot"
).3 Although bootstrapping is usually used to
test an indirect effect, it is very slow to do R
bootstrapping in nrep
datasets (the model will be fitted
R * nrep
times). Therefore, it is preferable to use Monte
Carlo confidence intervals to do the initial estimation.
test_fun
: The function to be used to do the test for
each replication. Any function following a specific requirement can be
used, and power4mome
comes with several built-in functions
for some common tests. The function test_indirect_effect()
is used to test an indirect effect in the model.
test_args
: A named list of arguments to be supplied
to test_fun
. For test_indirect_effect()
, it is
a named list specifying the predictor (x
), the mediator(s)
(m
), and the outcome (y
). A path with any
number of mediators can be supported. Please refer to the help page of
test_indirect_effect()
.4
parallel
: If the test to be conducted is slow, which
is the case for tests done by Monte Carlo or nonparametric bootstrapping
confidence intervals, it is advised to enable parallel processing by
setting parallel
to TRUE
.5
For nrep = 400
, the 95% confidence limits for a power of
.80 are about .04 below and above .80. This should be precise enough for
determining whether a sample size has sufficient power.
This is the default printout:
out
#>
#> ====================== Model Information ======================
#>
#> == Model on Factors/Variables ==
#>
#> m ~ x
#> y ~ m + x
#>
#> == Model on Variables/Indicators ==
#>
#> m ~ x
#> y ~ m + x
#>
#> ====== Population Values ======
#>
#> Regressions:
#> Population
#> m ~
#> x 0.500
#> y ~
#> m 0.300
#> x 0.100
#>
#> Variances:
#> Population
#> .m 0.750
#> .y 0.870
#> x 1.000
#>
#> (Computing indirect effects for 2 paths ...)
#>
#> == Population Conditional/Indirect Effect(s) ==
#>
#> == Indirect Effect(s) ==
#>
#> ind
#> x -> m -> y 0.150
#> x -> y 0.100
#>
#> - The 'ind' column shows the indirect effect(s).
#>
#> ======================= Data Information =======================
#>
#> Number of Replications: 400
#> Sample Sizes: 50
#>
#> Call print with 'data_long = TRUE' for further information.
#>
#> ==================== Extra Element(s) Found ====================
#>
#> - fit
#> - mc_out
#>
#> === Element(s) of the First Dataset ===
#>
#> ============ <fit> ============
#>
#> lavaan 0.6-19 ended normally after 1 iteration
#>
#> Estimator ML
#> Optimization method NLMINB
#> Number of model parameters 5
#>
#> Number of observations 50
#>
#> Model Test User Model:
#>
#> Test statistic 0.000
#> Degrees of freedom 0
#>
#> =========== <mc_out> ===========
#>
#>
#> == A 'mc_out' class object ==
#>
#> Number of Monte Carlo replications: 2000
#>
#>
#> ====================== Test(s) Conducted ======================
#>
#> - test_indirect: x->m->y
#>
#> Call print() and set 'test_long = TRUE' for a detailed report.
If test_long = TRUE
is added when printing the output by
print()
, a summary of the test will also be printed.
The summary of the test:
#>
#> Test statistic 0.000
#> Degrees of freedom 0
#>
#> =========== <mc_out> ===========
#>
#>
#> == A 'mc_out' class object ==
#>
#> Number of Monte Carlo replications: 2000
The mean of the estimates across all the replications is 0.152, close to the population value.
The power estimate is simply the proportion of significant results,
the rejection rate, because the null hypothesis is
false. In addition to using test_long = TRUE
in
print()
, the rejection rate can also be retrieved by
rejection_rates()
.
out_power <- rejection_rates(out)
out_power
#> [test]: test_indirect: x->m->y
#> [test_label]: Test
#> est p.v reject r.cilo r.cihi
#> 1 0.152 1.000 0.468 0.419 0.516
#> Notes:
#> - p.v: The proportion of valid replications.
#> - est: The mean of the estimates in a test across replications.
#> - reject: The proportion of 'significant' replications, that is, the
#> rejection rate. If the null hypothesis is true, this is the Type I
#> error rate. If the null hypothesis is false, this is the power.
#> - r.cilo,r.cihi: The confidence interval of the rejection rate, based
#> on normal approximation.
#> - Refer to the tests for the meanings of other columns.
In the example above, the estimated power of the test of the indirect
effect, conducted by Monte Carlo confidence intervals, is 0.468, under
the column reject
.
p.v
is the proportion of valid results across
replications. 1.000
means that the test conducted normally
in all replications.
By default, the 95% confidence interval of the rejection rate (power)
based on normal approximation is also printed, under the column
r.cilo
and r.cihi
. In this example, the 95%
confidence interval is [0.419; 0.516].
Let’s consider a moderation model, with some control variables.
This model has only moderation, with the predictor x
and
the moderator w
. The product term is included in the
lavaan
style, x:w
.
It is unrealistic to specific the population values for all control
variables. Therefore, we can just add a proxy,
control
to represent the set of control variables
that may be included.
This is the syntax for the population values:
This example introduces one useful tag, .beta.
For a
model with many paths, it is inconvenient to specify all of them
manually. The tag .beta.
specifies the default
value for all regression paths not specified explicitly,
which is small (.10) in this example. If a path is explicitly included
(such as y ~ control
and y ~ x:w
), the
manually specified value will be used instead of
.beta.
.
This example also illustrates that we can set the population values
for correlations (covariances in the standardized solution).
Control variables are included usually because they may correlate with
the predictors. Therefore, in this example, it is hypothesized that
there is a small correlation between x
and the proxy
control variable (x ~~ control: s
).
Last, recall from this section that the
convention for product term values is different: l
denotes
.15 for product terms.
power4test()
to Check the ModelWe check the model first:
print(out2,
data_long = TRUE)
#>
#> ====================== Model Information ======================
#>
#> == Model on Factors/Variables ==
#>
#> y ~ x + w + x:w + control
#>
#> == Model on Variables/Indicators ==
#>
#> y ~ x + w + x:w + control
#>
#> ====== Population Values ======
#>
#> Regressions:
#> Population
#> y ~
#> x 0.100
#> w 0.100
#> x:w 0.150
#> control 0.100
#>
#> Covariances:
#> Population
#> x ~~
#> w 0.000
#> x:w 0.000
#> control 0.100
#> w ~~
#> x:w 0.000
#> control 0.000
#> x:w ~~
#> control 0.000
#>
#> Variances:
#> Population
#> .y 0.945
#> x 1.000
#> w 1.000
#> x:w 1.000
#> control 1.000
#>
#> (Computing indirect effects for 1 paths ...)
#>
#> (Computing conditional effects for 2 paths ...)
#>
#> == Population Conditional/Indirect Effect(s) ==
#>
#> == Effect(s) ==
#>
#> ind
#> control -> y 0.100
#>
#> - The 'ind' column shows the effect(s).
#>
#> == Conditional effects ==
#>
#> Path: x -> y
#> Conditional on moderator(s): w
#> Moderator(s) represented by: w
#>
#> [w] (w) ind
#> 1 M+1.0SD 1 0.250
#> 2 Mean 0 0.100
#> 3 M-1.0SD -1 -0.050
#>
#> - The 'ind' column shows the conditional effects.
#>
#>
#> == Conditional effects ==
#>
#> Path: w -> y
#> Conditional on moderator(s): x
#> Moderator(s) represented by: x
#>
#> [x] (x) ind
#> 1 M+1.0SD 1 0.250
#> 2 Mean 0 0.100
#> 3 M-1.0SD -1 -0.050
#>
#> - The 'ind' column shows the conditional effects.
#>
#>
#> ======================= Data Information =======================
#>
#> Number of Replications: 2
#> Sample Sizes: 50000
#>
#> ==== Descriptive Statistics ====
#>
#> vars n mean sd skew kurtosis se
#> y 1 1e+05 0.00 1 0.02 0.01 0
#> x 2 1e+05 0.00 1 0.01 0.01 0
#> w 3 1e+05 0.00 1 0.00 -0.02 0
#> x:w 4 1e+05 0.00 1 -0.03 6.01 0
#> control 5 1e+05 0.01 1 0.00 0.00 0
#>
#> ===== Parameter Estimates Based on All 2 Samples Combined =====
#>
#> Total Sample Size: 100000
#>
#> ==== Standardized Estimates ====
#>
#> Variances and error variances omitted.
#>
#> Regressions:
#> est.std
#> y ~
#> x 0.097
#> w 0.104
#> x:w 0.152
#> control 0.099
#>
#> Covariances:
#> est.std
#> x ~~
#> w -0.003
#> x:w 0.002
#> control 0.101
#> w ~~
#> x:w -0.004
#> control 0.001
#> x:w ~~
#> control 0.004
#>
#>
#> ==================== Extra Element(s) Found ====================
#>
#> - fit
#>
#> === Element(s) of the First Dataset ===
#>
#> ============ <fit> ============
#>
#> lavaan 0.6-19 ended normally after 1 iteration
#>
#> Estimator ML
#> Optimization method NLMINB
#> Number of model parameters 5
#>
#> Number of observations 50000
#>
#> Model Test User Model:
#>
#> Test statistic 0.000
#> Degrees of freedom 0
The population values for the regression paths are what we specified, and the estimates based on 5 × 104 by 2 or 100000 support that the dataset were generated correctly.
NOTE: If a product term is involved, and the component terms
(x
and w
in this example) are correlated, the
population standard deviation of this product term may not be equal to
one (Bohrnstedt & Goldberger, 1969).
Therefore, the model can be specified correctly even if the standard
deviations of product terms in the section
Descriptive Statistics
are not close to one.
power4test()
to Test The Moderation EffectWe can now do the simulation to estimate power. In this simple model,
the test is just a test of the product term, x:w
. This
model can be fitted by linear regression using lm()
. Let’s
estimate the power when the sample size is 50 and the model is fitted by
lm()
:
out2 <- power4test(nrep = 400,
model = mod2,
pop_es = mod2_es,
n = 100,
fit_model_args = list(fit_function = "lm"),
test_fun = test_moderation,
iseed = 1234,
parallel = TRUE)
These are the new arguments used:
fit_model_args
: This named list stores additional
arguments for fit_model()
. By default,
lavaan::sem()
is used. To fit the model by linear
regression using lm()
, add fit_function = "lm"
to the list.6
test_fun
: It is set to test_moderation
,
provided by power4mome
. This function automatically
identifies all product terms in a model and test them. The test used
depends on method used to fit the model. If lm()
is used,
then the usual t test is used.7
We can ues rejection_rates()
again to estimate the
power:
out2_power <- rejection_rates(out2)
out2_power
#> [test]: test_moderation: CIs
#> [test_label]: y~x:w
#> est p.v reject r.cilo r.cihi
#> 1 0.158 1.000 0.347 0.301 0.394
#> Notes:
#> - p.v: The proportion of valid replications.
#> - est: The mean of the estimates in a test across replications.
#> - reject: The proportion of 'significant' replications, that is, the
#> rejection rate. If the null hypothesis is true, this is the Type I
#> error rate. If the null hypothesis is false, this is the power.
#> - r.cilo,r.cihi: The confidence interval of the rejection rate, based
#> on normal approximation.
#> - Refer to the tests for the meanings of other columns.
The estimated power of the test of the product term,
x:w
, is 0.347, with 95% confidence interval [0.301;
0.394].
Let’s consider a moderated mediation model.
This model is a mediation model with the a-path,
m ~ x
, moderated by w
. As explained before,
there is no need to use any label nor define and parameters. This will
be handled by the test function to be used.
This is the syntax for the population values:
Please refer to the previous section on setting up this syntax.
power4test()
to Check the ModelWe check the model first:
print(out3,
data_long = TRUE)
#>
#> ====================== Model Information ======================
#>
#> == Model on Factors/Variables ==
#>
#> m ~ x + w + x:w
#> y ~ m + x
#>
#> == Model on Variables/Indicators ==
#>
#> m ~ x + w + x:w
#> y ~ m + x
#>
#> ====== Population Values ======
#>
#> Regressions:
#> Population
#> m ~
#> x 0.300
#> w 0.100
#> x:w 0.050
#> y ~
#> m 0.300
#> x 0.100
#>
#> Covariances:
#> Population
#> x ~~
#> w 0.000
#> x:w 0.000
#> w ~~
#> x:w 0.000
#>
#> Variances:
#> Population
#> .m 0.898
#> .y 0.881
#> x 1.000
#> w 1.000
#> x:w 1.000
#>
#> (Computing indirect effects for 1 paths ...)
#>
#> (Computing conditional effects for 2 paths ...)
#>
#> == Population Conditional/Indirect Effect(s) ==
#>
#> == Effect(s) ==
#>
#> ind
#> x -> y 0.100
#>
#> - The 'ind' column shows the effect(s).
#>
#> == Conditional indirect effects ==
#>
#> Path: x -> m -> y
#> Conditional on moderator(s): w
#> Moderator(s) represented by: w
#>
#> [w] (w) ind m~x y~m
#> 1 M+1.0SD 1 0.105 0.350 0.300
#> 2 Mean 0 0.090 0.300 0.300
#> 3 M-1.0SD -1 0.075 0.250 0.300
#>
#> - The 'ind' column shows the conditional indirect effects.
#> - 'm~x','y~m' is/are the path coefficient(s) along the path conditional
#> on the moderator(s).
#>
#>
#> == Conditional indirect effects ==
#>
#> Path: w -> m -> y
#> Conditional on moderator(s): x
#> Moderator(s) represented by: x
#>
#> [x] (x) ind m~w y~m
#> 1 M+1.0SD 1 0.045 0.150 0.300
#> 2 Mean 0 0.030 0.100 0.300
#> 3 M-1.0SD -1 0.015 0.050 0.300
#>
#> - The 'ind' column shows the conditional indirect effects.
#> - 'm~w','y~m' is/are the path coefficient(s) along the path conditional
#> on the moderator(s).
#>
#>
#> ======================= Data Information =======================
#>
#> Number of Replications: 2
#> Sample Sizes: 50000
#>
#> ==== Descriptive Statistics ====
#>
#> vars n mean sd skew kurtosis se
#> m 1 1e+05 0 1 0.03 0.03 0
#> y 2 1e+05 0 1 0.01 -0.01 0
#> x 3 1e+05 0 1 0.00 -0.02 0
#> w 4 1e+05 0 1 0.00 0.01 0
#> x:w 5 1e+05 0 1 0.04 5.92 0
#>
#> ===== Parameter Estimates Based on All 2 Samples Combined =====
#>
#> Total Sample Size: 100000
#>
#> ==== Standardized Estimates ====
#>
#> Variances and error variances omitted.
#>
#> Regressions:
#> est.std
#> m ~
#> x 0.303
#> w 0.099
#> x:w 0.052
#> y ~
#> m 0.299
#> x 0.098
#>
#> Covariances:
#> est.std
#> x ~~
#> w 0.003
#> x:w -0.001
#> w ~~
#> x:w 0.008
#>
#>
#> ==================== Extra Element(s) Found ====================
#>
#> - fit
#>
#> === Element(s) of the First Dataset ===
#>
#> ============ <fit> ============
#>
#> lavaan 0.6-19 ended normally after 1 iteration
#>
#> Estimator ML
#> Optimization method NLMINB
#> Number of model parameters 7
#>
#> Number of observations 50000
#>
#> Model Test User Model:
#>
#> Test statistic 0.007
#> Degrees of freedom 2
#> P-value (Chi-square) 0.997
The population values and the estimates based on 5 × 104 by 2 or 100000 are what we expect.
power4test()
to Test The Moderated Mediation
EffectTo estimate the power of a moderated mediation effect, we can test
the index of moderated mediation (Hayes,
2015). In this example, it is the product of the coefficient
m ~ x:w
and the coefficient y ~ m
. This can be
done by the test function test_index_of_mome()
, provided by
power4mome
. Again, Monte Carlo confidence interval is
used.
Let’s estimate the power when sample size is 100.
out3 <- power4test(nrep = 400,
model = mod3,
pop_es = mod3_es,
n = 100,
R = 2000,
ci_type = "mc",
test_fun = test_index_of_mome,
test_args = list(x = "x",
m = "m",
y = "y",
w = "w",
mc_ci = TRUE),
iseed = 1234,
parallel = TRUE)
The call is similar to the one used in testing mediation.
This is the new argument used:
test_fun
: It is set to
test_index_of_mome()
in this example. This function is
similar to test_indirect_effect()
, with one more argument,
w
, for the moderator. Although this example has only one
mediator, it support any number of mediators along a path.8We can ues rejection_rates()
again to estimate the
power:
out3_power <- rejection_rates(out3)
out3_power
#> [test]: test_index_of_mome: x->m->y, moderated by w
#> [test_label]: Test
#> est p.v reject r.cilo r.cihi
#> 1 0.016 1.000 0.055 0.033 0.077
#> Notes:
#> - p.v: The proportion of valid replications.
#> - est: The mean of the estimates in a test across replications.
#> - reject: The proportion of 'significant' replications, that is, the
#> rejection rate. If the null hypothesis is true, this is the Type I
#> error rate. If the null hypothesis is false, this is the power.
#> - r.cilo,r.cihi: The confidence interval of the rejection rate, based
#> on normal approximation.
#> - Refer to the tests for the meanings of other columns.
The estimated power of the test of moderated mediation effect, conducted by a test of the index of moderated mediation, is 0.055, 95% confidence interval [0.033; 0.077].
Unlike the previous example on moderation tested by regression, estimating the power of Monte Carlo confidence intervals is substantially slower. However, this is necessary because Monte Carlo or nonparametric bootstrapping confidence interval is the test usually used in moderated mediation (and mediation).
The function power4test()
also supports redoing
an analysis using a new value for the sample size (or population effect
sizes set to pop_es
). Simply
set the output of power4test
as the first argument,
and
set the new value for n
.
For example, we can repeat the simulation for the test of moderation
above, but for a sample size of 200. We simply
call power4test()
again, set the previous output
(out2
in the example for moderation) as the first argument,
and set n
to a new value (200 in this example):
This is the estimated power when the sample size is 200.
out2_new_n_reject <- rejection_rates(out2_new_n)
out2_new_n_reject
#> [test]: test_moderation: CIs
#> [test_label]: y~x:w
#> est p.v reject r.cilo r.cihi
#> 1 0.148 1.000 0.527 0.479 0.576
#> Notes:
#> - p.v: The proportion of valid replications.
#> - est: The mean of the estimates in a test across replications.
#> - reject: The proportion of 'significant' replications, that is, the
#> rejection rate. If the null hypothesis is true, this is the Type I
#> error rate. If the null hypothesis is false, this is the power.
#> - r.cilo,r.cihi: The confidence interval of the rejection rate, based
#> on normal approximation.
#> - Refer to the tests for the meanings of other columns.
The estimated power is 0.527, 95% confidence interval [0.479; 0.576], when the sample size is 200.
This technique can be repeated to find the required sample size for a target power, and can be used for all the other scenarios covered above, such as mediation and moderated mediation.
There are several more efficient ways to find the sample size with the desired power.
n_region_from_power()
The function n_region_from_power()
can be used to find
the region of sample sizes likely to have the desired power. If
the default settings are to be used, then it can be called directly on
the output of power4test()
:
This is the recommended way for sample size planning, when there is no predetermined range of sample sizes.
See the
templates for examples on using n_region_from_power()
for common models.
power4test_by_n()
The function power4test_by_n()
can be used To estimate
the power for a sequence of sample sizes. For example, we can estimate
the power in the moderation model above for these sample sizes: 250,
300, 350, 400.
The first argument is the output of power4test()
for an
arbitrary sample size.
The argument n
is a numeric vector of sample sizes to
examine.
The argument by_seed
, if set to an integer, try to make
the results reproducible.
The call will take some times to run because it is equivalent to
calling power4test()
once for each sample size.
The rejection rates for each sample size can be retrieved by
rejection_rates()
too:
rejection_rates(out2_several_ns)
#> [test]: test_moderation: CIs
#> [test_label]: y~x:w
#> n est p.v reject r.cilo r.cihi
#> 1 250 0.149 1.000 0.660 0.614 0.706
#> 2 300 0.150 1.000 0.733 0.689 0.776
#> 3 350 0.151 1.000 0.810 0.772 0.848
#> 4 400 0.150 1.000 0.850 0.815 0.885
#> Notes:
#> - n: The sample size in a trial.
#> - p.v: The proportion of valid replications.
#> - est: The mean of the estimates in a test across replications.
#> - reject: The proportion of 'significant' replications, that is, the
#> rejection rate. If the null hypothesis is true, this is the Type I
#> error rate. If the null hypothesis is false, this is the power.
#> - r.cilo,r.cihi: The confidence interval of the rejection rate, based
#> on normal approximation.
#> - Refer to the tests for the meanings of other columns.
The results show that, to have a power of about .800 to detect the moderation effect, a sample size of about 350 is needed.
This approach is used when the range of sample sizes has already been decided and the levels of power are needed to determine the final sample size.
Please refer to the help page of power4test_by_n()
for
other examples.
x_from_power()
The function x_from_power()
can be used to
systematically search within an interval the sample size with the target
power. This takes longer to run but, instead of manually trying
different sample size, this function do the search automatically.
This approach can be used when the goal is to find the probable
minimum or maximum sample size with the desired level of power. The
first approach, using n_region_from_power()
, simply uses
this approach twice to find the region of sample sizes.
See this
article for an illustration of how to use
x_from_power()
.
This brief illustration only covers the basic features of
power4mome
. These are other advanced features to be covered
in other articles:
There is no inherent restriction on the form of the model.
Typical models that can be specified in lavaan
model syntax
can be the population model, although there may be special models in
which power4test
does not yet support.
The population model can be a model with latent factors and
indicators. Nevertheless, users can specify only the relation among the
factors. There is no need to include indicators in the model syntax, and
also no need to manually specify the factor loadings. The number of
indicators for each factor and the factor loadings are set by the
argument number_of_indicators
and reliability
(see the help page of sim_data()
on how to set them). The
model syntax used to fit to the data will automatically include the
indicators. An introduction can be found in
vignette("power4test_latent_mediation")
. Examples can be
found in templates for
models with latent variables.
Though not illustrated above, estimating the power of tests conducted by nonparametric bootstrapping is supported, although it will take longer to run.
Although this package focuses on moderation, mediation, and
moderated mediation, in principle, the power of any test can be
estimated, as long as a test function for test_fun
is
available. Some other functions are provided with
power4mome
(e.g., test_parameters()
for
testing all free model parameters). See the help page of
do_test()
on how to write a function to do a test not
available in power4mome
.
When estimating power, usually the population model is fitted to
the data. However, it is possible to fit any other model to the
generated data. This can be done by using the argument
fit_model_args
to set the argument model
of
fit_model()
.
Preliminary support for multigroup model is available. See the
help pages of ptable_pop()
and pop_es_yaml()
on how to specify the population value syntax. Functions will be added
for tests relevant to multigroup models (e.g., testing the between-group
difference in an indirect effect).
Although we illustrated only rerunning an analysis with a new
sample size (n
), it is also possible to rerun an analysis
using a new population value for a parameter. This can be done by using
the previous output of power4test()
as the first argument,
and setting only pop_es
to a named vector:
x_fun
of power4test()
for details.lm()
(regression). To estimate power of testing
mediation or moderated mediation effects in models fitted by
lm()
, ci_type = "boot"
is needed.Users can specify the values directly if necessary.↩︎
The functions used are
parallel::clusterSetRNGStream()
for parallel processing,
and set.seed()
for serial processing.↩︎
They are implemented by manymome::do_mc()
and manymome::do_boot()
, respectively.↩︎
The test is implemented by
manymome::indirect()
.↩︎
The number of cores is determined automatically but can
be set directly by the ncores
argument.↩︎
See the help page of fit_model()
on other
available arguments.↩︎
The test name has CIs
in it but this is
equivalent to using the t test when the model is fitted by
lm()
.↩︎
The test is implemented by
manymome::test_index_of_mome()
.↩︎