In an effort to make `TOSTER`

more informative and easier
to use, I created the functions `t_TOST`

and
`simple_htest`

. These function operates very similarly to
base R’s `t.test`

function with a few exceptions. First,
`t_TOST`

performs 3 t-tests (one two-tailed and two
one-tailed tests). Second, `simple_htest`

allows you to run
equivalence testing or minimal effects testing using a t-test or
Wilcoxon-Mann-Whitney tests using the `alternative`

argument
and the output is the same as `t.test`

or
`wilcox.test`

(in that the object is of the class
`htest`

). In addition, these functions have a generic method
where two vectors can be supplied or a formula can be given
(e.g.,`y ~ group`

). These functions make it easier to switch
between types of t-tests. All three types (two sample, one sample, and
paired samples) can be performed/calculated from the same function.
Moreover, the summary information and visualizations have been upgraded.
This should make the decisions derived from the function more
informative and user-friendly.

These functions are not limited to equivalence tests. Minimal effects
testing (MET) is possible. MET is useful for situations where the
hypothesis is about a minimal effect and the null hypothesis *is*
equivalence.

In the general introduction to this package, we detailed how to look
at *old* results and how to apply TOST to interpreting those
results. However, in many cases, users may have new data that needs to
be analyzed. Therefore, `t_TOST`

and
`simple_htest`

can be applied to new data. This vignette will
use the `iris`

and the `sleep`

data.

```
data('sleep')
data('iris')
```

For this example, we will use the sleep data. In this data there is a
`group`

variable and an outcome `extra`

.

```
head(sleep)
#> extra group ID
#> 1 0.7 1 1
#> 2 -1.6 1 2
#> 3 -0.2 1 3
#> 4 -1.2 1 4
#> 5 -0.1 1 5
#> 6 3.4 1 6
```

We will assume the data are independent, and that we have equivalence
bounds of +/- 0.5 raw units. All we need to do is provide the
`formula`

, `data`

, and `eqb`

arguments
for the function to run appropriately. In addition, we can set the
`var.equal`

argument (to assume equal variance), and the
`paired`

argument (sets if the data is paired or not). Both
are logical indicators that can be set to TRUE or FALSE. The
`alpha`

is automatically set to 0.05 but this can also be
adjusted by the user. The Hedges correction is also automatically
calculated, but this can be overridden with the
`bias_correction`

argument. The `hypothesis`

is
automatically set to “EQU” for equivalence but if a minimal effect is of
interest then “MET” can be supplied. Note: for this example, we will set
`smd_ci`

to “goulet” since it will reduce the time to produce
plots.

```
= t_TOST(formula = extra ~ group,
res1 data = sleep,
eqb = .5,
smd_ci = "goulet")
= t_TOST(x = subset(sleep,group==1)$extra,
res1a y = subset(sleep,group==2)$extra,
eqb = .5)
```

We can also using the “simpler” approach with
`simple_htest`

.

```
# Simple htest
= simple_htest(formula = extra ~ group,
res1b data = sleep,
mu = .5, # set equivalence bound
alternative = "e")
```

Once the function has run, we can print the results with the
`print`

command. This provides a verbose summary of the
results.

```
# t_TOST
print(res1)
#>
#> Welch Two Sample t-test
#>
#> The equivalence test was non-significant, t(17.78) = -1.3, p = 0.89
#> The null hypothesis test was non-significant, t(17.78) = -1.86p = 0.08
#> NHST: don't reject null significance hypothesis that the effect is equal to zero
#> TOST: don't reject null equivalence hypothesis
#>
#> TOST Results
#> t df p.value
#> t-test -1.861 17.78 0.079
#> TOST Lower -1.272 17.78 0.890
#> TOST Upper -2.450 17.78 0.012
#>
#> Effect Sizes
#> Estimate SE C.I. Conf. Level
#> Raw -1.5800 0.8491 [-3.0534, -0.1066] 0.9
#> Hedges's g(av) -0.7965 0.4976 [-1.6843, -0.0615] 0.9
#> Note: SMD confidence intervals are an approximation. See vignette("SMD_calcs").
# htest
print(res1b)
#>
#> Welch Two Sample t-test
#>
#> data: extra by group
#> t = -1.2719, df = 17.776, p-value = 0.8901
#> alternative hypothesis: equivalence
#> null values:
#> difference in means difference in means
#> -0.5 0.5
#> 90 percent confidence interval:
#> -3.0533815 -0.1066185
#> sample estimates:
#> mean of x mean of y
#> 0.75 2.33
```

Another nice feature is the generic `plot`

method that can
provide a visual summary of the results (only available for
`t_TOST`

). All of the plots in this package were inspired by
the concurve R
package. There are two types of plots that can be produced. The first,
and default, is the consonance density plot
(`type = "cd"`

).

`plot(res1, type = "cd")`

The shading pattern can be modified with the
`ci_shades`

.

```
# Set to shade only the 90% and 95% CI areas
plot(res1, type = "cd",
ci_shades = c(.9,.95))
```

Consonance plots, where all confidence intervals can be simultaneous plotted, can also be produced. The advantage here is multiple confidence interval lines can plotted at once.

```
plot(res1, type = "c",
ci_lines = c(.9,.95))
```

A description of the results can also be produced with the
`describe`

or `describe_htest`

method and function
respectively.

```
describe(res1)
describe_htest(res1b)
```

Using the Welch Two Sample t-test, a null hypothesis significance test (NHST), and a equivalence test, via two one-sided tests (TOST), were performed with an alpha-level of 0.05. These tested the null hypotheses that true mean difference is equal to 0 (NHST), and true mean difference is more extreme than -0.5 and 0.5 (TOST). Both the equivalence test (p = 0.89), and the NHST (p = 0.079) were not significant (mean difference = -1.58 90% C.I.[-3.05, -0.107]; Hedges’s g(av) = -0.796 90% C.I.[-1.68, -0.0615]). Therefore, the results are inconclusive: neither null hypothesis can be rejected.

The Welch Two Sample t-test is not statistically significant (t(17.776) = -1.27, p = 0.89, mean of x = 0.75, mean of y = 2.33, 90% C.I.[-3.05, -0.107]) at a 0.05 alpha-level. The null hypothesis cannot be rejected. At the desired error rate, it cannot be stated that the true difference in means is between -0.5 and 0.5.

To perform a paired samples TOST, the process does not change much.
We could process the test the same way by providing a formula. All we
would need to then is change `paired`

to TRUE.

```
= t_TOST(formula = extra ~ group,
res2 data = sleep,
paired = TRUE,
eqb = .5)
res2#>
#> Paired t-test
#>
#> The equivalence test was non-significant, t(9) = -2.8, p = 0.99
#> The null hypothesis test was significant, t(9) = -4.06p < 0.01
#> NHST: reject null significance hypothesis that the effect is equal to zero
#> TOST: don't reject null equivalence hypothesis
#>
#> TOST Results
#> t df p.value
#> t-test -4.062 9 0.003
#> TOST Lower -2.777 9 0.989
#> TOST Upper -5.348 9 < 0.001
#>
#> Effect Sizes
#> Estimate SE C.I. Conf. Level
#> Raw -1.580 0.389 [-2.293, -0.867] 0.9
#> Hedges's g(z) -1.174 0.411 [-1.8046, -0.4977] 0.9
#> Note: SMD confidence intervals are an approximation. See vignette("SMD_calcs").
= simple_htest(
res2b formula = extra ~ group,
data = sleep,
paired = TRUE,
mu = .5,
alternative = "e")
res2b#>
#> Paired t-test
#>
#> data: extra by group
#> t = -2.7766, df = 9, p-value = 0.9892
#> alternative hypothesis: equivalence
#> null values:
#> mean difference mean difference
#> -0.5 0.5
#> 90 percent confidence interval:
#> -2.2930053 -0.8669947
#> sample estimates:
#> mean difference
#> -1.58
```

However, we may have two vectors of data that are paired. So we may want to just provide those separately rather than using a data set and setting the formula. This can be demonstrated with the “iris” data.

```
= t_TOST(x = iris$Sepal.Length,
res3 y = iris$Sepal.Width,
paired = TRUE,
eqb = 1)
res3#>
#> Paired t-test
#>
#> The equivalence test was non-significant, t(149) = 22.32, p = 1
#> The null hypothesis test was significant, t(149) = 34.815p < 0.01
#> NHST: reject null significance hypothesis that the effect is equal to zero
#> TOST: don't reject null equivalence hypothesis
#>
#> TOST Results
#> t df p.value
#> t-test 34.82 149 < 0.001
#> TOST Lower 47.31 149 < 0.001
#> TOST Upper 22.32 149 1
#>
#> Effect Sizes
#> Estimate SE C.I. Conf. Level
#> Raw 2.786 0.08002 [2.6536, 2.9184] 0.9
#> Hedges's g(z) 2.828 0.18257 [2.5252, 3.1244] 0.9
#> Note: SMD confidence intervals are an approximation. See vignette("SMD_calcs").
= simple_htest(
res3a x = iris$Sepal.Length,
y = iris$Sepal.Width,
paired = TRUE,
mu = 1,
alternative = "e"
)
res3a#>
#> Paired t-test
#>
#> data: x and y
#> t = 22.319, df = 149, p-value = 1
#> alternative hypothesis: equivalence
#> null values:
#> mean difference mean difference
#> -1 1
#> 90 percent confidence interval:
#> 2.653551 2.918449
#> sample estimates:
#> mean difference
#> 2.786
```

We may want to perform a Minimal Effect Test with the
`hypothesis`

argument set to “MET”.

```
= t_TOST(x = iris$Sepal.Length,
res_met y = iris$Sepal.Width,
paired = TRUE,
hypothesis = "MET",
eqb = 1,
smd_ci = "goulet")
res_met#>
#> Paired t-test
#>
#> The minimal effect test was significant, t(149) = 47.31, p < 0.01
#> The null hypothesis test was significant, t(149) = 34.815p < 0.01
#> NHST: reject null significance hypothesis that the effect is equal to zero
#> TOST: reject null MET hypothesis
#>
#> TOST Results
#> t df p.value
#> t-test 34.82 149 < 0.001
#> TOST Lower 47.31 149 1
#> TOST Upper 22.32 149 < 0.001
#>
#> Effect Sizes
#> Estimate SE C.I. Conf. Level
#> Raw 2.786 0.08002 [2.6536, 2.9184] 0.9
#> Hedges's g(z) 2.835 0.25311 [2.5719, 3.1284] 0.9
#> Note: SMD confidence intervals are an approximation. See vignette("SMD_calcs").
= simple_htest(x = iris$Sepal.Length,
res_metb y = iris$Sepal.Width,
paired = TRUE,
mu = 1,
alternative = "minimal.effect")
res_metb#>
#> Paired t-test
#>
#> data: x and y
#> t = 22.319, df = 149, p-value < 2.2e-16
#> alternative hypothesis: minimal.effect
#> null values:
#> mean difference mean difference
#> -1 1
#> 90 percent confidence interval:
#> 2.653551 2.918449
#> sample estimates:
#> mean difference
#> 2.786
```

A description of the results can also be produced with the
`describe`

or `describe_htest`

method and function
respectively.

```
describe(res_met)
describe_htest(res_metb)
```

Using the Paired t-test, a null hypothesis significance test (NHST), and a minimal effect test, via two one-sided tests (TOST), were performed with an alpha-level of 0.05. These tested the null hypotheses that true mean difference is equal to 0 (NHST), and true mean difference is greater than -1 or less than 1 (TOST). The minimal effect test was not significant (p = 1). The NHST was significant, t(149) = 34.815, p < 0.001 (mean difference = 2.786 90% C.I.[2.654, 2.918]; Hedges’s g(z) = 2.835 90% C.I.[2.572, 3.128]). At the desired error rate, it can be stated that the true mean difference is not equal to 0 (i.e., no minimal effect).

The Paired t-test is statistically significant (t(149) = 22.319, p < 0.001, mean difference = 2.786, 90% C.I.[2.654, 2.918]) at a 0.05 alpha-level. The null hypothesis can be rejected. At the desired error rate, it can be stated that the true mean difference is less than -1 or greater than 1.

In other cases we may just have a one sample test. If that is the
case all we have to do is supply the `x`

argument for the
data. For this test we may hypothesis that the mean of Sepal.Length is
not more than 5.5 points greater or less than 8.5.

```
= t_TOST(x = iris$Sepal.Length,
res4 hypothesis = "EQU",
eqb = c(5.5,8.5),
smd_ci = "goulet")
res4#>
#> One Sample t-test
#>
#> The equivalence test was significant, t(149) = 5.08, p < 0.01
#> The null hypothesis test was significant, t(149) = 86.425p < 0.01
#> NHST: reject null significance hypothesis that the effect is equal to zero
#> TOST: reject null equivalence hypothesis
#>
#> TOST Results
#> t df p.value
#> t-test 86.425 149 < 0.001
#> TOST Lower 5.078 149 < 0.001
#> TOST Upper -39.293 149 < 0.001
#>
#> Effect Sizes
#> Estimate SE C.I. Conf. Level
#> Raw 5.843 0.06761 [5.7314, 5.9552] 0.9
#> Hedges's g 7.021 0.42002 [6.4067, 7.7882] 0.9
#> Note: SMD confidence intervals are an approximation. See vignette("SMD_calcs").
```

In some cases you may only have access to the summary statistics.
Therefore, we created a function, `tsum_TOST`

, to perform the
same tests just based on the summary statistics. This involves providing
the function with a number of different arguments.

`n1 & n2`

the sample sizes (only n1 needs to be provided for one sample case)`m1 & m2`

the sample means`sd1 & sd2`

the sample standard deviation`r12`

the correlation between the paired samples; only needed if`paired`

is set to TRUE

The results from above can be replicated with the
`tsum_TOST`

```
= tsum_TOST(
res_tsum m1 = mean(iris$Sepal.Length, na.rm=TRUE),
sd1 = sd(iris$Sepal.Length, na.rm=TRUE),
n1 = length(na.omit(iris$Sepal.Length)),
hypothesis = "EQU",
eqb = c(5.5,8.5)
)
res_tsum#>
#> One-sample t-Test
#>
#> The equivalence test was significant, t(149) = 5.078, p = 5.62e-07
#> The null hypothesis test was significant, t(149) = 86.425, p = 3.33e-129
#> NHST: reject null significance hypothesis that the effect is equal to zero
#> TOST: reject null equivalence hypothesis
#>
#> TOST Results
#> t df p.value
#> t-test 86.425 149 < 0.001
#> TOST Lower 5.078 149 < 0.001
#> TOST Upper -39.293 149 < 0.001
#>
#> Effect Sizes
#> Estimate SE C.I. Conf. Level
#> Raw 5.843 0.06761 [5.7314, 5.9552] 0.9
#> Hedges's g 7.021 0.41350 [6.327, 7.6914] 0.9
#> Note: SMD confidence intervals are an approximation. See vignette("SMD_calcs").
```

`plot(res_tsum)`

```
describe(res_tsum)
#> [1] "Using the One-sample t-Test, a null hypothesis significance test (NHST), and a equivalence test, via two one-sided tests (TOST), were performed with an alpha-level of 0.05. These tested the null hypotheses that true mean is equal to 0 (NHST), and true mean is more extreme than 5.5 and 8.5 (TOST). The equivalence test was significant, t(149) = 5.078, p < 0.001 (mean = 5.843 90% C.I.[5.731, 5.955]; Hedges's g = 7.021 90% C.I.[6.327, 7.691]). At the desired error rate, it can be stated that the true mean is between 5.5 and 8.5."
```

We also created `power_t_TOST`

to allow for power
calculations for TOST analyses that utilize t-tests. This function uses
a more accurate method than the older functions in TOSTER and match the
results of the commercially available PASS software. The exact
calculations of power are based on Owen’s Q-function or by direct
integration of the bivariate non-central t-distribution^{1}. Approximate power is
implemented via the non-central t-distribution or the ‘shifted’ central
t-distribution Diletti, Hauschke, and Steinijans
(1992). The function is limited to power analyses involves one
sample, two sample, and paired sample cases. More options are available
in the `PowerTOST`

R package.

The interface for this function is quite simple and was intended to
mimic the base R function `power.t.test`

. The user must
specify the 2 equivalence bounds, and leave only one of the other
options blank (`alpha`

, `power`

, or
`n`

). The “true difference” can be set with
`delta`

and the standard deviation (default is 1) can be set
with the `sd`

argument. Once everything is set and the
function is run, a object of the `power.htest`

class will be
returned.

As an example, let’s say we are looking at an equivalence study where
we assume the *true* difference is *at least* 1 unit, the
standard deviation is 2.5, and we set the equivalence bounds to 2.5
units as well. If we want to find the sample size adequate to have 95%
power at an alpha of 0.025 we enter the following:

```
power_t_TOST(n = NULL,
delta = 1,
sd = 2.5,
eqb = 2.5,
alpha = .025,
power = .95,
type = "two.sample")
#>
#> Two-sample TOST power calculation
#>
#> power = 0.95
#> beta = 0.05
#> alpha = 0.025
#> n = 73.16747
#> delta = 1
#> sd = 2.5
#> bounds = -2.5, 2.5
#>
#> NOTE: n is number in *each* group
```

From the analysis above we would conclude that adequate power is achieved with 74 participants per group and 148 participants in total.

Diletti, E, D Hauschke, and VW Steinijans. 1992. “Sample Size
Determination for Bioequivalence Assessment by Means of Confidence
Intervals.” *International Journal of Clinical Pharmacology,
Therapy, and Toxicology* 30 Suppl 1: S51—8. http://europepmc.org/abstract/MED/1601532.

Labes, Detlew, Helmut Schütz, and Benjamin Lang. 2021. *PowerTOST:
Power and Sample Size for (Bio)equivalence Studies*. https://CRAN.R-project.org/package=PowerTOST.

Phillips, Kem F. 1990. “Power of the Two One-Sided Tests Procedure
in Bioequivalence.” *Journal of Pharmacokinetics and
Biopharmaceutics* 18 (2): 137–44. https://doi.org/10.1007/bf01063556.

Inspired by Labes, Schütz, and Lang (2021) in the

`PowerTOST`

R package. Please see this package for more options↩︎