--- title: "Basic Examples" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Basic Examples} %\VignetteEncoding{UTF-8} %\VignetteEngine{knitr::rmarkdown} bibliography: references.bib editor_options: chunk_output_type: console --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ```{r setup} library(xactonomial) ``` We consider the following problem. Let $T_j$ be distributed $\mbox{Multinomial}(n_j, d_j, \theta_j)$ for $j = 1, \ldots, k$, where $n_j$ is the number of trials, $d_j$ the number of mutually exclusive categories, and $\theta_j$ the vector of $d_j$ probabilities. Denote $\boldsymbol{T} = (T_1, \ldots, T_k)$ and $\boldsymbol{\theta} = (\theta_1, \ldots, \theta_k)$. Suppose one is interested in the parameter $\psi \equiv \tau(\boldsymbol{\theta}) \in \Psi$ where $\Psi$ is a subset of the real line $\mathbb{R}$. Suppose we observe a sample $\boldsymbol{X} = (X_1, \ldots, X_k)$ which is a vector of counts corresponding to the random variable $\boldsymbol{T}$. Let $G(\boldsymbol{X})$ denote a real-valued statistic that defines an ordering of the sample space. A sensible default statistic in this setting is to apply the function defining the parameter of interest to the sample proportions. If the function $\tau()$ is complex, or the variance of $\psi(\hat{\boldsymbol{\theta}})$ is otherwise difficult to calculate, then one may consider using the nonparametric bootstrap for inference [@efron1979bootstrap]. However, the bootstrap may perform poorly in small samples, e.g., not having the correct coverage or type I error [@bickel1981some]. Instead we propose a computational approach for computing p values and confidence intervals in this setting. Here is a simple illustration of the problem. Let \[ \psi = \tau(\theta_1, \theta_2) = \sum \sqrt(\theta_1, \theta_2), \] which is similar to the Bhattacharyya coefficient [@sankhya1946measure]. Suppose we have two samples $T_1$ and $T_2$ each from independent multinomials with dimension 4. The following R code allows us to sample the data and compute $\psi$. ```{r} true_theta <- c(.45, .15, .3, .1, .05, .15, .4, .4) sample_data <- function(n) { T1 <- rmultinom(1, n[1], prob = true_theta[1:4]) T2 <- rmultinom(1, n[2], prob = true_theta[5:8]) list(T1 = c(T1), T2 = c(T2)) } psi_bc <- function(theta) { theta1 <- theta[1:4] theta2 <- theta[5:8] sum(sqrt(theta1 * theta2)) } psi_bc(true_theta) ``` We can do the bootstrap, but it actually performs quite poorly in this situation. We are going to define the same psi function but vectorized this time. This improves the speed of xactonomial. ```{r} psi_bc_v <- function(theta) { theta1 <- theta[,1:4, drop = FALSE] theta2 <- theta[,5:8, drop = FALSE] rowSums(sqrt(theta1 * theta2)) } ``` ```{r} data <- sample_data(n = c(10, 12)) data results <- xactonomial(data, psi_bc_v, psi_limits = c(0,1), conf_int = TRUE, psi0 = .5, maxit = 50, chunksize = 100, ga_restart_every = 10, itp_eps = .01) results ``` ## Maximum of parameters In this example, $\tau(\theta) = \max(\theta)$. When two of the parameters are equal, the function is nondifferentiable, so bootstrap and other asymptotic methods will fail while our method still works. ```{r} psi_causal <- function(pp) { max(pp) } true_psi_causal <- psi_causal(c(.4, .4, .2)) sample_data2 <- function(n) { list(rmultinom(1, n, prob = c(.4, .4, .2))[, 1]) } set.seed(421) data <- sample_data2(n = c(60)) data results <- xactonomial(data, psi_causal, psi_limits = c(1/3,1), psi0 = .55, maxit = 100, chunksize = 1000, itp_eps = .01) results ``` In this example, we see that the lower confidence limit it close to the boundary of the psi space. It is possible that the lower limit should be equal to the boundary, but the root finding algorithm was too eager. We can improve performance by providing the argument `p_value_limits` which should be the vector of 2 p-values, the first being for the test of the null at the lower boundary of psi and the second at the upper boundary. If those p-values are greater than $\alpha / 2$, then no root exists and the confidence limit should instead be the boundary. First we calculate the exact p-value for the test of `psi0 <= 1/3` by providing `theta_null_points`. The upper p-value is clearly less than $\alpha / 2$. ```{r} p.ll <- xactonomial(data, psi_causal, psi_limits = c(1/3,1), psi0 = 1/3, conf_int = FALSE, itp_eps = .01, theta_null_points = t(rep(1/3, 3)), alternative = "greater")$p.value xactonomial(data, psi_causal, psi_limits = c(1/3,1), psi0 = 1, conf_int = FALSE, itp_eps = .01, theta_null_points = rbind(c(1, 0, 0), c(0,1,0), c(0,0,1)), alternative = "less")$p.value ``` Then we provide the vector of those 2 p-values to the function. They do not need to be precise, the important thing is when one of them is greater than $\alpha / 2$. ```{r} xactonomial(data, psi_causal, psi_limits = c(1/3,1), p_value_limits = c(p.ll, 1e-8), maxit = 100, chunksize = 1000, itp_eps = .01) ``` Now you can see we get exactly 1/3 for the lower confidence limit. ## References