To demonstrate how to use efficiency augmentation and the propensity
score utilities available in the `personalized`

package, we
simulate data with two treatments. The treatment assignments are based
on covariates and hence mimic an observational setting with no
unmeasured confounders.

`library(personalized)`

In this simulation, the treatment assignment depends on covariates and hence we must model the propensity score \(\pi(x) = Pr(T = 1 | X = x)\). In this simulation we will assume that larger values of the outcome are better.

```
library(personalized)
set.seed(1)
<- 500
n.obs <- 10
n.vars <- matrix(rnorm(n.obs * n.vars, sd = 3), n.obs, n.vars)
x
# simulate non-randomized treatment
<- 0.5 + 0.25 * x[,9] - 0.25 * x[,1]
xbetat <- exp(xbetat) / (1 + exp(xbetat))
trt.prob <- rbinom(n.obs, 1, prob = trt.prob)
trt
# simulate delta
<- (0.5 + x[,2] - 0.5 * x[,3] - 1 * x[,1] + 1 * x[,1] * x[,4] )
delta
# simulate main effects g(X)
<- 2 * x[,1] + 3 * x[,4] - 0.25 * x[,2]^2 + 2 * x[,3] + 0.25 * x[,5] ^ 2
xbeta <- xbeta + delta * (2 * trt - 1)
xbeta
# simulate continuous outcomes
<- drop(xbeta) + rnorm(n.obs, sd = 3) y
```

Estimation of the propensity score is a fundamental aspect of the
estimation of individualized treatment rules (ITRs). The
`personalized`

package offers support tools for construction
of the propensity score function used by the `fit.subgroup()`

function. The support is via the
`create.propensity.function()`

function. This tool allows for
estimation of the propensity score in high dimensions via
`glmnet`

. In high dimensions it can be important to account
for regularization bias via cross-fitting (https://arxiv.org/abs/1608.00060); the
`create.propensity.function()`

offers a cross-fitting
approach for high-dimensional propensity score estimation. A basic usage
of this function with cross-fitting (with 4 folds; normally we would set
this larger, but have reduced it to make computation time shorter) is as
follows:

```
# arguments can be passed to cv.glmnet via `cv.glmnet.args`
<- create.propensity.function(crossfit = TRUE,
prop.func nfolds.crossfit = 4,
cv.glmnet.args = list(type.measure = "auc", nfolds = 3))
```

`prop.func`

can then be passed to
`fit.subgroup()`

as follows:

We have set `nfolds`

to 3 for computational reasons; it
should generally be higher, such as 10.

```
<- fit.subgroup(x = x, y = y,
subgrp.model trt = trt,
propensity.func = prop.func,
loss = "sq_loss_lasso",
nfolds = 3) # option for cv.glmnet (for ITR estimation)
summary(subgrp.model)
```

```
## family: gaussian
## loss: sq_loss_lasso
## method: weighting
## cutpoint: 0
## propensity
## function: propensity.func
##
## benefit score: f(x),
## Trt recom = 1*I(f(x)>c)+0*I(f(x)<=c) where c is 'cutpoint'
##
## Average Outcomes:
## Recommended 0 Recommended 1
## Received 0 8.2176 (n = 136) -12.7821 (n = 69)
## Received 1 -1.7832 (n = 143) -0.4186 (n = 152)
##
## Treatment effects conditional on subgroups:
## Est of E[Y|T=0,Recom=0]-E[Y|T=/=0,Recom=0]
## 10.0008 (n = 279)
## Est of E[Y|T=1,Recom=1]-E[Y|T=/=1,Recom=1]
## 12.3635 (n = 221)
##
## NOTE: The above average outcomes are biased estimates of
## the expected outcomes conditional on subgroups.
## Use 'validate.subgroup()' to obtain unbiased estimates.
##
## ---------------------------------------------------
##
## Benefit score quantiles (f(X) for 1 vs 0):
## 0% 25% 50% 75% 100%
## -14.3195 -3.7348 -0.6998 2.0439 13.0446
##
## ---------------------------------------------------
##
## Summary of individual treatment effects:
## E[Y|T=1, X] - E[Y|T=0, X]
##
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -28.639 -7.470 -1.400 -1.717 4.088 26.089
##
## ---------------------------------------------------
##
## 4 out of 10 interactions selected in total by the lasso (cross validation criterion).
##
## The first estimate is the treatment main effect, which is always selected.
## Any other variables selected represent treatment-covariate interactions.
##
## Trt1 V1 V2 V3 V8
## Estimate -0.651 -1.0653 0.834 -0.4833 0.1437
```

Efficiency in estimating ITRs can be improved by including an
augmentation term. The optimal augmentation term generally is a function
of the main effects of the full outcome regression model marginalized
over the treatment. Especially in high dimensions, regularization bias
can potentially have a negative impact on performance. Cross-fitting is
again another reasonable approach to circumventing this issue.
Augmentation functions can be constructed (with cross-fitting as an
option) via the `create.augmentation.function()`

function,
which works similarly as the `create.propensity.function()`

function. The basic usage is as follows:

```
<- create.augmentation.function(family = "gaussian",
aug.func crossfit = TRUE,
nfolds.crossfit = 4,
cv.glmnet.args = list(type.measure = "mae", nfolds = 3))
```

We have set `nfolds`

to 3 for computational reasons; it
should generally be higher, such as 10.

`aug.func`

can be used for augmentation by passing it to
`fit.subgroup()`

like:

```
<- fit.subgroup(x = x, y = y,
subgrp.model.aug trt = trt,
propensity.func = prop.func,
augment.func = aug.func,
loss = "sq_loss_lasso",
nfolds = 3) # option for cv.glmnet (for ITR estimation)
summary(subgrp.model.aug)
```

```
## family: gaussian
## loss: sq_loss_lasso
## method: weighting
## cutpoint: 0
## augmentation
## function: augment.func
## propensity
## function: propensity.func
##
## benefit score: f(x),
## Trt recom = 1*I(f(x)>c)+0*I(f(x)<=c) where c is 'cutpoint'
##
## Average Outcomes:
## Recommended 0 Recommended 1
## Received 0 9.571 (n = 103) -7.8823 (n = 102)
## Received 1 -2.2994 (n = 120) 0.008 (n = 175)
##
## Treatment effects conditional on subgroups:
## Est of E[Y|T=0,Recom=0]-E[Y|T=/=0,Recom=0]
## 11.8704 (n = 223)
## Est of E[Y|T=1,Recom=1]-E[Y|T=/=1,Recom=1]
## 7.8903 (n = 277)
##
## NOTE: The above average outcomes are biased estimates of
## the expected outcomes conditional on subgroups.
## Use 'validate.subgroup()' to obtain unbiased estimates.
##
## ---------------------------------------------------
##
## Benefit score quantiles (f(X) for 1 vs 0):
## 0% 25% 50% 75% 100%
## -13.5302 -2.1872 0.6803 3.6881 13.3778
##
## ---------------------------------------------------
##
## Summary of individual treatment effects:
## E[Y|T=1, X] - E[Y|T=0, X]
##
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -27.060 -4.374 1.361 1.381 7.376 26.756
##
## ---------------------------------------------------
##
## 6 out of 10 interactions selected in total by the lasso (cross validation criterion).
##
## The first estimate is the treatment main effect, which is always selected.
## Any other variables selected represent treatment-covariate interactions.
##
## Trt1 V1 V2 V3 V5 V6 V8
## Estimate 0.9222 -0.9353 1.0194 -0.4164 -0.0945 -0.1404 0.118
```

We first run the training/testing procedure to assess the performance of the non-augmented estimator:

```
<- validate.subgroup(subgrp.model, B = 3,
valmod method = "training_test",
train.fraction = 0.75)
valmod
```

```
## family: gaussian
## loss: sq_loss_lasso
## method: weighting
##
## validation method: training_test_replication
## cutpoint: 0
## replications: 3
##
## benefit score: f(x),
## Trt recom = 1*I(f(x)>c)+0*I(f(x)<=c) where c is 'cutpoint'
##
## Average Test Set Outcomes:
## Recommended 0 Recommended 1
## Received 0 7.0026 (SE = 3.0607, n = 31) -13.9625 (SE = 6.6671, n = 15.6667)
## Received 1 -3.2555 (SE = 0.8747, n = 37) -0.9539 (SE = 0.5936, n = 41.3333)
##
## Treatment effects conditional on subgroups:
## Est of E[Y|T=0,Recom=0]-E[Y|T=/=0,Recom=0]
## 10.258 (SE = 3.5586, n = 68)
## Est of E[Y|T=1,Recom=1]-E[Y|T=/=1,Recom=1]
## 13.0086 (SE = 7.1043, n = 57)
##
## Est of
## E[Y|Trt received = Trt recom] - E[Y|Trt received =/= Trt recom]:
## 9.5398 (SE = 2.0051)
```

Then we compare with the augmented estimator. Although this is based on just 3 replications, we can see that the augmented estimator is better at discriminating between benefitting and non-benefitting patients, as evidenced by the large treatment effect among those predicted to benefit (and smaller standard error) by the augmented estimator versus the smaller conditional treatment effect above.

```
<- validate.subgroup(subgrp.model.aug, B = 3,
valmod.aug method = "training_test",
train.fraction = 0.75)
valmod.aug
```

```
## family: gaussian
## loss: sq_loss_lasso
## method: weighting
##
## validation method: training_test_replication
## cutpoint: 0
## replications: 3
##
## benefit score: f(x),
## Trt recom = 1*I(f(x)>c)+0*I(f(x)<=c) where c is 'cutpoint'
##
## Average Test Set Outcomes:
## Recommended 0
## Received 0 10.5794 (SE = 2.6567, n = 23.6667)
## Received 1 -4.9438 (SE = 4.4187, n = 31)
## Recommended 1
## Received 0 -10.6693 (SE = 5.1586, n = 25.6667)
## Received 1 -1.0748 (SE = 1.9236, n = 44.6667)
##
## Treatment effects conditional on subgroups:
## Est of E[Y|T=0,Recom=0]-E[Y|T=/=0,Recom=0]
## 15.5232 (SE = 1.8056, n = 54.6667)
## Est of E[Y|T=1,Recom=1]-E[Y|T=/=1,Recom=1]
## 9.5945 (SE = 5.7758, n = 70.3333)
##
## Est of
## E[Y|Trt received = Trt recom] - E[Y|Trt received =/= Trt recom]:
## 11.0473 (SE = 1.7645)
```