vathy.kamulete@rbc.com

Please see the paper as cited
below (Kamulete 2022) for details. We
denote the R package as `dsos`

, to avoid confusion with
`D-SOS`

, the method.

We show how easy it is to implement `D-SOS`

for a
particular notion of outlyingness. Suppose we want to test for no
adverse shift based on isolation scores in the context of two-sample
comparison. To do so, we need two main ingredients: a scoring function
and a method to compute the \(p-\)value.

First, the scores are obtained using predictions from isolation
forest with the `isotree`

package (Cortes 2020). Isolation forest detects
*isolated* points, instances that are typically
out-of-distribution relative to the high-density regions of the data
distribution. Naturally, any performant method for density-based
out-of-distribution detection can effectively be used to achieve the
same goal. The function `score_od`

shows the implementation
of one such scoring function in the `dsos`

package.

```
dsos::score_od
## function (x_train, x_test, n_trees = 500L, threshold = 0.6)
## {
## if (!requireNamespace("isotree", quietly = TRUE)) {
## stop("Package \"isotree\" must be installed to use this function.",
## call. = FALSE)
## }
## iso_fit <- isotree::isolation.forest(data = x_train, ntrees = n_trees)
## os_train <- predict(iso_fit, newdata = x_train)
## os_test <- predict(iso_fit, newdata = x_test)
## os_train[os_train < threshold] <- threshold
## os_test[os_test < threshold] <- threshold
## return(list(test = os_test, train = os_train))
## }
## <bytecode: 0x0000013a865e5a30>
## <environment: namespace:dsos>
```

Second, we estimate the empirical null distribution for the \(p-\)value via permutations. For speed, this
is implemented as a sequential Monte Carlo test with the
`simctest`

package (Gandy
2009). The function `pt_refit`

in the
`dsos`

package combines scoring with inference. The prefix
*pt* stands for permutation test. The code for `_pt_`

is relatively straightforward.

`dsos`

may use sample splitting and out-of-bag variants as
alternatives to permutations for \(p-\)value calculation. Both sample
splitting and out-of-bag variants use the asymptotic null distribution
for the test statistic. As a result, they can be appreciably faster than
inference based on permutations.

```
dsos::pt_refit
## function (x_train, x_test, scorer, n_pt = 2000)
## {
## result <- exchangeable_null(x_train, x_test, scorer = scorer,
## n_pt = n_pt, is_oob = FALSE)
## return(result)
## }
## <bytecode: 0x0000013a8679efa8>
## <environment: namespace:dsos>
```

Take the `iris`

dataset for example. When the training set only consists of
`setosa`

(flower species) and the test set, only of
`versicolor`

, the data is incompatible with the null of no
adverse shift. In other words, we have strong evidence that the test
contains a disproportionate number of outliers, if the training set is
the reference distribution.

```
set.seed(12345)
data(iris)
x_train <- iris[1:50,1:4] # Training sample: Species == 'setosa'
x_test <- iris[51:100,1:4] # Test sample: Species == 'versicolor'
iris_test <- pt_refit(x_train, x_test, score = dsos::score_od)
plot(iris_test)
```

You can plug in your own scores in this framework. Those already
implemented in `dsos`

can be useful but they are by means the
only ones. If you favour a different method for out-of-distribution
(outlier) detection, want to tune the hyperparameters, or choose a
different notion of outlyingness altogether, `dsos`

provides
the building blocks to build your own. The workhorse function, powering
the approach behind the scenes, is a way to calculate the test statistic
from outlier scores (see `wauc_from_os`

).

Cortes, David. 2020. *Isotree: Isolation-Based Outlier
Detection*. https://CRAN.R-project.org/package=isotree.

Gandy, Axel. 2009. “Sequential Implementation of Monte Carlo Tests
with Uniformly Bounded Resampling Risk.” *Journal of the
American Statistical Association* 104 (488): 1504–11.

Kamulete, Vathy M. 2022. “Test for Non-Negligible Adverse
Shifts.” In *The 38th Conference on Uncertainty in Artificial
Intelligence*. https://openreview.net/forum?id=S5UG2BLi9xc.