---
title: "Custom CI tests for causal discovery algorithms"
vignette: >
  %\VignetteIndexEntry{Custom CI tests into causal discovery algorithms}
  %\VignetteEngine{quarto::html}
  %\VignetteEncoding{UTF-8}
knitr:
  opts_chunk:
    collapse: true
    comment: "#>"
    fig.align: "center"
---

```{r}
#| label: setup
library(causalDisco)
```

Please note, custom tests are experimental and subject to change to improve
usability and consistency across engines. We welcome feedback on the current
interface and suggestions for improvement.

This vignette illustrates how to use custom conditional independence tests and
custom scores with the `disco()` function in causalDisco. We show how to define
user-specified tests and scores, and how to run causal discovery using these
custom functions.

Note that Tetrad-based algorithms do not currently support custom CI tests, so
this vignette will only cover the engines bnlearn, causalDisco, and pcalg.

# Custom Tests

The interface for custom tests depends slightly on the engine being used.
causalDisco and pcalg use the exact same interface, while bnlearn is slightly
different for now. We plan to unify the interface across engines in a future
release.

## causalDisco and pcalg engines

A custom tests takes the signature
`function(x, y, conditioning_set, suff_stat, args)`, (`args` is an optional).
`x` and `y` are the variables being tested for independence, `conditioning_set`
is the conditioning set, and `suff_stat` is a list of sufficient statistics
needed for the test (e.g. correlation matrix and sample size). The function
should return a p-value indicating whether `x` and `y` are conditionally
independent given `conditioning_set`.

Of course the `suff_stat` argument could just be the raw data, but for
efficiency reasons, it is often better to precompute some sufficient statistics.

An example of a custom CI test is a partial correlation test based on the Fisher
Z-transform, which can be implemented as follows:

```{r}
#| label: my_custom_test
my_test <- function(x, y, conditioning_set, suff_stat) {
  C <- suff_stat$C
  n <- suff_stat$n

  vars <- c(x, y, conditioning_set)
  C_sub <- C[vars, vars, drop = FALSE]
  K <- solve(C_sub)
  r <- -K[1, 2] / sqrt(K[1, 1] * K[2, 2])
  z <- 0.5 * log((1 + r) / (1 - r))

  stat <- sqrt(n - length(conditioning_set) - 3) * abs(z)

  pval <- 2 * (1 - pnorm(stat))

  pval
}

my_suff_stat <- function(data) {
  list(
    C = cor(data),
    n = nrow(data)
  )
}
```

This can then be used with the causalDisco engine as follows:

```{r}
#| label: using_my_custom_test_causalDisco
data(num_data)

my_tpc <- tpc(
  engine = "causalDisco",
  test = my_test,
  alpha = 0.05,
  suff_stat_fun = my_suff_stat
)
result <- disco(data = num_data, method = my_tpc)
plot(result)
```

or the pcalg engine as follows:

```{r}
#| label: using_my_custom_test_pcalg
my_pc <- pc(
  engine = "pcalg",
  test = my_test,
  alpha = 0.05,
  suff_stat_fun = my_suff_stat
)
result <- disco(data = num_data, method = my_pc)
plot(result)
```

If the custom test requires additional arguments, then the function signature
should include an `args` parameter, i.e. be
`function(x, y, conditioning_set, suff_stat, args)`, and the additional
arguments can be passed as a list to the `args` parameter when defining the
method.

## bnlearn

For bnlearn the signature should be the same, but you must pass `suff_stat/data`
directly in the function, since bnlearn does not have a way to precompute
sufficient statistics and pass them to the test function.

When defining a custom test for `bnlearn`, you can use either `conditioning_set`
and `suff_stat` **or** `z` and `data`, which are the argument names bnlearn
expects.

The test function should return either:

- A single numeric value (the p-value), **or**
- A numeric vector of length 2, where the **second element is the p-value**.

bnlearn requires the first element (the test statistic), but it is not used. If
you return only the p-value, a dummy statistic will be automatically injected.

```{r}
#| label: my_custom_test_bnlearn
my_test_bnlearn <- function(x, y, conditioning_set, suff_stat, args = list()) {
  not_used <- args$not_used
  C <- cor(suff_stat)
  n <- nrow(suff_stat)

  vars <- c(x, y, conditioning_set)
  C_sub <- C[vars, vars, drop = FALSE]
  K <- solve(C_sub)
  r <- -K[1, 2] / sqrt(K[1, 1] * K[2, 2])
  z <- 0.5 * log((1 + r) / (1 - r))

  stat <- sqrt(n - length(conditioning_set) - 3) * abs(z)

  pval <- 2 * (1 - pnorm(stat))

  pval
}

my_pc <- pc(
  engine = "bnlearn",
  test = my_test_bnlearn,
  alpha = 0.05,
  args = list(not_used = "Example of passing additional arguments")
)
result <- disco(data = num_data, method = my_pc)
plot(result)
```

# Custom Scores

Not yet implemented, but will be added in a future release. Stay tuned for
updates on this feature!