Getting Started with gptools in R

gptoolsStan is a minimal package to publish Stan code for efficient Gaussian process inference. The package can be used with the cmdstanr interface for Stan in R. Unfortunately, Rstan is not supported because it does not provide an option to specify include paths. If you’re already familiar with cmdstanr, dive in below. If not, have a look at the getting started guide for the cmdstanr interface.

This vignette demonstrates the package by sampling from a simple Gaussian process using Fourier methods (see the accompanying publication “Scalable Gaussian Process Inference with Stan” for background on the approach). Here is the model definition in Stan.

Here, we assume that cmdstanr is installed and that the cmdstan compiler is available. See here if you need help getting set up with cmdstanr. Let’s compile the model.

library(cmdstanr)
#> Warning: package 'cmdstanr' was built under R version 4.3.3
#> This is cmdstanr version 0.8.1
#> - CmdStanR documentation and vignettes: mc-stan.org/cmdstanr
#> - CmdStan path: /Users/till/.cmdstan/cmdstan-2.36.0
#> - CmdStan version: 2.36.0
library(gptoolsStan)

model <- cmdstan_model(
  stan_file="getting_started.stan",
  include_paths=gptools_include_path(),
)
#> Warning in '/private/var/folders/l5/cw82lksx6sd80pskw1npx_kh0000gn/T/RtmpwNXJb1/Rinst11ac0790a98/gptoolsStan/extdata/gptools/fft2.stan', line 174, column 5, included from
#> '/private/var/folders/l5/cw82lksx6sd80pskw1npx_kh0000gn/T/RtmpwNXJb1/Rinst11ac0790a98/gptoolsStan/extdata/gptools/fft.stan', line 2, column 0, included from
#> '/var/folders/l5/cw82lksx6sd80pskw1npx_kh0000gn/T/RtmpVY7hJ3/model-11af2635cfd1e.stan', line 5, column 4: Functions
#>     that end in _jacobian will change meaning in Stan 2.39. They will be used
#>     for the encapsulating usages of 'jacobian +=', and therefore not
#>     available to be called in all the same places as this function is now. To
#>     avoid any issues, please rename this function to not end in _jacobian.
#> Warning in '/private/var/folders/l5/cw82lksx6sd80pskw1npx_kh0000gn/T/RtmpwNXJb1/Rinst11ac0790a98/gptoolsStan/extdata/gptools/fft1.stan', line 83, column 5, included from
#> '/private/var/folders/l5/cw82lksx6sd80pskw1npx_kh0000gn/T/RtmpwNXJb1/Rinst11ac0790a98/gptoolsStan/extdata/gptools/fft.stan', line 1, column 0, included from
#> '/var/folders/l5/cw82lksx6sd80pskw1npx_kh0000gn/T/RtmpVY7hJ3/model-11af2635cfd1e.stan', line 5, column 4: Functions
#>     that end in _jacobian will change meaning in Stan 2.39. They will be used
#>     for the encapsulating usages of 'jacobian +=', and therefore not
#>     available to be called in all the same places as this function is now. To
#>     avoid any issues, please rename this function to not end in _jacobian.

As indicated in the data section of the Stan program, we need to define the number of elements n of the Gaussian process and the real fast Fourier transform (RFFT) of the covariance kernel cov_rfft. We’ll use 100 elements and a squared exponential covariance kernel which allows us to evaluate the RFFT directly.

n <- 100
length_scale <- n / 10
freq <- 1:(n %/% 2 + 1) - 1
# See appendix B of https://arxiv.org/abs/2301.08836 for details on the expression.
cov_rfft <- exp(- 2 * (pi * freq * length_scale / n) ^ 2) + 1e-9

fit <- model$sample(
  data=list(n=n, cov_rfft=cov_rfft),
  seed=123,
  chains=1,
  iter_warmup=1000,
  iter_sampling=5,
)
#> Running MCMC with 1 chain...
#> 
#> Chain 1 Iteration:    1 / 1005 [  0%]  (Warmup) 
#> Chain 1 Iteration:  100 / 1005 [  9%]  (Warmup) 
#> Chain 1 Iteration:  200 / 1005 [ 19%]  (Warmup) 
#> Chain 1 Iteration:  300 / 1005 [ 29%]  (Warmup) 
#> Chain 1 Iteration:  400 / 1005 [ 39%]  (Warmup) 
#> Chain 1 Iteration:  500 / 1005 [ 49%]  (Warmup) 
#> Chain 1 Iteration:  600 / 1005 [ 59%]  (Warmup) 
#> Chain 1 Iteration:  700 / 1005 [ 69%]  (Warmup) 
#> Chain 1 Iteration:  800 / 1005 [ 79%]  (Warmup) 
#> Chain 1 Iteration:  900 / 1005 [ 89%]  (Warmup) 
#> Chain 1 Iteration: 1000 / 1005 [ 99%]  (Warmup) 
#> Chain 1 Iteration: 1001 / 1005 [ 99%]  (Sampling) 
#> Chain 1 Iteration: 1005 / 1005 [100%]  (Sampling) 
#> Chain 1 finished in 7.9 seconds.
#> Warning: 5 of 5 (100.0%) transitions hit the maximum treedepth limit of 10.
#> See https://mc-stan.org/misc/warnings for details.

We extract the draws from the fit object and plot a realization of the process.

f <- fit$draws("f", format="draws_matrix")
plot(1:n, f[1,], type="l", xlab="covariate x", ylab="Gaussian process f(x)")

This vignette illustrates the use of gptools with an elementary example. Further details can be found in the more comprehensive documentation although using the cmdstanpy interface.