1 Summary

The aim of this vignette is to present the features of the EasyABC package. Section Overview of the package EasyABC describes the different algorithms available in the package. Section Installation and requirements details how to install the package and the formatting requirements. Sections The toy model and The trait model present two detailed worked examples.

2 Overview of the package EasyABC

Four main types of ABC schemes are available in EasyABC: the standard rejection algorithm of Pritchard et al. (1999), sequential schemes first proposed by Sisson et al. (2007), coupled to MCMC sequential schemes first proposed by Marjoram et al. (2003), and a Simulated Annealing algorithm (SABC) suggested in Albert et al. (2014). Four different sequential algorithms are available: the ones of Beaumont et al. (2009), Drovandi and Pettitt (2011), Del Moral et al. (2012) and Lenormand et al. (2012). Four different MCMC schemes are available: the ones of Marjoram et al. (2003), Wegmann et al. (2009a), a modification of Marjoram et al. (2003)’s algorithm in which the tolerance and proposal range are determined by the algorithm, following the modifications of Wegmann et al. (2009a). Details on how to implement these various algorithms with EasyABC are given in the manual pages of each function and two examples are detailed in Sections The toy model and The trait model. We provide below a short presentation of each implemented algorithm.

2.1 The standard rejection algorithm of Pritchard et al. (1999)

This sampling scheme consists in drawing the model parameters in the prior distributions, in using these model parameter values to launch a simulation and in repeating this two-step procedure nb_simul times. At the end of the nb_simul simulations, the simulations closest to the target (or at a distance smaller than a tolerance threshold) in the space of the summary statistics are retained to form an approximate posterior distribution of the model parameters.

2.2 Sequential algorithms

Sequential algorithms for ABC have first been proposed by Sisson et al. (2007). These algorithms aim at reducing the required number of simulations to reach a given quality of the posterior approximation. The underlying idea of these algorithms is to spend more time in the areas of the parameter space where simulations are frequently close to the target. Sequential algorithms consist in a first step of standard rejection ABC, followed by a number of steps where the sampling of the parameter space is not anymore performed according to the prior distributions of parameter values. Various ways to perform this biased sampling have been proposed, and four of them are implemented in the package EasyABC.

2.3 Coupled to MCMC algorithms

The idea of ABC-MCMC algorithms proposed by Marjoram et al. (2003) is to perform a Metropolis-Hastings algorithm to explore the parameter space, and in replacing the step of likelihood ratio computation by simulations of the model. The original algorithm of Marjoram et al. (2003) is implemented in the method “Marjoram_original” in EasyABC. Wegmann et al. (2009) later proposed a number of improvements to the original scheme of Marjoram et al. (2003): they proposed to perform a calibration step so that the algorithm automatically determines the tolerance threshold, the scaling of the summary statistics and the scaling of the jumps in the parameter space during the MCMC. These improvements have been implemented in the method “Marjoram”. Wegmann et al. (2009) also proposed additional modifications, among which a PLS transformation of the summary statistics. The complete Wegmann et al. (2009)’s algorithm is implemented in the method “Wegmann”.

2.4 Simulated annealing

Inspired by Simulated Annealing algorithms used for optimization, the SABC algorithm from Albert et al. (2014) propagates an ensemble of particles in the product space of parameters and model outputs and continuously lowers the tolerance between model outputs and the data so that the parameter marginal converges to the posterior. The tolerance is lowered adaptively so as to minimize entropy production, which serves as a measure for computational waste. In EasyABC, SABC is implemented in the function SABC.

3 Installation and requirements

3.1 Installing the package

A version of R greater than or equal to 2.15.0 is required. The package has been tested on Windows 32 and Linux, but not on Mac. To install the EasyABC package from R, simply type:

install.packages("EasyABC")

Once the package is installed, it needs to be loaded in the current R session to be used:

For online help on the package content, simply type:

help(package="EasyABC")

For online help on a particular command (such as the function ABC_sequential), simply type:

help(ABC_sequential)

3.2 The simulation code - for use on a single core

Users need to develop a simulation code with minimal compatibility constraints. The code can either be a R function or a binary executable file.

If the code is a R function, its argument must be a vector of parameter values and it must return a vector of summary statistics. If the option use_seed=TRUE is chosen, the first parameter value passed to the simulation code corresponds to the seed value to be used by the simulation code to initialize the pseudo-random number generator. The following parameters are the model parameters.

If the code is a binary executable file, it needs to read the parameter values in a file named ‘input’ in which each line contains one parameter value, and to output the summary statistics in a file named ‘output’ in which each summary statistics must be separated by a space or a tabulation.

If the code is a binary executable file, a wrapper R function named ‘binary_model’ is available to interface the executable file with the R functions of the EasyABC package (see section The toy model below).

Alternatively, users may prefer building a R function calling their binary executable file. A short tutorial is provided in this section to call a C/C++ program.

NB: Currently, SABC ignores the use_seed option and requires a function, whose first argument is the parameter vector.

3.3 The simulation code - for use with multiple cores

Users need to develop a simulation code with minimal compatibility constraints. The code can either be a R function or a binary executable file.

If the code is a R function, its argument must be a vector of parameter values and it must return a vector of summary statistics. The first parameter value passed to the simulation code corresponds to the seed value to be used by the simulation code to initialize the pseudo-random number generator. The following parameters are the model parameters. This means that the option use_seed must be turned to TRUE when using EasyABC with multiple cores.

If the code is a binary executable file, it needs to have as its single argument a positive integer k. It has to read the parameter values in a file named ‘inputk’ (where k is the integer passed as argument to the binary code: ‘input1’, ‘input2’…) in which each line contains one parameter value, and to output the summary statistics in a file named ‘outputk’ (where k is the integer passed as argument to the binary code: ‘output1’, ‘output2’…) in which each summary statistics must be separated by a space or a tabulation. This construction avoids multiple cores to read/write in the same files. If the code is a binary executable file, a wrapper R function named ‘binary_model_cluster’ is available to interface the executable file with the R functions of the EasyABC package (see section The trait model below).

Alternatively, users may prefer building a R function calling their binary executable file. A short tutorial is provided in this section to call a C/C++ program.

NB: Currently, SABC does currently not support the usage of multiple cores.

3.4 Management of pseudo-random number generators

To insure that stochastic simulations are independent, the simulation code must either possess an internal way of initializing the seeds of its pseudo-random number generators each time the simulation code is launched.

This can be achieved for instance by initializing the seed to the clock value. It is often desirable though to have a way to re-run some analyses with similar seed values. EasyABC offers this possibility by default with the default option use_seed=TRUE,seed_count=0 where seed_count can be any integer number.

If this option is chosen, a seed value is provided in the input file as a first (additional) parameter, and incremented by 1 at each call of the simulation code.

This means that the simulation code must be designed so that the first parameter is a seed initializing value. In the worked example (Section The trait model), the simulation code trait_model makes use of this package option, and in the first example (Section The toy model), the way this option can be used with a simple R function is demonstrated.

NB: Note that when using multicores with the package functions (n_cluster=x with x larger than 1), the option use_seed=TRUE is forced, since the seed value is also used to distribute the tasks to each core.

3.5 Encoding the prior distributions

A list encoding the prior distributions used for each model parameter must be supplied by the user. Each element of the list corresponds to a model parameter and can be defined in two ways:

  1. By using predefined prior distributions. In this case, the list element must be a vector whose first argument determines the type of prior distribution followed by the argument of the distribution function, possible values are:
    • “unif” for a uniform distribution on a segment, followed by two numbers the minimum and maximum values of the uniform distribution
    • “normal” for a normal distribution, followed by two numbers the mean and standard deviation of the normal distribution
    • “lognormal” for a lognormal distribution, followed by two numbers: the mean and standard deviation on the log scale of the lognormal distribution
    • “exponential” for an exponential distribution, followed by one number: the rate of the exponential distribution
my_prior=list(c("unif",0,1),c("normal",1,2))

NB: Note that a fixed variable can be passed to the simulation code by choosing for this fixed variable a uniform prior distribution and a trivial range (with equal lower and upper bounds). The EasyABC methods will not work properly if these fixed variables are passed with other types of prior distributions (like a normal distribution with a standard deviation equal to zero).

  1. By providing the user-defined sampling and density function. In this case, each list element must be itself a list of two elements: the sampling function and the density function. For example, a uniform distribution can be defined using this approach with the following code (equivalent to my_prior=list(c("unif",0,1))):
my_prior=list(list(c("runif",1,0,1), c("dunif",0,1)))

NB: SABC requires the prior to be specified as a sampler and as a density (see the examples below).

3.6 Adding constraints to prior distributions

To add constraints to prior distributions (for instance, parameter 1 < parameter 2), users need to use the parameter prior_test in the ABC functions of the package (see their online documentation). This parameter prior_test will be evaluated as a logical expression, you can use all the logical operators including "<", ">", … to define whether a parameter set respects the constraint. Each parameter should be designated with "X1", "X2", … in the same order as in the prior definition.

Here is an example where the second parameter should be greater than the first one:

prior = list(c("unif",0,1),c("unif",0,10))
ABC_rejection(model=a_model,prior=prior,nb_simul=3, prior_test="X2 > X1")

3.7 The target summary statistics

A vector containing the summary statistics of the data must be supplied. The statistics must be in the same order as in the simulation outputs. The function SABC allows for a semi-automatic generation of summary statistics according to Fearnhead et al. (2012).

3.8 The option verbose

Intermediary results can be written in output files in the working directory. Users solely need to choose the option verbose=TRUE when launching the EasyABC functions (otherwise, the default value for verbose is FALSE). Intermediary results consist in the progressive writing of simulation outputs for the functions ABC_rejection and ABC_mcmc and in the writing of intermediary results at the end of each step for the function ABC_sequential. Additional details are provided in the help files of the functions.

3.10 Example of integration of an external program: fastsimcoal

This example is provided by an EasyABC user Albert Min-Shan Ko (currently at the Department of genetics, Max Planck Institute of Evolutionary Anthropology, Leipzig, Germany). The purpose is to plug a third-party software related to population genetics into the EasyABC workflow. This software needs input data in a given format, so the idea is to wrap the call to the fastsimcoal software into a script that will link EasyABC to fastsimcoal.

Here are the scripts as provided by courtesy of Albert Min-Shan Ko.

  • First, a R script reformats the parameters to be used by fastsimcoal (here named mod.input.r).
r<-read.table('input',head=F)
sink('mod.input')
cat(paste('1','p1','unif',round(r[1,],0),round(r[1,],0),sep='\t'))
cat('\n')
cat(paste('1','p2','unif',round(r[2,],0),round(r[2,],0),sep='\t'))
cat('\n')
cat(paste('1','p3','unif',round(r[3,],0),round(r[3,],0),sep='\t'))
sink()
  • Second, a GNU Bash script (here names run_sim.sh) invokes the latter R script and builds a parameter file for fastsimcoal (sim.est), runs fastsimcoal and computes some summary statistics with the arlequin program.
#!/bin/bash
rm -fr sim
Rscript mod.input.r
cat <(sed -n 1p template.est) <(sed -n '1,3'p mod.input) \ 
<(sed -n '5,\$'p template.est) > sim.est
until [ -f sim/arl_output ]; do
./fastsimcoal -t sim.tpl -e sim.est -E1 -n1 -q
./arlsumstat sim/sim_1_1.arp sim/arl_output 1 0 run_silent
done
cat sim/arl_output > output

Then, the user can invoke EasyABC like this :

prior=list(c("unif",500,1000),c("unif",100,500),c("unif",50,200))
ABC_sim<-ABC_rejection(model=binary_model('./run_sim.sh'),prior=prior,nb_simul=3)

3.11 Example of integration of a java model

If your model runs with a Java Virtual Machine (can be written in Java, Scala, Groovy, …), you can of course use the binary_model wrapper to run the JVM within your model. But, you can achieve a tighter integration that will simplify the process and save computing time. This section propose to use the R package rJava. Let’s consider the toy model written in Java (in a file named Model.java):

public class Model {
    public static double[] run(double[] x) {
        double[] result = new double[2];
        result[0] = x[0] + x[1];
        result[1] = x[0] * x[1];
        return result;
    }
}

We can compile it with the command: javac Model.java and then define our wrapper in R:

mymodel <- function(x) {
    library("rJava")
    .jinit(classpath=".")
    result = .jcall(J("Model"),"[D","run",.jarray(x))
    result
}

Then, the user can invoke EasyABC like this :

prior=list(c("unif",0,1),c("normal",1,2))
ABC_sim<-ABC_rejection(model=mymodel,prior=prior,nb_simul=3)

4 A first worked example

4.1 The toy model

We here consider a very simple stochastic model coded in the R language:

toy_model<-function(x){
    c( x[1] + x[2] + rnorm(1,0,0.1) , x[1] * x[2] + rnorm(1,0,0.1) ) 
}

We will use two different types of prior distribution for the two model parameters (\(x[1]\) and \(x[2]\)): a uniform distribution between 0 and 1 and a normal distribution with mean 1 and standard deviation 2.

toy_prior=list(c("unif",0,1),c("normal",1,2))

And we will consider an imaginary dataset of two summary statistics that the toy_model is aiming at fitting:

sum_stat_obs=c(1.5,0.5)

4.2 Performing a standard ABC-rejection procedure

A standard ABC-rejection procedure can be simply performed with the function ABC_rejection, in precising the number \(n\) of simulations to be performed and the proportion of simulations which are to be retained \(p\):

set.seed(1)
n=10
p=0.2
ABC_rej<-ABC_rejection(model=toy_model, prior=toy_prior, nb_simul=n,
summary_stat_target=sum_stat_obs, tol=p)
ABC_rej
#> $param
#>            [,1]      [,2]
#> param 0.6927316 0.8877425
#> param 0.3162717 1.0934523
#> 
#> $stats
#>          [,1]      [,2]
#> [1,] 1.564895 0.4678920
#> [2,] 1.386153 0.2915392
#> 
#> $weights
#> [1] 0.5 0.5
#> 
#> $stats_normalization
#> [1] 0.7266951 0.5603033
#> 
#> $nsim
#> [1] 10
#> 
#> $nrec
#> [1] 2
#> 
#> $computime
#> [1] 0.01295018

Alternatively, ABC_rejection can be used to solely launch the simulations and to store the simulation outputs without performing the rejection step. This option enables the user to make use of the R package abc (Csilléry et al. 2012) which offers an array of more sophisticated post-processing treatments than the simple rejection procedure:

# Run the ABC rejection on the model
set.seed(1)
n=10
ABC_rej<-ABC_rejection(model=toy_model, prior=toy_prior, nb_simul=n)
ABC_rej
#> $param
#>            [,1]      [,2]
#> param 0.2655087 0.3475333
#> param 0.6607978 1.6590155
#> param 0.7698414 0.9884657
#> param 0.2121425 1.7796865
#> param 0.8696908 0.1769783
#> param 0.6684667 2.6424424
#> param 0.7829328 1.2666727
#> param 0.6927316 0.8877425
#> param 0.3162717 1.0934523
#> param 0.3323947 1.7753432
#> 
#> $stats
#>            [,1]       [,2]
#>  [1,] 0.7460219 0.21951603
#>  [2,] 2.2377665 1.14501671
#>  [3,] 1.9987724 0.83732115
#>  [4,] 1.9297049 0.15607719
#>  [5,] 1.0718915 0.06472432
#>  [6,] 3.3702993 1.85828258
#>  [7,] 2.1300244 0.98600890
#>  [8,] 1.5648945 0.46789202
#>  [9,] 1.3861534 0.29153921
#> [10,] 2.1023574 0.45240868
#> 
#> $weights
#>  [1] 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1
#> 
#> $stats_normalization
#> [1] 0.7266951 0.5603033
#> 
#> $nsim
#> [1] 10
#> 
#> $computime
#> [1] 0.0005409718
# Install if needed the "abc" package
install.packages("abc")
# Post-process the simulations outputs
library(abc)
rej<-abc(sum_stat_obs, ABC_rej$param, ABC_rej$stats, tol=0.2, method="rejection")
#> Warning in abc(sum_stat_obs, ABC_rej$param, ABC_rej$stats, tol = 0.2, method =
#> "rejection"): No parameter names are given, using P1, P2, ...
#> Warning in abc(sum_stat_obs, ABC_rej$param, ABC_rej$stats, tol = 0.2, method =
#> "rejection"): No summary statistics names are given, using S1, S2, ...
# simulations selected:
rej$unadj.values
#>            [,1]      [,2]
#> param 0.6927316 0.8877425
#> param 0.3162717 1.0934523
# their associated summary statistics:
rej$ss
#>          [,1]      [,2]
#> [1,] 1.564895 0.4678920
#> [2,] 1.386153 0.2915392
# their normalized euclidean distance to the data summary statistics:
rej$dist
#>  [1] 1.6103923 1.9542368 1.2025193 1.0981535 1.2163667 4.6145013 1.5879393
#>  [8] 0.1448057 0.4716867 1.2112846

4.3 Performing a sequential ABC scheme

Other functions of the EasyABC package are used in a very similar manner. To perform the algorithm of Beaumont et al. (2009), one needs to specify the sequence of tolerance levels \(tolerance\_tab\) and the number \(nb\_simul\) of simulations to obtain below the tolerance level at each iteration:

n=10
tolerance=c(1.25,0.75)
ABC_Beaumont<-ABC_sequential(method="Beaumont", model=toy_model,
prior=toy_prior, nb_simul=n, summary_stat_target=sum_stat_obs,
tolerance_tab=tolerance)
ABC_Beaumont
#> $param
#>            [,1]      [,2]
#>  [1,] 0.7800180 0.4830061
#>  [2,] 0.3181763 2.3673583
#>  [3,] 0.1811065 2.6700808
#>  [4,] 0.8456229 0.2467030
#>  [5,] 0.1418500 1.9097104
#>  [6,] 0.3282295 2.8256064
#>  [7,] 0.1976839 1.2863106
#>  [8,] 0.3323167 2.3794734
#>  [9,] 0.2252921 1.5824478
#> [10,] 0.6077307 0.4225812
#> 
#> $stats
#>            [,1]       [,2]
#>  [1,] 1.3838109 0.49279371
#>  [2,] 2.6442826 0.65600856
#>  [3,] 2.8649926 0.47168981
#>  [4,] 1.0743703 0.19859866
#>  [5,] 2.0575765 0.21200292
#>  [6,] 3.0582520 0.80427662
#>  [7,] 1.4783048 0.06284696
#>  [8,] 2.6188539 0.64199282
#>  [9,] 1.7743399 0.35304040
#> [10,] 0.8912953 0.24977382
#> 
#> $weights
#>  [1] 0.09597405 0.09105900 0.10164109 0.10169384 0.12145842 0.08442337
#>  [7] 0.11758458 0.08987372 0.11027429 0.08601765
#> 
#> $stats_normalization
#> [1] 2.290391 1.433019
#> 
#> $epsilon
#> [1] 0.5079519
#> 
#> $nsim
#> [1] 31
#> 
#> $computime
#> [1] 0.004883766

To perform the algorithm of Drovandi and Pettitt (2011), one needs to specify four arguments: the initial number of simulations \(nb\_simul\), the final tolerance level \(tolerance\_tab\), the proportion \(\alpha\) of best-fit simulations to update the tolerance level at each step, and the target proportion \(c\) of unmoved particles during the MCMC jump. Note that default values \(alpha=0.5\) and \(c=0.01\) are used if not specified, following Drovandi and Pettitt (2011).

n=10
tolerance=0.75
c_drov=0.7
ABC_Drovandi<-ABC_sequential(method="Drovandi", model=toy_model,
prior=toy_prior, nb_simul=n, summary_stat_target=sum_stat_obs,
tolerance_tab=tolerance, c=c_drov)
ABC_Drovandi
#> $param
#>            [,1]      [,2]
#>  [1,] 0.6988245 0.8190878
#>  [2,] 0.6860284 0.1430693
#>  [3,] 0.3080524 2.0202168
#>  [4,] 0.4009210 0.8702183
#>  [5,] 0.4295584 0.4770350
#>  [6,] 0.6366469 0.8473816
#>  [7,] 0.9931522 0.2698854
#>  [8,] 0.3444874 0.5230873
#>  [9,] 0.5484915 1.4070225
#> [10,] 0.3991026 0.7969931
#> 
#> $stats
#>            [,1]      [,2]
#>  [1,] 1.5335613 0.4986674
#>  [2,] 0.9236163 0.1415199
#>  [3,] 2.3118317 0.6644022
#>  [4,] 1.3034399 0.4532500
#>  [5,] 0.8410152 0.2013222
#>  [6,] 1.5010774 0.4530792
#>  [7,] 1.3363126 0.3626958
#>  [8,] 1.0170970 0.2974128
#>  [9,] 1.9362341 0.9295290
#> [10,] 1.2003607 0.1584102
#> 
#> $weights
#>  [1] 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1
#> 
#> $stats_normalization
#> [1] 1.953418 1.214883
#> 
#> $epsilon
#> [1] 0.1910321
#> 
#> $nsim
#> [1] 45
#> 
#> $computime
#> [1] 0.005883932

To perform the algorithm of Del Moral et al. (2012), one needs to specify five arguments: the initial number of simulations \(nb\_simul\), the number \(\alpha\) controlling the decrease in effective sample size of the particle set at each step, the number \(M\) of simulations performed for each particle, the minimal effective sample size \(nb\_threshold\) below which a resampling of particles is performed and the final tolerance level \(tolerance\_target\). Note that default values \(alpha=0.5\), \(M=1\) and \(nb\_threshold=nb\_simul/2\) are used if not specified.

n=10
alpha_delmo=0.5
tolerance=0.75
ABC_Delmoral<-ABC_sequential(method="Delmoral", model=toy_model,
prior=toy_prior, nb_simul=n, summary_stat_target=sum_stat_obs,
alpha=alpha_delmo, tolerance_target=tolerance)
ABC_Delmoral
#> $param
#>             [,1]      [,2]
#>  [1,] 0.20305029 1.5083895
#>  [2,] 0.36154201 2.4887627
#>  [3,] 0.07295993 2.5253925
#>  [4,] 0.34874081 1.2126412
#>  [5,] 0.35066476 1.4262540
#>  [6,] 0.31188902 0.8378975
#>  [7,] 0.74261427 0.6621932
#>  [8,] 0.17663039 1.4794365
#>  [9,] 0.28307740 0.4146026
#> [10,] 0.30339615 1.2495415
#> 
#> $stats
#>            [,1]       [,2]
#>  [1,] 1.6007504 0.34104430
#>  [2,] 2.8064978 0.74901158
#>  [3,] 2.3885641 0.08381627
#>  [4,] 1.5005152 0.39277779
#>  [5,] 1.7063282 0.56293875
#>  [6,] 1.2908525 0.23887366
#>  [7,] 1.4279571 0.44951689
#>  [8,] 1.6061781 0.41628474
#>  [9,] 0.5003865 0.16883178
#> [10,] 1.6210776 0.39343083
#> 
#> $weights
#>  [1] 0.1428571 0.0000000 0.0000000 0.1428571 0.1428571 0.1428571 0.1428571
#>  [8] 0.1428571 0.0000000 0.1428571
#> 
#> $stats_normalization
#> [1] 1.8358809 0.9232688
#> 
#> $epsilon
#> [1] 0.579182
#> 
#> $nsim
#> [1] 33
#> 
#> $computime
#> [1] 0.007021904

To perform the algorithm of Lenormand et al. (2012), one needs to specify three arguments: the initial number of simulations \(nb\_simul\), the proportion \(\alpha\) of best-fit simulations to update the tolerance level at each step, and the stopping criterion \(p\_acc\_min\). Note that default values \(alpha=0.5\) and \(p\_acc\_min=0.05\) are used if not specified, following Lenormand et al. (2012). Also note that the method “Lenormand” is only supported with uniform prior distributions (since it performs a Latin Hypercube sampling at the beginning). Here, we therefore need to alter the prior distribution of the second model parameter:

toy_prior2=list(c("unif",0,1),c("unif",0.5,1.5))
n=10
pacc=0.4
ABC_Lenormand<-ABC_sequential(method="Lenormand", model=toy_model,
prior=toy_prior2, nb_simul=10, summary_stat_target=sum_stat_obs,
p_acc_min=pacc)
ABC_Lenormand
#> $param
#>           [,1]      [,2]
#> [1,] 0.4551314 1.0652461
#> [2,] 0.4266732 1.0758588
#> [3,] 0.3618080 1.3258408
#> [4,] 0.5621324 0.6516160
#> [5,] 0.7931661 0.8690392
#> 
#> $stats
#>          [,1]      [,2]
#> [1,] 1.496612 0.3921411
#> [2,] 1.468425 0.6092825
#> [3,] 1.619925 0.4034825
#> [4,] 1.365678 0.4030354
#> [5,] 1.703329 0.6694059
#> 
#> $weights
#> [1] 0.2048501 0.2048501 0.2293697 0.2586764 0.1022538
#> 
#> $stats_normalization
#> [1] 0.5378742 0.4894343
#> 
#> $epsilon
#> [1] 0.2627045
#> 
#> $nsim
#> [1] 20
#> 
#> $computime
#> [1] 0.006312132

4.4 Performing a ABC-MCMC scheme

To perform the algorithm of Marjoram et al. (2003), one needs to specify five arguments: the number of sampled points \(n\_rec\) in the Markov Chain, the number of chain points between two sampled points \(n\_between\_sampling\), the maximal distance accepted between simulations and data \(dist\_max\), a vector \(tab\_normalization\) precising the scale of each summary statistics, and a vector \(proposal\_range\) precising the maximal distances in each dimension of the parameter space for a jump of the MCMC. All these arguments have default values (see the package help for the function ABC_mcmc), so that ABC_mcmc will work without user-defined values.

n=10
ABC_Marjoram_original<-ABC_mcmc(method="Marjoram_original", model=toy_model,
prior=toy_prior, summary_stat_target=sum_stat_obs, n_rec=n)
#> [1] "Warning: summary statistics are normalized by default through a division by the target summary statistics - it may not be appropriate to your case."
#> [1] "Consider providing normalization constants for each summary statistics in the option 'tab_normalization' or using the method 'Marjoram' which automatically determines these constants."
#> [1] "Warning: default values for proposal distributions are used - they may not be appropriate to your case."
#> [1] "Consider providing proposal range constants for each parameter in the option 'proposal_range' or using the method 'Marjoram' which automatically determines these constants."
#> [1] "Warning: a default value for the tolerance has been computed - it may not be appropriate to your case."
#> [1] "Consider providing a tolerance value in the option 'dist_max' or using the method 'Marjoram' which automatically determines this value."
ABC_Marjoram_original
#> $param
#>            [,1]     [,2]
#>  [1,] 0.2398792 1.278654
#>  [2,] 0.2321289 1.575667
#>  [3,] 0.2672405 1.020733
#>  [4,] 0.2672405 1.020733
#>  [5,] 0.2846436 1.128116
#>  [6,] 0.2455528 1.366674
#>  [7,] 0.2223720 1.069631
#>  [8,] 0.2223720 1.069631
#>  [9,] 0.2223720 1.069631
#> [10,] 0.2878243 1.252653
#> 
#> $stats
#>           [,1]      [,2]
#>  [1,] 1.479906 0.4190810
#>  [2,] 1.799097 0.5039863
#>  [3,] 1.404520 0.4964135
#>  [4,] 1.404520 0.4964135
#>  [5,] 1.317378 0.4855191
#>  [6,] 1.806580 0.4093119
#>  [7,] 1.339412 0.3572251
#>  [8,] 1.339412 0.3572251
#>  [9,] 1.339412 0.3572251
#> [10,] 1.628605 0.4347523
#> 
#> $dist
#>  [1] 0.026370981 0.039823243 0.004103213 0.004103213 0.015661370 0.074671273
#>  [7] 0.093000212 0.093000212 0.093000212 0.024379895
#> 
#> $stats_normalization
#> [1] 1.5 0.5
#> 
#> $epsilon
#> [1] 0.09300021
#> 
#> $nsim
#> [1] 92
#> 
#> $n_between_sampling
#> [1] 10
#> 
#> $computime
#> [1] 0.008288145

To perform the algorithm of Marjoram et al. (2003) in which some of the arguments (\(dist\_max\), \(tab\_normalization\) and \(proposal\_range\)) are automatically determined by the algorithm via an initial calibration step, one needs to specify three arguments: the number \(n\_calibration\) of simulations to perform at the calibration step, the tolerance quantile \(tolerance\_quantile\) to be used for the determination of \(dist\_max\) and the scale factor \(proposal\_phi\) to determine the proposal range. These modifications are drawn from the algorithm of Wegmann et al. (2009a), without relying on PLS regressions. The arguments are set by default to: \(n\_calibration=10000\), \(tolerance\_quantile=0.01\) and \(proposal\_phi=1\). This way of automatic determination of \(dist\_max\), \(tab\_normalization\) and \(proposal\_range\) is strongly recommended, compared to the crude automatic determination proposed in the method Marjoram_original.

n=10
ABC_Marjoram<-ABC_mcmc(method="Marjoram", model=toy_model,
prior=toy_prior, summary_stat_target=sum_stat_obs, n_rec=n)
ABC_Marjoram
#> $param
#>            [,1]      [,2]
#>  [1,] 0.8156467 0.6613202
#>  [2,] 0.8132653 0.7117298
#>  [3,] 0.8268796 0.7528865
#>  [4,] 0.8704218 0.6560219
#>  [5,] 0.7940921 0.6481498
#>  [6,] 0.7758323 0.7067397
#>  [7,] 0.8229934 0.7033363
#>  [8,] 0.8797230 0.5543400
#>  [9,] 0.8988029 0.6478492
#> [10,] 0.8988029 0.6478492
#> 
#> $stats
#>           [,1]      [,2]
#>  [1,] 1.542392 0.5205872
#>  [2,] 1.559272 0.5470707
#>  [3,] 1.492039 0.4694074
#>  [4,] 1.448708 0.5497803
#>  [5,] 1.530955 0.5530765
#>  [6,] 1.481620 0.4812322
#>  [7,] 1.427922 0.5121051
#>  [8,] 1.452025 0.4737984
#>  [9,] 1.430581 0.5268393
#> [10,] 1.430581 0.5268393
#> 
#> $dist
#>  [1] 0.0007339775 0.0024096080 0.0006730610 0.0023798899 0.0022121933
#>  [6] 0.0003295032 0.0013637998 0.0010410004 0.0016757382 0.0016757382
#> 
#> $stats_normalization
#> [1] 2.029893 1.192913
#> 
#> $epsilon
#> [1] 0.002409608
#> 
#> $nsim
#> [1] 10091
#> 
#> $n_between_sampling
#> [1] 10
#> 
#> $computime
#> [1] 0.3375671

To perform the algorithm of Wegmann et al. (2009a), one needs to specify four arguments: the number \(n\_calibration\) of simulations to perform at the calibration step, the tolerance quantile \(tolerance\_quantile\) to be used for the determination of \(dist\_max\), the scale factor \(proposal\_phi\) to determine the proposal range and the number of components \(numcomp\) to be used in PLS regressions. The arguments are set by default to: \(n\_calibration=10000\), \(tolerance\_quantile=0.01\), \(proposal\_phi=1\) and \(numcomp=0\), this last default value encodes a choice of a number of PLS components equal to the number of summary statistics.

n=10
ABC_Wegmann<-ABC_mcmc(method="Wegmann", model=toy_model,
prior=toy_prior, summary_stat_target=sum_stat_obs, n_rec=n)
ABC_Wegmann
#> $param
#>            [,1]      [,2]
#>  [1,] 0.2722069 1.3531304
#>  [2,] 0.2717984 1.3133540
#>  [3,] 0.2947780 1.2116622
#>  [4,] 0.2947780 1.2116622
#>  [5,] 0.3748534 1.2900388
#>  [6,] 0.3748534 1.2900388
#>  [7,] 0.4720616 1.0156964
#>  [8,] 0.5790331 0.9456075
#>  [9,] 0.4862960 0.8426098
#> [10,] 0.5470196 0.9231012
#> 
#> $stats
#>           [,1]      [,2]
#>  [1,] 1.449920 0.4645954
#>  [2,] 1.483816 0.4450861
#>  [3,] 1.487523 0.4559809
#>  [4,] 1.487523 0.4559809
#>  [5,] 1.529353 0.5540701
#>  [6,] 1.529353 0.5540701
#>  [7,] 1.580289 0.5256440
#>  [8,] 1.598992 0.5245697
#>  [9,] 1.426881 0.5004935
#> [10,] 1.610406 0.4871551
#> 
#> $dist
#>  [1] 0.001559741 0.002398932 0.001538073 0.001538073 0.002464988 0.002464988
#>  [7] 0.002007425 0.002740974 0.001237080 0.002931023
#> 
#> $epsilon
#> [1] 0.002931023
#> 
#> $nsim
#> [1] 10091
#> 
#> $n_between_sampling
#> [1] 10
#> 
#> $min_stats
#> [1] -6.007372 -4.873890
#> 
#> $max_stats
#> [1] 9.273201 8.222326
#> 
#> $lambda
#> [1] 0.6060606 0.6060606
#> 
#> $geometric_mean
#> [1] 1.483339 1.406508
#> 
#> $boxcox_mean
#> [1] 0.5238013 0.4351455
#> 
#> $boxcox_sd
#> [1] 0.13159714 0.08956804
#> 
#> $pls_transform
#>            [,1]       [,2]
#> [1,] -0.7122440 -0.7048392
#> [2,] -0.6566048  0.7542348
#> 
#> $n_component
#> [1] 2
#> 
#> $computime
#> [1] 70.3014

4.5 Performing a SABC scheme

For the SABC algorithm by Albert et al. (2014) we need to provide the prior in the form of a sampler and a density:

r.prior <- function()   c(runif(1,0,1),rnorm(1,1,2))
d.prior <- function(x)  dunif(x[1],0,1)*dnorm(x[2],1,2)

Furthermore, we need to specify the size of the ensemble, the number of simulations and the initial tolerance

n.sample <- 300
iter.max <- n.sample * 30
eps.init <- 2

Since, for this example, the prior carries relevant information, we choose the method “informative”:

An approximate posterior parameter sample is contained in ABC_Albert\$E[,1:2], e.g.

plot(ABC_Albert$E[,1:2])

4.6 Using multiple cores

The functions of the package EasyABC can launch the simulations on multiple cores of a computer: users have to indicate the number of cores they wish to use in the argument n_cluster of the functions, and they have to use the option use_seed=TRUE. Users also need to design their code in a slightly different way so that it is compatible with the option use_seed=TRUE (see Section The simulation code - for use with multiple cores for additional details). For the toy model above, the modifications needed are the following:

toy_model_parallel<-function(x){
set.seed(x[1]) # so that each core is initialized with a different seed value.
c( x[2] + x[3] + rnorm(1,0,0.1) , x[2] * x[3] + rnorm(1,0,0.1) ) 
}
set.seed(1)
n=10
p=0.2
ABC_rej<-ABC_rejection(model=toy_model_parallel, prior=toy_prior,
nb_simul=n, summary_stat_target=sum_stat_obs, tol=p, n_cluster=2,
use_seed=TRUE)
ABC_rej
#> $param
#>           [,1]      [,2]
#> [1,] 0.6870228 0.4105591
#> [2,] 0.2121425 1.7796865
#> 
#> $stats
#>          [,1]      [,2]
#> [1,] 1.013496 0.4204994
#> [2,] 1.983370 0.4615872
#> 
#> $weights
#> [1] 0.5 0.5
#> 
#> $stats_normalization
#> [1] 1.420082 0.687775
#> 
#> $nsim
#> [1] 10
#> 
#> $nrec
#> [1] 2
#> 
#> $computime
#> [1] 1.685923

5 A second worked example

5.1 The trait model

We turn now to a stochastic ecological model hereafter called trait_model to illustrate how to use EasyABC with models not initially coded in the R language. trait_model represents the stochastic dynamics of an ecological community where each species is represented by a set of traits (i.e. characteristics) which determine its competitive ability. A detailed description and analysis of the model can be found in Jabot (2010). The model requires four parameters: an immigration rate \(I\), and three additional parameters (\(h\), \(A\) and \(\sigma\)) describing the way traits determine species competitive ability. The model additionnally requires two fixed variables: the total number of individuals in the local community \(J\) and the number of traits used \(n\_t\).

The model outputs four summary statistics: the species richness of the community \(S\), its Shannon’s index \(H\), the mean of the trait value among individuals \(MTV\) and the skewness of the trait value distribution \(STV\).

NB: Three parameters (\(I\), \(A\) and \(\sigma\)) have non-uniform prior distributions: instead, their log-transformed values have a uniform prior distribution. The simulation code trait_model therefore takes an exponential transform of the values proposed by EasyABC for these parameters at the beginning of each simulation.

In the following, we will use the values \(J=500\) and \(n\_t=1\), and uniform prior distributions for \(ln(I)\) in \([3;5]\), \(h\) in [-25;125], \(ln(A)\) in \([ln(0.1);ln(5)]\) and \(ln(\sigma)\) in \([ln(0.5);ln(25)]\). The simulation code trait_model reads sequentially \(J\), \(I\), \(A\), \(n\_t\), \(h\) and \(\sigma\).

NB: Note that the fixed variables \(J\) and \(n\_t\) have been fixed (see this section for details) into the function trait_model. But if it didn’t, we would have included these constants in the prior list using uniform distributions with a trivial ranges, like c("unif",500,500) for example.

trait_prior=list(c("unif",3,5),c("unif",-2.3,1.6),
c("unif",-25,125), c("unif",-0.7,3.2))

We will consider an imaginary dataset whose summary statistics are \((S,H,MTV,STV) = (100,2.5,20,30000)\):

sum_stat_obs=c(100,2.5,20,30000)

5.2 Performing a standard ABC-rejection procedure

A standard ABC-rejection procedure can be simply performed with the function ABC_rejection, in precising the number \(n\) of simulations to be performed and the proportion \(p\) of retained simulations. Note that the option use_seed=TRUE is used, since trait_model requires a seed initializing value for its pseudo-random number generator:

set.seed(1)
n=10
p=0.2
ABC_rej<-ABC_rejection(model=trait_model, prior=trait_prior, nb_simul=n,
summary_stat_target=sum_stat_obs, tol=p, use_seed=TRUE)
ABC_rej
#> $param
#>          [,1]      [,2]      [,3]     [,4]
#> [1,] 4.435237  1.568434  32.00528 2.332036
#> [2,] 3.534441 -0.794155 -22.99145 0.791313
#> 
#> $stats
#>      [,1]     [,2]    [,3]     [,4]
#> [1,]  116 4.104226 32.9882 3029.800
#> [2,]   84 3.994615 49.1732 1950.147
#> 
#> $weights
#> [1] 0.5 0.5
#> 
#> $stats_normalization
#> [1] 3.981680e+01 6.631974e-01 1.451333e+01 1.526571e+04
#> 
#> $nsim
#> [1] 10
#> 
#> $nrec
#> [1] 2
#> 
#> $computime
#> [1] 2.71795

Alternatively, ABC_rejection can be used to solely launch the simulations and to store the simulation outputs without performing the rejection step. This option enables the user to make use of the R package abc (Csilléry et al. 2012) which offers an array of more sophisticated post-processing treatments than the simple rejection procedure:

install.packages("abc")
library(abc)
set.seed(1)
n=10
p=0.2
ABC_rej<-ABC_rejection(model=trait_model, prior=trait_prior, nb_simul=n, use_seed=TRUE)
rej<-abc(sum_stat_obs, ABC_rej$param, ABC_rej$stats,
tol=0.2, method="rejection")
#> Warning in abc(sum_stat_obs, ABC_rej$param, ABC_rej$stats, tol = 0.2, method =
#> "rejection"): No parameter names are given, using P1, P2, ...
#> Warning in abc(sum_stat_obs, ABC_rej$param, ABC_rej$stats, tol = 0.2, method =
#> "rejection"): No summary statistics names are given, using S1, S2, ...
# simulations selected:
rej$unadj.values
#>          [,1]      [,2]      [,3]     [,4]
#> [1,] 4.435237  1.568434  32.00528 2.332036
#> [2,] 3.534441 -0.794155 -22.99145 0.791313
# their associated summary statistics:
rej$ss
#>      [,1]     [,2]    [,3]     [,4]
#> [1,]  116 4.104226 32.9882 3029.800
#> [2,]   84 3.994615 49.1732 1950.147
# their normalized euclidean distance to the data summary statistics:
rej$dist
#>  [1] 6.030324 9.400513 5.613765 8.993603 3.847707 5.885814 4.960505 5.605648
#>  [9] 8.706420 7.113637

Note that a simulation code My_simulation_code can be passed to the function ABC_rejection in several ways depending on its nature:

  • if it is a R function \ ABC_rejection(My_simulation_code, prior, nb_simul,...)
  • if it is a binary executable file and a single core is used (see section The simulation code - for use on a single core for compatibility constraints)\ ABC_rejection(binary_model("./My_simulation_code"), prior, nb_simul, use_seed=TRUE,...)
  • if it is a binary executable file and multiple cores are used (see section The simulation code - for use with multiple cores for compatibility constraints)\ ABC_rejection(binary_model_cluster("./My_simulation_code"), prior, nb_simul, n_cluster=2, use_seed=TRUE)

5.3 Performing a sequential ABC scheme

Other functions of the EasyABC package are used in a very similar manner. To perform the algorithm of Beaumont et al. (2009), one needs to specify the sequence of tolerance levels \(tolerance\_tab\) and the number \(nb\_simul\) of simulations to obtain below the tolerance level at each iteration:

n=10
tolerance=c(8,5)
ABC_Beaumont<-ABC_sequential(method="Beaumont", model=trait_model,
prior=trait_prior, nb_simul=n, summary_stat_target=sum_stat_obs,
tolerance_tab=tolerance, use_seed=TRUE)
ABC_Beaumont
#> $param
#>           [,1]       [,2]       [,3]         [,4]
#>  [1,] 3.362110 -1.9163965 23.0654334 -0.599666827
#>  [2,] 3.543818  0.9455070 17.2848491  0.716968486
#>  [3,] 3.260380 -0.9270426 25.3968857  0.779964720
#>  [4,] 3.398492  0.4703746 15.1099903  0.938328200
#>  [5,] 3.017150 -1.8711144 26.1860829 -0.002416091
#>  [6,] 4.053298 -1.7809923 14.1823387 -0.159038819
#>  [7,] 4.398392  1.0919829  5.7191547  2.949045622
#>  [8,] 4.358355  0.7294188 14.7202497  2.627312598
#>  [9,] 4.694502  1.1509709  0.9577352  3.095833860
#> [10,] 3.672407  0.7490497 13.0817551  3.067217787
#> 
#> $stats
#>       [,1]     [,2]    [,3]      [,4]
#>  [1,]   60 2.406482 35.3478 15610.260
#>  [2,]   52 1.891610 21.0018 12123.542
#>  [3,]   50 1.762070 31.1942 10202.962
#>  [4,]   45 2.162352 17.7182 11280.426
#>  [5,]   46 1.682788 34.1128 12557.523
#>  [6,]  119 3.615008 36.6096 13923.754
#>  [7,]  122 4.144510 14.6302 17235.203
#>  [8,]  118 4.079771 19.0042 12605.343
#>  [9,]  149 4.212484 14.8760 20535.516
#> [10,]   86 3.506517 17.2494  6748.929
#> 
#> $weights
#>  [1] 0.11597293 0.06118753 0.07779398 0.05992568 0.15439436 0.07654255
#>  [7] 0.11863352 0.08190165 0.18240961 0.07123820
#> 
#> $stats_normalization
#> [1] 4.530833e+01 9.914929e-01 1.550687e+01 1.311587e+04
#> 
#> $epsilon
#> [1] 4.782638
#> 
#> $nsim
#> [1] 72
#> 
#> $computime
#> [1] 15.67006

To perform the algorithm of Drovandi and Pettitt (2011), one needs to specify four arguments: the initial number of simulations \(nb\_simul\), the final tolerance level \(tolerance\_tab\), the proportion \(\alpha\) of best-fit simulations to update the tolerance level at each step, and the target proportion \(c\) of unmoved particles during the MCMC jump. Note that default values \(alpha=0.5\) and \(c=0.01\) are used if not specified, following Drovandi and Pettitt (2011).

n=10
tolerance=3
c_drov=0.7
ABC_Drovandi<-ABC_sequential(method="Drovandi", model=trait_model,
prior=trait_prior, nb_simul=n, summary_stat_target=sum_stat_obs,
tolerance_tab=tolerance, c=c_drov, use_seed=TRUE)
ABC_Drovandi
#> $param
#>           [,1]         [,2]       [,3]        [,4]
#>  [1,] 4.372120 -0.001121019  5.4715372 -0.12568710
#>  [2,] 4.051867 -0.125022011 12.1207510 -0.67154259
#>  [3,] 4.357840 -0.158636879  4.0214488 -0.14801890
#>  [4,] 4.003453  0.155106599 11.8676334 -0.43394316
#>  [5,] 4.497001  0.296629792  0.4179174  0.04238219
#>  [6,] 4.614254 -0.421605678 11.7121630 -0.13055283
#>  [7,] 4.255693 -0.030613125 13.7085732 -0.09558018
#>  [8,] 4.320190  0.177067473 15.5170178  0.26745481
#>  [9,] 4.704030 -0.007652111  9.6886842  0.34755989
#> [10,] 4.202139 -0.031661415  9.6060166 -0.11315922
#> 
#> $stats
#>       [,1]     [,2]    [,3]     [,4]
#>  [1,]   96 2.038106 16.2344 24729.78
#>  [2,]   76 1.519483 22.5206 27628.05
#>  [3,]   95 2.213074 16.8628 31063.59
#>  [4,]   65 1.960228 18.8398 21922.02
#>  [5,]   93 2.712320 13.6572 37907.42
#>  [6,]  110 2.657473 25.9604 20921.48
#>  [7,]   91 1.819804 22.4596 19743.10
#>  [8,]   85 2.351128 21.5434 15320.82
#>  [9,]  124 2.648244 22.0450 26811.07
#> [10,]   85 2.017672 17.5428 20111.10
#> 
#> $weights
#>  [1] 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1
#> 
#> $stats_normalization
#> [1]    45.687583     1.130895    18.184548 14175.858029
#> 
#> $epsilon
#> [1] 1.204596
#> 
#> $nsim
#> [1] 40
#> 
#> $computime
#> [1] 6.751506

To perform the algorithm of Del Moral et al. (2012), one needs to specify five arguments: the initial number of simulations \(nb\_simul\), the number \(\alpha\) controlling the decrease in effective sample size of the particle set at each step, the number \(M\) of simulations performed for each particle, the minimal effective sample size \(nb\_threshold\) below which a resampling of particles is performed and the final tolerance level \(tolerance\_target\). Note that default values \(alpha=0.5\), \(M=1\) and \(nb\_threshold=nb\_simul/2\) are used if not specified.

n=10
alpha_delmo=0.5
tolerance=3
ABC_Delmoral<-ABC_sequential(method="Delmoral", model=trait_model,
prior=trait_prior, nb_simul=n, summary_stat_target=sum_stat_obs,
alpha=alpha_delmo, tolerance_target=tolerance, use_seed=TRUE)
ABC_Delmoral
#> $param
#>           [,1]        [,2]      [,3]        [,4]
#>  [1,] 4.340312  0.27803253 12.362276  1.38257144
#>  [2,] 4.142546 -0.85552836  8.562840  0.51965039
#>  [3,] 4.137010  0.08351261  3.671017  2.36717400
#>  [4,] 4.108617 -0.06937268 10.516238 -0.03101547
#>  [5,] 4.254593  0.35733560 13.356750  1.44322454
#>  [6,] 4.129874  0.13015864 10.201600 -0.10552508
#>  [7,] 4.157252 -0.16565665  4.865025  1.99019890
#>  [8,] 4.034561 -0.46067583  5.810586  1.22583149
#>  [9,] 4.186188 -0.13743236  7.820983  1.35175359
#> [10,] 4.254593  0.35733560 13.356750  1.44322454
#> 
#> $stats
#>       [,1]     [,2]    [,3]     [,4]
#>  [1,]  101 3.233236 20.2856 22151.89
#>  [2,]   93 2.580371 22.1176 22572.67
#>  [3,]  107 3.402067 15.8006 32264.44
#>  [4,]   77 1.631609 19.8354 25750.84
#>  [5,]   87 3.080927 19.0860 17732.19
#>  [6,]   81 2.024566 17.0106 18401.83
#>  [7,]   97 3.042123 14.1450 26042.48
#>  [8,]   87 3.104230 14.3686 28931.67
#>  [9,]   97 2.808620 17.5424 28791.39
#> [10,]   87 3.080927 19.0860 17732.19
#> 
#> $weights
#>  [1] 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1
#> 
#> $stats_normalization
#> [1]    32.616288     1.050966    14.887083 13081.213750
#> 
#> $epsilon
#> [1] 1.37042
#> 
#> $nsim
#> [1] 52
#> 
#> $computime
#> [1] 8.676857

To perform the algorithm of Lenormand et al. (2012), one needs to specify three arguments: the initial number of simulations \(nb\_simul\), the proportion \(\alpha\) of best-fit simulations to update the tolerance level at each step, and the stopping criterion \(p\_acc\_min\). Note that default values \(alpha=0.5\) and \(p\_acc\_min=0.05\) are used if not specified, following Lenormand et al. (2012).

n=10
pacc=0.4
ABC_Lenormand<-ABC_sequential(method="Lenormand", model=trait_model,
prior=trait_prior, nb_simul=n, summary_stat_target=sum_stat_obs,
p_acc_min=pacc, use_seed=TRUE)
ABC_Lenormand
#> $param
#>          [,1]       [,2]      [,3]       [,4]
#> [1,] 4.872057  0.9787403  9.217730 -0.5341278
#> [2,] 3.292700 -1.5865848 -4.613586  2.9811316
#> [3,] 3.875684  0.8784209 47.125550  1.6683782
#> [4,] 4.216460  1.2599320 53.953265  1.7230472
#> [5,] 4.057230 -2.0734062 22.124426  0.2728228
#> 
#> $stats
#>      [,1]     [,2]    [,3]      [,4]
#> [1,]  112 2.531360 19.2316 24271.635
#> [2,]   72 3.091854 20.3568 39848.829
#> [3,]   72 2.655304 48.0888  1081.154
#> [4,]   98 2.943912 53.2572 -1734.814
#> [5,]  112 3.995667 44.8408 10194.798
#> 
#> $weights
#> [1] 0.22554134 0.22554134 0.22554134 0.22554134 0.09783465
#> 
#> $stats_normalization
#> [1]    26.6426976     0.8538883    19.4103525 20397.9850272
#> 
#> $epsilon
#> [1] 5.851488
#> 
#> $nsim
#> [1] 15
#> 
#> $computime
#> [1] 3.721657

5.4 Performing a ABC-MCMC scheme

To perform the algorithm of Marjoram et al. (2003), one needs to specify five arguments: the number of sampled points \(n\_obs\) in the Markov Chain, the number of chain points between two sampled points \(n\_between\_sampling\), the maximal distance accepted between simulations and data \(dist\_max\), a vector \(tab\_normalization\) precising the scale of each summary statistics, and a vector \(proposal\_range\) precising the maximal distances in each dimension of the parameter space for a jump of the MCMC. All these arguments have default values (see the package help for the function ABC_mcmc), so that ABC_mcmc will work without user-defined values.

n=10
ABC_Marjoram_original<-ABC_mcmc(method="Marjoram_original", model=trait_model,
prior=trait_prior, summary_stat_target=sum_stat_obs, n_rec=n, use_seed=TRUE)
#> [1] "Warning: summary statistics are normalized by default through a division by the target summary statistics - it may not be appropriate to your case."
#> [1] "Consider providing normalization constants for each summary statistics in the option 'tab_normalization' or using the method 'Marjoram' which automatically determines these constants."
#> [1] "Warning: default values for proposal distributions are used - they may not be appropriate to your case."
#> [1] "Consider providing proposal range constants for each parameter in the option 'proposal_range' or using the method 'Marjoram' which automatically determines these constants."
#> [1] "Warning: a default value for the tolerance has been computed - it may not be appropriate to your case."
#> [1] "Consider providing a tolerance value in the option 'dist_max' or using the method 'Marjoram' which automatically determines this value."
ABC_Marjoram_original
#> $param
#>           [,1]       [,2]       [,3]     [,4]
#>  [1,] 4.199740 -1.1294670   8.906919 1.789854
#>  [2,] 4.236296 -0.9263885  10.258952 2.143386
#>  [3,] 4.434562 -0.7351417   6.881969 2.348461
#>  [4,] 4.583995 -0.8471750   2.503599 2.339180
#>  [5,] 4.378520 -0.9813868  14.280402 2.219657
#>  [6,] 4.455056 -1.3638328   1.880114 2.338873
#>  [7,] 4.587697 -1.6464353  -2.076588 2.112828
#>  [8,] 4.486547 -1.7021830 -15.758385 2.296798
#>  [9,] 4.401557 -2.1317335 -12.844467 2.782585
#> [10,] 4.272777 -2.2316310 -16.090006 2.630626
#> 
#> $stats
#>       [,1]     [,2]    [,3]       [,4]
#>  [1,]  118 3.933861 32.0702 22218.7174
#>  [2,]  124 3.902011 27.1996 27035.4492
#>  [3,]  130 4.091712 23.3602 25443.7013
#>  [4,]  157 4.462166 26.2230 25485.6756
#>  [5,]  129 4.120971 31.0466 21129.3689
#>  [6,]  150 4.279180 30.3934 23215.4559
#>  [7,]  171 4.680078 42.5138  4829.8046
#>  [8,]  162 4.729817 49.8080   766.8731
#>  [9,]  145 4.550159 50.9206 -2667.9723
#> [10,]  131 4.450486 50.5850  2537.2852
#> 
#> $dist
#>  [1] 0.7928533 0.5114523 0.5466613 1.0603728 0.8970070 1.0776786 3.2356483
#>  [8] 4.3507541 4.4509867 3.8814095
#> 
#> $stats_normalization
#> [1]   100.0     2.5    20.0 30000.0
#> 
#> $epsilon
#> [1] 4.450987
#> 
#> $nsim
#> [1] 92
#> 
#> $n_between_sampling
#> [1] 10
#> 
#> $computime
#> [1] 16.70014

To perform the algorithm of Marjoram et al. (2003) in which some of the arguments (\(dist\_max\), \(tab\_normalization\) and \(proposal\_range\)) are automatically determined by the algorithm via an initial calibration step, one needs to specify three arguments: the number \(n\_calibration\) of simulations to perform at the calibration step, the tolerance quantile \(tolerance\_quantile\) to be used for the determination of \(dist\_max\) and the scale factor \(proposal\_phi\) to determine the proposal range. These modifications are drawn from the algorithm of Wegmann et al. (2009a), without relying on PLS regressions. The arguments are set by default to: \(n\_calibration=10000\), \(tolerance\_quantile=0.01\) and \(proposal\_phi=1\). This way of automatic determination of \(dist\_max\), \(tab\_normalization\) and \(proposal\_range\) is strongly recommended, compared to the crude automatic determination proposed in the method Marjoram_original.

n=10
n_calib=10
tol_quant=0.2 
ABC_Marjoram<-ABC_mcmc(method="Marjoram", model=trait_model, prior=trait_prior,
summary_stat_target=sum_stat_obs,
n_rec=n, n_calibration=n_calib, tolerance_quantile=tol_quant, use_seed=TRUE)
ABC_Marjoram
#> $param
#>           [,1]        [,2]     [,3]     [,4]
#>  [1,] 4.377532 -0.43685036 16.00214 2.251974
#>  [2,] 3.885844 -0.28414243 15.45805 2.256866
#>  [3,] 3.713414 -0.40552403 16.51290 2.392034
#>  [4,] 3.352807 -0.12041734 15.13927 2.190195
#>  [5,] 3.629966  0.26055628 14.04613 2.270628
#>  [6,] 3.838667  0.04460109 13.16472 2.415645
#>  [7,] 3.798929 -0.59853210 13.47327 2.582163
#>  [8,] 3.653076 -0.71307810 12.40725 2.439392
#>  [9,] 4.478488 -0.68144386 11.64735 2.461565
#> [10,] 3.427418 -0.74796895 13.09694 2.421449
#> 
#> $stats
#>       [,1]     [,2]    [,3]      [,4]
#>  [1,]  132 4.216634 27.9314 24996.140
#>  [2,]   93 3.174627 23.2592 20959.218
#>  [3,]   84 3.409929 20.6658  8826.281
#>  [4,]   44 2.013663 18.2780  8654.713
#>  [5,]   65 3.174639 16.4488 13314.321
#>  [6,]   80 3.381207 18.2916 19262.376
#>  [7,]   91 3.496436 22.4586 17348.881
#>  [8,]   77 3.167358 18.6004 13989.242
#>  [9,]  150 4.480797 30.5738 21189.463
#> [10,]   83 3.047479 18.0238 17154.225
#> 
#> $dist
#>  [1] 3.000164 1.263669 5.606173 6.990029 4.119004 1.996978 2.456306 3.416704
#>  [9] 5.274917 2.190825
#> 
#> $stats_normalization
#> [1]   41.685196    1.270533   14.262361 9522.886173
#> 
#> $epsilon
#> [1] 6.990029
#> 
#> $nsim
#> [1] 101
#> 
#> $n_between_sampling
#> [1] 10
#> 
#> $computime
#> [1] 12.04817

To perform the algorithm of Wegmann et al. (2009a), one needs to specify four arguments: the number \(n\_calibration\) of simulations to perform at the calibration step, the tolerance quantile \(tolerance\_quantile\) to be used for the determination of \(dist\_max\), the scale factor \(proposal\_phi\) to determine the proposal range and the number of components \(numcomp\) to be used in PLS regressions. The arguments are set by default to: \(n\_calibration=10000\), \(tolerance\_quantile=0.01\), \(proposal\_phi=1\) and \(numcomp=0\), this last default value encodes a choice of a number of PLS components equal to the number of summary statistics.

n=10
n_calib=10
tol_quant=0.2 
ABC_Wegmann<-ABC_mcmc(method="Wegmann", model=trait_model, prior=trait_prior,
summary_stat_target=sum_stat_obs,
n_rec=n, n_calibration=n_calib, tolerance_quantile=tol_quant, use_seed=TRUE)
ABC_Wegmann
#> $param
#>           [,1]       [,2]     [,3]       [,4]
#>  [1,] 3.418997 -1.8441175 15.79981 -0.3531885
#>  [2,] 3.424944 -1.4579560 15.24388  0.1872806
#>  [3,] 3.460813 -1.7687966 14.61885 -0.4819984
#>  [4,] 3.830361 -1.7979041 14.12632  0.9398673
#>  [5,] 3.903148 -0.5012784 14.44541  0.6418533
#>  [6,] 3.758177 -0.4700121 14.26322  0.2699922
#>  [7,] 3.937060  1.1058046 14.35394  1.3264595
#>  [8,] 4.531886  0.8582610 14.32158  1.2852569
#>  [9,] 4.545157  1.0951576 14.80598  1.1290289
#> [10,] 4.559757 -0.6612092 15.26312 -0.1962148
#> 
#> $stats
#>       [,1]     [,2]    [,3]      [,4]
#>  [1,]   71 2.190161 27.5200 13954.961
#>  [2,]   59 2.116203 30.9712 29472.847
#>  [3,]   74 2.594792 26.9422 15934.574
#>  [4,]   95 3.702872 32.6552 17190.443
#>  [5,]   81 2.263678 23.9988 17508.712
#>  [6,]   61 1.978142 21.4404 22695.913
#>  [7,]   67 2.609104 17.6650  7290.894
#>  [8,]   97 2.897983 19.4404 12758.479
#>  [9,]   88 2.963642 20.0662 13630.234
#> [10,]  128 3.189369 30.9628 22055.391
#> 
#> $dist
#>  [1] 1.4784034 2.1311425 1.0755683 2.3644112 0.7036211 2.1148378 2.0914487
#>  [8] 0.7833533 0.9541797 1.5737981
#> 
#> $epsilon
#> [1] 2.364411
#> 
#> $nsim
#> [1] 101
#> 
#> $n_between_sampling
#> [1] 10
#> 
#> $min_stats
#> [1]     43.000000      1.632943     18.051800 -36048.727962
#> 
#> $max_stats
#> [1]   133.000000     4.417974    84.024000 21035.394747
#> 
#> $lambda
#> [1] 0.6060606 0.6060606 1.8181818 0.6060606
#> 
#> $geometric_mean
#> [1] 1.511961 1.539539 1.462830 1.483128
#> 
#> $boxcox_mean
#> [1] 0.5767484 0.6038067 0.4759215 0.5468069
#> 
#> $boxcox_sd
#> [1] 0.3602366 0.3218550 0.3699768 0.3741916
#> 
#> $pls_transform
#>             [,1]        [,2]        [,3]       [,4]
#> [1,] -0.49792327 -0.46804745 -0.52229791  0.5123616
#> [2,] -0.49423098 -0.58543918  0.43815101 -0.4805024
#> [3,]  0.72267120 -0.68569737  0.06543188  0.1562168
#> [4,] -0.06593739 -0.06311256  0.76456955  0.6380458
#> 
#> $n_component
#> [1] 4
#> 
#> $computime
#> [1] 14.88913

5.5 Performing a SABC scheme

For the SABC algorithm by Albert et al. (2014) we need to provide the prior in the form of a sampler and a density:

r.prior <- function() c(runif(1,3,5), runif(1,-2.3,1.6),
    runif(1,-25,125),runif(1,-0.7,3.2),1)
d.prior <- function(x) dunif(x[1],3,5) * dunif(x[2],-2.3,1.6) *
    dunif(x[3],-25,125) * dunif(x[4],-0.7,3.2)

Furthermore, we need to specify the size of the ensemble, the number of simulations and the initial tolerance

n.sample <- 30
iter.max <- n.sample * 30
eps.init <- 20

Since, for this example, the prior is flat, we choose the method “noninformative”:

We may plot the marginals of the posterior, using, e.g.,

hist(ABC_Albert$E[,3],breaks=100)

5.6 Using multiple cores

The functions of the package EasyABC can launch the simulations on multiple cores of a computer: users only have to indicate the number of cores they wish to use in the argument n_cluster of the functions. The compatibility constraints of the simulation code are slightly different when using multiple cores: please refer to section The simulation code - for use with multiple cores for more information.

6 Troubleshooting and development

Please send comments, suggestions and bug reports to or . Any new development of more efficient ABC schemes that could be included in the package is particularly welcome.

7 Programming Acknowledgements

The EasyABC package makes use of a number of R tools, among which:

  • the R package lhs (Carnell 2012) for latin hypercube sampling.
  • the R package MASS (Venables and Ripley 2002) for boxcox transformation.
  • the R package mnormt (Genz and Azzalini 2012) for multivariate normal generation.
  • the R package pls (Mevik and Wehrens 2011) for partial least square regression.
  • the R script for the Wegmann et al. (2009a)’s algorithm drawn from the ABCtoolbox documentation (Wegmann et al. 2009b). We thank Sylvie Huet, Albert Ko, Matteo Fasiolo and Wim Delva for their suggestions and inputs in the development of this version.

8 References

Albert C., K"unsch HR., Scheidegger A. (2014) A Simulated Annealing Approach to Approximate Bayes Computations. Stat. Comput., 1–16, arXiv:1208.2157 [stat.CO].

Beaumont, M. A., Cornuet, J., Marin, J., and Robert, C. P. (2009) Adaptive approximate Bayesian computation. Biometrika,96, 983–990.

Carnell, R. (2012) lhs: Latin Hypercube Samples. R package version 0.10. https://doi.org/10.32614/CRAN.package.lhs

Csilléry, K., Franois, O., and Blum, M.G.B. (2012) abc: an r package for approximate bayesian computation (abc). Methods in Ecology and Evolution, 3, 475–479.

Del Moral, P., Doucet, A., and Jasra, A. (2012) An adaptive sequential Monte Carlo method for approximate Bayesian computation. Statistics and Computing, 22, 1009–1020.

Drovandi, C. C. and Pettitt, A. N. (2011) Estimation of parameters for macroparasite population evolution using approximate Bayesian computation. Biometrics, 67, 225–233.

Fearnhead, P. and Prangle, D. (2012) Constructing summary statistics for approximate Bayesian computation: semiautomatic approximate Bayesian computation. J.Roy. Stat. Soc.: Series B 74.3, 419-474.

Genz, A., and Azzalini, A. (2012) mnormt: The multivariate normal and t distributions. R package version 1.4-5. https://doi.org/10.32614/CRAN.package.mnormt

Jabot, F. (2010) A stochastic dispersal-limited trait-based model of community dynamics. Journal of Theoretical Biology, 262, 650–661.

Lenormand, M., Jabot, F., Deffuant G. (2012) Adaptive approximate Bayesian computation for complex models. http://arxiv.org/pdf/1111.1308.pdf

Marjoram, P., Molitor, J., Plagnol, V. and Tavaré, S. (2003) Markov chain Monte Carlo without likelihoods. PNAS, 100, 15324–15328.

Mevik, B.-H., and Wehrens, R. (2011) pls: Partial Least Squares and Principal Component regression. R package version 2.3-0. https://doi.org/10.32614/CRAN.package.pls

Pritchard, J.K., and M.T. Seielstad and A. Perez-Lezaun and M.W. Feldman (1999) Population growth of human Y chromosomes: a study of Y chromosome microsatellites. Molecular Biology and Evolution, 16, 1791–1798.

Sisson, S.A., Fan, Y., and Tanaka, M.M. (2007) Sequential Monte Carlo without likelihoods. PNAS, 104, 1760–1765.

Venables, W.N., and Ripley, B.D. (2002) Modern Applied Statistics with S. Fourth Edition. Springer, New York.

Wegmann, D., Leuenberger, C. and Excoffier, L. (2009a) Efficient approximate Bayesian computation coupled with Markov chain Monte Carlo without likelihood. Genetics, 182, 1207-1218.

Wegmann, D., Leuenberger, C. and Excoffier, L. (2009b) Using ABCtoolbox. https://cmpg.unibe.ch/software/ABCtoolbox/ABCtoolbox_manual.pdf