--- title: "Bayesian Stabilization of kappa" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Bayesian Stabilization of kappa} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ```{r setup, include=FALSE} library(braidrm) set.seed(20240820) ``` ## The Problem One of the long-standing limitations of the BRAID model (and indeed any parametric response model) is how to handle conditions in which one or both of the compounds show little to no activity. Synergy, in the pharmacological sense, is a circumstance in which a combination is more potent than one would expect based on its individual behaviors. But what does synergy look like when one of the compounds does nothing? A more potent nothing is still ... nothing. Even in cases where the compound does show activity, if the concentrations tested lie too far below a compound's dose of median effect, that activity will be virtually undetectable. This results in many experiments where the value of the response surface parameters, including the interaction parameter $\kappa$, are severely under-determined. Take the example dataset `incompleteExample`. This dataset represents a checkerboard of measurements tested at concentrations between 0.125 and 8, in which the second compound has a dose of median effect of 100, well above the range tested; in addition, the maximal effect of this compound is just 10% that of the first compound, barely distinguishable from noise. This results in an experiment in which effect of the second compound is virtually identical to no effect at all: ```{r} surface <- incompleteExample head(surface) head(surface[surface$concA==0,]) ``` This surface was generated with the parameter vector show below in `trueParameter`, and indeed we find that the root-mean-squared error between this ideal model and the measured data is quite low: ```{r} trueParameter <- c(1, 100, 3, 3, 0, 0, 1, 0.1, 1) trueSurface <- evalBraidModel(surface$concA,surface$concB,trueParameter) trueError <- surface$measure - trueSurface sqrt(mean(trueError^2)) ``` But when we fit the data with an unstabilized BRAID model, the resulting parameter vector is quite different: ```{r} bfit <- braidrm(measure~concA+concB,surface, prior="none") coef(bfit) ``` The best fitting model is one in which the dose of median effect of the second drug is actually quite *low*, the maximal effect of the second drug is even lower than before, and the surface exhibits a synergistic interaction with a $\kappa$ over 3. Despite this disagreement with the original parametric representation, this BRAID model actually fits the data *more* closely: ```{r} sqrt(mean(resid(bfit)^2)) ``` The severity of the issue is made even more clear by examining the confidence intervals on the parameters: ```{r} summary(bfit) ``` Consider the precise meaning of the confidence interval on $\kappa$ here. These are bootstrapped confidence intervals chosen by resampling residuals (see `vignette("confidenceIntervals")`), so a confidence interval ranging from -1.36 (indicating very pronounced antagonism) to 39.6 (*extremely* high synergy) means that if a new experiment with the same pattern of errors or residuals were run with this underlying response surface, the best-fit values of $\kappa$ would lie in this range 95% of the time; more importantly they would lie *outside* this range up to 5% of the time. One out of every twenty runs would be best fit by an extreme value of $\kappa$. In fact, in practice, the frequency of this outcome is even higher, as slight patterns of noise in under-determined surfaces are often best explained by extreme variations in $\kappa$. Of course, in some cases, this is hardly an issue; the reason these variations occur is because all the values explain the data nearly equally well; so predictions and quantitative measures of combination activity within the range tested would still be quite stable. But if one hopes to draw meaningful conclusions from the value of $\kappa$, or compare $\kappa$ across a large set of comparable experiments, this instability can be extremely frustrating. ## The Solution: Bayesian Stabilization When we began updating the BRAID code for this package, addressing this instability was one of our chief concerns. We have worked with many analyses intended to tease out the drivers of synergy and antagonism, represented quantitiatively by $\kappa$, and this can be much more difficult to do if many of the most extreme values of $\kappa$ are nothing more than noise. So with the updated fits, we wanted to operate by the following principle: a large value of $\kappa$ should correspond to a large or significant deviation. Put another way, extreme $\kappa$ values should only be fit when there is sufficient evidence to support them. The most straightforward way for us to do this was to introduce a Bayesian prior on the value of $\kappa$. The derivation goes as follows. Suppose we assume that our measured response surface is drawn from a normal noise distribution around the true response surface, so the likelihood of a given set of measurements is: $$ L(\{x_i\}) = \prod_{i}{N(x_i-y_i,\sigma^2)}=\prod_i{\frac{1}{\sqrt{2\pi\sigma^2}}}\exp\left(-\frac{(x_i-y_i)^2}{2\sigma^2}\right) $$ where $y_i$ is the predicted effect at a given point based on the current set of parameters. Maximizing this likelihood reduces to minimizing the sum of squared errors, hence the traditional approach to fitting. Introducing a prior, however, requires maximizing not the likelihood of the data given the parameters, but maximizing the posterior probability of the parameters given the data *and* the prior probabilities for those parameters. According to Bayes rule: $$ P(\theta|\{x_i\}) \propto L(\{x_i\}|\theta)P(\theta) $$ where $\theta$ is just a representation of a particular set of parameters and $P(\theta)$ is the prior distribution on those parameters. We have no reason to add in priors to the other parameters of our response surface, so we assume that the prior distribution on them is effectively uniform, so for our purposes, $P(\theta) = P(\kappa)$, and the objective function that we want to maximize is: $$ O(\theta) = P(\kappa)\cdot\prod_i{\frac{1}{\sqrt{2\pi\sigma^2}}}\exp\left(-\frac{(x_i-y_i)^2}{2\sigma^2}\right) $$ As with many probability models, it is much easier to *minimize* the negative logarithm of this function than to maximize the function directly, so the loss function we seek to minimize partially reduces to: $$ L(\theta) = -\log{P(\kappa)} + \frac{N}{2}\log{2\pi\sigma^2}+\frac{1}{2\sigma^2}\sum_i{(x_i-y_i)^2} $$ Because $\sigma^2$ is a property of the experiment and not impacted by the parameters of our response surface, minimizing this loss is equivalent to minimizing a similar expression in which the second term has been canceled out and the whole expression has been multiplied by $2\sigma^2$, giving: $$ L'(\theta) = \sum_i{(x_i-y_i)^2} - 2\sigma^2\log{P(\kappa)} $$ There! So maximizing the posterior amounts to minimizing the sum of squared errors *with an extra term* representing the loss from the prior on $\kappa$. Of course, this means that to add Bayesian stabilization of $\kappa$ we need two values that we would not have included in a traditional least-squares fit: $\sigma^2$, representing the variance of measurement noise, and $P(\kappa)$, our new Bayesian prior. The simplest way to include $\sigma^2$ is to supply it directly, often estimated from some other experimental data, such as the variance of measurements in controls. But when such values are not readily available, the variance can be estimated more directly by fitting a fully agnostic response surface to the data, and taking the average deviation. This is what functions like `braidrm` do by default. For a prior on $\kappa$, we chose a function that was symmetric in its treatment of synergy and antagonism, and adjustable in the severity with which it enforced values of $\kappa$ closer to zero. The precise definition is: $$ P(\kappa) = \frac{1}{\sqrt{2\pi\eta^2}}\exp{-\frac{\epsilon^2}{2\eta^2}} $$ where $\epsilon=\log{(\kappa+2)/2}$ is a transformed, symmetrized version of $\kappa$ that goes to negative infinity for antagonistic values and infinity for synergistic values; and $\eta$ is a parameter controlling the narrowness and severity of the prior. For the default prior in the `braidrm` package (called "moderate"), the value of $\eta$ is simply 1. Bringing all these together (and filtering out an additional constant term) we have our final, Bayesian stabilized loss function for BRAID fitting: $$ L^{*}(\theta) = \sum_i{(x_i-y_i)^2}-\frac{\sigma^2\epsilon^2}{\eta^2} $$ ## Using Bayesian Stabilization in the braidrm Package In most cases, the user will (hopefully) not have to consider these details at all. Bayesian stabilization is implemented in BRAID fitting functions by default, and should quietly hold $\kappa$ values close to zero in cases where no evidence of interaction is present. But in the event that one *does* want more precise control of the behavior, several options are available. The simplest is the `prior` parameter in fitting functions such as `braidrm` and `findBestBraid`. By default, this parameter is set to `"moderate"`, which automatically estimates $\sigma^2$ from the data and uses a default $\eta$ of 1; setting it to `"high"` decreases $\eta$ to $2/3$, holding $\kappa$ closer to zero, and setting it to `"mild"` increases $\eta$ to $3/2$, allowing $\kappa$ more space. Finally, setting it to `"none"` (as we did in an example above) removes the prior on $\kappa$ altogether, and simply fits the least-squares model. Another option is to specify the prior on $\kappa$ more explicitly using the `kappaPrior()` function. This function produces an object of class `kappaPrior` which can be passed into BRAID fitting functions as the `prior` parameter. It takes two parameters: `spread`, representing the standard deviation of expected noise (i.e. $\sigma$), and `strength`, which is either one of the strings specified above, or a numeric value corresponding to the severity of the prior (which is actually just $1/\eta$). This function is quite useful if a reliable estimate of the magnitude of noise in an experiment is known or can be estimated. For example, suppose we have reason to believe that measurements in our `incompleteExample` experiment had a standard noise deviation of 0.05. We could then fit our surface using the following: ```{r} stableFit <- braidrm(measure~concA+concB,surface, prior=kappaPrior(0.05,"moderate")) sqrt(mean(resid(stableFit)^2)) summary(stableFit) ``` This adjusted model fits the data nearly as well as the unstabilized model, but the magnitude of $\kappa$ is moderately reduced, and more importantly, the confidence intervals are much more restrained. In practice, we have found that even moderate Bayesian stabilization drastically reduces the incidence of extreme yet uninformative $\kappa$ values, improving the overall quantitative power of it as a measure of interaction.