--- title: "Using `BRcal` to Assess and Embolden Probability Forecasts" subtitle: "Last Updated September 13, 2024" output: rmarkdown::html_vignette bibliography: "../inst/REFERENCES.bib" vignette: > %\VignetteIndexEntry{Using `BRcal` to Assess and Embolden Probability Forecasts} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} # Save original options and par to reset at the end original_options <- options() original_par <- par(no.readonly=TRUE) # save the built-in output hook hook_output <- knitr::knit_hooks$get("output") # set a new output hook to truncate text output knitr::knit_hooks$set(output = function(x, options) { if (!is.null(ns <- c(options$out.lines.head, options$out.lines.tail))) { x <- xfun::split_lines(x) if (length(x) > sum(ns)) { # truncate the output x <- c(head(x, ns[1]), "...\n", tail(x, ns[2])) } x <- paste(x, collapse = "\n") } hook_output(x, options) }) knitr::opts_chunk$set( collapse = TRUE, comment = ">" ) library(ggplot2) library(devtools) library(gridExtra) ``` This vignette demonstrates how to use the `BRcal` package via a case study involving hockey home team win probability predictions and game outcomes. The core features of this package are (1) probability calibration assessment via `bayes_ms()` and `llo_lrt()`, (2) MLE (Maximum Likelihood Estimation) recalibration via `mle_recal()`, and (3) boldness-recalibration via `brcal()`. Supporting features include visualizations via `lineplot()` and `plot_params()` and the linear in log odds (LLO) function (`LLO()`) for manual adjustment of probability predictions. For more details on the methodology under the hood of the `BRcal` package, see @GuthrieFranck2024. Note that in many cases the output is truncated with `...` for readability. ## Installation There are three ways to install the package. The first is to install it directly from GitHub via the snippet below. ```{r, eval=FALSE} # Install via github devtools::install_github("apguthrie/BRcal") ``` The second way is to install the package from the most recent tarball available on GitHub. The tarball must first be downloaded and the first argument in the example below must be the file path to the tarball. ```{r, eval=FALSE} # Install via tarball install.packages("path_to_file\BRcal_1.0.0.tar.gz", repos = NULL, type="source") ``` The third and simplest way to install the package is directly from CRAN (upon acceptance) via the usual `install.packages`. ```{r, eval=FALSE} # Install via CRAN install.packages("BRcal") ``` Once the package is installed, it can be loaded with the usual `library()` call. ```{r setup} library(BRcal) ``` # Illustrative Data Analysis with Core Functionality In this section, we demonstrate the core functionality of the `BRcal` package. ## Loading the Data The hockey data we will use throughout this vignette is directly available through the BRcal package in the object called `hockey` and can be loaded as follows: ```{r} data("hockey") ``` The data contains probability predictions and outcomes for all 868 regular season games in the 2020-21 National Hockey League (NHL) season. The column descriptions are as follows: * `y`: game outcome, 1 if home team win, 0 if home team loss * `x`: probability predictions of home team win from FiveThirtyEight * `rand`: probability predictions of home team win from random noise forecaster * `winner`: game outcome in text, "home" if home team win, "away" if home team loss The predictions from @fivethirtyeight can also be found on their [website](https://data.fivethirtyeight.com/). The random noise forecaster predictions were generated via random uniforms draws from 0.26 to 0.77 (the observed range in the FiveThirtyEight data) to mimic the behavior of a completely uninformed forecaster. ## Common Convention Across the BRcal Package The functions in the `BRcal` package share a lot of common arguments. Two that we want to explain are `x` and `y`, where `x` is a vector of probabilities predictions of binary events and `y` is the corresponding true event outcomes. Since we are dealing with probabilities, `x` must contain numeric values from 0 to 1 (inclusive). And as we are dealing with binary events, `y` must only take on two values, i.e. the two possible outcomes. By default, the functions in `BRcal` expect `y` to be a vector of 0s and 1s where a 1 represents a "event" and 0 represents a "non-event". Additionally, these functions expect that the probabilities in `x` are the probabilities of "event" at each observation. For demonstration of how to pass `y` as a binary vector that contains values other than 0s and 1s and properly specify `event`, see the [Specifying `event`](#event) Section. ## Assessing Calibration via `bayes_ms()` and `llo_lrt()` The `BRcal` package provides two approaches to assessing calibration: an approximate Bayesian model selection-based approach implemented in `bayes_ms()` and a likelihood ratio test for calibration in `llo_lrt()`. ### Bayesian Model Selection-based approach (`bayes_ms()`) {#bayesms} We'll first look at the Bayesian model selection-based approach to calibration assessment using `bayes_ms()`. The snippet below uses this function to assess the calibration of FiveThirtyEight's predictions in `hockey$x`. ```{r} bt538 <- bayes_ms(hockey$x, hockey$y) bt538 ``` Notice this function returns a list, which we've stored in `bt538` for later use. The first element in the returned list is `Pmc` which shows the prior probability of the calibrated model. The next two elements in the return list are `BIC_Mc` and `BIC_Mu` which are the Bayesian Information Criteria (BIC) for the calibrated model and the uncalibrated model, respectively. These are used in the approximation for the Bayes Factor comparing the uncalibrated model to the calibrated model, which is returned in list element `BF`. Next, `posterior_model_prob` is the posterior model probability for the calibrated model given the outcomes `y`. This serves as the assessment measure of calibration. In this case, since the posterior model probability is high, we consider the predictions from FiveThirtyEight to be well calibrated. The maximum likelihood estimates (MLEs) for $\delta$ and $\gamma$ are returned in element `MLEs`. These parameters are used throughout the package to govern the adjustment of probability predictions (i.e. [MLE-recalibration](#mlerecal), [boldness-recalibration](#brcal), and [LLO-adjustment](#LLO)). Additionally, whenever MLEs are found for $\delta$ and $\gamma$ throughout the package, the `optim` function is used to perform the optimization. The final list element is called `optim_details`, which itself is a list returned by the call to `optim()` to get the MLEs. Note that we expect to see an `NA` returned for the number of calls to the gradient function as shown in `$optim_details$counts` when we use the default algorithm (`Nelder-Mead`) since it does not use a gradient function. For more details and how to change the call to `optim`, see the [Passing Additional/Different Arguments to `optim`](#optim) section. #### Setting the prior model probabilities via `Pmc` Imagine we wish to assess the extent to which lower prior probabilities of calibration influence results. By default, this function uses assumes equal prior probabilities for the calibrated and uncalibrated models. However, it is important for users to determine what priors make sense for the context of their problem. Argument `Pmc` allows easy specification of the prior model probability for the calibrated model. Naturally, `Pmc` must be a value in [0,1] and the prior model probability for the uncalibrated model is taken to be `1 - Pmc`. For example, let's say it is reasonable to assume that the prior model probability of calibration for FiveThirtyEight is 0.2 instead of 0.5 based on past experience. Then we could use `bayes_ms()` in the following way. ```{r} bayes_ms(hockey$x, hockey$y, Pmc=0.2) ``` Note that while many of the elements returned are unchanged (as `Pmc` is not used in their calculation), `BF` and `posterior_model_prob` are different than the previous call. ### Likelihood Ratio Test for Calibration (`llo_lrt()`) Now let's look at the likelihood ratio test for calibration using `llo_lrt()`. The null hypothesis for this test is that the predictions `x` are well calibrated. The alternative is that they are not well calibrated. The snippet below uses this function to test the calibration of FiveThirtyEight's predictions in `hockey$x` and returns a list. ```{r} llo_lrt(hockey$x, hockey$y) ``` The first element `test_stat` in the returned list is the test statistic under this likelihood ratio test. The element `pval` is the p-value associated with this test and can be used to determine if the predictions in `x` are well calibrated. In this case, since the p-value is relatively high, we will fail to reject the null that FiveThirtyEight's predictions are well calibrated, which is consistent with the conclusion from `bayes_ms()`. The elements in `MLEs` and `optim_details` are the same as described in the section for [`bayes_ms()`](#bayesms). ## MLE Recalibration {#mlerecal} To get the MLE recalibrated set (i.e. adjust the probability predictions in `x`via the Linear in Log Odds (LLO) function such that calibration is maximized), we provide the `mle_recal` function. Below is an example of using `mle_recal()` with the `hockey` data. ```{r, out.lines.head=4, out.lines.tail=23} mle_recal(hockey$x, hockey$y) ``` #### Specifying `probs_only = TRUE` In some cases, you may just want the recalibrated probabilities and without any of the other items in the return list. In this case, you can set `probs_only=TRUE` and just the vector of probabilities will be returned. Note that it's still recommended that you check `optim_details` to verify convergence before using this option. Below is the result of using `probs_only=TRUE`. ```{r, out.lines.head=3, out.lines.tail=3} mle_recal(hockey$x, hockey$y, probs_only=TRUE) ``` ### Visualizing MLE-recalibrated Probabilities via `lineplot()` We can use the `lineplot()` function to visualize how the predicted probabilities change from the original set to the MLE recalibrated set. ```{r, fig.width=6, fig.height=4, fig.align='center'} lineplot(hockey$x, hockey$y) ``` Notice the left hand column of points are the original probability predictions. The original probabilities are connect by a line to the MLE recalibrated probabilities in the right hand column. These points and lines are colored based on the corresponding outcome in `y` where blue corresponds to an "event" (i.e. a home team win) and red corresponds to a "non-event" (i.e. a home team loss). For more details on modifying this plot and additional features, see the [Visualizing Recalibrated Probability Forecasts via `lineplot()`](#lineplot) Section. ## Boldness-Recalibration via `brcal()` {#brcal} The `brcal` functions allows users to boldness-recalibrate `x` [@GuthrieFranck2024]. More specifically, we are maximizing the standard deviation of the predictions while ensuring the posterior model probability of calibration is at least `t`. By default, `brcal` performs 95% boldness recalibration (i.e. `t` is set to 0.95). Below is an example of 90% boldness-recalibration for FiveThirtyEight's predictions. ```{r br90, cache=TRUE, out.lines.head=12, out.lines.tail=9} br90 <- brcal(hockey$x, hockey$y, t=0.9) ``` Notice there would be quite a bit of output from this call if it were not truncated (4 lines for each of the 176 iterations). This is coming from a call to the `nloptr` function from the `nloptr` package under the hood of `brcal` [@nloptr-package]. The `nloptr` package is an R interface to [NLopt](https://nlopt.readthedocs.io/en/latest/), which is an excellent open-source nonlinear optimization library [@nloptr-package]. Since both our objective and constraint functions are are nonlinear with respect to parameters $\delta$ and $\gamma$, we leverage the `nloptr` package for the optimization rather than `optim()`. The output printed at each iteration shows (1) the current values of the parameter estimates, (2) the value of the objective function, and (3) the value of the constraint function. Next, let's look at the list returned by `brcal()`. The first entry `nloptr` is exactly the object returned from `nloptr()` when the optimizer stops. This provides useful information about convergence and should always be checked by users to ensure that the algorithm converged properly. Next in the list are `Pmc` and `t`, which are just the prior model probability and constraint on probability as specified by the user (or the default values). The solution for the boldness-recalibration parameters is found in `BR_params`. The achieved maximal spread is `sb`. Lastly, `brcal()` returns the vector of boldness-recalibrated probabilities in `probs`. ```{r, out.lines.head=43, out.lines.tail=3} br90 ``` ### Visualizing Boldness-Recalibrated Probabilities via `lineplot()` We can again use the `lineplot()` function to visualize how the predicted probabilities change from the original set to varying levels of boldness-recalibrated sets. Below we visualize 95% and 90% boldness-recalibration. By default, these will also be compared to the MLE-recalibrated set. ```{r, fig.width=6, fig.height=4, fig.align='center'} lineplot(x=hockey$x, y=hockey$y, t_levels = c(0.95, 0.9)) ``` Notice the left hand column of points are the original probability predictions. The original probabilities are connect by a line to the corresponding MLE-recalibrated, 95%, and 90% boldness-recalibrated probabilities in the right hand columns. These points and lines are colored based on the corresponding outcome in `y` where blue corresponds to an "event" (i.e. a home team win) and red corresponds to a "non-event" (i.e. a home team loss). Notice that the posterior model probabilities on the x-axis are not in ascending or descending order. Instead, they simply follow the ordering of how one might use the `BRcal` package: first looking at the original predictions, then maximizing calibration, then examining how far they can spread out predictions while maintaining calibration with boldness-recalibration. For more details on modifying this plot and additional features, see the [Visualizing Recalibrated Probability Forecasts via `lineplot()`](#lineplot) Section. ### Using `tau = TRUE` Sometimes the optimizer is more efficient when optimizing over $\tau = log(\delta)$ instead of $\delta$. In this case, users can set `tau = TRUE`. Specification of start location `x0` and bounds `lb`, `ub` should still be specified in terms of $\delta$. The `brcal` function will automatically convert from $\delta$ to $\tau$. Let's try optimizing over $\tau$ with the `hockey` data. ```{r br90_tau, cache=TRUE, out.lines.head=12, out.lines.tail=9} br90_tau <- brcal(hockey$x, hockey$y, t=0.9, tau=TRUE) ``` Notice that the parameter values in the output from `nloptr` are in terms of $\tau$ instead of $\delta$. The same is true of the solution reported in `br90_tau$nloptr`. In general, all results returned directly from `nloptr` will reflect whichever scale `nloptr()` optimized on, whether it be $\delta$ or $\tau$. However, `BR_params` in the returned list will always be reported in terms of $\delta$. ```{r, out.lines.head=43, out.lines.tail=3} br90_tau ``` # Additional Features and Details In this section, we demonstrate additional features in the `BRcal` package and discuss additional details that may be useful to more advanced use-cases. ## Passing Additional / Different Arguments to `nloptr()` {#nloptr_opts} The [nloptr](https://cran.r-project.org/package=nloptr) package is extremely flexible, as it provides numerous arguments to users for tailoring the optimization to the problem at hand. In the `nloptr` function, most of these arguments are specified via the `opts` argument, a convention we partially borrow for `brcal()`. However, we allow users to directly specify a few arguments that are typically relegated to `opts` for easier adjustments. All other arguments, except those related to the objective and constraint functions, are available to users by passing a list of these arguments `opts`. We do not allow users to change the objective or constraint functions, or their Jacobian functions, as these are the backbone of this method and the sole purpose for using the `brcal` function. For those who wish to optimize a different function or use a different constraint, we recommend using the `nloptr` function directly. ### Start Values While the algorithm used by `optim()` was not very sensitive to the starting location of $\delta$ and $\gamma$, we've found the nature of boldness-recalibration makes `nloptr()` more sensitive to starting location. By default, `brcal()` initializes the optimization at the MLEs for $\delta$ and $\gamma$. While we've found this to be the most stable approach to setting a starting location, users can specify their own starting location by specifying `x0` and setting `start_at_MLEs=FALSE`. To show the sensitivity of this approach to the starting values, try set `x0` to $\delta = 10$, $\gamma = -5$ like we did with `optim()`. The line of code below will run `brcal()` with these starting values, however we do not run it automatically in this vignette as it causes an error. In this case, the algorithm cannot converge and ends up crashing due to trying extreme values of the parameters. We encourage readers to run this code themselves to verify this result. ```{r out.lines.head=12, out.lines.tail=14} try(brcal(hockey$x, hockey$y, x0 = c(10, -5), start_at_MLEs = FALSE)) ``` ### Algorithm By default, the `brcal` function uses the Augmented Lagrangian Algorithm (AUGLAG) [@auglag1; @auglag2], which requires an inner optimization routine. We use Sequential Least-Squares Quadratic Programming (SLSQP) [@slsqp1; @slsqp2] as the inner optimization routine, by default. For a complete list of inner and outer optimization algorithms, see the [NLopt website.](https://nlopt.readthedocs.io/en/latest/) Note that not all algorithms can be used with a non-linear constraint (which is necessary for boldness-recalibration). Also note that other algorithms may be more sensitive to starting conditions or may need different stopping criteria to converge. All changes to the inner algorithm are made via passing a sub-list to `local_opts`, which itself an entry of the list passed to `opts`. Instead of SLSQP as the inner algorithm, let's try the method of moving asymptotes (MMA) [@MMA]. ```{r, out.lines.head=8, out.lines.tail=9} br90_inMMA <- brcal(hockey$x, hockey$y, t=0.9, opts=list(local_opts=list(algorithm="NLOPT_LD_MMA"))) ``` ```{r} br90_inMMA$nloptr ``` Now, let's try changing the outer algorithm to MMA. This is a great example of an algorithm that is more sensitive to both starting conditions and stopping criteria. In order to get convergence here, we'll opt to start the optimizer at `c(1,2)` and relax the relative tolerance. Stopping criteria is discussed in a later section. ```{r, out.lines.head=8, out.lines.tail=9} br90_outMMA <- brcal(hockey$x, hockey$y, t=0.9, x0=c(1, 2), start_at_MLEs = FALSE, xtol_rel_outer = 1e-05, opts=list(algorithm="NLOPT_LD_MMA")) ``` ```{r} br90_outMMA$nloptr ``` Note that many of the algorithms do not use an inner algorithm. In most of these cases, `local_opts` will be ignored, otherwise `nloptr()` may throw a warning or error. ### Bounds Following the convention of `nloptr()`, arguments `lb` and `ub` allow users to specify the lower and upper bounds of optimization, respectively. The bounds set using `lb` and `ub` are used for both the inner and outer algorithms. By default, we only place a lower bound on $\delta$ requiring $\delta > 0$. Similar to setting the bounds of optimization in `optim()`, setting the lower bound to `-Inf` and the upper bound to `Inf` will leave the corresponding parameter unbounded. For example, the code below shows how to bound $0.25 < \delta < 5$ and leave $\gamma$ unbounded. ```{r, out.lines.head=8, out.lines.tail=9} br90_bound <- brcal(hockey$x, hockey$y, t=0.9, lb=c(0.25, -Inf), ub=c(5, Inf)) ``` ```{r} br90_bound$nloptr ``` ### Stopping criteria Since there are quite a few options for stopping criteria that can be set via `opts` in the `nloptr` function, we select just a few to specify directly in the `brcal` function. As with `optim()`, it is important to check convergence and adjust the stopping criteria accordingly. Keep in mind that stopping criteria also work together so several adjustments may be necessary. Argument `maxeval` sets the maximum number of iterations (approximately) allowed by the outer optimization routine. Argument `maxtime` sets the maximum number of seconds (approximately) the outer algorithm will be allowed to run. Argument `xtol_rel_outer` and `xtol_rel_inner` set the relative tolerance for the change in $\delta$ and $\gamma$ between iterations of the outer and inner algorithms, respectively. Again, `nloptr` allows several other options for stopping criteria that can be specified via `opts`. The example below shows how to set the maximum number of evaluations to 100, the maximum time to 30 seconds, and the relative tolerance of both the inner and outer algorithms to 0.001. These are not necessarily recommended settings, rather they are intended for demonstration purposes only. ```{r, out.lines.head=8, out.lines.tail=9} br90_stop <- brcal(hockey$x, hockey$y, t=0.9, maxeval=100, maxtime = 30, xtol_rel_outer = 0.001, xtol_rel_inner = 0.001) ``` ```{r} br90_stop$nloptr ``` ### Suppressing output from `nloptr()` To minimize the amount of output printed at each iteration of the optimizer, we leverage `print_level` from `nloptr()`. By default, we choose to print the maximum amount of information by setting `print_level=3`. To simply reduce the output, try `print_level=1`... ```{r br95_2, cache=TRUE, out.lines.head=6, out.lines.tail=7} br90 <- brcal(hockey$x, hockey$y, t=0.9, print_level=1) ``` ...or `print_level=2`. ```{r br95_3, cache=TRUE, out.lines.head=9, out.lines.tail=10} br90 <- brcal(hockey$x, hockey$y, t=0.9, print_level=2) ``` Or to fully suppress, use `print_level=0`. Notice nothing prints from the following call. ```{r br95, cache=TRUE, out.lines.head=12, out.lines.tail=9} br90 <- brcal(hockey$x, hockey$y, t=0.9, print_level=0) ``` ## Passing Additional / Different Arguments to `optim()` {#optim} While `optim()` is not used for the non-linear constrained optimization for finding the boldness-recalibration parameters, it is used throughout this package wherever the MLEs are needed, including in the calculation for the posterior model posterior (`bayes_ms()`, `llo_lrt()`, `mle_recal`, `brcal()`, `line_plot()`, and `plot_params()`). While we've found that `optim()` converges in most cases we've tried using the default settings in this package, it is important to double check convergence. It is for this reason that we allow users to pass additional arguments to `optim` and include the returned list from `optim()` in several of our own functions so users can see the convergence information. If the algorithm did not converge, which can be checked via list elements `optim_details$convergence` and `optim_details$message` from `bayes_ms()`, `llo_lrt()`, or `mle_recal()`, users should adjust the arguments passed to `optim()` using `...` to achieve convergence before trusting results. Below are a few examples of how to adjust the call to `optim()`. See the documentation for `optim()` for more information. ### Conventions for Passing Arguments to `optim()` The BRcal package uses two different conventions for passing arguments to `optim()` depending on the situation. For the functions where `optim()` is the only function to which we are passing additional arguments (`bayes_ms()`, `llo_lrt()`, and `mle_recal()`), we pass arguments to `optim()` using the `...`. The next few sections demonstrate how to pass arguments in this way. While the examples in this section use `bayes_ms()`, the same conventions apply to `llo_lrt()` and `mle_recal()`. For the functions where we pass additional arguments to multiple functions (`brcal()`, `plot_params()`, and `lineplot()`), we cannot leverage the flexibility of `...`. Instead, users can pass arguments to `optim()` in the form of a list via `optim_options`. Demonstration of passing arguments to `optim()` in this fashion can be found in [Passing Arguments via a List](#optim_opts_pp). ### Start Values In some cases, the optimization routine may sensitive to the starting location of the optimizer. By default, all calls to `optim()` in this package use users can change the starting values of $\delta$ and $\gamma$ by passing them as a vector via argument `par`. In the following example, we will start the optimizer at $\delta = 10$, $\gamma = -5$. ```{r} bayes_ms(hockey$x, hockey$y, par = c(10, -5)) ``` In this case, the optimizer does not seem overly sensitive to start values, as $\delta = 10$, $\gamma = -5$ are not close to the solution (the MLEs) at $\delta = 0.9455$, $\gamma = 1.4005$. The algorithm converged and we get the same solution as when we started at $\delta = 0.5$, $\gamma = 0.5$. ### Stopping Criteria Sometimes it's useful to adjust the optimization stopping criteria to improve convergence or refine the solution. We use the default settings in `optim()`. These can be adjust via the argument `control` which takes a list with an array of possible components. We defer to the documentation for `optim()` for further explanation of how to use `control`. Below is a simple example where we set the relative convergence tolerance (`reltol`) to `1e-10` instead of the default of `sqrt(.Machine$double.eps)`. ```{r} bayes_ms(hockey$x, hockey$y, control=list(reltol=1e-10)) ``` Notice how we arrive at a slightly different solution and there were more function calls than when using the default. ### Algorithm and Bounds It's not uncommon that some optimization algorithms work better than others depending on the nature of the problem. We use the default algorithm `"Nelder-Mead"` [@nelder-mead] which we've found to reliable in most cases we've tried. This algorithm does not accept bounds on the parameters. To circumvent the fact that $\delta > 0$, meaning one of our parameters of interest is bounded, we perform the optimization in terms of $log(\delta)$. Should users prefer to use a bounded algorithm, the algorithm can be specified using `method` and the bounds can be specified using `lower` and `upper`. For example, let's say we want to use `"L-BFGS-B"` where we bound $0 < \delta < 10$ and $-1 < \gamma < 25$: ```{r} bayes_ms(hockey$x, hockey$y, method = "L-BFGS-B", lower = c(0, -1), upper = c(10, 25)) ``` Or, if we'd rather only impose the lower bound on $\delta$ and leave $\gamma$ unbounded, we could use the following snippet: ```{r} bayes_ms(hockey$x, hockey$y, method = "L-BFGS-B", lower = c(0, -Inf), upper = c(Inf, Inf)) ``` ### Suppressing the output from `optim()` Once you are confident that the algorithm converges via the arguments you've specified, you can set `optim_details=FALSE` to reduce the output if desired. As long as you continue to use the same settings, the convergence should not change. Below is what the output looks like using the `hockey` data when `optim_details=FALSE`. ```{r} bayes_ms(hockey$x, hockey$y, optim_details=FALSE) ``` ### Passing Arguments via a List {#optim_opts_pp} As previously mentioned, the `brcal`, `plot_params`, and `lineplot` functions use `optim()` under the hood but we choose not to leverage the flexibility of `...` since they also pass additional arguments to other functions. For this reason, we instead pass additional arguments to `optim()` in the form of a list via the parameter `optim_options`. While we only demonstrate this using `brcal()` below, the same convention applies to `plot_params()` and `lineplot()`. The example below shows how to use `optim_options` to change the algorithm to `"L-BFGS-B"`, set bounds on $\delta$ and $\gamma$, and set the max number of iterations to 200 in the `brcal` function. Note we specify `print_level` = 0 and only print the list returned from `nloptr()` to reduce output. ```{r} br <- brcal(hockey$x, hockey$y, t=0.9, print_level = 0, optim_options=list(method="L-BFGS-B", lower=c(0.0001, 10), upper=c(0.0001, 10), control=list(maxit=200))) br$nloptr ``` While not demonstrated here, the options in the sections above that are passed via `...` can also be specified in a list passed to `optim_options`. ## Linear in Log Odds (LLO) function `LLO()` {#LLO} The `LLO` function allows you to apply a linear in log odds (LLO) adjustment to predicted probabilities in `x`. Specifically it shifts the probabilities (on the log odds scale) by `delta` and scales them by `gamma`. Note that the value supplied to `delta` must be greater than 0 and `gamma` can be any real number, though very extreme values may cause instabilities (which we'll see later). By default, this function is used under the hood of all other functions in this package, specifically when making any adjustment to probabilities (i.e. MLE recalibration, boldness-recalibration). While most users of this package probably will not use the `LLO()` function directly, there are some cases where users may want to do so. One example of this could be when you have estimates for $\delta$ and $\gamma$ based on historical data and want to LLO-adjust unlabeled out of sample predictions. A user could use the `LLO` function to LLO-adjust the new set of predictions by passing them to function argument `x`, and the parameter estimates to `delta` and `gamma`. The following sections provide examples of how to use the `LLO` function. ### Using `LLO()` with MLEs or Boldness-Recalibration Estimates Another example of when users may want to use `LLO()` directly is in getting the MLE recalibrated set. We have already demonstrated how to go about this using `mle_recal()`. Alternatively, you can get the MLE recalibrated set manually by first getting the MLEs from another function (such as `bayes_ms()` or `llo_lrt()`) and then passing those as `delta` and `gamma` in `LLO()`. This may be useful in cases where you have large data and finding the MLEs is slower than desired. If you already have the MLEs from a previous function call to `bayes_ms()` or `llo_lrt()` then you don't need to take the time to re-solve for the MLEs. Additionally, this approach serves as a good alternative to using `mle_recal()` with `probs_only=TRUE`. In the example below, we'll use the MLEs for FiveThirtyEight's predictions we saved from `bayes_ms()` earlier on and pass those to `LLO()`. Notice the probabilities below are identical to those we got from `mle_recal()`. ```{r, out.lines.head=3, out.lines.tail=3} hockey$x_mle <- LLO(x=hockey$x, delta=bt538$MLEs[1], gamma=bt538$MLEs[2]) hockey$x_mle ``` Let's visualize what this LLO-adjustment looks like by plotting the original probabilities (`hockey$x`) on the x-axis and the MLE recalibrated probabilities (`hockey$x_mle`) on the y-axis. Note that the points below are the actual observations and the curve represents how any prediction along the 0-1 range would behave under `delta`=`r bt538$MLEs[1]` and `gamma`=`r bt538$MLEs[2]`. ```{r, fig.width=6, fig.height=4, fig.align='center'} # plot original vs LLO-adjusted via MLEs ggplot2::ggplot(data=hockey, mapping=aes(x=x, y=x_mle)) + stat_function(fun=LLO, args=list(delta=bt538$MLEs[1], gamma=bt538$MLEs[2]), geom="line", linewidth=1, color="black", xlim=c(0, 1)) + geom_point(aes(color=winner), alpha=0.75, size=2) + lims(x=c(0,1), y=c(0,1)) + labs(x="Original", y="LLO(x, 2, 2)", color="Winner") + theme_bw() ``` Similarly, say you have already obtained boldness-recalibration estimates but did not save the vector of boldness-recalibrated probabilities. You can use `LLO()` to boldness-recalibrate with your estimates without taking the time to let the optimizer in `brcal()` re-solve for these estimates. In the example below, we'll use the 90% boldness-recalibration estimates for FiveThirtyEight's predictions we saved from `brcal()` earlier on and pass those to `LLO()`. This essentially achieves the same probabilities in `probs` returned by `brcal()` by LLO-adjusting via the returned boldness-recalibration parameters in `BR_params` returned by `brcal()`. ```{r, out.lines.head=3, out.lines.tail=3} LLO(x=hockey$x, br90$BR_params[1], br90$BR_params[2]) ``` ### Extreme Value of `delta` or `gamma` As previously mentioned, using extreme values of `delta` or `gamma` can cause instabilities. For example, let's see what happens when when we set `delta=2`, `gamma=1000`. In the plot below, we show the original probabilities on the x-axis and the LLO-adjusted probabilities via `delta=2`, `gamma=1000` on the y-axis. Notice that nearly all predictions were pushed to approximately 0 or 1. ```{r, fig.width=6, fig.height=4, fig.align='center'} # LLO-adjust using delta=2, gamma=1000 hockey$x2 <- LLO(hockey$x, delta=2, gamma=1000) # plot original vs LLO-adjusted via 2,2 ggplot2::ggplot(data=hockey, mapping=aes(x=x, y=x2)) + stat_function(fun=LLO, args=list(delta=2, gamma=1000), geom="line", linewidth=1, color="black", xlim=c(0, 1)) + geom_point(aes(color=winner), alpha=0.75, size=2) + lims(x=c(0,1), y=c(0,1)) + labs(x="Original", y="LLO(x, 2, 1000)", color="Winner") + theme_bw() ``` While this didn't cause problems plotting the LLO curve, there are cases in this package where probabilities very close to 0 or 1 will cause instability and possibly even break the code. With this in mind, we do have safeguard in place to keep predictions from getting to close (numerically speaking) to 0 or 1, which is discussed further in [Specifying `epsilon`](#epsilon). ## Specifying `epsilon` {#epsilon} When probability predictions that are "too close" to 0 or 1, calculations using the log likelihood become unstable. This is due to the fact that the logit function is not defined at 0 and 1. To mitigate this instability, we use `epsilon` to specify how close is "too close". For values in `x` that are less than `epsilon` (i.e. too close to zero), these are replaced with `epsilon`. For values in `x` greater than `1 - epsilon` (i.e. too close to 1), these are replaced with `1 - epsilon`. Naturally, `epsilon` must take on a value between 0 and 1, but it is recommended that this be a very small number. By default, we use `.Machine$double.eps`. In this example with the `hockey` data, there are no values close to 0 or 1 in `x` or `rand`, so `epsilon` is not used at all. ## Specifying `event` {#event} Throughout this package, the functions assume by default that `y` is a vector of 0s and 1s where a 1 represents a "event" and 0 represents a "non-event". Additionally, these functions expect that the probabilities in `x` are the probabilities of "event" at each observation. While switching the labels of "event" and "non-event" will not change the fundamental conclusions, it is good practice to make sure `event` is defined properly. However, there may be cases where you want to pass a binary vector of outcomes that are not defined as 0s and 1s. To define which of the two values in `y` should be considered a "success" or the event of interest, users may set argument `event` to that value. For example, let's use the column `winner` from the `hockey` dataset as our vector of binary outcomes instead of `y` in `bayes_ms()`. We can do this by specifying `event = "home"`. We'll continue to set `optim_details=FALSE` for easier reading of the output. ```{r} bayes_ms(hockey$x, hockey$winner, event="home", optim_details=FALSE) ``` As expected, the results are identical to our previous call. Maybe you want to approach this problem from the perspective of an event being a home team loss. To do so, you need to pass the predicted probabilities of a home team loss as `x = 1 - hockey$x`, and specify which value in `y` is an event via `event=0` for `y=hockey$y`... ```{r} bayes_ms(1-hockey$x, hockey$y, event=0, optim_details=FALSE) ``` ... or `event="away"` for `hockey$winner`. ```{r} bayes_ms(1-hockey$x, hockey$winner, event="away", optim_details=FALSE) ``` Note that we get the same results using either specification. Additionally, the posterior model probability is the same as when we considered event to be a home team win, as one might expect with binary event predictions. ## Visualizing Posterior Model Probability as a Function of $\delta$ and $\gamma$ via `plot_params()` The goal of the `plot_params` function is to visualize how the posterior model probability changes under different recalibration parameters, as this is used in boldness-recalibration. Let's see what the default version of this plot looks like with FiveThirtyEight's data. ```{r, fig.width=6, fig.height=4, fig.align='center'} plot_params(hockey$x, hockey$y) ``` You might immediately notice the imagine is grainy and unnecessarily small. In the following sections, we will demonstrate how to fix these issues and make further adjustments via arguments in `plot_params()`. ### Setting bounds and grid density To adjust the grid of $\delta$ and $\gamma$ values, we can use `dlim` and `glim` to "zoom in" on the area of non-zero posterior model probabilities. Let's reduce the range of $\delta$ and $\gamma$ so we can better visualize the area of non-zero posterior model probability of calibration. ```{r, fig.width=6, fig.height=4, fig.align='center'} plot_params(hockey$x, hockey$y, dlim=c(0.5, 1.5), glim=c(0.25, 2.75)) ``` Next, we can use `k` to form a denser grid (and thus a smoother plotting surface). However, it's important to note that the larger `k` values used, the slower this plot will generate. ```{r eval=FALSE, fig.align='center', fig.height=4, fig.width=6, include=TRUE} plot_params(hockey$x, hockey$y, k=200, dlim=c(0.5, 1.5), glim=c(0.25, 2.75)) ``` ```{r echo=FALSE, fig.align='center', fig.height=4, fig.width=6} zmat_list <- plot_params(hockey$x, hockey$y, k=200, dlim=c(0.5, 1.5), glim=c(0.25, 2.75), return_z = TRUE) ``` ### Saving and Reusing `z` Once you've settled on the bounds for $\delta$ and $\gamma$ you want to plot and the density of the grid via `k`, you can save the underlying matrix of posterior model probabilities to reuse while making minor plot adjustments. This will save you the hassle of having to recalculate these values, which is where the bottleneck on time occurs. To save this matrix, set `return_z=TRUE` and store the returned list from `plot_params()`. ```{r save_z, eval=FALSE, fig.align='center', fig.height=4, fig.width=6, include=TRUE} zmat_list <- plot_params(hockey$x, hockey$y, k=200, dlim=c(0.5, 1.5), glim=c(0.25, 2.75), return_z = TRUE) ``` ```{r echo=FALSE, fig.align='center', fig.height=4, fig.width=6} plot_params(z=zmat_list$z, k=200, dlim=c(0.5, 1.5), glim=c(0.25, 2.75)) ``` The returned list is printed below. Note that the matrix is the first entry in the list named `z`. Additionally, the bounds passed via `dlim` and `glim`, and `k` are returned as a reminder of what values were used to construct `z`. ```{r, out.lines.head=4, out.lines.tail=9} zmat_list ``` While it's a little hard to see what `z` looks like in the output above, below shows that `z` is a 200 by 200 matrix (where 200 is the grid size set by `k`). The rows and columns are named based on the value of $\delta$ and $\gamma$ for which the posterior model probability was calculated. ```{r} class(zmat_list$z) dim(zmat_list$z) ``` Now we can reuse the matrix to adjust the plot by passing `zmat_list$z` to `z` in `plot_params()`. However, this trick is only useful so long as you do not expand the bounds over which you want to plot. Below shows the same `z` matrix plotted on (1) the same range, (2) a smaller range, and (3) a larger range than over which is was calculated. The white that appears in the third frame reflects the fact that the posterior model probability was not calculated for those grid cells since they are outside of the original bounds. ```{r, fig.width=8, fig.height=4, fig.align='center'} par(mfrow=c(1,3)) plot_params(z=zmat_list$z, k=200, dlim=c(0.5, 1.5), glim=c(0.25, 2.75), main="Same range as z") plot_params(z=zmat_list$z, k=200, dlim=c(0.7, 1.3), glim=c(0.5, 2.5), main="Smaller range than z") plot_params(z=zmat_list$z, k=200, dlim=c(1e-04, 5), glim=c(1e-04, 5), main="Larger range than z") ``` We'll continue to use this trick throughout to reduce plotting time. ### Adding contours and using `countors_only=TRUE` To add contours at specific levels of the posterior model probability of calibration, users can specify these levels in a vector via `t_levels`. Below, we add contours at t = 0.95, 0.9, and 0.8. See (Passing Additional / Different Arguments to `image.plot()` and `contour()`)[#contour_opts] for guidance on adjusting the contours (i.e. changing color, line width, labels, etc). ```{r, fig.width=6, fig.height=4, fig.align='center'} plot_params(z=zmat_list$z, t_levels=c(0.95, 0.9, 0.8), k=200, dlim=c(0.5, 1.5), glim=c(0.25, 2.75)) ``` To plot the contour surface without the color background, users should specify `contours_only=TRUE` as shown below. We'll also add a few more contour levels. ```{r, fig.width=6, fig.height=4, fig.align='center'} plot_params(z=zmat_list$z, t_levels=c(0.99, 0.95, 0.9, 0.8, 0.7), k=200, dlim=c(0.5, 1.5), glim=c(0.25, 2.75), contours_only=TRUE) ``` ### Passing Additional / Different Arguments to `image.plot()` and `contour()` {#contour_opts} While `plot_params()` obscures the call to `image.plot()` and `contour()`, we allow users to leverage their full capabilities via arguments `imgplt_options` and `contour_options`. To pass additional arguments to `image.plot()` and `contour()`, users can create a list whose entries match the names of the arguments in these functions and pass them to `imgplt_options` and `contour_options`, respectively. For example, below we can move the legend to appear horizontally below the plot using argument `horizontal = TRUE` in `image.plot()`. Additionally, we'll reduce the number of colors used for plotting via `nlevel=10` and add a legend label via `legend.lab`. Notice how these are passed as a list to `imgplt_options`. ```{r, fig.width=6, fig.height=6, fig.align='center'} plot_params(z=zmat_list$z, k=200, dlim=c(0.5, 1.5), glim=c(0.25, 2.75), imgplt_options=list(horizontal = TRUE, nlevel=10, legend.lab="Posterior Model Prob")) ``` We can also pass additional arguments to `contour()` via `contour_options`. Below, we'll draw contours at t = 0.99 and 0.1. To make them dotted lines instead of solid, we'll use `lty="dotted"`, we'll change the contour color to hot pink, and we'll increase the line width using `lwd=2`. ```{r, fig.width=6, fig.height=4, fig.align='center'} plot_params(z=zmat_list$z, k=200, dlim=c(0.5, 1.5), glim=c(0.25, 2.75), t_levels = c(0.99, 0.1), contour_options=list(lty = "dotted", col="hotpink", lwd=2)) ``` ## Visualizing Recalibrated Probability Forecasts via `lineplot()` {#lineplot} The goal of the `lineplot` function is to visualize how predicted probabilities change under different recalibration parameters. By default this function shows changes in the original probabilities to the MLE recalibrated probabilities. Let's see what this looks like with the FiveThirtyEight `hockey` data. ```{r, fig.width=6, fig.height=4, fig.align='center'} lineplot(x=hockey$x, y=hockey$y) ``` Note that this function leverages the flexibility of the `ggplot2` package, so cosmetic modification and how the plot is returned is a little different than `plot_params`. We will demonstrate this throughout the following sections. ### Specifying Levels of Boldness-Recalibration via `t_levels` Similar to specifying contour levels in `plot_params`, users can specify levels of boldness-recalibration using `t_levels` in `lineplot()`. Below we add 95%, 90%, and 80% boldness-recalibration to the plot. Notice in the lineplot for FiveThirtyEight below that the posterior model probabilities on the x-axis are not in ascending or descending order. Instead, they simply follow the ordering of how one might use the `BRcal` package: first looking at the original predictions, then maximizing calibration, then examining how far they can spread out predictions while maintaining calibration with boldness-recalibration. ```{r, fig.width=6, fig.height=4, fig.align='center'} lineplot(x=hockey$x, y=hockey$y, t_levels = c(0.95, 0.9, 0.8)) ``` ### Saving and reusing `df` While the time expense of `lineplot` isn't as high as `plot_params`, there is still a call to `optim` for MLE recalibration and a call to `nloptr` for each level of boldness-recalibration. Similar to returning the `z` matrix in `plot_params`, users of `lineplot` can specify `return_df` to return the underlying dataframe used to construct the plot. By default, `return_df=FALSE` and `lineplot()` only returns the `ggplot` object of the lineplot. When `return_df=TRUE`, a list is returned with two entries: (1) `plot` which is the `ggplot` object of the lineplot and (2) `df` which is the underlying dataframe. The code below shows how to specify `return_df=TRUE` and saved the returned list. ```{r} lp <- lineplot(x=hockey$x, y=hockey$y, return_df = TRUE, t_levels = c(0.95, 0.9, 0.8)) ``` To extract the plot, we can use the following code. ```{r, fig.width=6, fig.height=4, fig.align='center'} lp$plot ``` We can also extract the dataframe using `lp$df`. The first and last few rows of the dataframe plus a summary of the dataframe are printed below. Notice there are 6 columns, here are their brief descriptions (more details below): * `probs`: the values of each predicted probability under each set * `outcome`: the corresponding outcome for each predicted probability of calibration * `post`: the posterior model probability of the set as a whole * `id`: the id of each probability used for mapping observations between sets * `set`: the set with which the probability belongs to * `label`: the label used for the x-axis in the lineplot ```{r, out.lines.head=7, out.lines.tail=6} lp$df ``` ```{r} summary(lp$df) ``` The `probs` column contains each set of predicted probabilities that are plotted (i.e. the original, MLE recalibrated, and each set of boldness-recalibrated probabilities). The `outcome` column contains the corresponding outcome for each value in `probs`. Each set of probabilities and outcomes are essentially "stacked" on top of each other. The `id` column contains a unique id for each observation to map how observations change between sets. This essentially tells the plotting function how to connect (with line) the same observation as is changes from the original set to MLE- or boldness-recalibration. The `set` column contains a factor for which set that observation belongs to. The `post` column contains the value of the posterior model probability for the set as a whole, so this value will be the same for all observations in the same set. Lastly, the `label` column is a string that is used to label the x-axis for each set in the lineplot and is also the same for all observations in a set. Since this dataframe has each set of probabilities stacked, the number of rows in the dataframe will be $\text{# of sets } \times n$, where $n$ is the number of observations per set. For example, in the data frame above, there are 5 sets (original, MLE, 95%, 90%, and 80% boldness-recalibration) with 868 observations total. So the length of the dataframe is 5 * 868 = `r 5*868`. This is confirmed below. ```{r} nrow(lp$df) ``` Now with this dataframe saved, we can pass it back via `df` without needing to specify `x` or `y` and we get the same plot as before, but much more quickly. This functionality makes it easier to make cosmetic adjustments to your plot without needing to wait for computations under the hood. ```{r, fig.width=6, fig.height=4, fig.align='center'} lineplot(df=lp$df, t_levels=c(0.95, 0.9, 0.8)) ``` Users can also choose to omit specific boldness-recalibration sets even if they are included in `df`. In the code below, we omit the 90\% and 80\% boldness-recalibrated sets by only setting `t_levels=0.95`. ```{r, fig.width=6, fig.height=4, fig.align='center'} lineplot(df=lp$df, t_levels=c(0.95)) ``` ### Omitting the Original or MLE-Recalibrated sets from plotting via `plot_original` and `plot_MLE` To omit either the original set or the MLE-recalibrated set from plotting, users can specify `plot_original=FALSE` or `plot_MLE=FALSE`, respectively. Below is an example of omitting the original set by setting `plot_original=FALSE` while plotting the MLE-recalibrated and boldness-recalibration sets returned in `df` above. ```{r, fig.width=6, fig.height=4, fig.align='center'} lineplot(df=lp$df, t_levels=c(0.95, 0.9, 0.8), plot_original=FALSE) ``` ### Cosmetic Adjustments A key difference in how this plot is created compared to `plot_params()` is that it uses `ggplot2` instead of base `R` graphics. Users of `lineplot()` have quite a few options for making cosmetic changes to these plots. The first set of options we'll discuss are those related to the points and lines already present on the lineplot. To modify the appearance of the points, users can pass options to `geom_point()` via a list to `ggpoint_options`. For example, the code below shows how to change the shape and size of the points. ```{r, fig.width=6, fig.height=4, fig.align='center'} lineplot(df=lp$df, t_levels=c(0.95, 0.9, 0.8), ggpoint_options=list(size=3, shape=1)) ``` ```{r, fig.width=6, fig.height=4, fig.align='center'} lineplot(df=lp$df, t_levels=c(0.95, 0.9, 0.8), ggline_options=list(linewidth=2, alpha=0.1)) ``` An important thing to note about these two options is that their purpose is solely for adjustments that are not part of the `aes` statement in `geom_point()` or `geom_line()`. Should users specify `aes` options via `ggpoint_options` or `ggline_options`, this will cause `ggplot()` to create a new layer of points or lines on top of those following the default settings we use under the hood. Another option for making cosmetic adjustments is to use the convention of `ggplot2` and add additional `ggplot2` functions to the returned lineplot using `+`. For example, the code below shows how we could add the `scale_color_manual()` function to our plot to change the colors of the points and lines. Notice that `ggplot2` throws a warning about another color scale already being present. This is because under the hood we have already specified the default color scheme to be red and blue. A similar warning may appear in any case where a user adds a function that is already present in our original construction of the lineplot. ```{r, fig.width=6, fig.height=4, fig.align='center'} lineplot(df=lp$df, t_levels=c(0.95, 0.9, 0.8)) + scale_color_manual(values = c("orchid", "darkgreen")) ``` ### Thinning Another strategy to save time when plotting is to reduce, or "thin", the amount of data plotted. When sample sizes are large, the plot can become overcrowded and slow to plot. We provide three options for thinning: * `thin_to`: the observations in (x,y) are randomly sampled without replacement to form a set of size `thin_to` * `thin_prop`: the observations in (x,y) are randomly sampled without replacement to form a set that is `thin_prop` * 100% of the original size of (x,y) * `thin_by`: the observations in (x,y) are thinned by selecting every `thin_by` observation By default, all three of these settings are set to `NULL`, meaning no thinning is performed. Users can only specify one thinning strategy at a time. Care should be taken in selecting a thinning approach based on the nature of your data and problem. Note that MLE recalibration and boldness-recalibration will be done using the full set, so the posterior model probabilities are those of the full set. Also note that if a thinning strategy is used with `return_df=TRUE`, the returned data frame will **only contain the reduced set** (i.e. the data *after* thinning). Below is an example of each of these strategies. Notice that each plot looks a little different since the observations selected are a little different under each strategy. ```{r, fig.width=10, fig.height=4, fig.align='center'} lp1 <- lineplot(df=lp$df, t_levels=c(0.95, 0.9, 0.8), thin_to=500) lp2 <- lineplot(df=lp$df, t_levels=c(0.95, 0.9, 0.8), thin_prop=0.5) lp3 <- lineplot(df=lp$df, t_levels=c(0.95, 0.9, 0.8), thin_by=2) gridExtra::grid.arrange(lp1, lp2, lp3, ncol=3) ``` By default, there is a seed set so that the observations selected stay the same through successive calls using the same thinning strategy. To change the seed, users can specify their own via `seed`. Notice in the example below that we're using the same strategy as panel one in the above set of plots but the points are slightly different. ```{r, fig.width=6, fig.height=4, fig.align='center'} lineplot(df=lp$df, t_levels=c(0.95, 0.9, 0.8), thin_to=500, seed = 47) ``` Alternatively, if users would prefer no seed be set, they can set `seed=NULL`. In the plot below, we again have a different set of points and lines. In fact, each compilation of this vignette will have a different set. ```{r, fig.width=6, fig.height=4, fig.align='center'} lineplot(df=lp$df, t_levels=c(0.95, 0.9, 0.8), thin_to=500, seed=NULL) ``` ### Passing Additional Arguments to `optim` and `nloptr` Similar to the `plot_params` function, this function allows users to specify additional arguments to `optim` via `optim_options`. Additionally, we also allow users to pass arguments to `nloptr` via `nloptr_options` in a similar fashion. In this case, the list passed via `nloptr_options` is passed to `opts` in the `brcal` function. For more information on how to structure this list, see [Passing Additional / Different Arguments to `nloptr()`](#nloptr_opts). ```{r include=FALSE} # Reset user options and par options(original_options) par(original_par) ``` # References