--- title: "Getting Started with likelihood.model" output: rmarkdown::html_vignette: toc: true vignette: > %\VignetteIndexEntry{Getting Started with likelihood.model} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include=FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width = 6, fig.height = 4 ) old_opts <- options(digits = 4) ``` ## Introduction The `likelihood.model` package provides a flexible framework for specifying and using likelihood models for statistical inference in R. It supports: - Standard R distributions (normal, Weibull, exponential, etc.) - Censored data (left, right, and interval censoring) - Custom likelihood contributions for complex models - Maximum likelihood estimation with confidence intervals - Bootstrap sampling distributions This vignette provides a quick introduction to the main features. ## Installation ```{r install, eval=FALSE} # Install from GitHub if (!require(devtools)) install.packages("devtools") devtools::install_github("queelius/likelihood.model") ``` ## Loading Packages ```{r load, message=FALSE, warning=FALSE} library(likelihood.model) ``` ## Example 1: Basic MLE with Weibull Distribution Let's start with a simple example: fitting a Weibull distribution to survival data using the `weibull_uncensored` model, which provides analytical derivatives for faster and more accurate estimation. ```{r weibull-example} # Generate synthetic survival data set.seed(42) n <- 150 true_shape <- 2.5 true_scale <- 3.0 df <- data.frame(x = rweibull(n, shape = true_shape, scale = true_scale)) # Create the likelihood model model <- weibull_uncensored("x") # View model assumptions assumptions(model) # Fit the MLE solver <- fit(model) mle <- solver(df, par = c(1, 1)) # initial guess: shape=1, scale=1 # View results summary(mle) ``` ### Extracting Results ```{r weibull-results} # Parameter estimates cat("Estimated parameters:\n") print(coef(mle)) # Confidence intervals (Wald-based) cat("\n95% Confidence Intervals:\n") print(confint(mle)) # Standard errors cat("\nStandard Errors:\n") print(se(mle)) # Log-likelihood value cat("\nLog-likelihood:", loglik_val(mle), "\n") # AIC for model selection cat("AIC:", aic(mle), "\n") ``` ## Example 2: Handling Censored Data A key feature of `likelihood.model` is proper handling of censored data. Let's simulate data with right-censoring and show how ignoring censoring leads to biased estimates. ```{r censoring-example} # Generate normal data with right-censoring set.seed(123) n <- 200 true_mean <- 10 true_sd <- 2 censor_time <- 11 # Generate latent (true) values x_latent <- rnorm(n, true_mean, true_sd) # Apply censoring censored <- x_latent > censor_time df_cens <- data.frame( x = ifelse(censored, censor_time, x_latent), censor = ifelse(censored, "right", "exact") ) cat("Censoring rate:", mean(censored) * 100, "%\n") ``` ### Fitting with Proper Censoring ```{r censoring-fit} # Create model that handles censoring model_cens <- likelihood_name("norm", ob_col = "x", censor_col = "censor") # Fit the model solver_cens <- fit(model_cens) mle_cens <- suppressWarnings(solver_cens(df_cens, par = c(0, 1))) cat("MLE (accounting for censoring):\n") cat(" Mean:", coef(mle_cens)[1], "(true:", true_mean, ")\n") cat(" SD: ", coef(mle_cens)[2], "(true:", true_sd, ")\n") ``` ### Comparison: Naive vs Proper Estimation ```{r censoring-comparison} # Naive estimates (ignoring censoring) naive_mean <- mean(df_cens$x) naive_sd <- sd(df_cens$x) cat("\nComparison of estimates:\n") cat(" Mean SD\n") cat(sprintf("True values: %7.3f %5.3f\n", true_mean, true_sd)) cat(sprintf("MLE (correct): %7.3f %5.3f\n", coef(mle_cens)[1], coef(mle_cens)[2])) cat(sprintf("Naive (biased): %7.3f %5.3f\n", naive_mean, naive_sd)) cat("\nNote: Naive estimates are biased downward due to censoring!\n") ``` ## Example 3: Bootstrap Sampling Distribution For more robust inference, especially with small samples, we can use bootstrap methods to estimate the sampling distribution of the MLE. ```{r bootstrap-example, cache=TRUE} # Using the Weibull example from before model <- weibull_uncensored("x") df <- data.frame(x = rweibull(150, shape = 2.5, scale = 3.0)) # Create bootstrap sampler boot_sampler <- sampler(model, df = df, par = c(1, 1)) # Generate bootstrap samples (100 replicates for speed) boot_result <- boot_sampler(n = 100) # Compare asymptotic vs bootstrap standard errors mle <- fit(model)(df, par = c(1, 1)) asymp_se <- se(mle) boot_se <- se(boot_result) # Uses vcov from bootstrap replicates cat("Standard Error Comparison:\n") cat(" Asymptotic Bootstrap\n") cat(sprintf("Shape: %10.4f %9.4f\n", asymp_se[1], boot_se[1])) cat(sprintf("Scale: %10.4f %9.4f\n", asymp_se[2], boot_se[2])) # Bootstrap bias estimate cat("\nBootstrap Bias Estimate:\n") print(bias(boot_result)) ``` ## Example 4: Verifying the Score at MLE At the MLE, the score function (gradient of log-likelihood) should be approximately zero. This provides a useful diagnostic. ```{r score-check} model <- weibull_uncensored("x") df <- data.frame(x = rweibull(100, 2, 1.5)) # Fit MLE mle <- fit(model)(df, par = c(1, 1)) # Evaluate score at MLE score_func <- score(model) score_at_mle <- score_func(df, coef(mle)) cat("Score at MLE (should be near zero):\n") print(score_at_mle) # The score is also stored in the MLE object cat("\nScore stored in MLE object:\n") print(score_val(mle)) ``` The score values are very close to zero, confirming that the optimizer successfully found the MLE (a stationary point of the log-likelihood). ## Example 5: Fisherian Likelihood Inference The package supports pure likelihood-based inference in the Fisherian tradition. Instead of confidence intervals, we can compute likelihood intervals that make no probability statements about parameters. ```{r fisherian-example} # Fit a model model <- weibull_uncensored("x") df <- data.frame(x = rweibull(100, 2.0, 1.5)) mle <- fit(model)(df, par = c(1, 1)) # Support function: log relative likelihood # S(theta) = logL(theta) - logL(theta_hat) # Support at MLE is always 0 s_at_mle <- support(mle, coef(mle), df, model) cat("Support at MLE:", s_at_mle, "\n") # Support at alternative parameter values (negative = less supported) s_alt <- support(mle, c(1.5, 1.0), df, model) cat("Support at (1.5, 1.0):", s_alt, "\n") # Relative likelihood = exp(support) rl <- relative_likelihood(mle, c(1.5, 1.0), df, model) cat("Relative likelihood at (1.5, 1.0):", rl, "\n") ``` Negative support and a very small relative likelihood indicate that the alternative parameter values $(1.5, 1.0)$ are poorly supported by the data compared to the MLE. ### Likelihood Intervals ```{r likelihood-intervals} # Compute 1/8 likelihood interval (roughly equivalent to 95% CI) # This is the set of theta where R(theta) >= 1/8 li <- likelihood_interval(mle, df, model, k = 8) print(li) # Compare with Wald confidence interval cat("\nWald 95% CI:\n") print(confint(mle)) ``` ## Example 6: Closed-Form MLE with Exponential Lifetime Model The `exponential_lifetime` model demonstrates a key design feature: specialized models can override `fit()` to bypass `optim` entirely when the MLE has a closed-form solution. For Exponential(lambda) data, the MLE is simply lambda_hat = d / T, where d is the number of uncensored observations and T is the total observation time. ```{r exponential-example} # Generate exponential survival data set.seed(99) df_exp <- data.frame(t = rexp(200, rate = 3.0)) # Create model and fit -- no initial guess needed! model_exp <- exponential_lifetime("t") mle_exp <- fit(model_exp)(df_exp) cat("Closed-form MLE (no optim):\n") cat(" lambda_hat:", coef(mle_exp), "(true: 3.0)\n") cat(" SE:", se(mle_exp), "\n") # The score at the MLE is exactly zero (by construction) cat(" Score at MLE:", score_val(mle_exp), "\n") ``` ### Right-Censored Exponential Data The model handles right-censoring naturally through the sufficient statistics. ```{r exponential-censored} # Generate data with right-censoring at time 0.5 set.seed(99) true_lambda <- 3.0 x <- rexp(200, rate = true_lambda) censored <- x > 0.5 df_cens_exp <- data.frame( t = pmin(x, 0.5), status = ifelse(censored, "right", "exact") ) cat("Censoring rate:", mean(censored) * 100, "%\n") model_cens_exp <- exponential_lifetime("t", censor_col = "status") mle_cens_exp <- fit(model_cens_exp)(df_cens_exp) cat("MLE (censored):", coef(mle_cens_exp), "(true:", true_lambda, ")\n") cat("95% CI:", confint(mle_cens_exp)[1, ], "\n") ``` ### Cross-Validation: Matches `likelihood_name("exp", ...)` The analytical model produces the same log-likelihood as the generic approach: ```{r exponential-crossval} # Compare log-likelihood values at the MLE lambda_hat <- coef(mle_exp) ll_analytical <- loglik(model_exp)(df_exp, lambda_hat) # likelihood_name("exp") uses dexp(x, rate), so pass unnamed parameter df_exp_name <- data.frame(t = df_exp$t, censor = rep("exact", nrow(df_exp))) model_generic <- likelihood_name("exp", ob_col = "t", censor_col = "censor") ll_generic <- loglik(model_generic)(df_exp_name, unname(lambda_hat)) cat("Analytical loglik:", ll_analytical, "\n") cat("Generic loglik: ", ll_generic, "\n") cat("Match:", isTRUE(all.equal(unname(ll_analytical), ll_generic, tolerance = 1e-6)), "\n") ``` ## Summary of Key Functions ### Model Creation | Function | Description | |----------|-------------| | `likelihood_name()` | Create model for standard R distributions | | `weibull_uncensored()` | Weibull model with analytical derivatives (exact obs.) | | `exponential_lifetime()` | Exponential model with closed-form MLE and optional censoring | | `likelihood_contr_model$new()` | Custom likelihood contributions | ### Likelihood Calculus | Function | Description | |----------|-------------| | `loglik()` | Get log-likelihood function | | `score()` | Get score (gradient) function | | `hess_loglik()` | Get Hessian of log-likelihood | | `fim()` | Fisher information matrix | | `observed_info()` | Observed information matrix | ### Estimation and Inference | Function | Description | |----------|-------------| | `fit()` | Create MLE solver (returns `fisher_mle`) | | `sampler()` | Create bootstrap sampler (returns `fisher_boot`) | | `coef()` | Extract parameter estimates | | `vcov()` | Variance-covariance matrix | | `confint()` | Confidence intervals | | `se()` | Standard errors | ### Fisherian Inference | Function | Description | |----------|-------------| | `support()` | Log relative likelihood: S(θ) = logL(θ) - logL(θ̂) | | `relative_likelihood()` | Likelihood ratio: R(θ) = L(θ)/L(θ̂) | | `likelihood_interval()` | Interval where R(θ) ≥ 1/k | | `profile_loglik()` | Profile log-likelihood | | `evidence()` | Evidence for θ₁ vs θ₂ | ### Model Comparison | Function | Description | |----------|-------------| | `lrt()` | Likelihood ratio test | | `aic()` | Akaike Information Criterion | | `bic()` | Bayesian Information Criterion | | `deviance()` | Deviance statistic | ## Next Steps For more advanced usage, see the other vignettes: - **Likelihood Named Distributions Model**: Detailed coverage of `likelihood_name` with various censoring types - **Likelihood Contributions Model**: Building custom models for complex scenarios like series systems with component failures ## Session Info ```{r session} sessionInfo() ``` ```{r cleanup, include=FALSE} options(old_opts) ```