--- title: "Introduction to metafrontier" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Introduction to metafrontier} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width = 7, fig.height = 5 ) ``` ## What is a metafrontier? In efficiency analysis, we often study firms that operate under fundamentally different technologies. Steel producers using electric arc furnaces (EAF) face a different production possibility set than those using the blast furnace-basic oxygen furnace (BF-BOF) route. Hospitals in rural areas face different constraints than urban ones. Banks in developing economies operate under different regulatory and technological environments than those in advanced economies. Standard stochastic frontier analysis (SFA) or data envelopment analysis (DEA) applied to the pooled sample implicitly assumes all firms share the same technology -- an assumption that may be unrealistic. Estimating separate frontiers for each group solves this problem but makes efficiency scores incomparable across groups: a firm that is 90\% efficient relative to a less advanced group frontier may actually be less productive than a firm that is 70\% efficient relative to a more advanced frontier. The **metafrontier** framework, introduced by Battese, Rao, and O'Donnell (2004) and extended by Huang, Huang, and Liu (2014) and O'Donnell, Rao, and Battese (2008), resolves this by: 1. Estimating **group-specific frontiers** for each technology group 2. Estimating a **metafrontier** that envelops all group frontiers 3. Decomposing efficiency into two components: $$TE^*_i = TE_i \times TGR_i$$ where: - $TE_i$ is efficiency relative to the **group frontier** (within-group inefficiency) - $TGR_i$ is the **technology gap ratio**, measuring how close the group frontier is to the metafrontier (between-group technology gap) - $TE^*_i$ is efficiency relative to the **metafrontier** (overall efficiency) The `metafrontier` package provides a unified interface for estimating metafrontier models using both SFA and DEA approaches. ## Quick start ```{r setup} library(metafrontier) ``` ### Simulate data The package includes `simulate_metafrontier()` for generating data from a known data-generating process. This is useful for Monte Carlo studies and for learning the package. ```{r simulate} sim <- simulate_metafrontier( n_groups = 3, n_per_group = 200, beta_meta = c(1.0, 0.5, 0.3), # intercept, elasticity_1, elasticity_2 tech_gap = c(0, 0.25, 0.5), # intercept shifts (0 = best technology) sigma_u = c(0.2, 0.3, 0.4), # inefficiency SD per group sigma_v = 0.15, # noise SD seed = 42 ) str(sim$data[, c("log_y", "log_x1", "log_x2", "group")]) table(sim$data$group) ``` The simulation generates a Cobb-Douglas frontier: $$\ln y_i = \beta_0^{(j)} + \beta_1 \ln x_{1i} + \beta_2 \ln x_{2i} + v_i - u_i$$ where the intercept $\beta_0^{(j)} = \beta_0^* - \delta_j$ is shifted down from the metafrontier by the technology gap $\delta_j$ for group $j$. ### Estimate the metafrontier ```{r estimate} fit <- metafrontier( log_y ~ log_x1 + log_x2, data = sim$data, group = "group", method = "sfa", meta_type = "deterministic" ) fit ``` ## Deterministic SFA metafrontier (Battese, Rao, and O'Donnell, 2004) The deterministic metafrontier is estimated in two stages: 1. **Stage 1**: Fit separate SFA models for each group via maximum likelihood. 2. **Stage 2**: Find metafrontier coefficients $\hat\beta^*$ by minimising $$\sum_i \left[\ln f(x_i; \hat\beta^*) - \ln f(x_i; \hat\beta_j)\right]^2$$ subject to the constraint that the metafrontier envelops all group frontiers: $\ln f(x_i; \hat\beta^*) \ge \ln f(x_i; \hat\beta_j)$ for all $i$ and $j$. This is the default method: ```{r deterministic} fit_det <- metafrontier( log_y ~ log_x1 + log_x2, data = sim$data, group = "group", meta_type = "deterministic" ) summary(fit_det) ``` ## Stochastic metafrontier (Huang, Huang, and Liu, 2014) The stochastic metafrontier replaces the LP in Stage 2 with a second-stage SFA, using the fitted group frontier values as the dependent variable: $$\ln \hat{f}(x_i; \hat\beta_j) = x_i'\beta^* + v^*_i - u^*_i$$ where $u^*_i \ge 0$ captures the technology gap stochastically. This provides a distributional framework for the TGR, enabling standard errors and hypothesis testing. ```{r stochastic} fit_sto <- metafrontier( log_y ~ log_x1 + log_x2, data = sim$data, group = "group", meta_type = "stochastic" ) summary(fit_sto) ``` The stochastic metafrontier provides a variance-covariance matrix: ```{r vcov} vcov(fit_sto) ``` ## DEA-based metafrontier For a nonparametric approach, set `method = "dea"`: ```{r dea} fit_dea <- metafrontier( log_y ~ log_x1 + log_x2, data = sim$data, group = "group", method = "dea", rts = "vrs" ) fit_dea ``` The DEA metafrontier computes: 1. Group-specific DEA efficiencies under variable returns to scale (VRS), constant returns to scale (CRS), or other assumptions. 2. A pooled DEA across all observations to form the metafrontier. 3. $TGR_i = TE^{pool}_i / TE^{group}_i$. ## Extracting results ### Efficiency scores Use `efficiencies()` to extract the three components of the decomposition: ```{r efficiencies} te <- efficiencies(fit_det, type = "group") tgr <- efficiencies(fit_det, type = "tgr") te_star <- efficiencies(fit_det, type = "meta") # Verify the fundamental identity: TE* = TE x TGR all.equal(te_star, te * tgr) ``` ### Technology gap ratio The `technology_gap_ratio()` function returns TGR values grouped by technology: ```{r tgr} tgr_by_group <- technology_gap_ratio(fit_det) lapply(tgr_by_group, summary) ``` For a formatted summary table: ```{r tgr-summary} tgr_summary(fit_det) ``` ### Coefficients ```{r coefs} # Metafrontier coefficients coef(fit_det, which = "meta") # Group-specific coefficients coef(fit_det, which = "group") ``` ### Model information ```{r model-info} # Log-likelihood (sum of group log-likelihoods for deterministic) logLik(fit_det) # Number of observations nobs(fit_det) # AIC and BIC (available automatically via logLik method) AIC(fit_det) ``` ## Visualisation The package provides four built-in plot types: ### TGR distributions ```{r plot-tgr, fig.height=4} plot(fit_det, which = "tgr") ``` ### Efficiency scatter ```{r plot-eff, fig.height=4} plot(fit_det, which = "efficiency") ``` Points below the 45-degree line indicate a technology gap (TE* < TE). The vertical distance from the line reflects the TGR. ### Efficiency decomposition ```{r plot-decomp, fig.height=4, fig.width=9} plot(fit_det, which = "decomposition") ``` Side-by-side boxplots of TE, TGR, and TE* by group. ## Hypothesis testing ### Poolability test The poolability test evaluates whether group-specific frontiers are statistically different from a single pooled frontier: ```{r poolability} poolability_test(fit_det) ``` A significant result (small p-value) indicates that the technology groups have genuinely different production technologies, justifying the metafrontier approach. ## Inefficiency distributions The package supports three distributional assumptions for the one-sided inefficiency term $u_i$ in SFA: ```{r distributions, eval=FALSE} # Half-normal (default): u ~ |N(0, sigma_u^2)| fit_hn <- metafrontier(log_y ~ log_x1 + log_x2, data = sim$data, group = "group", dist = "hnormal") # Truncated normal: u ~ N+(mu, sigma_u^2) fit_tn <- metafrontier(log_y ~ log_x1 + log_x2, data = sim$data, group = "group", dist = "tnormal") # Exponential: u ~ Exp(1/sigma_u) fit_exp <- metafrontier(log_y ~ log_x1 + log_x2, data = sim$data, group = "group", dist = "exponential") ``` ## Comparing true and estimated values Since we used simulated data, we can compare estimated values against the truth: ```{r compare-truth} # True vs estimated metafrontier coefficients cbind( True = sim$params$beta_meta, Estimated = coef(fit_det, which = "meta") ) # True vs estimated mean TGR by group true_tgr <- tapply(sim$data$true_tgr, sim$data$group, mean) est_tgr <- tapply(fit_det$tgr, fit_det$group_vec, mean) cbind(True = true_tgr, Estimated = est_tgr) # Correlation between true and estimated efficiency cor(sim$data$true_te, fit_det$te_group) cor(sim$data$true_te_star, fit_det$te_meta) ``` ## Panel SFA Metafrontier The package supports panel data via the Battese-Coelli (1992) and (1995) models. Use the `panel` argument: ```{r panel-sfa, eval=FALSE} # Simulate panel data panel_sim <- simulate_panel_metafrontier( n_groups = 2, n_firms_per_group = 20, n_periods = 5, seed = 42 ) # BC92: time-varying inefficiency u_it = u_i * exp(-eta*(t-T)) fit_panel <- metafrontier( log_y ~ log_x1 + log_x2, data = panel_sim$data, group = "group", panel = list(id = "firm", time = "year"), panel_dist = "bc92" ) summary(fit_panel) # The eta parameter captures time-varying inefficiency # eta > 0: inefficiency decreasing over time # eta < 0: inefficiency increasing over time ``` ## Bootstrap Confidence Intervals for TGR The `boot_tgr()` function provides parametric and nonparametric bootstrap confidence intervals for the technology gap ratio: ```{r bootstrap, eval=FALSE} sim <- simulate_metafrontier(n_groups = 2, n_per_group = 100, seed = 42) fit <- metafrontier(log_y ~ log_x1 + log_x2, data = sim$data, group = "group", meta_type = "stochastic") # Nonparametric bootstrap (case resampling within groups) boot <- boot_tgr(fit, R = 499, type = "nonparametric", seed = 1) print(boot) # Observation-level CIs ci <- confint(boot) head(ci) # Group-level mean TGR CIs boot$ci_group # Parametric bootstrap (resample from estimated error distributions) boot_par <- boot_tgr(fit, R = 499, type = "parametric", seed = 1) ``` ## Murphy-Topel Variance Correction The stochastic metafrontier is a two-stage estimator where Stage 2 uses fitted values from Stage 1 as regressors. This "generated regressor" problem means naive standard errors understate uncertainty. The Murphy-Topel (1985) correction adjusts for this: ```{r murphy-topel, eval=FALSE} fit <- metafrontier(log_y ~ log_x1 + log_x2, data = sim$data, group = "group", meta_type = "stochastic") # Naive (uncorrected) standard errors vcov(fit) # Murphy-Topel corrected standard errors vcov(fit, correction = "murphy-topel") # Corrected confidence intervals confint(fit, correction = "murphy-topel") ``` ## Latent Class Metafrontier When group membership is unobserved, use `latent_class_metafrontier()`: ```{r latent-class, eval=FALSE} sim <- simulate_metafrontier(n_groups = 2, n_per_group = 100, seed = 42) # Fit with 2 latent classes lc <- latent_class_metafrontier( log_y ~ log_x1 + log_x2, data = sim$data, n_classes = 2, n_starts = 5, seed = 123 ) print(lc) summary(lc) # Select optimal number of classes via BIC bic_table <- select_n_classes( log_y ~ log_x1 + log_x2, data = sim$data, n_classes_range = 2:4, n_starts = 3, seed = 42 ) print(bic_table) # choose n_classes with lowest BIC ``` ## Directional Distance Functions (DDF) For additive efficiency decomposition, use DDF-based metafrontier: ```{r ddf, eval=FALSE} sim <- simulate_metafrontier(n_groups = 2, n_per_group = 50, seed = 42) # Use raw (non-log) data for DEA sim$data$y <- exp(sim$data$log_y) sim$data$x1 <- exp(sim$data$log_x1) sim$data$x2 <- exp(sim$data$log_x2) fit_ddf <- metafrontier( y ~ x1 + x2, data = sim$data, group = "group", method = "dea", type = "directional", direction = "output" ) summary(fit_ddf) # Additive decomposition: beta_meta = beta_group + ddf_tgr head(data.frame( beta_meta = fit_ddf$beta_meta, beta_group = fit_ddf$beta_group, ddf_tgr = fit_ddf$ddf_tgr )) ``` ## References - Battese, G.E., Rao, D.S.P. and O'Donnell, C.J. (2004). A metafrontier production function for estimation of technical efficiencies and technology gaps for firms operating under different technologies. *Journal of Productivity Analysis*, 21(1), 91--103. - Huang, C.J., Huang, T.-H. and Liu, N.-H. (2014). A new approach to estimating the metafrontier production function based on a stochastic frontier framework. *Journal of Productivity Analysis*, 42(3), 241--254. - O'Donnell, C.J., Rao, D.S.P. and Battese, G.E. (2008). Metafrontier frameworks for the study of firm-level efficiencies and technology ratios. *Empirical Economics*, 34(2), 231--255.