5. Agreement and ICC for Wide Data

Scope

Agreement and reliability are related to correlation, but they are not the same problem. Correlation describes co-movement. Agreement describes similarity on the measurement scale itself. Reliability describes the proportion of variation attributable to stable differences among subjects rather than to measurement error or method disagreement.

This vignette focuses on ccc() for wide data and uses related agreement and reliability functions as context:

Pairwise concordance and Bland-Altman analysis

Lin’s concordance correlation coefficient combines precision and accuracy in a single number. In matrixCorr, ccc() computes Lin’s pairwise CCC for numeric wide data and optionally returns large-sample confidence intervals. No formal hypothesis test is implemented; inference is based on the estimate and its confidence interval.

Bland-Altman analysis separates the agreement question into estimated bias and limits of agreement.

library(matrixCorr)

set.seed(40)
ref <- rnorm(50, mean = 100, sd = 10)
m1 <- ref + rnorm(50, sd = 2)
m2 <- ref + 1.2 + rnorm(50, sd = 3)

fit_ba <- ba(m1, m2)
fit_ccc <- ccc(data.frame(m1 = m1, m2 = m2), ci = TRUE)

print(fit_ba)
#> Bland-Altman preview:
#>   based_on    : 50
#>   loa_rule    : mean +/- 1.96 * SD
#>   ci          : 95%
#>   sd_diff     : 3.722
#>   width       : 14.589
#> 
#>  quantity        estimate lwr     upr   
#>  Mean difference -1.290   -2.347  -0.232
#>  Lower LoA       -8.584   -10.416 -6.752
#>  Upper LoA       6.005    4.173   7.837
summary(fit_ccc)
#> Lin's concordance summary
#>   dimensions  : 2 x 2
#>   pairs       : 1
#>   estimate    : 0.9299
#>   most_negative: m1-m2 (0.9299)
#>   most_positive: m1-m2 (0.9299)
#>   ci          : 95%
#>   ci_method   : lin_delta_fisher_z
#>   ci_width    : 0.08
#> 
#>  item1 item2 estimate n  95% CI      
#>  m1    m2    0.9299   50 [0.88, 0.96]
estimate(fit_ccc)
#>           m1        m2
#> m1 1.0000000 0.9299231
#> m2 0.9299231 1.0000000
confint(fit_ccc)
#>   item1 item2       lwr       upr
#> 1    m1    m2 0.8804487 0.9593657
ci(fit_ccc)
#> $lwr
#>           m1        m2
#> m1 1.0000000 0.8804487
#> m2 0.8804487 1.0000000
#> 
#> $upr
#>           m1        m2
#> m1 1.0000000 0.9593657
#> m2 0.9593657 1.0000000
#> 
#> $conf.level
#> [1] 0.95
#> 
#> $ci.method
#> [1] "lin_delta_fisher_z"
tidy(fit_ccc)
#>     item1 item2  estimate       lwr       upr
#> row    m1    m2 0.9299231 0.8804487 0.9593657

The two summaries are complementary rather than redundant. ccc() gives a single concordance coefficient, while ba() makes the scale of disagreement explicit.

If you have at least 3 methods in wide form, ba() can now compute every unordered Bland-Altman contrast directly:

fit_ba_pairwise <- ba(data.frame(m1 = m1, m2 = m2, m3 = ref - 0.8 + rnorm(50, sd = 2.5)))
print(fit_ba_pairwise)
#> 
#> Bland-Altman (row - column) (95% CI)
#> 
#>  bias   sd_loa loa_low loa_up width  n_obs
#>  -1.290 3.722  -8.584   6.005 14.589 50   
#>   1.502 3.461  -5.280   8.285 13.565 50   
#>   2.792 4.515  -6.058  11.642 17.700 50
summary(fit_ba_pairwise)
#> 
#> Bland-Altman (pairwise row - column) (95% CI)
#> 
#> Agreement estimates
#> 
#>  n_obs bias   sd_loa loa_low loa_up width  loa_multiplier
#>  50    -1.290 3.722  -8.584   6.005 14.589 1.96          
#>  50     1.502 3.461  -5.280   8.285 13.565 1.96          
#>  50     2.792 4.515  -6.058  11.642 17.700 1.96          
#> 
#> Confidence intervals
#> 
#>  bias_lwr bias_upr lo_lwr  lo_upr up_lwr up_upr
#>  -2.347   -0.232   -10.416 -6.752 4.173   7.837
#>   0.519    2.486    -6.984 -3.577 6.582   9.988
#>   1.509    4.075    -8.281 -3.835 9.419  13.864

Pairwise ICC

icc() extends the wide-data reliability workflow in two directions. It can return a pairwise matrix across method pairs, or it can return the overall classical ICC table for the full set of methods.

wide_methods <- data.frame(
  J1 = ref + rnorm(50, sd = 1.5),
  J2 = ref + 4.0 + rnorm(50, sd = 1.8),
  J3 = ref - 3.0 + rnorm(50, sd = 2.0),
  J4 = ref + rnorm(50, sd = 1.6)
)

fit_icc_pair <- icc(
  wide_methods,
  model = "twoway_random",
  type = "agreement",
  unit = "single",
  scope = "pairwise"
)

fit_icc_overall <- icc(
  wide_methods,
  model = "twoway_random",
  type = "agreement",
  unit = "single",
  scope = "overall",
  ci = TRUE
)

print(fit_icc_pair, digits = 2)
#> Intraclass correlation matrix
#>   method      : Intraclass correlation (two-way random, agreement, single)
#>   dimensions  : 4 x 4
#> 
#>      J1   J2   J3   J4
#> J1 1.00 0.90 0.94 0.98
#> J2 0.90 1.00 0.80 0.91
#> J3 0.94 0.80 1.00 0.93
#> J4 0.98 0.91 0.93 1.00
summary(fit_icc_pair)
#> Intraclass correlation summary
#>   method      : Intraclass correlation (two-way random, agreement, single)
#>   dimensions  : 4 x 4
#>   pairs       : 6
#>   n_complete  : 50
#>   estimate    : 0.7961 to 0.9817
#>   most_negative: J2-J3 (0.7961)
#>   most_positive: J1-J4 (0.9817)
#> 
#>  item1 item2 estimate n 
#>  J1    J4    0.9817   50
#>  J1    J3    0.9398   50
#>  J3    J4    0.9329   50
#>  J2    J4    0.9136   50
#>  J1    J2    0.9043   50
#>  J2    J3    0.7961   50
print(fit_icc_overall)
#> Overall intraclass correlation
#>   method      : Overall intraclass correlation table
#>   subjects    : 50
#>   raters      : 4
#>   selected    : ICC2
#> 
#> Coefficient table
#> 
#>  coefficient label            estimate ... upr    selected
#>  ICC1        Single absolute  0.9050   ... 0.9400 FALSE   
#>  ICC2        Single random    0.9067   ... 0.9647  TRUE   
#>  ICC3        Single fixed     0.9757   ... 0.9850 FALSE   
#>  ICC1k       Average absolute 0.9744   ... 0.9843 FALSE   
#>  ICC2k       Average random   0.9749   ... 0.9909 FALSE   
#>  ICC3k       Average fixed    0.9938   ... 0.9962 FALSE   
#> ... 5 more variables not shown (omitted)
#> Use as.data.frame()/tidy()/as.matrix() to inspect the full result.

Pairwise versus overall ICC

This is the most important distinction in the ICC interface.

scope = "pairwise" answers: “How reliable is each specific pair of methods?”

scope = "overall" answers: “How reliable is the full set of methods when analysed jointly?”

Those are different quantities. The overall ICC cannot, in general, be recovered by averaging the pairwise matrix.

Consistency versus agreement

This simulation also includes systematic method bias, so it is a natural place to contrast type = "consistency" with type = "agreement".

fit_icc_cons <- icc(
  wide_methods,
  model = "twoway_random",
  type = "consistency",
  unit = "single",
  scope = "overall",
  ci = FALSE
)

fit_icc_agr <- icc(
  wide_methods,
  model = "twoway_random",
  type = "agreement",
  unit = "single",
  scope = "overall",
  ci = FALSE
)

data.frame(
  type = c("consistency", "agreement"),
  selected_coefficient = c(
    attr(fit_icc_cons, "selected_coefficient"),
    attr(fit_icc_agr, "selected_coefficient")
  ),
  estimate = c(
    attr(fit_icc_cons, "selected_row")$estimate,
    attr(fit_icc_agr, "selected_row")$estimate
  )
)
#>          type selected_coefficient  estimate
#> 1 consistency                 ICC3 0.9757153
#> 2   agreement                 ICC2 0.9066943

Consistency discounts additive method shifts, whereas agreement penalises them. When methods differ mainly by a systematic offset, consistency can therefore look substantially better than agreement.

Model, type, and unit

The classical ICC family is controlled by three arguments.

For pairwise ICC, average-measure output uses k = 2 because each estimate is based on exactly two methods. For overall ICC, average-measure output uses the full number of analysed columns.

Agreement indices implemented in matrixCorr

CCC, ICC and CIA address related but distinct agreement questions. CCC quantifies agreement between measurements by combining precision and accuracy (Lin, 1989). ICC expresses reliability or agreement through variance components; its interpretation depends on the study design, model form, and whether agreement or consistency is targeted (Shrout and Fleiss, 1979; McGraw and Wong, 1996). CIA targets individual agreement or interchangeability by comparing disagreement between methods with disagreement within methods (Barnhart, Kosinski, and Haber, 2007; Barnhart et al., 2007).

These indices should not be treated as interchangeable without considering the scientific question, data structure, and implemented estimator.

CCC functions

ccc() is the simple wide-data CCC implementation. It estimates pairwise Lin CCC values and, when ci = TRUE, returns Lin delta-method/Fisher-z confidence intervals. It does not report a p-value or test decision.

For repeated-measures data, the package provides two CCC routes. ccc_rm_reml() fits pairwise mixed models by REML and converts the fitted variance components and fixed-effect bias term into repeated-measures CCC estimates. ccc_rm_ustat() computes a nonparametric U-statistic repeated-measures CCC with optional Fisher-z confidence intervals. These repeated-measures functions estimate agreement parameters and optional confidence intervals; they do not implement a formal hypothesis test of the CCC parameter. ccc_rm_reml() can use boundary-aware likelihood-ratio tests for variance-component selection when vc_select = "auto", but those tests are model-selection diagnostics rather than tests of CCC agreement.

ICC functions

icc() computes classical ANOVA ICC forms for wide data. With scope = "pairwise" it returns a pairwise matrix; with scope = "overall" it returns the standard six-form overall coefficient table. The selected ICC form is controlled by model, type, and unit, so the same numeric value should not be interpreted without those design choices.

Pairwise icc() and repeated-measures icc_rm_reml() provide estimates and optional confidence intervals without a formal test of a target ICC parameter. For icc(scope = "overall"), the package reports ANOVA F statistics, degrees of freedom, and p-values in the overall coefficient table. The implementation and tests verify those reported p-values, but the package documentation does not define a generic ICC agreement hypothesis for users; this vignette therefore does not describe the overall ICC p-values as a generic test of agreement.

CIA functions

cia() and cia_rm() are included here only as conceptual comparisons. CIA targets individual agreement or interchangeability rather than the same population concordance question targeted by CCC. In cia(), replicated readings within each subject-method cell are required because the estimator compares between-method disagreement with within-method replicate disagreement. The function supports pairwise and overall CIA, optional reference-method scaling, and optional confidence intervals by the selected inference method. No formal hypothesis test is implemented; inference is based on the estimate and its confidence interval.

cia_rm() is different from cia() because it targets matched repeated measurements under conditions such as visits, raters, laboratories, treatments, or time points, and it is not a technical-replicate estimator. It reports condition-specific CIA estimates, optional confidence intervals, and a homogeneity test for agreement across conditions. As implemented, the reported test statistic is MS_method_time / MS_error, with an upper-tail F-test p-value using df_method_time and df_error. The null hypothesis documented by the function is homogeneous agreement across conditions; the alternative is that agreement changes across conditions.

Function summary

In this package, the functions differ not only in the target index but also in the supported data structure and inferential procedure. Some functions provide estimates and confidence intervals only. A formal hypothesis test should only be interpreted when the function explicitly implements and reports one.

Function Index family Target question Data/design supported Estimation approach Inference reported Formal hypothesis test implemented? Yes/No/Requires verification
ccc() Lin CCC Pairwise concordance combining precision and accuracy Numeric wide data; rows are paired observational units Moment CCC with Lin delta-method/Fisher-z CI when requested Estimate; optional confidence interval No
ccc_rm_reml() Repeated-measures CCC Pairwise repeated-measures concordance from fitted variance components Long repeated-measures data; subject plus optional method/time structure REML mixed-model variance components and fixed-effect bias term Estimate; optional confidence interval; variance-component diagnostics No for the CCC parameter; variance-component selection tests are not CCC tests
ccc_rm_ustat() Repeated-measures CCC Pairwise repeated-measures concordance from U-statistic distances Long repeated-measures data; balanced method/time coverage per pair Nonparametric U-statistic estimator with Fisher-z CI when requested Estimate; optional confidence interval No
icc() Classical ICC Reliability or agreement under the selected ICC model, type, and unit Numeric wide data; pairwise or overall all-column scope Classical ANOVA mean-square formulas Pairwise: estimate and optional CI. Overall: coefficient table with F statistic, df, p-value, and optional CI Requires verification for overall p-values; no pairwise ICC test is documented
icc_rm_reml() Repeated-measures ICC Pairwise repeated-measures reliability/agreement from fitted variance components Long repeated-measures data; subject plus optional method/time structure REML mixed-model variance components and fixed-effect bias term Estimate; optional confidence interval; variance-component diagnostics No for the ICC parameter; variance-component selection tests are not ICC tests
cia() CIA Individual agreement/interchangeability relative to within-method replicate disagreement Long replicated method-comparison data with replicate identifiers Method-of-moments disagreement ratios; optional bounded variance-component variant Estimate; optional confidence interval No
cia_rm() Repeated-measures CIA Individual agreement/interchangeability across matched repeated conditions Long matched repeated-measures data with one observation per subject-method-condition cell Categorical-condition ANOVA estimator Estimate; optional confidence interval; homogeneity F statistic and p-value Yes: homogeneity of agreement across conditions

Choosing among CCC, BA, ICC, and CIA

In practice these methods answer different questions.

There is overlap in interpretation, but these are not interchangeable estimators.

References

Barnhart HX, Haber M, Lokhnygina Y, Kosinski AS. (2007). Comparison of concordance correlation coefficient and coefficient of individual agreement in assessing agreement. Journal of Biopharmaceutical Statistics, 17(4), 721-738.

Barnhart HX, Kosinski AS, Haber M. (2007). Assessing individual agreement. Journal of Biopharmaceutical Statistics, 17(4), 697-719.

Carrasco JL, Jover L. (2003). Estimating the concordance correlation coefficient: a new approach. Computational Statistics & Data Analysis, 47(4), 519-539.

Carrasco JL, Phillips BR, Puig-Martinez J, King TS, Chinchilli VM. (2013). Estimation of the concordance correlation coefficient for repeated measures using SAS and R. Computer Methods and Programs in Biomedicine, 109(3), 293-304.

Haber M, Gao J, Barnhart HX. (2010). Evaluation of agreement between measurement methods from data with matched repeated measurements via the coefficient of individual agreement. Journal of Data Science, 8, 457-469.

Lin L. (1989). A concordance correlation coefficient to evaluate reproducibility. Biometrics, 45, 255-268.

McGraw KO, Wong SP. (1996). Forming inferences about some intraclass correlation coefficients. Psychological Methods, 1(1), 30-46.

Pan Y, Gao J, Haber M, Barnhart HX. (2010). Estimation of coefficients of individual agreement (CIA’s) for quantitative and binary data using SAS and R. Computer Methods and Programs in Biomedicine.

Shrout PE, Fleiss JL. (1979). Intraclass correlations: uses in assessing rater reliability. Psychological Bulletin, 86(2), 420-428.