Agreement and reliability are related to correlation, but they are not the same problem. Correlation describes co-movement. Agreement describes similarity on the measurement scale itself. Reliability describes the proportion of variation attributable to stable differences among subjects rather than to measurement error or method disagreement.
This vignette focuses on ccc() for wide data and uses
related agreement and reliability functions as context:
ccc()ba()icc()ccc_rm_reml() and ccc_rm_ustat()icc_rm_reml()cia() and cia_rm()Lin’s concordance correlation coefficient combines precision and
accuracy in a single number. In matrixCorr,
ccc() computes Lin’s pairwise CCC for numeric wide data and
optionally returns large-sample confidence intervals. No formal
hypothesis test is implemented; inference is based on the estimate and
its confidence interval.
Bland-Altman analysis separates the agreement question into estimated bias and limits of agreement.
library(matrixCorr)
set.seed(40)
ref <- rnorm(50, mean = 100, sd = 10)
m1 <- ref + rnorm(50, sd = 2)
m2 <- ref + 1.2 + rnorm(50, sd = 3)
fit_ba <- ba(m1, m2)
fit_ccc <- ccc(data.frame(m1 = m1, m2 = m2), ci = TRUE)
print(fit_ba)
#> Bland-Altman preview:
#> based_on : 50
#> loa_rule : mean +/- 1.96 * SD
#> ci : 95%
#> sd_diff : 3.722
#> width : 14.589
#>
#> quantity estimate lwr upr
#> Mean difference -1.290 -2.347 -0.232
#> Lower LoA -8.584 -10.416 -6.752
#> Upper LoA 6.005 4.173 7.837
summary(fit_ccc)
#> Lin's concordance summary
#> dimensions : 2 x 2
#> pairs : 1
#> estimate : 0.9299
#> most_negative: m1-m2 (0.9299)
#> most_positive: m1-m2 (0.9299)
#> ci : 95%
#> ci_method : lin_delta_fisher_z
#> ci_width : 0.08
#>
#> item1 item2 estimate n 95% CI
#> m1 m2 0.9299 50 [0.88, 0.96]
estimate(fit_ccc)
#> m1 m2
#> m1 1.0000000 0.9299231
#> m2 0.9299231 1.0000000
confint(fit_ccc)
#> item1 item2 lwr upr
#> 1 m1 m2 0.8804487 0.9593657
ci(fit_ccc)
#> $lwr
#> m1 m2
#> m1 1.0000000 0.8804487
#> m2 0.8804487 1.0000000
#>
#> $upr
#> m1 m2
#> m1 1.0000000 0.9593657
#> m2 0.9593657 1.0000000
#>
#> $conf.level
#> [1] 0.95
#>
#> $ci.method
#> [1] "lin_delta_fisher_z"
tidy(fit_ccc)
#> item1 item2 estimate lwr upr
#> row m1 m2 0.9299231 0.8804487 0.9593657The two summaries are complementary rather than redundant.
ccc() gives a single concordance coefficient, while
ba() makes the scale of disagreement explicit.
If you have at least 3 methods in wide form, ba() can
now compute every unordered Bland-Altman contrast directly:
fit_ba_pairwise <- ba(data.frame(m1 = m1, m2 = m2, m3 = ref - 0.8 + rnorm(50, sd = 2.5)))
print(fit_ba_pairwise)
#>
#> Bland-Altman (row - column) (95% CI)
#>
#> bias sd_loa loa_low loa_up width n_obs
#> -1.290 3.722 -8.584 6.005 14.589 50
#> 1.502 3.461 -5.280 8.285 13.565 50
#> 2.792 4.515 -6.058 11.642 17.700 50
summary(fit_ba_pairwise)
#>
#> Bland-Altman (pairwise row - column) (95% CI)
#>
#> Agreement estimates
#>
#> n_obs bias sd_loa loa_low loa_up width loa_multiplier
#> 50 -1.290 3.722 -8.584 6.005 14.589 1.96
#> 50 1.502 3.461 -5.280 8.285 13.565 1.96
#> 50 2.792 4.515 -6.058 11.642 17.700 1.96
#>
#> Confidence intervals
#>
#> bias_lwr bias_upr lo_lwr lo_upr up_lwr up_upr
#> -2.347 -0.232 -10.416 -6.752 4.173 7.837
#> 0.519 2.486 -6.984 -3.577 6.582 9.988
#> 1.509 4.075 -8.281 -3.835 9.419 13.864icc() extends the wide-data reliability workflow in two
directions. It can return a pairwise matrix across method pairs, or it
can return the overall classical ICC table for the full set of
methods.
wide_methods <- data.frame(
J1 = ref + rnorm(50, sd = 1.5),
J2 = ref + 4.0 + rnorm(50, sd = 1.8),
J3 = ref - 3.0 + rnorm(50, sd = 2.0),
J4 = ref + rnorm(50, sd = 1.6)
)
fit_icc_pair <- icc(
wide_methods,
model = "twoway_random",
type = "agreement",
unit = "single",
scope = "pairwise"
)
fit_icc_overall <- icc(
wide_methods,
model = "twoway_random",
type = "agreement",
unit = "single",
scope = "overall",
ci = TRUE
)
print(fit_icc_pair, digits = 2)
#> Intraclass correlation matrix
#> method : Intraclass correlation (two-way random, agreement, single)
#> dimensions : 4 x 4
#>
#> J1 J2 J3 J4
#> J1 1.00 0.90 0.94 0.98
#> J2 0.90 1.00 0.80 0.91
#> J3 0.94 0.80 1.00 0.93
#> J4 0.98 0.91 0.93 1.00
summary(fit_icc_pair)
#> Intraclass correlation summary
#> method : Intraclass correlation (two-way random, agreement, single)
#> dimensions : 4 x 4
#> pairs : 6
#> n_complete : 50
#> estimate : 0.7961 to 0.9817
#> most_negative: J2-J3 (0.7961)
#> most_positive: J1-J4 (0.9817)
#>
#> item1 item2 estimate n
#> J1 J4 0.9817 50
#> J1 J3 0.9398 50
#> J3 J4 0.9329 50
#> J2 J4 0.9136 50
#> J1 J2 0.9043 50
#> J2 J3 0.7961 50
print(fit_icc_overall)
#> Overall intraclass correlation
#> method : Overall intraclass correlation table
#> subjects : 50
#> raters : 4
#> selected : ICC2
#>
#> Coefficient table
#>
#> coefficient label estimate ... upr selected
#> ICC1 Single absolute 0.9050 ... 0.9400 FALSE
#> ICC2 Single random 0.9067 ... 0.9647 TRUE
#> ICC3 Single fixed 0.9757 ... 0.9850 FALSE
#> ICC1k Average absolute 0.9744 ... 0.9843 FALSE
#> ICC2k Average random 0.9749 ... 0.9909 FALSE
#> ICC3k Average fixed 0.9938 ... 0.9962 FALSE
#> ... 5 more variables not shown (omitted)
#> Use as.data.frame()/tidy()/as.matrix() to inspect the full result.This is the most important distinction in the ICC interface.
scope = "pairwise" answers: “How reliable is each
specific pair of methods?”
scope = "overall" answers: “How reliable is the full set
of methods when analysed jointly?”
Those are different quantities. The overall ICC cannot, in general, be recovered by averaging the pairwise matrix.
This simulation also includes systematic method bias, so it is a
natural place to contrast type = "consistency" with
type = "agreement".
fit_icc_cons <- icc(
wide_methods,
model = "twoway_random",
type = "consistency",
unit = "single",
scope = "overall",
ci = FALSE
)
fit_icc_agr <- icc(
wide_methods,
model = "twoway_random",
type = "agreement",
unit = "single",
scope = "overall",
ci = FALSE
)
data.frame(
type = c("consistency", "agreement"),
selected_coefficient = c(
attr(fit_icc_cons, "selected_coefficient"),
attr(fit_icc_agr, "selected_coefficient")
),
estimate = c(
attr(fit_icc_cons, "selected_row")$estimate,
attr(fit_icc_agr, "selected_row")$estimate
)
)
#> type selected_coefficient estimate
#> 1 consistency ICC3 0.9757153
#> 2 agreement ICC2 0.9066943Consistency discounts additive method shifts, whereas agreement penalises them. When methods differ mainly by a systematic offset, consistency can therefore look substantially better than agreement.
The classical ICC family is controlled by three arguments.
model selects the one-way, two-way random, or two-way
mixed formulation.type selects consistency or agreement.unit selects single-measure or average-measure
reliability.For pairwise ICC, average-measure output uses k = 2
because each estimate is based on exactly two methods. For overall ICC,
average-measure output uses the full number of analysed columns.
CCC, ICC and CIA address related but distinct agreement questions. CCC quantifies agreement between measurements by combining precision and accuracy (Lin, 1989). ICC expresses reliability or agreement through variance components; its interpretation depends on the study design, model form, and whether agreement or consistency is targeted (Shrout and Fleiss, 1979; McGraw and Wong, 1996). CIA targets individual agreement or interchangeability by comparing disagreement between methods with disagreement within methods (Barnhart, Kosinski, and Haber, 2007; Barnhart et al., 2007).
These indices should not be treated as interchangeable without considering the scientific question, data structure, and implemented estimator.
ccc() is the simple wide-data CCC implementation. It
estimates pairwise Lin CCC values and, when ci = TRUE,
returns Lin delta-method/Fisher-z confidence intervals. It does not
report a p-value or test decision.
For repeated-measures data, the package provides two CCC routes.
ccc_rm_reml() fits pairwise mixed models by REML and
converts the fitted variance components and fixed-effect bias term into
repeated-measures CCC estimates. ccc_rm_ustat() computes a
nonparametric U-statistic repeated-measures CCC with optional Fisher-z
confidence intervals. These repeated-measures functions estimate
agreement parameters and optional confidence intervals; they do not
implement a formal hypothesis test of the CCC parameter.
ccc_rm_reml() can use boundary-aware likelihood-ratio tests
for variance-component selection when vc_select = "auto",
but those tests are model-selection diagnostics rather than tests of CCC
agreement.
icc() computes classical ANOVA ICC forms for wide data.
With scope = "pairwise" it returns a pairwise matrix; with
scope = "overall" it returns the standard six-form overall
coefficient table. The selected ICC form is controlled by
model, type, and unit, so the
same numeric value should not be interpreted without those design
choices.
Pairwise icc() and repeated-measures
icc_rm_reml() provide estimates and optional confidence
intervals without a formal test of a target ICC parameter. For
icc(scope = "overall"), the package reports ANOVA F
statistics, degrees of freedom, and p-values in the overall coefficient
table. The implementation and tests verify those reported p-values, but
the package documentation does not define a generic ICC agreement
hypothesis for users; this vignette therefore does not describe the
overall ICC p-values as a generic test of agreement.
cia() and cia_rm() are included here only
as conceptual comparisons. CIA targets individual agreement or
interchangeability rather than the same population concordance question
targeted by CCC. In cia(), replicated readings within each
subject-method cell are required because the estimator compares
between-method disagreement with within-method replicate disagreement.
The function supports pairwise and overall CIA, optional
reference-method scaling, and optional confidence intervals by the
selected inference method. No formal hypothesis test is implemented;
inference is based on the estimate and its confidence interval.
cia_rm() is different from cia() because it
targets matched repeated measurements under conditions such as visits,
raters, laboratories, treatments, or time points, and it is not a
technical-replicate estimator. It reports condition-specific CIA
estimates, optional confidence intervals, and a homogeneity test for
agreement across conditions. As implemented, the reported test statistic
is MS_method_time / MS_error, with an upper-tail F-test
p-value using df_method_time and df_error. The
null hypothesis documented by the function is homogeneous agreement
across conditions; the alternative is that agreement changes across
conditions.
In this package, the functions differ not only in the target index but also in the supported data structure and inferential procedure. Some functions provide estimates and confidence intervals only. A formal hypothesis test should only be interpreted when the function explicitly implements and reports one.
| Function | Index family | Target question | Data/design supported | Estimation approach | Inference reported | Formal hypothesis test implemented? Yes/No/Requires verification |
|---|---|---|---|---|---|---|
ccc() |
Lin CCC | Pairwise concordance combining precision and accuracy | Numeric wide data; rows are paired observational units | Moment CCC with Lin delta-method/Fisher-z CI when requested | Estimate; optional confidence interval | No |
ccc_rm_reml() |
Repeated-measures CCC | Pairwise repeated-measures concordance from fitted variance components | Long repeated-measures data; subject plus optional method/time structure | REML mixed-model variance components and fixed-effect bias term | Estimate; optional confidence interval; variance-component diagnostics | No for the CCC parameter; variance-component selection tests are not CCC tests |
ccc_rm_ustat() |
Repeated-measures CCC | Pairwise repeated-measures concordance from U-statistic distances | Long repeated-measures data; balanced method/time coverage per pair | Nonparametric U-statistic estimator with Fisher-z CI when requested | Estimate; optional confidence interval | No |
icc() |
Classical ICC | Reliability or agreement under the selected ICC model, type, and unit | Numeric wide data; pairwise or overall all-column scope | Classical ANOVA mean-square formulas | Pairwise: estimate and optional CI. Overall: coefficient table with F statistic, df, p-value, and optional CI | Requires verification for overall p-values; no pairwise ICC test is documented |
icc_rm_reml() |
Repeated-measures ICC | Pairwise repeated-measures reliability/agreement from fitted variance components | Long repeated-measures data; subject plus optional method/time structure | REML mixed-model variance components and fixed-effect bias term | Estimate; optional confidence interval; variance-component diagnostics | No for the ICC parameter; variance-component selection tests are not ICC tests |
cia() |
CIA | Individual agreement/interchangeability relative to within-method replicate disagreement | Long replicated method-comparison data with replicate identifiers | Method-of-moments disagreement ratios; optional bounded variance-component variant | Estimate; optional confidence interval | No |
cia_rm() |
Repeated-measures CIA | Individual agreement/interchangeability across matched repeated conditions | Long matched repeated-measures data with one observation per subject-method-condition cell | Categorical-condition ANOVA estimator | Estimate; optional confidence interval; homogeneity F statistic and p-value | Yes: homogeneity of agreement across conditions |
In practice these methods answer different questions.
ccc() when one concordance coefficient per pair is
the main target.ccc_rm_ustat() or ccc_rm_reml() when
the concordance target is repeated-measures CCC rather than simple
paired wide-data CCC.ba() when the size and direction of disagreement
should be visible on the original measurement scale. Use
ba_rm() for the repeated-measures Bland-Altman
workflow.icc() when the target is reliability under a
classical variance components interpretation.icc_rm_reml() when the target is repeated-measures
ICC from the package’s REML variance-component backend.cia() when replicated readings within
subject-method cells are available and the target is individual
agreement or interchangeability relative to within-method replicate
disagreement.cia_rm() when the target is individual agreement
across matched repeated conditions rather than technical
replicates.There is overlap in interpretation, but these are not interchangeable estimators.
Barnhart HX, Haber M, Lokhnygina Y, Kosinski AS. (2007). Comparison of concordance correlation coefficient and coefficient of individual agreement in assessing agreement. Journal of Biopharmaceutical Statistics, 17(4), 721-738.
Barnhart HX, Kosinski AS, Haber M. (2007). Assessing individual agreement. Journal of Biopharmaceutical Statistics, 17(4), 697-719.
Carrasco JL, Jover L. (2003). Estimating the concordance correlation coefficient: a new approach. Computational Statistics & Data Analysis, 47(4), 519-539.
Carrasco JL, Phillips BR, Puig-Martinez J, King TS, Chinchilli VM. (2013). Estimation of the concordance correlation coefficient for repeated measures using SAS and R. Computer Methods and Programs in Biomedicine, 109(3), 293-304.
Haber M, Gao J, Barnhart HX. (2010). Evaluation of agreement between measurement methods from data with matched repeated measurements via the coefficient of individual agreement. Journal of Data Science, 8, 457-469.
Lin L. (1989). A concordance correlation coefficient to evaluate reproducibility. Biometrics, 45, 255-268.
McGraw KO, Wong SP. (1996). Forming inferences about some intraclass correlation coefficients. Psychological Methods, 1(1), 30-46.
Pan Y, Gao J, Haber M, Barnhart HX. (2010). Estimation of coefficients of individual agreement (CIA’s) for quantitative and binary data using SAS and R. Computer Methods and Programs in Biomedicine.
Shrout PE, Fleiss JL. (1979). Intraclass correlations: uses in assessing rater reliability. Psychological Bulletin, 86(2), 420-428.