1. Introduction to matrixCorr

Introduction

matrixCorr is organised around a simple idea: correlation, agreement, and reliability workflows are often used together, so they should not require completely different interfaces, object classes, and inspection patterns.

The package therefore combines several estimator families behind a common matrix-oriented interface for wide data and a shared long-format workflow for repeated-measures designs. The purpose of this vignette is not to describe every supported method in detail, but to show how the package is structured, what kinds of inputs it expects, and how the returned objects can be inspected.

Main workflow families

At a practical level, the package is centred on five workflow families.

The same broad inspection pattern is used throughout to fit an estimator, print it for a compact view, call summary() for a longer digest, and use plot() when a graphical summary is available.

Wide matrix workflow

The matrix-style functions accept a numeric matrix or data frame and return a square result indexed by the original column names.

library(matrixCorr)

set.seed(1)
X <- as.data.frame(matrix(rnorm(120), ncol = 4))
names(X) <- paste0("V", 1:4)

fit_pearson <- pearson_corr(X, ci = TRUE)
fit_spearman <- spearman_rho(X, ci = TRUE)

print(fit_pearson, digits = 2)
#> Pearson correlation matrix
#>   method      : pearson
#>   dimensions  : 4 x 4
#>   ci          : yes
#> 
#>      V1    V2   V3    V4
#> V1 1.00  0.05 0.02  0.00
#> V2 0.05  1.00 0.45 -0.12
#> V3 0.02  0.45 1.00  0.00
#> V4 0.00 -0.12 0.00  1.00
summary(fit_spearman)
#> Spearman correlation summary
#>   output      : matrix
#>   dimensions  : 4 x 4
#>   retained_pairs: 6
#>   threshold   : 0.0000
#>   diag        : included
#>   estimate    : -0.1524 to 0.4318
#>   ci          : 95%
#>   ci_method   : jackknife_euclidean_likelihood
#>   ci_width    : 0.644 to 0.908
#>   cross_zero  : 5 pair(s)
#> 
#>  item1 item2 estimate n  95% CI            p_value
#>  V2    V3     0.4318  30 [0.1077, 0.7514]  0.0163 
#>  V1    V3    -0.1524  30 [-0.5819, 0.2781] 0.4248 
#>  V1    V4    -0.0861  30 [-0.4636, 0.2924] 0.6538 
#>  V2    V4    -0.0345  30 [-0.4660, 0.3982] 0.8577 
#>  V1    V2     0.0154  30 [-0.3972, 0.4276] 0.9364 
#>  V3    V4     0.0087  30 [-0.4458, 0.4624] 0.9640

For fitted objects with confidence intervals, the public accessors expose the stored pieces without asking users to read attributes directly.

estimate(fit_pearson)
#>             V1          V2           V3            V4
#> V1 1.000000000  0.04866964 0.0153318005  0.0011601689
#> V2 0.048669642  1.00000000 0.4540745507 -0.1204674721
#> V3 0.015331801  0.45407455 1.0000000000  0.0009970037
#> V4 0.001160169 -0.12046747 0.0009970037  1.0000000000
confint(fit_pearson)
#>   item1 item2        lwr       upr
#> 1    V1    V2 -0.3171607 0.4018920
#> 2    V1    V3 -0.3468533 0.3735378
#> 3    V2    V3  0.1121522 0.6998551
#> 4    V1    V4 -0.3592592 0.3612784
#> 5    V2    V4 -0.4607402 0.2506815
#> 6    V3    V4 -0.3594013 0.3611365
ci(fit_pearson)
#> $est
#>             V1          V2           V3            V4
#> V1 1.000000000  0.04866964 0.0153318005  0.0011601689
#> V2 0.048669642  1.00000000 0.4540745507 -0.1204674721
#> V3 0.015331801  0.45407455 1.0000000000  0.0009970037
#> V4 0.001160169 -0.12046747 0.0009970037  1.0000000000
#> 
#> $lwr.ci
#>            V1         V2         V3         V4
#> V1        NaN -0.3171607 -0.3468533 -0.3592592
#> V2 -0.3171607        NaN  0.1121522 -0.4607402
#> V3 -0.3468533  0.1121522        NaN -0.3594013
#> V4 -0.3592592 -0.4607402 -0.3594013        NaN
#> 
#> $upr.ci
#>           V1        V2        V3        V4
#> V1       NaN 0.4018920 0.3735378 0.3612784
#> V2 0.4018920       NaN 0.6998551 0.2506815
#> V3 0.3735378 0.6998551       NaN 0.3611365
#> V4 0.3612784 0.2506815 0.3611365       NaN
#> 
#> $conf.level
#> [1] 0.95
tidy(fit_pearson)
#>   item1 item2      estimate        lwr       upr
#> 1    V1    V2  0.0486696416 -0.3171607 0.4018920
#> 2    V1    V3  0.0153318005 -0.3468533 0.3735378
#> 3    V2    V3  0.4540745507  0.1121522 0.6998551
#> 4    V1    V4  0.0011601689 -0.3592592 0.3612784
#> 5    V2    V4 -0.1204674721 -0.4607402 0.2506815
#> 6    V3    V4  0.0009970037 -0.3594013 0.3611365

This pattern extends to other wide-data estimators such as kendall_tau(), dcor(), bicor(), pbcor(), wincor(), skipped_corr(), shrinkage_corr(), pcorr(), ccc(), and icc().

Agreement and reliability workflow

Agreement methods answer a different question from ordinary correlation. Correlation asks whether two variables move together. Agreement asks whether two methods give sufficiently similar values on the measurement scale itself.

set.seed(2)
ref <- rnorm(40, mean = 10, sd = 2)
alt <- ref + 0.3 + rnorm(40, sd = 0.8)

fit_ba <- ba(ref, alt)
print(fit_ba)
#> Bland-Altman preview:
#>   based_on    : 40
#>   loa_rule    : mean +/- 1.96 * SD
#>   ci          : 95%
#>   sd_diff     : 0.927
#>   width       : 3.632
#> 
#>  quantity        estimate lwr    upr   
#>  Mean difference -0.269   -0.566 0.027 
#>  Lower LoA       -2.085   -2.599 -1.572
#>  Upper LoA       1.547    1.034  2.060

wide_methods <- data.frame(
  m1 = ref + rnorm(40, sd = 0.2),
  m2 = ref + 0.2 + rnorm(40, sd = 0.3),
  m3 = ref - 0.1 + rnorm(40, sd = 0.4)
)

fit_ccc <- ccc(wide_methods)
fit_icc <- icc(wide_methods, scope = "pairwise")

summary(fit_ccc)
#> Correlation summary
#>   output      : matrix
#>   dimensions  : 3 x 3
#>   retained_pairs: 3
#>   threshold   : 0.0000
#>   diag        : included
#>   estimate    : 0.9567 to 0.9813
#> 
#>  item1 item2 estimate n 
#>  m1    m2    0.9813   40
#>  m1    m3    0.9803   40
#>  m2    m3    0.9567   40
summary(fit_icc)
#> Intraclass correlation summary
#>   method      : Intraclass correlation (one-way, consistency, single)
#>   dimensions  : 3 x 3
#>   pairs       : 3
#>   n_complete  : 40
#>   estimate    : 0.9574 to 0.9817
#>   most_negative: m2-m3 (0.9574)
#>   most_positive: m1-m2 (0.9817)
#> 
#>  item1 item2 estimate n 
#>  m1    m2    0.9817   40
#>  m1    m3    0.9807   40
#>  m2    m3    0.9574   40

For ICC, scope = "pairwise" and scope = "overall" answer different questions as well. Pairwise ICC asks how reliable each specific method pair is. Overall ICC asks how reliable the full set of columns is when analysed jointly.

fit_icc_overall <- icc(wide_methods, scope = "overall", ci = TRUE)
print(fit_icc_overall)
#> Overall intraclass correlation
#>   method      : Overall intraclass correlation table
#>   subjects    : 40
#>   raters      : 3
#>   selected    : ICC1
#> 
#> Coefficient table
#> 
#>  coefficient label            estimate ... upr    selected
#>  ICC1        Single absolute  0.9732   ... 0.9848  TRUE   
#>  ICC2        Single random    0.9733   ... 0.9876 FALSE   
#>  ICC3        Single fixed     0.9819   ... 0.9898 FALSE   
#>  ICC1k       Average absolute 0.9909   ... 0.9949 FALSE   
#>  ICC2k       Average random   0.9909   ... 0.9958 FALSE   
#>  ICC3k       Average fixed    0.9939   ... 0.9966 FALSE   
#> ... 5 more variables not shown (omitted)
#> Use as.data.frame()/tidy()/as.matrix() to inspect the full result.
summary(fit_icc_overall)
#> Overall intraclass correlation summary
#>   selected    : ICC1
#>   subjects    : 40
#>   raters      : 3
#>   n_complete  : 40
#>   ms_subject  : 14.4485
#>   ms_rater    : 1.8009
#>   ms_error    : 0.0884
#> 
#> ICC coefficients
#> 
#>  coefficient label            estimate statistic df1 df2 p_value lwr  upr 
#>  ICC1        Single absolute  0.9732   110.1464  39  80  0       0.96 0.98
#>  ICC2        Single random    0.9733   163.5137  39  78  0       0.93 0.99
#>  ICC3        Single fixed     0.9819   163.5137  39  78  0       0.97 0.99
#>  ICC1k       Average absolute 0.9909   110.1464  39  80  0       0.98 0.99
#>  ICC2k       Average random   0.9909   163.5137  39  78  0       0.98 1.00
#>  ICC3k       Average fixed    0.9939   163.5137  39  78  0       0.99 1.00
#>  selected
#>   TRUE   
#>  FALSE   
#>  FALSE   
#>  FALSE   
#>  FALSE   
#>  FALSE   
#> 
#> ANOVA decomposition
#> 
#>  component df sum_sq   mean_sq statistic p_value
#>  subjects  39 563.4912 14.4485 110.1464   0     
#>  raters     2   3.6017  1.8009  20.3805   0     
#>  residual  78   6.8923  0.0884       NA  NA
estimate(fit_icc_overall)
#> [1] 0.9732 0.9733 0.9819 0.9909 0.9909 0.9939
confint(fit_icc_overall)
#>    lwr  upr
#> 1 0.96 0.98
#> 2 0.93 0.99
#> 3 0.97 0.99
#> 4 0.98 0.99
#> 5 0.98 1.00
#> 6 0.99 1.00
ci(fit_icc_overall)
#> $lwr
#> [1] 0.96 0.93 0.97 0.98 0.98 0.99
#> 
#> $upr
#> [1] 0.98 0.99 0.99 0.99 1.00 1.00
#> 
#> $conf.level
#> [1] 0.95
#> 
#> $ci.method
#> NULL
tidy(fit_icc_overall)
#>   coefficient            label estimate statistic df1 df2 p_value  lwr  upr
#> 1        ICC1  Single absolute   0.9732  110.1464  39  80       0 0.96 0.98
#> 2        ICC2    Single random   0.9733  163.5137  39  78       0 0.93 0.99
#> 3        ICC3     Single fixed   0.9819  163.5137  39  78       0 0.97 0.99
#> 4       ICC1k Average absolute   0.9909  110.1464  39  80       0 0.98 0.99
#> 5       ICC2k   Average random   0.9909  163.5137  39  78       0 0.98 1.00
#> 6       ICC3k    Average fixed   0.9939  163.5137  39  78       0 0.99 1.00
#>   selected
#> 1     TRUE
#> 2    FALSE
#> 3    FALSE
#> 4    FALSE
#> 5    FALSE
#> 6    FALSE

Repeated-measures workflow

Repeated-measures functions require long-format data and explicit identifiers for subjects and, when relevant, methods and time.

set.seed(3)
n_subject <- 12
n_rep <- 3

subject <- rep(seq_len(n_subject), each = n_rep)
signal <- rnorm(n_subject * n_rep)
subject_x <- rnorm(n_subject, sd = 1.2)[subject]
subject_y <- rnorm(n_subject, sd = 1.0)[subject]

dat_rm <- data.frame(
  id = subject,
  x = subject_x + signal + rnorm(n_subject * n_rep, sd = 0.2),
  y = subject_y + 0.7 * signal + rnorm(n_subject * n_rep, sd = 0.3),
  z = subject_y - 0.4 * signal + rnorm(n_subject * n_rep, sd = 0.4)
)

fit_rmcorr <- rmcorr(dat_rm, response = c("x", "y", "z"), subject = "id")
print(fit_rmcorr, digits = 2)
#> Repeated-measures correlation matrix
#>   method      : rmcorr
#>   dimensions  : 3 x 3
#>   p_values    : yes
#> 
#>       x     y     z
#> x  1.00  0.93 -0.60
#> y  0.93  1.00 -0.64
#> z -0.60 -0.64  1.00
summary(fit_rmcorr)
#> Correlation summary
#>   output      : matrix
#>   dimensions  : 3 x 3
#>   retained_pairs: 3
#>   threshold   : 0.0000
#>   diag        : included
#>   estimate    : -0.6408 to 0.9294
#> 
#>  item1 item2 estimate n 
#>  x     y      0.9294  36
#>  y     z     -0.6408  36
#>  x     z     -0.6018  36

Agreement and reliability in repeated designs use a different long-format interface because they need method and often time identifiers.

set.seed(4)
n_id <- 10
n_time <- 3

dat_agree <- expand.grid(
  id = factor(seq_len(n_id)),
  time = factor(seq_len(n_time)),
  method = factor(c("A", "B"))
)

subj <- rnorm(n_id, sd = 1.0)[dat_agree$id]
subj_method <- rnorm(n_id * 2, sd = 0.2)
sm <- subj_method[(as.integer(dat_agree$id) - 1L) * 2L + as.integer(dat_agree$method)]

dat_agree$y <- subj + sm + 0.25 * (dat_agree$method == "B") +
  rnorm(nrow(dat_agree), sd = 0.35)

fit_icc_rm <- icc_rm_reml(
  dat_agree,
  response = "y",
  subject = "id",
  method = "method",
  time = "time",
  type = "consistency"
)

summary(fit_icc_rm)
#> 
#> Repeated-measures intraclass correlation (REML)
#> 
#> ICC estimates
#> 
#>  item1 item2 estimate n_subjects n_obs se_icc residual_model
#>  A     B     0.9585   10         60    0.0064 iid           
#> 
#> Variance components
#> 
#>  sigma2_subject sigma2_subject_method sigma2_subject_time sigma2_error SB    
#>  0.9329         0.0044                0.0024              0.1055       0.0126
#> 
#> AR(1) diagnostics
#> 
#>  ar1_rho_lag1 ar1_rho_mom ar1_pairs ar1_pval use_ar1 ar1_recommend
#>  -0.1734      -0.1734     40        0.2728   FALSE   FALSE

Shared inspection methods

Most returned objects support at least print() and summary(). Many also support plot(). Matrix-style objects are intentionally compact when printed, while summary() returns a longer digest of the strongest or most relevant pairs.

Display defaults can be controlled through ordinary R options. For example:

options(
  matrixCorr.threads = parallel::detectCores(logical = FALSE),
  matrixCorr.print_max_rows = 20L,
  matrixCorr.print_topn = 5L,
  matrixCorr.print_max_vars = 10L,
  matrixCorr.print_show_ci = "yes",
  matrixCorr.summary_max_rows = 12L,
  matrixCorr.summary_topn = 5L,
  matrixCorr.summary_max_vars = 10L,
  matrixCorr.summary_show_ci = "yes"
)

matrixCorr.threads defines the default for n_threads across thread-aware estimators. The print options control compact console previews returned by print(). The summary options control the longer digest returned by summary(). Current values can be inspected with getOption("matrixCorr.print_max_rows") and the same pattern for the remaining options.

This shared display layer is part of the package design. The goal is that users can move across workflow families without relearning how objects are inspected, while still being able to tune how much output is shown by default.

Where to go next

The remaining vignettes focus on specific workflow families. Specifically