This vignette explains how mfrmr fits MML
models and how to interpret the newer strict marginal diagnostics.
For a response vector \(\mathbf{x}_n\) and parameter vector \(\beta\), the current MML path
targets the marginal likelihood
\[ L(\beta) = \prod_{n=1}^{N} \int p(\mathbf{x}_n \mid \theta, \beta) g(\theta) \, d\theta \approx \prod_{n=1}^{N} \sum_{q=1}^{Q} w_q \, p(\mathbf{x}_n \mid \theta_q, \beta), \]
where \((\theta_q, w_q)\) are
Gauss-Hermite nodes and weights. In mfrmr, the integral is
approximated with Gauss-Hermite quadrature, the marginal log-likelihood
is optimized from the same shared kernel, and person summaries are
computed post hoc from the posterior bundle. When a latent-regression
population model is active, the package uses person-specific transformed
nodes derived from the same quadrature basis rather than one
unconditional fixed grid.
The posterior weight for person \(n\) at node \(q\) is
\[ \omega_{nq} = \frac{w_q \, p(\mathbf{x}_n \mid \theta_q, \hat{\beta})} {\sum_{r=1}^{Q} w_r \, p(\mathbf{x}_n \mid \theta_r, \hat{\beta})}. \]
Expected a posteriori (EAP) scoring then uses
\[ \hat{\theta}_n^{\mathrm{EAP}} = \sum_{q=1}^{Q} \theta_q \, \omega_{nq}. \]
This is the kernel that now feeds logLik, the gradient,
EAP summaries, and strict marginal expected values.
For the current public RSM / PCM
release:
mml_engine = "direct" uses gradient-based direct
optimization of the marginal log-likelihoodmml_engine = "em" and
mml_engine = "hybrid" are also available for
RSM / PCM, while unsupported branches fall
back to directThis is the implemented scope for the current release.
The strict marginal branch is not based on plugging \(\hat{\theta}_n^{EAP}\) back into the response model. Instead, it works with posterior-integrated expectations. For a grouped summary \(g\) and category \(c\),
\[ \mathbb{E}_{\hat{\beta}}(N_{gc}) = \sum_{n=1}^{N} \sum_{q=1}^{Q} \omega_{nq} \, I(n \in g) \, P(X_n = c \mid \theta_q, \hat{\beta}). \]
The corresponding residual compares the observed count to that
latent-integrated expectation rather than to an EAP plug-in
prediction.
For pairwise local-dependence follow-up, the package keeps the same
posterior weights but replaces the one-category event with agreement or
adjacency events for the relevant pair of facet levels. That is why
top_marginal_cells and top_marginal_pairs are
conceptually related but not numerically comparable.
diagnose_mfrm() now keeps two evidence paths
explicit:
legacy: residual/EAP-oriented diagnostics inherited
from the earlier stackmarginal_fit: strict latent-integrated first-order and
pairwise screensboth: returns both without collapsing them into one
decision ruleThe object returned by summary(diag) exposes
diagnostic_basis so the two paths can be interpreted
separately.
The current design is deliberately aligned with five strands of the IRT fit literature.
Limited-information item-fit logic. Orlando and Thissen (2000,
2003) show why grouped or score-conditioned comparisons can be more
stable than full-information contingency-table statistics in realistic
IRT settings. The current package borrows that limited-information
logic, but it does not implement S-X2 or S-G2
literally. Instead, it applies posterior-integrated grouped residual
screens to many-facet cells and levels.
Generalized residual logic. Haberman and Sinharay (2013) define a generalized residual for a summary statistic \(T\) as
\[ r = \frac{T - \hat{\mathbb{E}}(T)}{\hat{s}_D}, \]
where \(\hat{\mathbb{E}}(T)\) and
\(\hat{s}_D\) are computed under the
fitted model. This is the clearest template for thinking about the
current marginal_fit outputs. The current pairwise
local-dependence summaries are informed by the same
observed-versus-expected logic, but they should still be read as
exploratory agreement screens rather than as formal Haberman- Sinharay
generalized residual tests.
Multi-method fit assessment and practical significance. Sinharay and Monroe (2025) review limited-information statistics, generalized residuals, posterior predictive checking, and practical significance, and recommend prioritizing fit procedures by intended use rather than treating one index as universally decisive.
Posterior predictive follow-up. Sinharay et al. (2006) treat
posterior predictive checking as a separate model-checking family built
around replicated datasets and discrepancy measures. That is the
intended follow-up role of the package’s currently scaffolded
posterior_predictive_follow_up path.
Many-facet reporting context. Linacre’s FACETS framework and applied MFRM studies such as Eckes (2005) remain the primary references for severity/leniency, mean-square fit, separation, and inter-rater agreement. The current strict marginal branch is designed to sit alongside that many-facet toolkit, not to replace it.
The strict marginal branch is currently a screening layer, not a fully calibrated inferential test battery.
This package therefore treats strict marginal diagnostics as structured evidence about possible misfit, not as a single definitive accept/reject rule. That design choice follows the broader review logic in Sinharay and Monroe (2025): use several complementary diagnostics, match them to the intended use of the scores, and examine practical significance before making strong claims.
For many-facet reporting, one additional boundary matters.
Facet-level separation/reliability and inter-rater agreement answer
different questions. High rater separation reliability can coexist with
weak observed agreement, and strong observed agreement does not imply
that raters are interchangeable on the latent severity scale. That is
why mfrmr reports diagnostics$reliability and
diagnostics$interrater as separate objects.
The current simulation-based validation covers:
These checks target RSM and PCM.
GPCM is now supported only within a bounded core route:
fitting, slope summaries, posterior scoring, information curves, direct
curve/category reports, and exploratory residual-based follow-up.
Broader APA/report bundles, fair-average semantics, and
planning/forecasting helpers remain out of scope for GPCM
in this release.
GPCM is the current upper supported scope for three
reasons.
MML kernel and the response-probability core
already generalize to the bounded GPCM branch without
changing the main package architecture.This is a narrower but more defensible claim than saying the whole package is uniformly generalized to free-discrimination many-facet work.
Robitzsch and Steinfeld (2018) are helpful because they separate two arguments that are often conflated in applied many-facet work.
If the intended score interpretation requires equal contributions of
items and raters, then the Rasch-family route remains substantively
attractive even when a slope-aware model fits better. mfrmr
therefore treats RSM / PCM as the
equal-weighting reference models and bounded GPCM as a
supported alternative for users who explicitly want to inspect or allow
discrimination-based reweighting.
This is also why some score-side helpers remain out of scope for
bounded GPCM. FACETS-style fair averages are Rasch-family
score transformations, and a slope-aware analogue should not silently
reuse the Rasch-family calculation.
One additional distinction matters for implementation. The
weight argument in fit_mfrm() is an
observation-weight column. It changes how rating events enter estimation
and summaries, but it is not the same thing as the equal-weighting
versus discrimination-weighting question discussed above.
Posterior-predictive checking, MCMC engines, and heavier
runtime infrastructure remain future extensions. They are not required
for the current quadrature-based MML route or for the
bounded GPCM support described here.
For the current release, the most defensible interpretation sequence is:
summary(fit) for estimation status and precision
basis.summary(diag) with
diagnostic_mode = "both" to keep legacy and strict evidence
separate.marginal_fit and marginal_pairwise
as screening layers for first-order and local-dependence follow-up.