Introduction to LearnVizLMM

Introduction

The objective of this package is to summarize characteristics of linear mixed effects models (LMMs) without data or a fitted model. Its functions convert code for fitting nlme::lme() and lme4::lmer() models into tables, equations, and visuals. These outputs can be used for learning how to fit LMMs in R and communicating about LMMs in presentations, manuscripts, and analysis plans.

Why without data?

To expand the types of examples used in statistics education.
To inform study design (e.g., model and variable selection prior to data collection or model fitting).

Installation

To install this package, you can use CRAN (the central R package repository).

library(LearnVizLMM)

Background

In many settings, multiple samples are collected from the same individual or group (e.g., achievement scores of students from different schools and classrooms within schools; impact of different diets on the weights of individuals sampled once a month for six months). This type of data is often referred to as multilevel and can be analyzed using LMMs. LMMs can be fit in R using the following packages and functions.

library(lme4)
lme4::lmer(formula = outcome ~ fixed_effects + random_effects)

library(nlme)
nlme::lme(fixed = outcome ~ fixed_effects, random = random_effects)

The outcome and fixed effects are similar to that of a linear model.

lmer(formula = Y ~ 1 + X1 + X2 + random_effects)
lme(fixed = Y ~ 1 + X1 + X2, random = random_effects)

The random effects are specified with respect to groups or grouping factors (GFs), which are factor variables under which observations are nested or grouped. Using different notations, users can specify random intercepts, random slopes, and more to be estimated by the model.

# Random Intercept
lmer(formula = outcome ~ fixed_effects + (1|GF))
lme(fixed = outcome ~ fixed_effects, random = ~1|GF)
lme(fixed = outcome ~ fixed_effects, random = list(GF=~1))
# Random Intercept & Slope
lmer(formula = outcome ~ fixed_effects + (1 + X1|GF))
lme(fixed = outcome ~ fixed_effects, random = ~1 + X1|GF)
lme(fixed = outcome ~ fixed_effects, random = list(GF=~1 + X1))

extract_equation()

extract_equation() returns a ‘LaTeX’ model equation. This function has three output options.

output_type = “latex” (default)

extract_equation(model = "lmer(score ~ 1 + age + (1|subject))")

$$
\begin{aligned}
  \operatorname{score}_{ij} &= \beta_{0} + \beta_{1}(\operatorname{age}) \\
  &+ u_{0i} \\
 &+ \epsilon_{ij} \\
 u_{0i} &\sim N \left(0, \tau^2_{u_{0}} \right), \text{ for subject i = 1,} \dots \text{,a} \\
 \epsilon_{ij} &\sim N \left(0, \sigma^2 \right), \text{ for Observation j = 1,} \dots \text{,n}  
\end{aligned}
$$

What do I do with this output?

To see the equation, copy and paste the output into any file that supports ‘LaTeX’ equations. Below is an example of what will happen if you paste the output from above into ‘R Markdown’.

\[ \begin{aligned} \operatorname{score}_{ij} &= \beta_{0} + \beta_{1}(\operatorname{age}) \\ &+ u_{0i} \\ &+ \epsilon_{ij} \\ u_{0i} &\sim N \left(0, \tau^2_{u_{0}} \right), \text{ for subject i = 1,} \dots \text{,a} \\ \epsilon_{ij} &\sim N \left(0, \sigma^2 \right), \text{ for Observation j = 1,} \dots \text{,n} \end{aligned} \]

output_type = “string”

extract_equation(model = "lmer(score ~ 1 + age + (1|subject))",
                 output_type = "string")

[1] "$$\n\\begin{aligned}\n  \\operatorname{score}_{ij} &= \\beta_{0} + \\beta_{1}(\\operatorname{age}) \\\\\n  &+ u_{0i} \\\\\n &+ \\epsilon_{ij} \\\\\n u_{0i} &\\sim N \\left(0, \\tau^2_{u_{0}} \\right), \\text{ for subject i = 1,} \\dots \\text{,a} \\\\\n \\epsilon_{ij} &\\sim N \\left(0, \\sigma^2 \\right), \\text{ for Observation j = 1,} \\dots \\text{,n}  \n\\end{aligned}\n$$\n"

output_type = “none”

extract_equation(model = "lmer(score ~ 1 + age + (1|subject))",
                 output_type = "none")

extract_variables()

extract_variables() returns a data frame with information on the variables in the model. The columns of the data frame include:

Effect - whether the effect is random or fixed
Group - group or grouping factor associated with random effects
Term - notation used to include the variable in the model
Description - description of the Term
Parameter - parameter estimated when the model is fit

Note: The data frames for the examples below are displayed using kbl() from the kableExtra package. Below are examples of how this function works for different models.

One Group, Random Intercept and Slopes

By default, all predictor variables are assumed to be numeric.

extract_variables(model = "lme(weight~1+sex+time+I(time^2), random=~1+time+I(time^2)|ID)")

Effect	Group	Term	Description	Parameter
fixed		1	intercept	(Intercept)
fixed		sex	numeric	sex
fixed		time	numeric	time
fixed		I(time^2)	numeric	I(time^2)
random	ID	~1+…+…\|ID	ID-specific intercepts	SD (Intercept)
random	ID	~…+time+…\|ID	ID-specific slopes for time	SD (time)
random	ID	~…+…+I(time^2)\|ID	ID-specific slopes for I(time^2)	SD (I(time^2))
random	ID			Cor (Intercept,time)
random	ID			Cor (Intercept,I(time^2))
random	ID			Cor (time,I(time^2))
random				SD (Residual)

Two Groups, Random Intercepts and Slope

To include one or more categorical predictors, use cat_vars and cat_vars_nlevels. In addition, fixed and random intercepts are added to the data frame unless explicitly excluded (e.g., “age-1” and “0+age”).

extract_variables(model = "lme(Score~type+age*sex,random=list(School=pdDiag(~type),Class=~1))",
                  cat_vars = c("type", "sex"),
                  cat_vars_nlevels = c(3, 2))

Effect	Group	Term	Description	Parameter
fixed		1	intercept	(Intercept)
fixed		type	categorical	typeB
fixed		type	categorical	typeC
fixed		age	numeric	age
fixed		sex	categorical	sexB
fixed		age*sex	interaction	age:sexB
random	School	School=pdDiag(~1+…)	School-specific intercepts	SD (Intercept)
random	School	School=pdDiag(~…+type)	School-specific slopes for typeB	SD (typeB)
random	School	School=pdDiag(~…+type)	School-specific slopes for typeC	SD (typeC)
random	Class (within School)	Class=~1	Class-specific intercepts	SD (Intercept)
random				SD (Residual)

extract_structure()

There are two ways to run extract_structure() to get an image of the multilevel data structure.

Use the model input, which contains lme() or lmer() code.
Specify the number, description, and names of the groups or grouping factors of interest.

Below are examples of both approaches for one, two, and three groups.

One group

extract_structure(n_gf = 1, 
                  gf_names = "Subject")

To specify the number of Subjects, use gf_nlevels.

extract_structure(model = "lme(Weight ~ Time, random=~Time|Subject, data)",
                  gf_nlevels = 47)

To remove the labels for the levels on the left side of the image, use label_levels.

extract_structure(model = "lme(Weight ~ Time, random=~Time|Subject, data)",
                  gf_nlevels = "m", 
                  label_levels = "no")

Two groups

When there are two groups, they can either be crossed (e.g., every level of Worker is observed for every level of Machine)

extract_structure(model = "lmer(Strength ~ 1 + (1|Machine) + (1|Worker))",
                  gf_nlevels = c(10, 20))

or nested (e.g., class is nested within school).

extract_structure(model = "lme(score~type, random=list(school=pdDiag(~1+type),class=~1))")

These figures can also be created using the following code.

extract_structure(n_gf = 2, 
                  gf_description = "crossed",
                  gf_names = c("Machine", "Worker"),
                  gf_nlevels = c(10, 20))

extract_structure(n_gf = 2, 
                  gf_description = "nested",
                  gf_names = c("school", "class"))

Three groups

When there are three groups, they can also be crossed or nested.

extract_structure(n_gf = 3,
                  gf_description = "crossed",
                  gf_names = c("District", "School", "Class"),
                  gf_nlevels = c(8, 15, 5))

The index used for the highest-level group or the group that entered the model or gf_names first can be left as "i" (see above) or redefined using gf3_index.

extract_structure(n_gf = 3,
                  gf_description = "nested",
                  gf_names = c("District", "School", "Class"),
                  gf_nlevels = c(8, 15, 5),
                  gf3_index = "q")

There may also be a combination of nested and crossed groups. For example, one group may be crossed with two nested groups (i.e. crossed with nested).

extract_structure(n_gf = 3,
                  gf_description = "crossed with nested",
                  gf_names = c("GF1", "GF2", "GF3"))

Another example is when two crossed groups are nested within another group (i.e. crossed within nested).

extract_structure(n_gf = 3,
                  gf_description = "crossed within nested",
                  gf_names = c("GF1", "GF2", "GF3"))

Export Type

If you wish to edit the output beyond what is offered by extract_structure(), set export_type = "text". After editing the text, the figure can be created using grViz() in the DiagrammeR package.

diagram_text <- extract_structure(n_gf = 1, 
                                  gf_names = "Subject",
                                  export_type = "text")

DiagrammeR::grViz(diagram_text)

Scope & Tips

The current version of this package does not work for all possible lme() or lmer() models. However, its functions do work for:

One, two, or three groups

# One
(1|GF)
random=~1|GF
random=list(GF=~1)
# Two
(1|GF1)+(1|GF2)
(1|GF1/GF2)
random=~1|GF1/GF2
# Three
(1|GF1/GF2/GF3)
(1|GF1)+(1|GF2/GF3)
random=list(GF1=~1,GF2=~1,GF3=~1)

One to four random effects per group

# One
(1|GF)
# Two
(1+X1|GF)
(X1|GF)
# Three
(1+X1+X2|GF)
(X1+X2|GF)
# Four
(1+X1+X2+X3|GF)
(X1+X2+X3|GF)

Two- and three-way interactions written using the "*" notation

# Two-way
X1 + X2 + X1:X2
X1*X2
# Three-way
X1 + X2 + X3 + X1:X2 + X1:X3 + X2:X3 + X1:X2:X3
X1*X2*X3
# Four-way
X1 + X2 + X3 + X3 + X1:X2 + X1:X3 + X1:X4 + X2:X3 + X2:X4 + 
  X1:X2:X3 + X1:X2:X4 + X1:X4:X3 + X4:X2:X3 + X1:X2:X3:X4
X1*X2*X3*X4 # does not work

Interactions between two and three categorical variables

Here are some additional tips for using this package:

If the model input is code for fitting a lmer() model, only the first input is used and it is assumed to be the formula input. While other information can be included, it will be ignored.

# works
model = "lmer(formula = outcome ~ fixed_effects + random_effects)"
model = "lmer(outcome ~ fixed_effects + random_effects)"
model = "lmer(outcome ~ fixed_effects + random_effects, data, ...)"
# does not work
model = "lmer(data, formula = outcome ~ fixed_effects + random_effects)"
model = "lmer(data, outcome ~ fixed_effects + random_effects)"

If the model input is code for fitting a lme() model, only the fixed and random inputs are used. The fixed input is assumed to be the first input and the random input must include random =. Other information can be included, but it will be ignored.

# works
model = "lme(fixed = outcome ~ fixed_effects, random = random_effects, data, ...)"
model = "lme(outcome ~ fixed_effects, random = random_effects)"
# does not work
model = "lme(outcome ~ fixed_effects, random_effects, data, ...)"
model = "lme(random = random_effects, fixed = outcome ~ fixed_effects)"