midfieldr is an R package that provides tools and methods for
studying undergraduate student-level records from the MIDFIELD
database.

Provides tools and demonstrates methods for working with individual undergraduate student-level records (registrar’s data) in R. Tools include filters for program codes, data sufficiency, and timely completion. Methods include gathering blocs of records, computing quantitative metrics such as graduation rate, and creating charts to visualize comparisons. midfieldr is designed to work with data from the MIDFIELD research database, a sample of which is available in the midfielddata data package.
midfieldr provides these functions for manipulating student-level data:
add_completion_status() Determine completion status for
every studentadd_data_sufficiency() Determine data sufficiency for
every studentadd_timely_term() Calculate a timely completion term
for every studentfilter_cip() Filter CIP data to match search
stringsprep_fye_mice() Prepare FYE data for multiple
imputationselect_required() Select required midfieldr
variablesAdditional functions for processing intermediate results:
order_multiway() Order categorical variables of
multiway dataR packages in examples and vignettes
In this example, we gather all students ever enrolled in Engineering and summarize their graduation status (in any major), grouping by race/ethnicity and sex. If you are writing your own script to follow along, we use these packages in this example:
library(midfieldr)
library(midfielddata)
library(data.table)Load the practice data. Reduce initial dimensions of data tables.
# Load the practice data
data(student, term, degree)
# Reduce dimensions of source data tables
student <- select_required(student)
term <- select_required(term)
degree <- select_required(degree)Filter for data sufficiency.
# Initialize the working data frame
DT <- term[, .(mcid, cip6)]
# Filter observations for data sufficiency
DT <- add_timely_term(DT, term)
DT <- add_data_sufficiency(DT, term)
DT <- DT[data_sufficiency == "include"]Filter for degree-seeking students ever enrolled in Engineering.
# Inner join to filter observations for degree-seeking
cols_we_want <- student[, .(mcid)]
DT <- cols_we_want[DT, on = c("mcid"), nomatch = NULL]
# Filter observations for engineering programs
DT <- DT[cip6 %like% "^14"]
# Filter observations for unique students (first instance)
DT <- DT[, .SD[1], by = c("mcid")]Determine completion status.
# Add completion status variable
DT <- add_completion_status(DT, degree)Aggregate observations by groupings.
# Left join to add race/ethnicity and sex variables (omit unknowns)
cols_we_want <- student[, .(mcid, race, sex)]
DT <- student[DT, on = c("mcid")]
DT <- DT[!(race %ilike% "unknown" | sex %ilike% "unknown")]
# Create a variable combining race/ethnicity and sex
DT[, people := paste(race, sex)]
# Aggregate observations by groupings
DT_display <- DT[, .N, by = c("completion_status", "people")]
setorderv(DT_display, c("completion_status", "people"))Reshape results for display.
# Transform to row-record form
DT_display <- dcast(DT_display, people ~ completion_status, value.var = "N", fill = 0)
# Prepare the table for display
setcolorder(DT_display, c("people", "timely", "late"))
setkeyv(DT_display, c("people"))
setnames(DT_display,
old = c("people", "timely", "late", "NA"),
new = c("People", "Timely completion", "Late completion", "Did not complete")
)Print the results table. “Timely completion” is the count of graduates completing their programs in no more than 6 years; “Late completion” is the count of those graduating in more than 6 years; “Did not complete” is the count of non-graduates.
| People | Timely completion | Late completion | Did not complete |
|---|---|---|---|
| Asian Female | 87 | 4 | 43 |
| Asian Male | 315 | 19 | 163 |
| Black Female | 26 | 3 | 39 |
| Black Male | 80 | 5 | 84 |
| Hispanic Female | 36 | 3 | 31 |
| Hispanic Male | 181 | 19 | 102 |
| International Female | 110 | 9 | 51 |
| International Male | 501 | 41 | 280 |
| Native American Female | 2 | 0 | 2 |
| Native American Male | 13 | 3 | 6 |
| White Female | 985 | 51 | 386 |
| White Male | 4100 | 269 | 2034 |
Reminder. midfielddata is suitable for learning to work with student-level data but not for drawing inferences about program attributes or student experiences. midfielddata supplies practice data, not research data.
Install from CRAN with:
install.packages("midfieldr")You can install the development version of midfieldr from GitHub with:
# install.packages("pak")
pak::pak("MIDFIELDR/midfieldr")The installed size of midfielddata is about 24 Mb, so the installation takes some time. Install from the MIDFIELDR drat repository with:
install.packages("midfielddata",
repos = "https://MIDFIELDR.github.io/drat/",
type = "source"
)MIDFIELD. A database of anonymized student-level records for approximately 2.4M undergraduates at 21 US institutions from 1987-2022. In 2023, control and management of the database was transferred to the American Society for Engineering Education (ASEE). For further information, contact ASEE.
midfielddata. An R data package that supplies anonymized student-level records for 98,000 undergraduates at three US institutions from 1988-2018. A sample of the MIDFIELD database, midfielddata provides practice data for the tools and methods in the midfieldr package.
As of the transfer of MIDFIELD to the American Society for Engineering Education (ASEE) in 2023, the development, expansion, and study of MIDFIELD has been supported by the National Science Foundation grants 0337629, 0646441, 0729596, 0734062, 0835914, 0935157, 0935058, 0969474, 1025171, 1129383, 1232740, 1329283, 1361058, 1545667, 2142087, 2141903, and 2152441.