--- title: "ACE Models with the NLSY" author: - William Howard [Beasley](https://scholar.google.com/citations?user=ffsJTC0AAAAJ) (Howard Live Oak LLC, Norman) - Joseph Lee [Rodgers](https://www.vanderbilt.edu/psychological_sciences/bio/joe-rodgers) (Vanderbilt University, Nashville) - David [Bard](https://medicine.ouhsc.edu/Academic-Departments/Pediatrics/Sections/Developmental-Behavioral-Pediatrics/Faculty/david-e-bard-phd) (University of Oklahoma Health Sciences Center, OKC) - Kelly [Williams](https://oklahoma.gov/omma.html) (Oklahoma City University, OKC) - Michael D. [Hunter](https://acquia-prod.hhd.psu.edu/contact/michael-hunter) (Penn State) date: "`r Sys.Date()`" output: rmarkdown::html_vignette: toc: true vignette: > %\VignetteIndexEntry{ACE Models with the NLSY} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- We describe how to use the NlsyLinks package to examine various biometric models, using the NLSY79. ```{r preliminaries, echo=FALSE, message=FALSE} library(knitr) # library(magrittr) library(xtable) opts_chunk$set(echo = TRUE) options(width = 88, show.signif.stars = FALSE, continue = " ") # , lattice.theme = function() canonical.theme("pdf", color = FALSE) if (any(search() == "package:NlsyLinks")) detach("package:NlsyLinks") ``` Terminology =============================== Researchers and grad students interested using the [NLSY](https://www.nlsinfo.org/) for Behavior Genetics and family research, please start with our 2016 article, [*The NLSY Kinship Links: Using the NLSY79 and NLSY-Children Data to Conduct Genetically-Informed and Family-Oriented Research*](https://link.springer.com/article/10.1007/s10519-016-9785-3). This package considers both Gen1 and Gen2 subjects. 'Gen1' refers to subjects in the original NLSY79 sample (https://www.nlsinfo.org/content/cohorts/nlsy79). 'Gen2' subjects are the biological offspring of the Gen1 females -*i.e.*, those in the NLSY79 Children and Young Adults sample (https://www.nlsinfo.org/content/cohorts/nlsy79-children). The NLSY97 is a third dataset that can be used for behavior genetic research (https://www.nlsinfo.org/content/cohorts/nlsy97), although this vignette focuses on the two generations in the NLSY79. Standard terminology is to refer second generation subjects as 'children' when they are younger than age 15 (NSLYC), and as 'young adults' when they are 15 and older (NLSY79-YA); though they are the same respondents, different funding mechanisms and different survey items necessitate the distinction. This cohort is sometimes abbreviated as 'NLSY79-C', 'NLSY79C', 'NLSY-C' or 'NLSYC'. This packages uses 'Gen2' to refer to subjects of this generation, regardless of their age at the time of the survey. Within our own team, we've mostly stopped using terms like 'NLSY79', 'NLSY79-C' and 'NLSY79-YA', because we conceptualize it as one big sample containing two related generations. It many senses, the responses collected from the second generation can be viewed as outcomes of the first generation. Likewise, the parents in the first generation provide many responses that can be viewed as explanatory variables for the 2nd generation. Depending on your research, there can be big advantages of using one cohort to augment the other. There are also survey items that provide information about the 3rd generation and the 0th generation. The `SubjectTag` variable uniquely identify NLSY79 subjects when a dataset contains both generations. For Gen2 subjects, the `SubjectTag` is identical to their CID (*i.e.*, C00001.00 -the ID assigned in the NLSY79-Children files). However for Gen1 subjects, the `SubjectTag` is their CaseID (*i.e.*, R00001.00), with "00" appended. This manipulation is necessary to identify subjects uniquely in inter-generational datasets. A Gen1 subject with an ID of 43 becomes 4300. The `SubjectTag`s of her four children remain 4301, 4302, 4303, and 4304. The *expected coefficient of relatedness* of a pair of subjects is typically represented by the statistical variable `R`. Examples are: Monozygotic twins have `R`=1; dizygotic twins have `R`=0.5; full siblings (*i.e.*, those who share both biological parents) have `R`=0.5; half-siblings (*i.e.*, those who share exactly one biological parent) have `R`=0.25; adopted siblings have `R`=0.0. Other uncommon possibilities are mentioned the documentation for `Links79Pair`. The font (and hopefully their context) should distinguish the variable `R` from the software R. To make things slightly more confusing the computer variable for `R` in the `Links79Pair` dataset is written with a monospace font: `R`. A subject's `ExtendedID` indicates their extended family. Two subjects will be in the same extended family if either: [1] they are Gen1 housemates, [2] they are Gen2 siblings, [3] they are Gen2 cousins (*i.e.*, they have mothers who are Gen1 sisters in the NLSY79), [4] they are mother and child (in Gen1 and Gen2, respectively), or [5] they are (aunt|uncle) and (niece|nephew) (in Gen1 and Gen2, respectively). An **outcome variable** is directly relevant to the applied researcher; these might represent constructs like height, IQ, and income. A **plumbing variable** is necessary to manage BG datasets; examples are `R`, a subject's ID, and the date of a subject's last survey. An ACE model is the basic biometrical model used by Behavior Genetic researchers, where the genetic and environmental effects are assumed to be additive. The three primary variance components are (1) the proportion of variability due to a shared genetic influence (typically represented as $a^2$, or sometimes $h^2$), (2) the proportion of variability due to shared common environmental influence (typically $c^2$), and (3) the proportion of variability due to unexplained/residual/error influence (typically $e^2$). The variables are scaled so that they account for all observed variability in the outcome variable; specifically: $a^2 + c^2 + e^2 = 1$. Using appropriate designs that can logically distinguish these different components (under carefully specified assumptions), the basic biometrical modeling strategy is to estimate the magnitude of $a^2$, $c^2$, and $e^2$ within the context of a particular model. For gentle introductions to Behavior Genetic research, we recommend [Plomin (1990)](https://books.google.com/books?id=r7AgAQAAIAAJ&source=gbs_navlinks_s) and [Carey (2003)](https://www.colorado.edu/psych-neuro/gregory-carey). For more in-depth ACE model-fitting strategies, we recommend [Neale & Maes, (2004)](http://ibgwww.colorado.edu/workshop2006/cdrom/HTML/book2004a.pdf). The **NLS Investigator** (https://www.nlsinfo.org/investigator/) is the best way to obtain the NLSY79 and NLSY97 datasets. See our vignette dedicated to the NLS Investigator by typing `vignette("NlsInvestigator")` or by visiting https://cran.r-project.org/package=NlsyLinks. Before starting the real examples, first verify that the NlsyLinks package is installed correctly. If not, please refer to Appendix. ```{r} any(.packages(all.available = TRUE) == "NlsyLinks") # Should evaluate to TRUE. library(NlsyLinks) # Load the package into the current session. ``` The package's documentation manual can be opened by typing `?NlsyLinks` in R or clicking the appropriate entry in RStudio's 'Packages' tab (which is usually in the lower right panel). Example: DF analysis with a Simple Outcome for Gen2 Subjects, Using a Package Variable =============================== The vignette's first example uses a simple statistical model and all available Gen2 subjects. The `CreatePairLinksDoubleEntered` function will create a data frame where each represents one pair of siblings, respective of order (*i.e.*, there is a row for Subjects 201 and 202, and a second row for Subjects 202 and 201). This function examines the subjects' IDs and determines who is related to whom (and by how much). By default, each row it produces has at least six values/columns: (i) ID for the older member of the kinship pair: `Subject1Tag`, (ii) ID for the younger member: `Subject2Tag`, (iii) ID for their extended family: `ExtendedID`, (iv) their estimated coefficient of genetic relatedness: `R`, (v *and beyond*) outcome values for the older member; (vi *and beyond*) outcome values for the younger member. A DeFries-Fulker (**DF**) Analysis uses linear regression to estimate the $a^2$, $c^2$, and $e^2$ of a univariate biometric system. The interpretations of the DF analysis can be found in Rodgers & Kohler (2005) and Rodgers, Rowe, & Li (1999). This vignette example uses the newest variation, which estimates two parameters; the corresponding function is called `DeFriesFulkerMethod3`. The steps are: 1. Use the NLS Investigator to select and download a Gen2 dataset. 1. Open R and create a new script (see [Appendix: R Scripts](#appendix-creating-and-saving-r-scripts) and load the NlsyLinks package. If you haven't done so, [install the NlsyLinks package](#appendix-installing-and-loading-the-nlsylinks-package)). Within the R script, identify the locations of the downloaded data file, and load it into a data frame. 1. Within the R script, load the linking dataset. Then select only Gen2 subjects. The 'Pair' version of the linking dataset is essentially an upper triangle of a symmetric sparse matrix. 1. Load and assign the `ExtraOutcomes79` dataset. 1. Specify the outcome variable name and filter out all subjects who have a negative value in this variable. The NLSY typically uses negative values to indicate different types of missingness (see 'Further Information' below). 1. Create a double-entered file by calling the 'CreatePairLinksDoubleEntered` function. At minimum, pass the (i) outcome dataset, the (ii) linking dataset, and the (iii) name(s) of the outcome variable(s). *(There are occasions when a single-entered file is more appropriate for a DF analysis. See Rodgers & Kohler, 2005, for additional information.)* 1. Use 'DeFriesFulkerMethod3` function (*i.e.*, general linear model) to estimate the coefficients of the DF model. ```{r} ### R Code for Example DF analysis with a simple outcome and Gen2 subjects # Step 2: Load the package containing the linking routines. library(NlsyLinks) # Step 3: Load the LINKING dataset and filter for the Gen2 subjects dsLinking <- subset(Links79Pair, RelationshipPath == "Gen2Siblings") summary(dsLinking) # Notice there are 11,088 records (one for each unique pair). # Step 4: Load the OUTCOMES dataset, and then examine the summary. dsOutcomes <- ExtraOutcomes79 #' ds' stands for 'Data Set' summary(dsOutcomes) # Step 5: This step isn't necessary for this example, because Kelly Meredith already # groomed the values. If the negative values (which represent NLSY missing or # skip patterns) still exist, then: dsOutcomes$MathStandardized[dsOutcomes$MathStandardized < 0] <- NA # Step 6: Create the double entered dataset. dsDouble <- CreatePairLinksDoubleEntered( outcomeDataset = dsOutcomes, linksPairDataset = dsLinking, outcomeNames = c("MathStandardized") ) summary(dsDouble) # Notice there are 22176=(2*11088) records now (two for each unique pair). # Step 7: Estimate the ACE components with a DF Analysis ace <- DeFriesFulkerMethod3( dataSet = dsDouble, oName_S1 = "MathStandardized_S1", oName_S2 = "MathStandardized_S2" ) ace ``` *Further Information*: If the different reasons of missingness are important, further work is necessary. For instance, some analyses that use item `Y19940000` might need to distinguish a response of "Don't Know" (which is coded as -2) from "Missing" (which is coded as -7). For this vignette example, we'll assume it's safe to clump the responses together. Example: DF analysis with a univariate outcome from a Gen2 Extract =============================== The vignette's second example differs from the previous example in two ways. First, the outcome variables are read from a CSV ([comma separated values](https://en.wikipedia.org/wiki/Comma-separated_values) file) that was downloaded from the **NLS Investigator**. Second, the DF analysis is called through the function `AceUnivariate`; this function is a wrapper around some simple ACE methods, and will help us smoothly transition to more techniques later in the vignette. The steps are: 1. Use the NLS Investigator to select and download a Gen2 dataset. Select the variables 'length of gestation of child in weeks' (`C03280.00`), 'weight of child at birth in ounces' (`C03286.00`), and 'length of child at birth' (`C03288.00`), and then download the *.zip file to your local computer. 1. [Open R and create a new script](#appendix-creating-and-saving-r-scripts) and load the NlsyLinks package. 1. Within the R script, load the linking dataset. Then select only Gen2 subjects. 1. Read the CSV into R as a `data.frame` using `ReadCsvNlsy79Gen2`. 1. Verify the desired outcome column exists, and rename it something meaningful to your project. It is important that the `data.frame` is reassigned (*i.e.*, `ds <- RenameNlsyColumn(...)`). In this example, we rename column `C0328800` to `BirthWeightInOunces`. 1. Filter out all subjects who have a negative `BirthWeightInOunces` value. See the 'Further Information' note in the previous example. 1. Create a double-entered file by calling the `CreatePairLinksDoubleEntered` function. At minimum, pass the (i) outcome dataset, the (ii) linking dataset, and the (iii) name(s) of the outcome variable(s). 1. Call the `AceUnivariate` function to estimate the coefficients. ```{r} ### R Code for Example of a DF analysis with a simple outcome and Gen2 subjects # Step 2: Load the package containing the linking routines. library(NlsyLinks) # Step 3: Load the linking dataset and filter for the Gen2 subjects dsLinking <- subset(Links79Pair, RelationshipPath == "Gen2Siblings") # Step 4: Load the outcomes dataset from the hard drive and then examine the summary. # Your path might be: filePathOutcomes <- 'C:/BGResearch/NlsExtracts/gen2-birth.csv' filePathOutcomes <- file.path(path.package("NlsyLinks"), "extdata", "gen2-birth.csv") dsOutcomes <- ReadCsvNlsy79Gen2(filePathOutcomes) summary(dsOutcomes) # Step 5: Verify and rename an existing column. VerifyColumnExists(dsOutcomes, "C0328600") # Should return '10' in this example. dsOutcomes <- RenameNlsyColumn(dsOutcomes, "C0328600", "BirthWeightInOunces") # Step 6: For this item, a negative value indicates the parent refused, didn't know, # invalidly skipped, or was missing for some other reason. # For our present purposes, we'll treat these responses equivalently. # Then clip/Winsorized/truncate the weight to something reasonable. dsOutcomes$BirthWeightInOunces[dsOutcomes$BirthWeightInOunces < 0] <- NA dsOutcomes$BirthWeightInOunces <- pmin(dsOutcomes$BirthWeightInOunces, 200) # Step 7: Create the double entered dataset. dsDouble <- CreatePairLinksDoubleEntered( outcomeDataset = dsOutcomes, linksPairDataset = dsLinking, outcomeNames = c("BirthWeightInOunces") ) # Step 8: Estimate the ACE components with a DF Analysis ace <- AceUnivariate( method = "DeFriesFulkerMethod3", dataSet = dsDouble, oName_S1 = "BirthWeightInOunces_S1", oName_S2 = "BirthWeightInOunces_S2" ) ace ``` For another example of incorporating CSVs downloaded from the NLS Investigator, please see the "Race and Gender Variables" entry in the [FAQ](https://nlsy-links.github.io/NlsyLinks/articles/faq.html). Example: Multiple Group SEM of a Simple Outcome for Gen2 Subjects =============================== The example differs from the first one by the statistical mechanism used to estimate the components. The first example uses multiple regression to estimate the influence of the shared genetic and environmental factors, while this example uses structural equation modeling (SEM). The `CreatePairLinksSingleEntered` function will create a `data.frame` where each row represents one unique pair of siblings, *irrespective of order*. Other than producing half the number of rows, this function is identical to `CreatePairLinksDoubleEntered`. The steps are: (Steps 1-5 proceed identically to the first example.) 6. Create a *single*-entered file by calling the `CreatePairLinksSingleEntered` function. At minimum, pass the (i) outcome dataset, the (ii) linking dataset, and the (iii) name(s) of the outcome variable(s). 7. Declare the names of the outcome variables corresponding to the two members in each pair. Assuming the variable is called 'ZZZ' and the preceding steps have been followed, the variable 'ZZZ\_S1' corresponds to the first members and ZZZ\_S2' corresponds to the second members. 8. Create a GroupSummary `data.frame`, which identifies the `R` groups that should be considered by the model. Inspect the output to see if the groups show unexpected or fishy differences. 9. Create a `data.frame` with cleaned variables to pass to the SEM function. This `data.frame` contains only the three necessary rows and columns. 10. Estimate the SEM with the \pkg{lavaan} package. The function returns an `S4` object, which shows the basic ACE information. 11. Inspect details of the SEM, beyond the ACE components. In this example, we look at the fit stats and the parameter estimates. The \pkg{lavaan} package has additional methods that may be useful for your purposes. ```{r} ### R Code for Example lavaan estimation analysis with a simple outcome and Gen2 subjects # Steps 1-5 are explained in the vignette's first example: library(NlsyLinks) dsLinking <- subset(Links79Pair, RelationshipPath == "Gen2Siblings") dsOutcomes <- ExtraOutcomes79 dsOutcomes$MathStandardized[dsOutcomes$MathStandardized < 0] <- NA # Step 6: Create the single entered dataset. dsSingle <- CreatePairLinksSingleEntered( outcomeDataset = dsOutcomes, linksPairDataset = dsLinking, outcomeNames = c("MathStandardized") ) # Step 7: Declare the names for the two outcome variables. oName_S1 <- "MathStandardized_S1" # Stands for Outcome1 oName_S2 <- "MathStandardized_S2" # Stands for Outcome2 # Step 8: Summarize the R groups and determine which groups can be estimated. dsGroupSummary <- RGroupSummary(dsSingle, oName_S1, oName_S2) dsGroupSummary # Step 9: Create a cleaned dataset dsClean <- CleanSemAceDataset(dsDirty = dsSingle, dsGroupSummary, oName_S1, oName_S2) # Step 10: Run the model ace <- AceLavaanGroup(dsClean) ace # Notice the 'CaseCount' is 8,390 instead of 17,440. # This is because (a) one pair with R=.75 was excluded, and # (b) the SEM uses a single-entered dataset instead of double-entered. # # Step 11: Inspect the output further library(lavaan) # Load the package to access methods of the lavaan class. GetDetails(ace) # Examine fit stats like Chi-Squared, RMSEA, CFI, etc. fitMeasures(GetDetails(ace)) #' fitMeasures' is defined in the lavaan package. # Examine low-level details like each group's individual parameter estimates and standard # errors. Uncomment the next line to view the entire output (which is roughly 4 pages). # summary(GetDetails(ace)) ``` Example: Multiple Group SEM of a Simple Outcome for Gen1 Subjects =============================== The example differs from the previous one in three ways. First, Gen1 subjects are used. Second, standardized height is used instead of math. Third, pairs are dropped if their `R` is zero; we return to this last issue after the code is run. ```{r} ### R Code for Example lavaan estimation analysis with a simple outcome and Gen1 subjects # Steps 1-5 are explained in the vignette's first example: library(NlsyLinks) dsLinking <- subset(Links79Pair, RelationshipPath == "Gen1Housemates") dsOutcomes <- ExtraOutcomes79 # The HeightZGenderAge variable is already groomed # Step 6: Create the single entered dataset. dsSingle <- CreatePairLinksSingleEntered( outcomeDataset = dsOutcomes, linksPairDataset = dsLinking, outcomeNames = c("HeightZGenderAge") ) # Step 7: Declare the names for the two outcome variables. oName_S1 <- "HeightZGenderAge_S1" oName_S2 <- "HeightZGenderAge_S2" # Step 8: Summarize the R groups and determine which groups can be estimated. dsGroupSummary <- RGroupSummary(dsSingle, oName_S1, oName_S2) dsGroupSummary # Step 9: Create a cleaned dataset dsClean <- CleanSemAceDataset(dsDirty = dsSingle, dsGroupSummary, oName_S1, oName_S2) # Step 10: Run the model ace <- AceLavaanGroup(dsClean) ace # Step 11: Inspect the output further (see the final step in the previous example). ``` Most of them responded they were `Non-relative`s to the explicit items asked in 1979 (*i.e.*, NLSY79 variables `R00001.50` through `R00001.59`). Yet their height's observed correlations is far larger than would be expected for a sample of unrelated subjects. Since our team began BG research with the NLSY in the mid-1990s, the $R$=0 group has consistently presented higher than expected correlations, across many domains of outcome variables. For a long time, we have substantial doubts that subject pairs in this group share a low proportion of their selective genes. Consequently, we suggest applied researchers consider excluding this group from their biometric analyses. If you wish to exclude additional groups from the analyses, Step 8 should change slightly. For instance, to MZ twins, replace the two lines in Step 8 with the following four. This is most for demonstration. It is unlikely to be useful idea in the current example, and is more likely to be useful when using the `RFull` variable, which includes all values of `R` we were able to determine. ```{r} # Step 8: Summarize the R groups and determine which groups can be estimated. dsGroupSummary <- RGroupSummary(dsSingle, oName_S1, oName_S2) rGroupsToDrop <- c(1) dsGroupSummary[dsGroupSummary$R %in% rGroupsToDrop, "Included"] <- FALSE dsGroupSummary ``` Example: Multiple Group SEM of a Simple Outcome all pairs in Gen1 and Gen2 =============================== The example differs from the previous example in one way --all possible pairs are considered for the analysis. Pairs are only excluded (a) if they belong to one of the small `R` groups that are difficult to estimate, or (b) if the value for adult height is missing. This includes all \Sexpr{scales::comma(nrow(Links79Pair))} relationships in the follow five types of NLSY79 relationships. ```{r, results='asis'} xt <- xtable(table(Links79Pair$RelationshipPath, dnn = c("Relationship Frequency")), caption = "Number of NLSY79 relationship, by `RelationshipPath`.(Recall that 'AuntNiece' also contains uncles and nephews.)" ) print.xtable(xt, format.args = list(big.mark = ","), type = "html") ``` In our opinion, using the intergenerational links is one of the most exciting new opportunities for NLSY researchers to pursue. We will be happy to facilitate such research through consult or collaboration, or even by generating new data structures that may be of value. The complete kinship linking file facilitates many different kinds of cross-generational research, using both biometrical and other kinds of modeling methods. ```{r} ### R Code for Example lavaan estimation analysis with a simple outcome and Gen1 subjects # Steps 1-5 are explained in the vignette's first example: library(NlsyLinks) dsLinking <- subset(Links79Pair, RelationshipPath %in% c( "Gen1Housemates", "Gen2Siblings", "Gen2Cousins", "ParentChild", "AuntNiece" )) # Because all five paths are specified, the line above is equivalent to: # dsLinking <- Links79Pair dsOutcomes <- ExtraOutcomes79 # The HeightZGenderAge variable is already groomed # Step 6: Create the single entered dataset. dsSingle <- CreatePairLinksSingleEntered( outcomeDataset = dsOutcomes, linksPairDataset = dsLinking, outcomeNames = c("HeightZGenderAge") ) # Step 7: Declare the names for the two outcome variables. oName_S1 <- "HeightZGenderAge_S1" oName_S2 <- "HeightZGenderAge_S2" # Step 8: Summarize the R groups and determine which groups can be estimated. dsGroupSummary <- RGroupSummary(dsSingle, oName_S1, oName_S2) dsGroupSummary # Step 9: Create a cleaned dataset dsClean <- CleanSemAceDataset(dsDirty = dsSingle, dsGroupSummary, oName_S1, oName_S2) # Step 10: Run the model ace <- AceLavaanGroup(dsClean) ace # Step 11: Inspect the output further (see the final step two examples above). ``` Notice the ACE estimates are very similar to the previous version, but the number of pairs has increased by 6x --from 4,185 to 24,700. The number of *subjects* doubles when Gen2 is added, and the number of *relationship pairs* really takes off. When an extended family's entire pedigree is considered by the model, many more types of links are possible than if just nuclear families are considered. This increased statistical power is even more important when the population's $a^2$ is small or moderate, instead of something large like 0.7. You may notice that the analysis has `r scales::comma(ace@CaseCount)` relationships instead of the entire `r scales::comma(nrow(Links79Pair))`. This is primarily because not all subjects have a value for 'adult height' (and that's mostly because a lot of Gen2 subjects are too young). There are `r scales::comma(sum(!is.na(Links79PairExpanded$RFull)))` pairs with a nonmissing value in `RFull`, meaning that `r round(mean(!is.na(Links79PairExpanded$RFull))*100, 1)` are classified. We feel comfortable claiming that if a researcher has a phenotype for both members of a pair, there's a 99+% chance we have an `RFull` for it. For a description of the `R` and `RFull` variables, please see the `Links79Pair` entry in the package [reference manual](https://nlsy-links.github.io/NlsyLinks/). **References:** The standard errors (but not the coefficients) are biased downward in these analyses, because individuals are included in multiple pairs. Our MDAN article presents a [GEE](https://en.wikipedia.org/wiki/Generalized_estimating_equation) method for handling this (p. 572). The CARB model (or any model that treats the full pedigree as a single unit of analysis in the multivariate or multilevel sense) also would produce more accurate standard error estimates. One of our [2013 BGA presentations](https://r-forge.r-project.org/forum/forum.php?thread_id=28498&forum_id=4266&group_id=1330) discusses these benefits in the context of the current NlsyLinks package, and our 2008 MDAN article accomplishes something similar using a GEE with females in both generations. Bard, D.E., Beasley, W.H., Meredith, K., & Rodgers, J.L. (2012). [*Biometric Analysis of Complex NLSY Pedigrees: Introducing a Conditional Autoregressive Biometric (CARB) Mixed Model*](https://link.springer.com/article/10.1007/s10519-012-9566-6). Behavior Genetics Association 42nd Annual Meeting. [[Slides](https://r-forge.r-project.org/forum/forum.php?thread_id=4761&forum_id=4266&group_id=1330)] Beasley, W.H., Bard, D.E., Meredith, K., Hunter, M., & Rodgers, J.L. (2013). [*NLSY Kinship Links: Creating Biometrical Design Structures from Cross-Generational Data*](https://link.springer.com/article/10.1007/s10519-013-9623-9). Behavior Genetics Association 43rd Annual Meeting. [[Slides](https://r-forge.r-project.org/forum/forum.php?thread_id=28498&forum_id=4266&group_id=1330)] Rodgers, J. L., Bard, D., Johnson, A., D'Onofrio, B., & Miller, W. B. (2008). [The Cross-Generational Mother-Daughter-Aunt-Niece Design: Establishing Validity of the MDAN Design with NLSY Fertility Variables](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2712575/). *Behavior Genetics, 38*, 567-578. Appendix: Receiving Help for the NlsyLinks Package =============================== A portion of our current grant covers a small, part-time support staff. If you have questions about BG research with our kinship links, or questions about our package, we'd like to hear from you. We provide personal support for researchers in several ways. Perhaps the best place to start are the forums on R-Forge (https://r-forge.r-project.org/forum/?group_id=1330); there are forums for people using R, as well as other software such as SAS. [This post](https://r-forge.r-project.org/forum/forum.php?thread_id=4537&forum_id=4266&group_id=1330) is a good overview of the current project is, which originally was an email Joe sent to previous users of our kinship links (many of them are/were SAS users). Appendix: Creating and Saving R Scripts =============================== There are several options and environments for executing R code. Our current recommendation is [RStudio](https://posit.co/), because it is easy to install, and has features targeting beginner and experienced R users. We've had good experiences with it on Windows, OS X, and Ubuntu Linux. RStudio allows you to create and save R files; these are simply text files that have an file extension of '.R'. RStudio will execute the commands written in the file. Appendix: Installing and Loading the NlsyLinks Package =============================== There are three operations you'll typically do with a package: (a) install, (b) load, and (c) update. The simplest way to *install* NlsyLinks is to type `install.packages("NlsyLinks")` in the console. R then will download NlsyLinks on your local computer. It may try to save and install the package to a location that you don't have permission to write files in. If so, R will ask if you would like to install it to a better location (*i.e.*, somewhere you do have permission to write files). Approve this decision (which is acceptable for everyone except for some network administrators). For a given computer, you'll need to *install* a package only once for each version of R (new versions of R are released every few months). However, you'll need to *load* a package in every session that you call its functions. To *load* NlsyLinks, type `library(NlsyLinks)`. Loading reads NlsyLinks information from the hard drive and places it in temporary memory. Once it's loaded, you won't need to load it again until R is closed and reopened later. Developers are continually improving their packages by adding functions and documentation. These newer versions are then uploaded to the CRAN servers. You may *update* all your installed packages at once by typing `update.packages(ask = FALSE)`. The command checks a CRAN server for newer versions of the packages installed on your local machine. Then they are automatically downloaded and installed. The grant supporting NlsyLinks extends until 2020. Until then, we'll be including new features and documentation, as we address additional user needs (if you have suggestions, we'd like to hear from you). When the NLSY periodically updates its data, we'll update our kinship links (embedded in NlsyLinks) with the newest information. Appendix: References =============================== A list of some articles that have used the NLSY for behavior genetics is available at: https://nlsy-links.github.io/NlsyLinks/articles/publications.html. Carey, Gregory (2002). [*Human Genetics for the Social Sciences*](https://www.colorado.edu/psych-neuro/gregory-carey). Sage. Plomin, Robert (1990). [*Nature and nurture: an introduction to human behavioral genetics*](https://books.google.com/books?id=r7AgAQAAIAAJ&source=gbs_navlinks_s). Brooks/Cole Publishing Company. Rodgers, J. L., Bard, D., Johnson, A., D'Onofrio, B., & Miller, W. B. (2008). [The Cross-Generational Mother-Daughter-Aunt-Niece Design: Establishing Validity of the MDAN Design with NLSY Fertility Variables](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2712575/). *Behavior Genetics, 38*, 567-578. Rodgers, Joseph Lee, & Kohler, Hans-Peter (2005). [Reformulating and simplifying the DF analysis model](https://pubmed.ncbi.nlm.nih.gov/15685433/). *Behavior Genetics, 35 (2)*, 211-217. Rodgers, Joseph Lee, Rowe, David C., & Li, Chengchang (1994). [Beyond nature versus nurture: DF analysis of nonshared influences on problem behaviors](https://psycnet.apa.org:443/journals/dev/30/3/374/). *Developmental Psychology, 30 (3)*, 374-384. Neale, Michael C., & Cardon, Lou R. (1992). [*Methodology for genetic studies of twins and families*](https://books.google.com/books/about/Methodology_for_genetic_studies_of_twins.html?id=vVzDmDv6WDkC). Norwell, MA: Kluwer Academic Publishers. (Also see Neale & Maes: [http://ibgwww.colorado.edu/workshop2006/cdrom/HTML/book2004a.pdf]). Funding =============================== This package's development has been supported by two grants from NIH. The first, NIH Grant [1R01HD65865](https://taggs.hhs.gov/Detail/AwardDetail?arg_awardNum=R01HD065865&arg_ProgOfficeCode=50), “NLSY Kinship Links: Reliable and Valid Sibling Identification” (PI: Joe Rodgers; Vignette Construction by Will Beasley) supported the (virtually) final completion of the NLSY79 and NLSYC/YA kinship linking files. The second, NIH Grant [1R01HD087395](https://reporter.nih.gov/project-details/9239744), “New NLSY Kinship Links and Longitudinal/ Cross-Generational Models: Cognition and Fertility Research," (PI: Joe Rodgers; Vignette Construction by Will Beasley) is supporting the development of the NLSY97 kinship links, and slight updates/extensions in the links for the two earlier data sources. Version Information =============================== For the sake of documentation and reproducibility, the current report was rendered in the following environment. Click the line below to expand. But you'll probably regret it.
Environment ```{r session-info-2, echo=FALSE} devtools::session_info()$platform knitr::kable(devtools::session_info()$packages[, c("loadedversion", "date")], format = "html") ```