--- title: "The Jacquard package" author: - Jan Graffelman - Dpt. of Statistics, Universitat Politecnica de Catalunya; Dpt. of Biostatistics, University of Washington date: "`r Sys.Date()`" output: rmarkdown::html_vignette bibliography: Jacquard.bib vignette: > %\VignetteIndexEntry{The Jacquard package} %\VignetteEncoding{UTF-8} %\VignetteEngine{knitr::rmarkdown} editor_options: chunk_output_type: console --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) knitr::opts_chunk$set(fig.align = "center") knitr::opts_chunk$set(warnings = FALSE) knitr::opts_chunk$set(message = FALSE) knitr::opts_chunk$set(fig.width = 6, fig.height = 6) ``` ## Introduction
This document explains the basic functionality of the **Jacquard** package which provides functions for the estimation of the nine condensed Jacquard coefficients (@Jacquard) for biallelic genetic marker data using a constrained least squares approach (@Graffelman2024). Outline: 1. [Installation](#installation) 2. [Estimation of the Jacquard coefficients](#coefficients) 3. [Estimation of derived relatedness parameters](#derived) 4. [Alternative estimation procedures](#mom) 5. [References](#refs) ## 1. Installation The package can be installed from CRAN with the instructions ```{r preinstall} #install.packages("Jacquard") library(Jacquard) ``` ## 2. Estimation of Jacquard the coefficients We illustrate the estimation of the Jacquard coefficients using (0,1,2) coded genotype data available in the data set `SimulatedPedigree`, of which we show a small part ```{r data} data(SimulatedPedigree) SimulatedPedigree[1:5,1:10] ``` This matrix contains `r formatC(nrow(SimulatedPedigree), digits=0, format = "f")` individuals, spanning seven generations, in its rows and pedigree information plus genotype data of 20,000 SNPs in its columns. We first separate the pedigree information from the genotype information. ```{r} Xped <- SimulatedPedigree[,1:5] Xgen <- as.matrix(SimulatedPedigree[,6:ncol(SimulatedPedigree)]) ``` We next determine the nine joint genotype counts for all pairs of individuals, storing these counts in a list object with nine lower triangular matrices, using function `JointGenotypeCounts`. For efficiency, we here load the precalculated joint counts. ```{r} #GTC <- JointGenotypeCounts(Xgen) data(GTC) names(GTC) ``` E.g., the counts of the major homozygote pairs for the first five individuals are ```{r} GTC[[1]][1:5,1:5] ``` We create a vector with the minor allele frequency of each SNP. ```{r} mafvec <- mafvector(Xgen) mafvec[1:5] ``` We proceed to estimate the Jacquard coefficients by constrained least squares with function `Jacquard.cls`. For the sake of illustration, we take the first three founders of the pedigree, and subset their joint genotype counts ```{r echo = TRUE} ii <- 1:3 Xped[ii,] GTCsubset <- list(length = 9) for (k in 1:9) { GTCsubset[[k]] <- matrix(numeric(3^2), ncol = 3) GTCsubset[[k]] <- GTC[[k]][ii,ii] } ``` We set random initial values for the Jacquard coefficients ```{r} set.seed(123) delta.init <- runif(9) delta.init <- delta.init/sum(delta.init) ``` And estimate the pairwise Jacquard coefficients of the three pairs. ```{r echo = TRUE} output <- Jacquard.cls(GTCsubset,mafvec=mafvec, eps=1e-06, delta.init=delta.init) Delta.cls <- output$delta ``` Convergence of the solver can be checked by looking at the field `convergence`, where 0 indicates proper convergence. ```{r} output$convergence ``` Particular estimates of Jacquard coefficients can be extracted from the `Delta.cls` list object, which is a list of nine matrices. E.g., $\Delta_9$ of the first pair of individuals (1,2) can be extracted by ```{r} Delta.cls[[9]][1,2] ``` All nine estimates of the Jacquard coefficients can be obtained with function `DeltaPair` ```{r} DeltaPair(Delta.cls,1,2) ``` A pairwise list of the nine Jacquard coefficients of each pair can be obtained with the function `PairwiseList`. ```{r echo = TRUE} PairwiseList(Delta.cls) ``` The full set of all pairwise Jacquard coefficients can be calculated with the instructions below. This result is also available in the precalculated data object `DeltaSimulatedPedigree`. ```{r} #DeltaSimulatedPedigree <- Jacquard.cls(GTC,mafvec=mafvec, # eps=1e-06, # delta.init=delta.init)$delta data(DeltaSimulatedPedigree) ``` Boxplots of the estimated Jacquard coefficients can be obtained with function `BoxplotDelta`. Separate boxplots are shown for the diagonals of $\Delta_1$ and $\Delta_7$. ```{r} BoxplotDelta(DeltaSimulatedPedigree) ``` ## 3. Estimation of derived relatedness parameters The set of five identifiable relatedness parameters dicussed by @Csuros, among them coancestry and inbreeding coefficients, can be calculated with the function `CalculateTheta`. ```{r} Theta <- CalculateTheta(DeltaSimulatedPedigree) ``` E.g., estimates of the kinship coefficients of the first five (founder) individuals are obtained by ```{r} Theta[[1]][1:5,1:5] ``` Individual inbreeding coefficients can be obtained from ... ```{r} diag(Theta[[2]])[1:5] diag(DeltaSimulatedPedigree[[1]])[1:5] ``` Boxplots of identifiable relatedness parameters can be made with `BoxplotTheta` ```{r} BoxplotTheta(Theta) ``` ## 4. Alternative estimation procedures There are many estimators for coancestry and inbreeding. Function `CalculateThetaMom` calculates moment estimators for the five identifiable relatedness parameters defined by @Csuros. ```{r mom, echo = FALSE} KS.mom <- CalculateMom(Xgen[1:10,],mafvec,verbose=FALSE) ``` E.g., moment estimators of the kinship coefficients of the first five individuals are given by ```{r} KS.mom[[1]][1:5,1:5] ```
## 5. References