QTL detection in multiparental populations characterized in multiple environments

Vincent Garin

24/12/2022

Multiparental populations

Multiparental populations (MPPs) like the nested association mapping (NAM; McMullen et al. (2009)) or multi-parent advanced generation inter-cross (MAGIC; Cavanagh et al. (2008)) populations have emerged as central genetic resources for research and breeding purposes (Scott et al. 2020). The diallels (Blanc et al. 2006), the factorial designs or the collections of crosses developed in breeding programs (Würschum 2012) can also be considered as MPPs. We can classify MPPs given the use of intrecrossing. In a first type of MPPs, there is no intercrossing. Each genotype inherits its alleles from two parents. Those MPPs can be seen as a collection of crosses (e.g. NAM or factorial design). A second type of MPPs like MAGIC involve genotypes that are a mosaic of more than two parents or founders. In this package, we only consider the first type of MPPs without intercrossing or collections of crosses with shared parents.

MPP-ME QTL detection

Even though several statistical methodologies and software are available to perform QTL detection in MPPs, there is potentially few or no open-source solution for the detection of QTL in MPPs characterized in multiple environments (MPP-ME). In this extension of mppR, we propose a method to detect QTLs in MPP-ME data.

Model description

We start from the QTL detection model 3 described in (Garin, Malosetti, and Eeuwijk 2020):

\[\underline{y}_{icj} = E_{j} + C_{cj} + x_{i_{q}p} * \beta_{pj} + \underline{GE}_{icj} + \underline{e}_{icj} \quad [1]\]

where \(\underline{y}_{icj}\) is the genotype adjusted mean (e.g. BLUE) of genotype \(i\) from cross \(c\) in environment \(j\). \(E_{j}\) is a fixed environment term, \(C_{cj}\) a fixed cross within environment interaction, \(x_{i_{q}p}\) is the number of allele copies coming from parent \(p\) at QTL position \(q\), and \(\beta_{pj}\) represent the QTL effect of parental allele \(p\) in environment \(j\). Different definitions of the QTL allelic effect are possible (Garin et al. 2017). In this version of the model we assumed that each parent carries a different allele and that each of those alleles can have an environmental specific effect. The model is therefore saturated with the estimation of \(N_{env} * N_{par}\) effects with the most frequent allele set as the reference in all environments (e.g. the central or recurrent parent in NAM). In model 1, QTL terms are considered as fixed. Therefore, the user needs to provide best linear unbiased estimates (BLUEs) as trait value.

The term \(GE_{icj}\) is the residual genetic variation and \(e_{icj}\) the plot error term. In model 1 because genotype values are averaged over replication \(e_{icj}\) and \(GE_{icj}\) are confounded. The variance of the \((GE_{icj} + e_{icj})\) term can be modeled with different variance covariance (VCOV) structures. A first possibility is to assume a constant genotypic variance across environments \(\sigma_{G}^{2}\) (compound symmetry - CS) and a unique variance for the plot error term \(\sigma_{e}^{2}\):

\[V\begin{bmatrix} y_{i11} \\ y_{i'21} \\ y_{i12} \\ y_{i'22} \\ y_{i13} \\ y_{i'23} \end{bmatrix} = \begin{bmatrix} \sigma_{G}^{2} + \sigma_{e}^{2} & 0 & \sigma_{G}^{2} & 0 & \sigma_{G}^{2} & 0 \\ 0 & \sigma_{G}^{2} + \sigma_{e}^{2} & 0 & \sigma_{G}^{2} & 0 & \sigma_{G}^{2} \\ \sigma_{G}^{2} & 0 & \sigma_{G}^{2} + \sigma_{e}^{2} & 0 & \sigma_{G}^{2} & 0 \\ 0 & \sigma_{G}^{2} & 0 & \sigma_{G}^{2} + \sigma_{e}^{2} & 0 & \sigma_{G}^{2} \\ \sigma_{G}^{2} & 0 & \sigma_{G}^{2} & 0 & \sigma_{G}^{2} + \sigma_{e}^{2} & 0 \\ 0 & \sigma_{G}^{2} & 0 & \sigma_{G}^{2} & 0 & \sigma_{G}^{2} + \sigma_{e}^{2} \\ \end{bmatrix}\]

The CS model requires the estimation of a single term (\(\sigma_{G}^{2}\)) to describe genetic the covariance between all pairs of environments. This model assumes that the background polygenic effect (effect of all genome positions except the tested QTL and cofactors) is the same in all environments (Van Eeuwijk et al. 2001).

A more realistic option called unstructured (UN) model allows the genetic covariance to be different in every pairs of environments. From a genetic point of view, it means that a set of genes have a different effect in each environments (Van Eeuwijk et al. 2001). In the unstructured model, each pair of environment get a specific genetic covariance term \(cov(y..j, y..j') = \sigma_{G_{jj'}}^{2}\)

\[V\begin{bmatrix} y_{i11} \\ y_{i'21} \\ y_{i12} \\ y_{i'22} \\ y_{i13} \\ y_{i'23} \end{bmatrix} = \begin{bmatrix} \sigma_{G_{1}}^{2} + \sigma_{e}^{2} & 0 & \sigma_{G_{12}} & 0 & \sigma_{G_{13}} & 0 \\ 0 & \sigma_{G_{1}}^{2} + \sigma_{e}^{2} & 0 & \sigma_{G_{12}} & 0 & \sigma_{G_{13}} \\ \sigma_{G_{12}} & 0 & \sigma_{G_{2}}^{2} + \sigma_{e}^{2} & 0 & \sigma_{G_{23}} & 0 \\ 0 & \sigma_{G_{12}} & 0 & \sigma_{G_{2}}^{2} + \sigma_{e}^{2} & 0 & \sigma_{G_{23}} \\ \sigma_{G_{13}} & 0 & \sigma_{G_{23}} & 0 & \sigma_{G_{3}}^{2} + \sigma_{e}^{2} & 0 \\ 0 & \sigma_{G_{13}} & 0 & \sigma_{G_{23}} & 0 & \sigma_{G_{3}}^{2} + \sigma_{e}^{2} \\ \end{bmatrix}\]

We considered the UN model as the default option.

QTL detection procedure

The QTL detection procedure is composed of the following steps:

  1. SIM scan
  2. Selection of cofactors
  3. CIM scan
  4. Selection of QTLs
  5. Estimation of the QTL allele effects
  6. Estimation of the QTL R2 contribution
  7. Plot of the QTL profile
  8. Visualisation of the allelic significance along the genome

The function mppGE_proc() perform the whole procedure detailed above. For further details, see the examples below.

Approximate Wald statistic computation

The calculation of an ‘exact’ mixed model at each marker (QTL) position of the genome is computationally demanding, which can quickly make detection unfeasible, especially with large populations. Therefore, to reduce the computation demand for a genome scan, we implemented an approximate mixed model computation similar to the generalized least square strategy implemented in Kruijer et al. (2015) or Garin et al. (2017). The procedure consists of estimating a general VCOV (\(\hat{V}\)) using model 1 without the tested QTL position for the simple interval mapping (SIM) scan. For the composite interval mapping (CIM) model, \(\hat{V}\) is estimated including the cofactors as fixed effect without the QTL position. For the CIM scan, there will be as many \(\hat{V}\) as the number of cofactors combinations given cofactor exclusion around the tested position.

There are two option for the estimation of \(\hat{V}\). In the first option we estimate a unique \(\hat{V}\) taking all cofactors into consideration (VCOV_data = "unique"). In the second option, different \(\hat{V}\) should be formed by removing the cofactor information that is too close of a tested QTL position (VCOV_data = "minus_cof").

The statistical significance of the tested QTL position and the allelic effects is obtained by substituting \(\hat{V}\) to get the following Wald statistic

\[W_Q = \beta^{T} [V(\beta)]^{-1} \beta \quad [2]\]

where,

\[\beta = (X^{T} \hat{V}^{-1} X)^{-1} X^T \hat{V}^{-1} y \quad [3]\]

and

\[V(\beta) = (X^{T} \hat{V}^{-1} X)^{-1}\]

\(X\) represents the fixed effect matrix including the cross effect, the cofactors, and the QTL position, and \(y\) is the vector of genotypic values. \(W_Q \sim \chi_{d}^{2}\) with degree of freedom (\(d\)) equal to the number of tested QTL allelic effects (e.g. \((N_{par}-1) * N_{env}\) for a QTL term with environment-specific allelic effects).

QTL effects estimation

It is possible to estimate the QTL parental allelic effect using model 1. Model 1 allows the estimation of QTL allelic effect for each parents in each environment. One parent is set as reference. By default, it is the most frequent parental allele. Therefore, in populations like the nested association mapping (NAM) populations, the central parent is considered as the reference. The reference parent can be specified by the user through the ref_par argument. The parental QTL allelic effect must be interpreted as the extra effect of one allele copy with respect to the reference parent in the considered environment.

QTL by environment effect determination

After QTL detection, it is possible to determine if the QTL allelic effects interact with the environment or rather have a consistent effect across environments. For that purpose, the parental QTL effects can be decomposed into a main effect component (\(\alpha_p\)) and a QEI component (\(\beta_{pj}\)) that are estimated simultaneously, which gives the following multi-QTL model:

\[\underline{y}_{icj} = E_{j} + C_{cj} + \sum_{q=1}^{n_{QTL}} x_{i_{q}p} * (\alpha_p + \beta_{pj}) + \underline{GE}_{icj} + \underline{e}_{icj} \quad [4]\]

We can test for the significance of the \(\alpha_p\) and \(\beta_{pj}\) terms using the Wald test. Those tests allow the determination of the specific parental effect significance (\(\alpha_p\) or \(\beta_{pj}\) significant), and the determination of the specific parental allele interaction with the environment (\(\beta_{pj}\) significant).

QTL by environmental covariate (EC) effect estimation

For the parental allelic effects showing a significant interaction with the environment in model 4, it is possible to extend the model to test the sensitivity of the QTL allele to a specific environmental covariate (\(EC_j\), e.g. temperature or humidity). For that purpose, we can rewrite the QTL effect term \(\beta_{pj}\) of the parents with a significant environmental interaction (pxE). We can replace \(\beta_{pj}\) with \(EC_j*S_p+l_{p\epsilon}\). \(EC_j\) represents the EC value in environment j associated with the sensitivity term \(S_p\). The \(S_{p}\) determines the rate of change of the parental QTL allelic additive effect given an extra unit of EC. Finally, \(l_{p\epsilon}\) is a residual effect. The fitted model becomes:

\[\underline{y}_{icj} = E_{j} + C_{cj} + \sum_{q=1}^{n_{QTL}} x_{i_{q}p} * (\alpha_p + \beta_{pj}) + x_{i_{q}pxE} * (\alpha_p + EC_j*S_p+l_{p\epsilon}) + \underline{GE}_{icj} + \underline{e}_{icj} \quad [5]\] The significance of the sensitivity term \(S_p\) can be evaluated with the Wald test. Contrary to the model fitted for the QTL detection scan, model 1 for QTL effect estimation, models 4 and 5 are fitted using an ‘exact’ REML solution using the R pakage nlme.

QTL analysis example

Data

The format of the data is the same as the default mppData objects format used in mppR. The traits measured in the different environments are simply added as separate columns of the pheno argument when you create the mppData object with create.mppData().

The data used in this vignette (mppData_GE) come from the EU-NAM Flint population (Bauer et al. 2013). The genotype data come from five crosses between the donor parents: D152, F03802, F2, F283, DK105 with the recurrent parent UH007. We selected a subset of 100 markers spread over chromosomes five and six. The genetic map was calculated by Giraud et al. (2014). The phenotypic data represent the within environment adjusted means for dry matter yield (DMY) calculated at La Coruna (CIAM), Roggenstein (TUM), Einbeck (KWS), and Ploudaniel (INRA_P) Lehermeier et al. (2014).

library(mppR)
#> mppR has been developed with the support of
#> Wageningen University
#> KWS SAAT SE & Co. KGaA
#> Swiss National Science Foundation
#> (Grant Postdoc.Mobility P500PB_203030)
data(mppData_GE)
design_connectivity(par_per_cross = mppData_GE$par.per.cross)

#> $`1`
#> [1] "UH007"  "D152"   "F03802" "F2"     "F283"   "DK105"

Simple interval mapping (SIM)

To perform the SIM scan, you simply need to pass the mppData object to the function and specify the environments you want to use in the trait argument. If needed, a reference parent can be specified using the ref_par argument. For example, here we analyse DMY characterized in the CIAM and TUM environments. After calculation, the SIM profile can be ploted with plot()

SIM <- mppGE_SIM(mppData = mppData_GE, trait = c('DMY_CIAM', 'DMY_TUM'), ref_par = 'UH007')
plot(x = SIM)

Cofactors selection

The cofactors as well as the QTLs are selected using the same function (QTL_select()). It performs an iterative search per chromosome starting from the most significant position above the threshold value. Then, marker positions falling on the left and right of the selected position by a distance specified in window are excluded. The search continues with the remaining positions by repeating the selection and exclusion steps. The search stops when no position remains above the threshold.

The default value of window (50) is deliberately large to limit the number of included cofactors and not overfit the model. In the mppGE_proc() function, the cofactor threshold and window parameters can be modified through the thre.cof and win.cof arguments.

cofactors <- QTL_select(Qprof = SIM, threshold = 4, window = 50)

Composite interval mapping (CIM)

To perform the CIM scan, you simply need to pass the selected cofators in the mppGE_CIM() function. The window parameter specifies the distance on the left and right where cofactors are excluded in the neighborhood of a tested position. The VCOV_data argument allow to specify if the user want to estimate a unique general VCOV with all cofactors included ("unique") or several VCOV removing cofactor in the neighbouring of a QTL position ("minus_cof"). The unique option is faster.

CIM <- mppGE_CIM(mppData = mppData_GE, trait = c('DMY_CIAM', 'DMY_TUM'),
                 cofactors = cofactors, window = 20, VCOV_data = "unique")

QTLs selection

The selection of QTLs is done using the same function as the one used for cofactors. For the selection of QTLs it is possible to restrict the size of the exclusion window around the selected positions. In the mppGE_proc() function, the QTL threshold and window parameters can be modified through the thre.QTL and win.QTL arguments.

QTL <- QTL_select(Qprof = CIM, threshold = 4, window = 20)

QTLs effect and significance estimations

The function QTL_effect_GE() allows the estimation of the QTL allele effects and their significances. It calculates an exact version of mixed model 1 with single or multiple QTLs. The QTL allele effects are estimated using [3] and their significance with the Wald test [2].

The results of the QTL allelic effect estimation are returned in a list where each component represent the effects of one QTL. The individual within environment QTL effects are represented in row while the effects (\(\hat{\beta}\)), their standard error, as well as their significance are represented in columns. For example, looking at the first QTL, we can see that in the first environment the allele of parent DK105 decreases the DMY by -9.861 deciton/ha compared to the central parent UH007. In the second environment, the effect of DK105 is only -1.559. The effect is highly significant in the first environment but not in the second.

Qeff <- QTL_effect_GE(mppData = mppData_GE, trait = c('DMY_CIAM', 'DMY_TUM'),
                       QTL = QTL)
Qeff$QTL_1
#>                 Effect Std.dev  Wald df        p.val sign
#> QTL_1_UH007_E1      NA      NA    NA NA           NA <NA>
#> QTL_1_D152_E1    0.979   3.107  0.10  1 0.7526802395     
#> QTL_1_F03802_E1 -4.632   2.265  4.18  1 0.0408643496    *
#> QTL_1_F2_E1     -3.170   3.523  0.81  1 0.3683083123     
#> QTL_1_F283_E1   -7.691   2.189 12.35  1 0.0004412536  ***
#> QTL_1_DK105_E1  -9.861   2.615 14.22  1 0.0001626113  ***
#> QTL_1_UH007_E2      NA      NA    NA NA           NA <NA>
#> QTL_1_D152_E2    2.351   3.010  0.61  1 0.4346726625     
#> QTL_1_F03802_E2  1.921   2.173  0.78  1 0.3764976065     
#> QTL_1_F2_E2     -3.200   3.386  0.89  1 0.3445433645     
#> QTL_1_F283_E2   -0.561   2.110  0.07  1 0.7904495664     
#> QTL_1_DK105_E2  -1.559   2.513  0.38  1 0.5350108782

A specific parent can be selected as reference using the ref_par argument. For example we can set as reference the parent ‘F2’ instead of the central parent (UH007).

Qeff <- QTL_effect_GE(mppData = mppData_GE, trait = c('DMY_CIAM', 'DMY_TUM'),
                       QTL = QTL, ref_par = 'F2')
Qeff$QTL_1
#>                 Effect Std.dev Wald df     p.val sign
#> QTL_1_UH007_E1   3.170   3.523 0.81  1 0.3683083     
#> QTL_1_D152_E1    4.149   4.697 0.78  1 0.3771534     
#> QTL_1_F03802_E1 -1.462   4.188 0.12  1 0.7269845     
#> QTL_1_F2_E1         NA      NA   NA NA        NA <NA>
#> QTL_1_F283_E1   -4.522   4.148 1.19  1 0.2756179     
#> QTL_1_DK105_E1  -6.691   4.387 2.33  1 0.1272424     
#> QTL_1_UH007_E2   3.200   3.386 0.89  1 0.3445434     
#> QTL_1_D152_E2    5.552   4.530 1.50  1 0.2203929     
#> QTL_1_F03802_E2  5.122   4.023 1.62  1 0.2029702     
#> QTL_1_F2_E2         NA      NA   NA NA        NA <NA>
#> QTL_1_F283_E2    2.640   3.989 0.44  1 0.5081781     
#> QTL_1_DK105_E2   1.641   4.217 0.15  1 0.6970780

QTLs R2 calculation

The function QTL_R2_GE() allows the estimation of an \(R^{2}\) contribution of all the QTLs (global) and of each QTL positions (partial). For simplicity, the \(R^{2}\) is estimated using a linear model version of [1] and not a mixed model, which means that the \(GE_{icj}\) term is absent from the model and that only a unique variance error term \(\sigma_{e}^{2}\) is estimated. Therefore, the \(R^{2}\) values represent general tendencies and should be taken with caution. The global adjusted \(R^{2}\) is defined as

\[R^{2} = 100*(1-(\frac{RSS_{full}/d_{full}}{RSS_{red}/d_{red}}))\]

where \(RSS_{full}\) and \(RSS_{red}\) are the residual sum of squares of the model with and without the QTL positions. \(d_{full}\) and \(d_{red}\) represent the degrees of freedom of the corresponding models. The partial (adjusted) \(R^{2}\) of QTL \(i\) is the difference between the global \(R^{2}\) obtained with all QTLs and the \(R^{2}\) obtained with a model minus QTL \(i\).

QR2 <- QTL_R2_GE(mppData = mppData_GE, trait = c('DMY_CIAM', 'DMY_TUM'), QTL = QTL)

QR2$glb.adj.R2
#> [1] 6.328735
QR2$part.adj.R2.diff
#>       Q1       Q2 
#> 2.563632 3.582575

QTL profile plot

The profile plot of the QTL significance can be obtained using the plot.QTLprof() function that plots the \(-log_{10}(p-val)\) of the position along the genome.

plot(x = CIM)

Whole-genome genetic effect significance plot

It is also possible to visualize the significance of the QTL parental allele along the genome by passing the QTL profile information (CIM) to the plot_Qeff_prof() function. The QTL argument can be used to specify potential QTL positions.

The color intensity \(z = -log10(p-val)\) is proportional to the statistical significance of the parental allelic effects obtained from the Wald test [2]. \(z\) has an upper limit of six to not let extreme signficant value influence too much the color scale. Compared to the reference parent (top of the graph) the red (blue) colour represents a negative (positive) effect. The different panels from the top to the bottom represent the different environments.

plot_allele_eff_GE(mppData = mppData_GE, nEnv = 2,
                   EnvNames = c('CIAM', 'TUM'), Qprof = CIM,
                   QTL = QTL, text.size = 14)

mppGE_proc: wrapper function

It is possible to calculate the whole procedure described using the mppGE_proc() function. This function uses the same arguments as the one defined in the sub-functions (e.g. mppData, trait, thre.cof, win.cof, window, ref_par etc.). In this function, it is possible to specify a working directory in output.loc. A folder with the pop.name and trait.name will be created to store intermediate data output like the SIM and CIM profiles, some results (list of QTLs, plots), as well as a summary of the results (QTL_REPORT.txt). Here we can still introduce the argument n.cores which allows to run the function in parallel.

MPP_GE_QTL <- mppGE_proc(pop.name = 'EUNAM', trait.name = 'DMY',
                         mppData = mppData_GE, trait = c('DMY_CIAM', 'DMY_TUM'),
                         n.cores = 1, verbose = FALSE, output.loc = tempdir())

QTL by environment effect determination

Once a list of QTL has been determined using the genome scan functions mppGE_SIM(), mppGE_CIM() or mppGE_proc(), it is possible to refine our understanding of the QTL parental allelic effects by determining which of those effects have a significant interaction with the environment. To do that, it is possible to use the function QTL_effect_main_QEI(), which calculates model 4. For the illustration, we extended the number of environments by adding the INRA and KWS locations.


Qeff <- QTL_effect_main_QEI(mppData = mppData_GE,
                            trait = c('DMY_CIAM', 'DMY_TUM', 'DMY_INRA_P', 'DMY_KWS'),
                            env_id = c('CIAM', 'TUM', 'INRA', 'KWS'),
                            QTL = QTL)

The function return two lists of \(N_{QTL}\) length. The first list (Q_sign) describes the significance of the main (\(\alpha_p\)) and QTLxE (\(\beta_{pj}\)) term per parent for each QTL. For example, at QTL 2, we can notice that the ‘F2’, ‘F283’ and the ‘F03802’ parents have a significant interaction with the environment because the \(-log10(p-val)\) of the \(\beta_{pj}\) is larger than 1.301, which correspond to p-value of 0.05.

Qeff$Q_sign$QTL2
#>      par  logP_main  logP_QxE
#> 1  UH007         NA        NA
#> 2     F2 1.05793090 3.1420765
#> 3   D152 0.96910440 0.2871339
#> 4  DK105 0.05370229 0.2525703
#> 5   F283 0.36207766 3.9077148
#> 6 F03802 0.01518149 4.6439537

The main and the QTLxE effect estimates are stored in the Q_eff list. For example for QTL 2, we can notice that the last environment ‘KWS’ was automatically set as the reference.

Qeff$Q_eff$QTL2
#>                  Effect Std.dev  Wald df        p.val sign
#> QTL2_main_UH007      NA      NA    NA NA           NA <NA>
#> QTL2_main_D152   -3.193   1.983  2.59  1 1.073731e-01     
#> QTL2_main_F03802  0.062   1.438  0.00  1 9.656472e-01     
#> QTL2_main_F2     -3.879   2.270  2.92  1 8.751230e-02    .
#> QTL2_main_F283   -1.129   1.444  0.61  1 4.344325e-01     
#> QTL2_main_DK105   0.248   1.692  0.02  1 8.836855e-01     
#> QTL2_CIAM_UH007      NA      NA    NA NA           NA <NA>
#> QTL2_CIAM_D152   -2.391   2.762  0.75  1 3.866235e-01     
#> QTL2_CIAM_F03802 -0.363   2.011  0.03  1 8.567614e-01     
#> QTL2_CIAM_F2     -0.617   3.148  0.04  1 8.445821e-01     
#> QTL2_CIAM_F283   -0.122   2.008  0.00  1 9.516950e-01     
#> QTL2_CIAM_DK105   0.859   2.353  0.13  1 7.150577e-01     
#> QTL2_TUM_UH007       NA      NA    NA NA           NA <NA>
#> QTL2_TUM_D152    -3.664   2.609  1.97  1 1.601385e-01     
#> QTL2_TUM_F03802  -7.902   1.871 17.84  1 2.407956e-05  ***
#> QTL2_TUM_F2      -4.382   2.942  2.22  1 1.363365e-01     
#> QTL2_TUM_F283    -7.312   1.879 15.14  1 9.969571e-05  ***
#> QTL2_TUM_DK105   -2.741   2.198  1.55  1 2.124397e-01     
#> QTL2_INRA_UH007      NA      NA    NA NA           NA <NA>
#> QTL2_INRA_D152   -0.708   2.051  0.12  1 7.299216e-01     
#> QTL2_INRA_F03802  1.853   1.488  1.55  1 2.130998e-01     
#> QTL2_INRA_F2     -2.172   2.347  0.86  1 3.547953e-01     
#> QTL2_INRA_F283    1.693   1.496  1.28  1 2.576782e-01     
#> QTL2_INRA_DK105  -0.137   1.750  0.01  1 9.375956e-01     
#> QTL2_KWS_UH007       NA      NA    NA NA           NA <NA>
#> QTL2_KWS_D152        NA      NA    NA NA           NA <NA>
#> QTL2_KWS_F03802      NA      NA    NA NA           NA <NA>
#> QTL2_KWS_F2          NA      NA    NA NA           NA <NA>
#> QTL2_KWS_F283        NA      NA    NA NA           NA <NA>
#> QTL2_KWS_DK105       NA      NA    NA NA           NA <NA>

QTLxEC effect estimation

Once the QTL alleles with a significant interaction with the environment have been determined, it is possible to investigate if those alleles are sensitive to specific environmental covariates using the function QTL_effect_main_QxEC(). This function estimates model 5. For that purpose, you can pass the results obtained with QTL_effect_main_QEI() to the Qmain_QxE argument from QTL_effect_main_QxEC(). You also need to provide the environmental covariate (EC) information in the format of a numeric matrix with environment in row and EC in column

# provide the environmental covariate information as a matrix
EC <- matrix(c(180, 310, 240, 280), 4, 1)
rownames(EC) <- c('CIAM', 'TUM', 'INRA', 'KWS')
colnames(EC) <- 'cum_rain'

Qeff <- QTL_effect_main_QxEC(mppData = mppData_GE,
                        trait = c('DMY_CIAM', 'DMY_TUM', 'DMY_INRA_P', 'DMY_KWS'),
                        env_id = c('CIAM', 'TUM', 'INRA', 'KWS'),
                        QTL = QTL, EC = EC, Qmain_QEI = Qeff, thre_QTL = 1.301)

The function extends the QTLxE term for the parental allelic effects with a \(-log10(p-value)\) superior to the thre_QTL argument. It returns a list with the second element ‘Qeff_EC’ being a list with the \(\alpha_p\) (all parental alleles) and \(S_p\) (only parental allele with significant QTLxE) estimates. For example, with a \(-log10(p-value)\) for the \(S_p\) term equal to 2.26 and 2.08, the ‘F03802’ and the ‘F283’ parental allele interact significantly with the cumulated rain.

Qeff$Qeff_EC$QTL2
#>            B_main  logP_main   B_cum_rain logP_cum_rain
#> UH007          NA         NA           NA            NA
#> D152   -3.1931542 0.96792428           NA            NA
#> F03802 12.0348838 2.31797894 -0.044433097     2.2647914
#> F2     -2.9970562 0.18508865 -0.008060809     0.1266095
#> F283   10.2417600 1.78749085 -0.042232587     2.0862106
#> DK105   0.1792433 0.03829626           NA            NA

It is possible to estimate the effect of several ECs but user should be careful to have enough degrees of freedom (df) to estimate those sensitivity curves. For example, with four environments, you should only estimate the effect of one EC at a time because you only have two remaining degrees of freedom (4 env - 1 df main effect - 1 df sensitivity slope).

Plot QTL x EC sensitivity slopes

After calculation, it is possible to plot the sensitivity slopes estimated with the function QTL_effect_main_QxEC(). The slope represent the rate of change in the parental allelic effect given change in EC with respect to the reference parent.

plot_QxEC(Qeff, EC = EC, env_id = c('CIAM', 'TUM', 'INRA', 'KWS'), 
          QTL = 2, EC_id = 'cum rain', trait_id = 'DMY')

Computation time

We tested the proposed procedure on several sorghum BCNAM populations as well as maize NAM population and breeding MPPs. The table below show the computation time required for different configurations. Computations were realized on an AMD Ryzen 7-cores 5800 U processor with 16 GB RAM.

Computation time examples
Populations Ngeno Nmarker Nenv Ncore SIM [m] CIM [h] QTL_effects [m] Total [h]
BCNAM Grinkan 1598 51545 4 4 20.0 11.00 15.0 11.60
BCNAM Kenin-Keni 575 51545 4 4 1.5 0.17 1.5 0.25
BCNAM Lata 896 51545 3 4 3.0 0.33 2.0 0.40
EUNAM Dent 841 18621 4 1 18.0 9.50 19.0 10.20
EUNAM Flint 811 5949 6 1 15.0 16.00 150.0 19.00
Breeding pop1 2071 1812 3 1 5.0 1.30 9.0 1.50
Breeding pop2 820 1760 5 1 9.0 10.20 33.0 10.80

From a general point of view, even though we tried to reduce as much as possible the computation time, it can still take several hours to perform a complete analysis. The computation time depends principally on the population size (number of genotypes), the number of environments, as well as the number of markers. The use of multiple cores (n.cores argument) is a first possibility to reduce the computational time.

The computation of the CIM part including cofactors is the most demanding. The whole procedure time can be largely reduced if the user only perform a SIM (SIM_only = TRUE in mppGE_proc()). Generally, the large effect QTLs are the same or fall in similar region in the SIM and CIM analysis. The number of cofactor position selected will also considerably extend the computational time. For that reason we advise to restrict their number by selecting maximum one cofactor per chrosome.

Ultimately, even if the computation time can be long we should always put that in balance with the time required to develop the population, phenotype it and get the genotypic data.

References

Bauer, Eva, Matthieu Falque, Hildrun Walter, Cyril Bauland, Christian Camisan, Laura Campo, Nina Meyer, et al. 2013. “Intraspecific Variation of Recombination Rate in Maize.” Genome Biol 14 (9): R103.
Blanc, G, A Charcosset, B Mangin, A Gallais, and L Moreau. 2006. “Connected Populations for Detecting Quantitative Trait Loci and Testing for Epistasis: An Application in Maize.” Theoretical and Applied Genetics 113 (2): 206–24.
Cavanagh, Colin, Matthew Morell, Ian Mackay, and Wayne Powell. 2008. “From Mutations to MAGIC: Resources for Gene Discovery, Validation and Delivery in Crop Plants.” Current Opinion in Plant Biology 11 (2): 215–21.
Garin, Vincent, Marcos Malosetti, and Fred van Eeuwijk. 2020. “Multi-Parent Multi-Environment QTL Analysis: An Illustration with the EU-NAM Flint Population.” Theoretical and Applied Genetics 133 (9): 2627–38.
Garin, Vincent, Valentin Wimmer, Sofiane Mezmouk, Marcos Malosetti, and Fred van Eeuwijk. 2017. “How Do the Type of QTL Effect and the Form of the Residual Term Influence QTL Detection in Multi-Parent Populations? A Case Study in the Maize EU-NAM Population.” Theoretical and Applied Genetics 130 (8): 1753–64.
Giraud, Heloise, Christina Lehermeier, Eva Bauer, Matthieu Falque, Vincent Segura, Cyril Bauland, Christian Camisan, et al. 2014. “Linkage Disequilibrium with Linkage Analysis of Multiline Crosses Reveals Different Multiallelic QTL for Hybrid Performance in the Flint and Dent Heterotic Groups of Maize.” Genetics 198 (4): 1717–34.
Kruijer, Willem, Martin P Boer, Marcos Malosetti, Pádraic J Flood, Bas Engel, Rik Kooke, Joost JB Keurentjes, and Fred A van Eeuwijk. 2015. “Marker-Based Estimation of Heritability in Immortal Populations.” Genetics 199 (2): 379–98.
Lehermeier, Christina, Nicole Krämer, Eva Bauer, Cyril Bauland, Christian Camisan, Laura Campo, Pascal Flament, et al. 2014. “Usefulness of Multiparental Populations of Maize (Zea Mays l.) for Genome-Based Prediction.” Genetics 198 (1): 3–16.
McMullen, Michael D, Stephen Kresovich, Hector Sanchez Villeda, Peter Bradbury, Huihui Li, Qi Sun, Sherry Flint-Garcia, et al. 2009. “Genetic Properties of the Maize Nested Association Mapping Population.” Science 325 (5941): 737–40.
Scott, Michael F, Olufunmilayo Ladejobi, Samer Amer, Alison R Bentley, Jay Biernaskie, Scott A Boden, Matt Clark, et al. 2020. “Multi-Parent Populations in Crops: A Toolbox Integrating Genomics and Genetic Mapping with Breeding.” Heredity 125 (6): 396–416.
Van Eeuwijk, FA, M Cooper, IH DeLacy, S Ceccarelli, and S Grando. 2001. “Some Vocabulary and Grammar for the Analysis of Multi-Environment Trials, as Applied to the Analysis of FPB and PPB Trials.” Euphytica 122 (3): 477–90.
Würschum, Tobias. 2012. “Mapping QTL for Agronomic Traits in Breeding Populations.” Theoretical and Applied Genetics 125 (2): 201–10.