Getting started with xtune

Jingxuan He and Chubing Zeng

2023-06-16

Purpose of this vignette

This vignette is a tutorial on how to use the xtune package to fit feature-specific regularized regression models based on external information.

In this tutorial the following points are going to be viewed:

xtune Overview

The main usage of xtune is to tune multiple shrinkage parameters in regularized regressions (Lasso, Ridge, and Elastic-net), based on external information.

The classical penalized regression uses a single penalty parameter \(\lambda\) that applies equally to all regression coefficients to control the amount of regularization in the model. And the single penalty parameter tuning is typically performed using cross-validation.

Here we apply an individual shrinkage parameter \(\lambda_j\) to each regression coefficient \(\beta_j\). And the vector of shrinkage parameters \(\lambda s = (\lambda_1,...,\lambda_p)\) is guided by external information \(Z\). In specific, \(\lambda\)s is modeled as a log-linear function of \(Z\). Better prediction accuracy for penalized regression models may be achieved by allowing individual shrinkage for each regression coefficients based on external information.

To tune the differential shrinkage parameter vector \(\lambda s = (\lambda_1,...,\lambda_p)\), we employ an Empirical Bayes approach by specifying Elastic-net to their random-effect formulation. Once the tuning parameters \(\lambda\)s are estimated, and therefore the penalties known, the regression coefficients are obtained using the glmnet package.

The response variable can be either quantitative or categorical. Utilities for carrying out post-fitting summary and prediction are also provided.

Examples

Here, we four simulated examples to illustrate the usage and syntax of xtune. The first example gives users a general sense of the data structure and model fitting process. The second and third examples use simulated data in concrete scenarios to illustrate the usage of the package. In the second example diet, we provide simulated data to mimic the dietary example described in this paper:

S. Witte, John & Greenland, Sander & W. Haile, Robert & L. Bird, Cristy. (1994). Hierarchical Regression Analysis Applied to a Study of Multiple Dietary Exposures and Breast Cancer. Epidemiology (Cambridge, Mass.). 5. 612-21. 10.1097/00001648-199411000-00009.

In the third example gene, we provide simulated data to mimic the bone density data published in the European Bioinformatics Institute (EMBL-EBI) ArrayExpress repository, ID: E-MEXP-1618.

And in the fourth example, we simulated data with multicategorical outcomes with three levels to provide the multi-classification example using xtune.

General Example

In the first example, \(Y\) is a \(n = 100\)-dimensional continuous observed outcome vector, \(X\) is matrix of \(p\) potential predictors observed on the \(n\) observations, and \(Z\) is a set of \(q = 4\) external features available for the \(p = 300\) predictors.

library(xtune)
data("example")
X <- example$X; Y <- example$Y; Z <- example$Z
dim(X);dim(Z)
#> [1] 100 300
#> [1] 300   4

Each column of Z contains information about the predictors in design matrix X. The number of rows in Z equals to the number of predictors in X.

X[1:3,1:10]
#>               Predictor_1 Predictor_2 Predictor_3 Predictor_4 Predictor_5
#> Observation_1  -0.7667960   0.9212806   2.0149030  0.79004563  -1.4244699
#> Observation_2  -0.8164583  -0.3144157  -0.2253684  0.08712746  -1.0296026
#> Observation_3  -0.1415352   0.6623149  -1.0398456  1.87611212   0.7340254
#>               Predictor_6 Predictor_7 Predictor_8 Predictor_9 Predictor_10
#> Observation_1  -0.9529327  -0.9344928 -0.32964818   0.4486023  -0.70894600
#> Observation_2   2.2546851   0.2732793 -0.03852896   1.3830463   0.03070716
#> Observation_3   1.7534320   0.3263808  0.09564893  -2.2104531   0.22615224

The external information is encoded as follows:

Z[1:10,]
#>              External_variable_1 External_variable_2 External_variable_3
#> Predictor_1                    1                   0                   0
#> Predictor_2                    1                   0                   0
#> Predictor_3                    0                   1                   0
#> Predictor_4                    0                   1                   0
#> Predictor_5                    0                   0                   1
#> Predictor_6                    0                   0                   1
#> Predictor_7                    0                   0                   0
#> Predictor_8                    0                   0                   0
#> Predictor_9                    0                   0                   0
#> Predictor_10                   0                   0                   0
#>              External_variable_4
#> Predictor_1                    0
#> Predictor_2                    0
#> Predictor_3                    0
#> Predictor_4                    0
#> Predictor_5                    0
#> Predictor_6                    0
#> Predictor_7                    1
#> Predictor_8                    1
#> Predictor_9                    1
#> Predictor_10                   1

Here, each variable in Z is a binary variable. \(Z_{jk}\) indicates if \(Predictor_j\) has \(ExternalVariable_k\) or not. This Z is an example of (non-overlapping) grouping of predictors. Predictor 1 and 2 belongs to group 1; predictor 3 and 4 belongs to group 2; predictor 5 and 6 belongs to group 3, and the rest of the predictors belongs to group 4.

To fit a differential-shrinkage lasso model to this data:

fit.example1 <- xtune(X,Y,Z, family = "linear", c = 1)
#> Z provided, start estimating individual tuning parameters 
#> Start estimating alpha:
#> Done!

Here, we specify the family of the model using the linear response and the LASSO type of penalty by assign \(c = 1\). The individual penalty parameters are returned by

fit.example1$penalty.vector

In this example, predictors in each group get different estimated penalty parameters.

unique(fit.example1$penalty.vector)
#>                     [,1]
#> Predictor_1 7.861552e-03
#> Predictor_3 1.434484e-02
#> Predictor_5 3.337497e-02
#> Predictor_7 1.804549e+02

Coefficient estimates and predicted values and can be obtained via predict and coef:

coef_xtune(fit.example1)
predict_xtune(fit.example1, newX = X)

The mse function can be used to get the mean square error (MSE) between prediction values and true values.

mse(predict(fit.example1, newX = X), Y)

Dietary example

Suppose we want to predict a person’s weight loss (binary outcome) using his/her weekly dietary intake. Our external information Z could incorporate information about the levels of relevant food constituents in the dietary items.

data(diet)
head(diet$DietItems)
#>      Milk Margarine Eggs Apples Lettuce Celery Hot dogs Liver Dark bread Pasta
#> [1,]    1         1    4      1       2      1        1     0          0     1
#> [2,]    1         0    0      0       2      4        0     0          2     0
#> [3,]    0         1    2      3       1      3        0     0          4     0
#> [4,]    0         2    0      1       0      1        1     0          3     1
#> [5,]    0         1    1      3       1      1        0     2          2     2
#> [6,]    2         1    2      1       3      0        1     2          2     1
#>      Beer Liquor Cookies Bran
#> [1,]    0      2       3    4
#> [2,]    2      1       0    3
#> [3,]    1      2       1    1
#> [4,]    1      1       1    2
#> [5,]    0      0       0    3
#> [6,]    1      0       2    1
head(diet$weightloss)
#> [1] 0 1 1 1 1 1

The external information Z in this example is:

head(diet$NuitritionFact)
#> NULL

In this example, Z is not a grouping of the predictors. The idea is that the nutrition facts about the dietary items might give us some information on the importance of each predictor in the model.

Similar to the previous example, the xtune model could be fit by:

fit.diet = xtune(X = diet$DietItems,Y=diet$weightloss,Z = diet$NuitritionFact, family="binary", c = 0)
#> No Z matrix provided, only a single tuning parameter will be estimated using empirical Bayes tuning 
#> Start estimating alpha:

Here, we use the Ridge model by specifying \(c = 0\). Each dietary predictor is estimated an individual tuning parameter based on their nutrition fact.

fit.diet$penalty.vector
#>  [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1

To make prediction using the trained model

predict_xtune(fit.diet,newX = diet$DietItems)

The above code returns the predicted probabilities (scores). To make a class prediction, use the type = "class" option.

pred_class <- predict_xtune(fit.diet,newX = diet$DietItems,type = "class")

The misclassification() function can be used to extract the misclassification rate. The prediction AUC can be calculated using the auc() function from the AUC package.

misclassification(pred_class,true = diet$weightloss)

Gene expression data example

The gene data contains simulated gene expression data. The dimension of data is \(50\times 200\). The outcome Y is continuous (bone mineral density). The external information is four previous study results that identify the biological importance of genes. For example \(Z_{jk}\) means whether \(gene_j\) is identified to be biologically important in previous study \(k\) result. \(Z_{jk} = 1\) means that gene \(j\) is identified by previous study \(k\) result and \(Z_{jk} = 0\) means that gene \(j\) is not identified to be important by previous study \(k\) result.

data(gene)
gene$GeneExpression[1:3,1:5]
#>          Gene_1     Gene_2     Gene_3     Gene_4     Gene_5
#> [1,] -0.7667960  1.7520578  0.9212806 -0.6273008  2.0149030
#> [2,] -0.8164583 -0.5477714 -0.3144157 -0.8796116 -0.2253684
#> [3,] -0.1415352 -0.8585257  0.6623149 -0.3053110 -1.0398456
gene$PreviousStudy[1:5,]
#>        Identified by previous study 1 Identified by previous study 2
#> Gene_1                              0                              0
#> Gene_2                              0                              0
#> Gene_3                              0                              0
#> Gene_4                              0                              0
#> Gene_5                              0                              0
#>        Identified by previous study 3 Identified by previous study 4
#> Gene_1                              0                              0
#> Gene_2                              0                              0
#> Gene_3                              0                              0
#> Gene_4                              0                              0
#> Gene_5                              0                              0

A gene can be identified to be important by several previous study results, therefore the external information Z in this example can be seen as an overlapping group of variables.

Model fitting:

fit.gene = xtune(X = gene$GeneExpression,Y=gene$bonedensity,Z = gene$PreviousStudy, family  = "linear", c = 0.5)

We use the Elastic-net model by specifying \(c = 0.5\) (can be any numerical value from 0 to 1). The rest of the steps are the same as the previous two examples.

Multi-classification data example

data("example.multiclass")
dim(example.multiclass$X); dim(example.multiclass$Y); dim(example.multiclass$Z)
#> [1] 600 800
#> [1] 600   1
#> [1] 800   5
head(example.multiclass$X)[,1:5]
#>            [,1]        [,2]       [,3]        [,4]       [,5]
#> [1,]  0.7445670  1.47901632 -0.1556682 -0.64053491 -1.2694581
#> [2,]  0.2170827  0.08309395  0.8571237 -0.12263159  0.4443480
#> [3,]  0.7843483  2.07273526  0.5046653 -0.56627993 -0.3385034
#> [4,]  0.4509514 -0.34583708 -0.5824597 -0.71907762 -0.5209697
#> [5,]  1.2444328  2.14042805  0.4639056 -0.01205312  1.2121137
#> [6,] -1.2819254 -0.83835437  0.6999223  0.04828028  1.0246388
head(example.multiclass$Y)
#>      [,1]
#> [1,]    3
#> [2,]    2
#> [3,]    1
#> [4,]    2
#> [5,]    3
#> [6,]    2
head(example.multiclass$Z)
#>      [,1] [,2] [,3] [,4] [,5]
#> [1,]    0    0    0    0    0
#> [2,]    0    0    0    0    1
#> [3,]    0    0    0    0    0
#> [4,]    0    0    0    0    0
#> [5,]    0    0    0    1    0
#> [6,]    0    0    0    0    0

Model fitting:

fit.multiclass = xtune(X = example.multiclass$X,Y=example.multiclass$Y,Z = example.multiclass$Z, U = example.multiclass$U, family  = "multiclass", c = 0.5)
#> Z provided, start estimating individual tuning parameters 
#> Start estimating alpha:
#> Done!

# check the tuning parameter
fit.multiclass$penalty.vector
#>   [1] 0.019033238 0.020019923 0.019033238 0.019033238 0.019070537 0.019033238
#>   [7] 0.019406624 0.019033238 0.016390346 0.019033238 0.019406624 0.020019923
#>  [13] 0.015582545 0.019033238 0.020019923 0.019033238 0.011414296 0.019033238
#>  [19] 0.011414296 0.020019923 0.011414296 0.019033238 0.015582545 0.019033238
#>  [25] 0.019033238 0.015582545 0.019033238 0.020019923 0.019033238 0.019033238
#>  [31] 0.015582545 0.019033238 0.019033238 0.011414296 0.019033238 0.019406624
#>  [37] 0.019033238 0.019033238 0.019033238 0.019033238 0.019406624 0.012006014
#>  [43] 0.015582545 0.015582545 0.019033238 0.019406624 0.019033238 0.019033238
#>  [49] 0.015582545 0.019033238 0.020019923 0.019033238 0.015582545 0.019033238
#>  [55] 0.019033238 0.019033238 0.019033238 0.019033238 0.019033238 0.011414296
#>  [61] 0.019033238 0.011414296 0.019033238 0.019033238 0.019033238 0.019033238
#>  [67] 0.019033238 0.015582545 0.019033238 0.016390346 0.019070537 0.019033238
#>  [73] 0.019033238 0.011414296 0.019033238 0.020412666 0.019033238 0.019033238
#>  [79] 0.011414296 0.019033238 0.019033238 0.019033238 0.019033238 0.019033238
#>  [85] 0.019406624 0.019033238 0.019033238 0.011414296 0.015582545 0.019033238
#>  [91] 0.011414296 0.019033238 0.019033238 0.019033238 0.011414296 0.020059155
#>  [97] 0.019033238 0.019033238 0.020019923 0.019033238 0.015582545 0.019033238
#> [103] 0.015888237 0.019033238 0.019070537 0.019033238 0.019406624 0.019033238
#> [109] 0.019033238 0.020019923 0.019070537 0.019033238 0.019033238 0.019033238
#> [115] 0.019033238 0.015582545 0.015582545 0.020019923 0.019033238 0.019406624
#> [121] 0.015582545 0.019033238 0.011414296 0.019070537 0.019033238 0.019033238
#> [127] 0.019033238 0.015582545 0.019033238 0.019033238 0.020019923 0.019033238
#> [133] 0.019033238 0.019033238 0.019070537 0.019033238 0.019444655 0.019033238
#> [139] 0.011414296 0.019033238 0.019033238 0.011414296 0.019033238 0.019033238
#> [145] 0.020019923 0.011414296 0.009344904 0.011414296 0.019033238 0.019033238
#> [151] 0.019033238 0.019033238 0.019033238 0.019033238 0.019406624 0.019033238
#> [157] 0.019033238 0.015582545 0.020019923 0.019406624 0.019033238 0.019033238
#> [163] 0.019406624 0.019033238 0.019070537 0.019033238 0.020019923 0.019406624
#> [169] 0.019070537 0.019070537 0.019033238 0.019070537 0.019406624 0.019033238
#> [175] 0.019033238 0.019033238 0.019033238 0.019070537 0.019070537 0.019033238
#> [181] 0.011414296 0.019033238 0.019033238 0.015582545 0.016390346 0.019033238
#> [187] 0.015582545 0.019033238 0.019033238 0.019033238 0.019033238 0.019033238
#> [193] 0.019406624 0.019033238 0.009528229 0.019033238 0.019070537 0.019033238
#> [199] 0.019070537 0.019033238 0.019033238 0.011414296 0.019070537 0.011414296
#> [205] 0.019033238 0.015582545 0.011414296 0.019033238 0.011414296 0.015582545
#> [211] 0.019033238 0.019444655 0.015613082 0.019033238 0.019033238 0.019033238
#> [217] 0.019033238 0.015582545 0.015582545 0.015582545 0.020019923 0.019033238
#> [223] 0.015582545 0.011414296 0.016390346 0.019033238 0.019033238 0.020019923
#> [229] 0.019033238 0.020412666 0.019033238 0.019070537 0.019033238 0.015582545
#> [235] 0.020019923 0.019033238 0.019033238 0.015582545 0.019070537 0.019033238
#> [241] 0.019033238 0.019033238 0.019033238 0.019033238 0.019406624 0.015582545
#> [247] 0.019033238 0.019033238 0.019033238 0.015582545 0.019033238 0.019070537
#> [253] 0.016390346 0.020019923 0.020019923 0.015582545 0.011414296 0.016390346
#> [259] 0.019033238 0.015888237 0.019033238 0.019070537 0.019033238 0.019070537
#> [265] 0.019033238 0.019033238 0.019033238 0.019033238 0.019033238 0.020019923
#> [271] 0.019033238 0.019033238 0.019033238 0.019033238 0.019033238 0.019033238
#> [277] 0.019070537 0.020019923 0.019033238 0.019070537 0.019070537 0.019033238
#> [283] 0.020412666 0.019033238 0.012006014 0.019406624 0.015888237 0.011414296
#> [289] 0.019033238 0.019033238 0.011414296 0.019033238 0.019033238 0.019033238
#> [295] 0.019033238 0.019033238 0.015582545 0.019033238 0.012006014 0.019033238
#> [301] 0.019033238 0.011414296 0.019033238 0.019033238 0.020412666 0.015582545
#> [307] 0.019033238 0.015582545 0.019033238 0.015582545 0.019033238 0.020019923
#> [313] 0.015582545 0.019033238 0.019070537 0.019033238 0.019033238 0.019033238
#> [319] 0.020019923 0.019033238 0.019033238 0.019033238 0.019033238 0.019033238
#> [325] 0.020019923 0.019033238 0.019033238 0.011414296 0.019033238 0.019033238
#> [331] 0.019033238 0.019406624 0.019033238 0.020019923 0.019033238 0.019033238
#> [337] 0.020019923 0.015582545 0.019070537 0.019033238 0.019033238 0.015582545
#> [343] 0.011414296 0.019406624 0.011414296 0.011414296 0.019406624 0.011414296
#> [349] 0.015582545 0.019033238 0.019406624 0.019033238 0.019033238 0.020019923
#> [355] 0.019444655 0.019070537 0.019033238 0.019033238 0.019033238 0.016390346
#> [361] 0.019033238 0.015582545 0.019033238 0.019033238 0.011638217 0.009344904
#> [367] 0.020019923 0.019033238 0.019033238 0.019033238 0.019070537 0.019033238
#> [373] 0.009363217 0.019033238 0.019070537 0.019033238 0.019070537 0.019033238
#> [379] 0.019070537 0.015582545 0.019033238 0.020019923 0.019033238 0.019033238
#> [385] 0.015582545 0.019033238 0.019033238 0.016390346 0.019033238 0.011414296
#> [391] 0.011414296 0.019033238 0.019033238 0.019033238 0.019033238 0.019033238
#> [397] 0.019033238 0.015582545 0.019033238 0.020019923 0.015582545 0.019033238
#> [403] 0.019033238 0.019033238 0.015582545 0.011414296 0.019070537 0.020019923
#> [409] 0.019033238 0.019033238 0.019070537 0.020059155 0.019033238 0.019070537
#> [415] 0.011436664 0.019033238 0.019033238 0.019033238 0.019033238 0.019033238
#> [421] 0.011414296 0.019033238 0.019033238 0.019033238 0.019033238 0.015613082
#> [427] 0.015582545 0.019033238 0.019406624 0.019033238 0.019033238 0.019033238
#> [433] 0.019033238 0.020059155 0.019070537 0.019033238 0.019033238 0.019033238
#> [439] 0.019406624 0.011414296 0.015888237 0.019033238 0.019033238 0.019033238
#> [445] 0.011414296 0.019033238 0.020019923 0.019033238 0.020019923 0.015582545
#> [451] 0.020059155 0.019033238 0.019033238 0.019033238 0.019033238 0.011414296
#> [457] 0.019070537 0.019033238 0.019033238 0.020412666 0.019070537 0.015582545
#> [463] 0.019033238 0.019070537 0.019033238 0.019033238 0.015582545 0.019033238
#> [469] 0.019070537 0.019070537 0.019033238 0.019406624 0.019033238 0.019033238
#> [475] 0.019033238 0.019406624 0.019033238 0.019033238 0.019033238 0.019070537
#> [481] 0.019033238 0.019033238 0.019033238 0.015582545 0.020019923 0.019033238
#> [487] 0.019033238 0.019033238 0.019033238 0.019406624 0.019406624 0.019033238
#> [493] 0.015582545 0.011414296 0.019033238 0.019033238 0.011414296 0.019033238
#> [499] 0.016390346 0.019033238 0.019070537 0.019406624 0.015582545 0.019033238
#> [505] 0.019033238 0.019033238 0.019033238 0.019033238 0.019033238 0.015582545
#> [511] 0.011414296 0.019033238 0.011414296 0.011414296 0.019033238 0.019070537
#> [517] 0.019070537 0.019033238 0.019033238 0.020019923 0.020019923 0.019033238
#> [523] 0.019033238 0.019070537 0.019033238 0.019406624 0.019033238 0.020019923
#> [529] 0.015582545 0.019406624 0.019033238 0.016390346 0.019033238 0.019033238
#> [535] 0.019033238 0.011414296 0.020019923 0.015582545 0.019033238 0.019033238
#> [541] 0.009528229 0.019033238 0.019033238 0.019406624 0.019070537 0.019033238
#> [547] 0.019033238 0.020019923 0.019033238 0.019033238 0.019033238 0.019033238
#> [553] 0.019033238 0.019033238 0.019033238 0.019033238 0.019070537 0.019033238
#> [559] 0.015582545 0.015582545 0.019033238 0.019033238 0.011414296 0.019033238
#> [565] 0.019033238 0.012006014 0.019033238 0.019033238 0.019033238 0.019033238
#> [571] 0.019444655 0.019033238 0.019033238 0.019033238 0.019406624 0.019070537
#> [577] 0.019033238 0.011414296 0.019033238 0.019033238 0.019033238 0.019033238
#> [583] 0.015582545 0.019033238 0.019033238 0.019444655 0.019033238 0.019033238
#> [589] 0.020019923 0.019406624 0.019406624 0.020019923 0.019033238 0.019033238
#> [595] 0.019033238 0.009344904 0.019444655 0.011414296 0.019070537 0.019033238
#> [601] 0.019033238 0.015582545 0.019033238 0.019033238 0.019406624 0.016422466
#> [607] 0.019033238 0.019033238 0.019033238 0.020412666 0.009344904 0.019444655
#> [613] 0.015582545 0.019033238 0.015582545 0.019033238 0.019406624 0.019033238
#> [619] 0.019033238 0.019406624 0.019033238 0.019033238 0.019033238 0.015582545
#> [625] 0.019033238 0.019406624 0.020019923 0.019033238 0.019033238 0.015888237
#> [631] 0.019033238 0.019033238 0.019033238 0.020019923 0.019033238 0.019033238
#> [637] 0.020019923 0.019033238 0.019033238 0.019070537 0.011414296 0.019033238
#> [643] 0.019070537 0.019033238 0.019406624 0.019070537 0.019033238 0.019406624
#> [649] 0.012006014 0.019033238 0.019033238 0.019033238 0.011414296 0.019033238
#> [655] 0.019070537 0.019033238 0.019070537 0.019070537 0.019033238 0.019033238
#> [661] 0.019033238 0.019033238 0.019033238 0.019033238 0.019070537 0.019033238
#> [667] 0.019444655 0.019033238 0.015888237 0.019033238 0.020019923 0.011414296
#> [673] 0.019406624 0.019033238 0.019033238 0.019033238 0.019033238 0.019406624
#> [679] 0.020019923 0.019033238 0.015582545 0.015582545 0.019070537 0.019033238
#> [685] 0.019033238 0.019033238 0.019406624 0.019406624 0.019033238 0.009528229
#> [691] 0.019033238 0.019070537 0.015582545 0.019033238 0.019033238 0.019033238
#> [697] 0.019033238 0.019033238 0.019033238 0.015582545 0.020019923 0.015582545
#> [703] 0.019033238 0.019033238 0.019033238 0.019033238 0.019033238 0.019033238
#> [709] 0.019033238 0.020019923 0.019444655 0.012006014 0.019406624 0.019033238
#> [715] 0.019033238 0.015582545 0.015582545 0.020019923 0.019033238 0.019033238
#> [721] 0.015582545 0.019033238 0.019033238 0.019033238 0.019033238 0.019033238
#> [727] 0.019406624 0.019406624 0.019033238 0.019033238 0.019033238 0.020019923
#> [733] 0.019033238 0.019033238 0.019033238 0.019033238 0.019033238 0.011414296
#> [739] 0.019033238 0.019033238 0.019033238 0.019070537 0.019070537 0.020019923
#> [745] 0.015888237 0.011436664 0.019033238 0.019033238 0.015582545 0.012006014
#> [751] 0.020019923 0.019033238 0.019033238 0.019070537 0.019033238 0.020019923
#> [757] 0.015582545 0.020019923 0.019033238 0.019033238 0.019033238 0.019033238
#> [763] 0.019406624 0.019033238 0.015888237 0.019070537 0.019033238 0.015582545
#> [769] 0.019033238 0.019033238 0.019033238 0.019033238 0.019070537 0.015582545
#> [775] 0.019033238 0.011414296 0.019406624 0.019033238 0.019033238 0.019033238
#> [781] 0.019033238 0.019406624 0.019033238 0.019033238 0.019033238 0.019033238
#> [787] 0.019033238 0.019033238 0.019033238 0.019033238 0.019033238 0.020452668
#> [793] 0.019033238 0.019033238 0.019033238 0.019033238 0.020019923 0.015582545
#> [799] 0.019033238 0.019406624 0.000000000 0.000000000 0.000000000 0.000000000
#> [805] 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000

To make prediction using the trained model:

pred.prob = predict_xtune(fit.multiclass,newX = cbind(example.multiclass$X, example.multiclass$U))
head(pred.prob)
#>           1.1        2.1        3.1
#> 1 0.152714695 0.04651872 0.80076659
#> 2 0.001784845 0.96370971 0.03450544
#> 3 0.594903813 0.14663364 0.25846255
#> 4 0.040101522 0.84879438 0.11110409
#> 5 0.084372394 0.17279569 0.74283192
#> 6 0.012523512 0.89878034 0.08869615

The above code returns the predicted probabilities (scores) for each class. To make a class prediction, specify the argument type = "class".

pred.class <- predict_xtune(fit.multiclass,newX = cbind(example.multiclass$X, example.multiclass$U), type = "class")
head(pred.class)
#> [1] "3" "2" "1" "2" "3" "2"

The misclassification() function can be used to extract the misclassification rate. The multiclass AUC can be calculated using the multiclass.roc function from the pROC package.

misclassification(pred.class,true = example.multiclass$Y)
#> [1] 0.006666667

Two special cases

No external information Z

If you just want to tune a single penalty parameter using empirical Bayes tuning, simply do not provide Z in the xtune() function. If no external information Z is provided, the function will perform empirical Bayes tuning to choose the single penalty parameter in penalized regression, as an alternative to cross-validation. For example

fit.eb <- xtune(X,Y, family = "linear", c = 0.5)
#> No Z matrix provided, only a single tuning parameter will be estimated using empirical Bayes tuning 
#> Start estimating alpha:
#> Done!

The estimated tuning parameter is:

fit.eb$lambda

Z as an identity matrix

If you provide an identity matrix as external information Z to xtune(), the function will estimate a separate tuning parameter \(\lambda_j\) for each regression coefficient \(\beta_j\). Note that this is not advised when the number of predictors \(p\) is very large.

Using the dietary example, the following code would estimate a separate penalty parameter for each coefficient.


Z_iden = diag(ncol(diet$DietItems))
fit.diet.identity = xtune(diet$DietItems,diet$weightloss,Z_iden, family = "binary", c = 0.5)
#> Z provided, start estimating individual tuning parameters 
#> Start estimating alpha:
fit.diet.identity$penalty.vector
#>               [,1]
#>  [1,] 1.619150e+02
#>  [2,] 2.667150e+02
#>  [3,] 8.667794e-02
#>  [4,] 1.160597e-02
#>  [5,] 1.514302e+02
#>  [6,] 2.667643e+02
#>  [7,] 2.585258e+02
#>  [8,] 1.315713e-01
#>  [9,] 2.784969e-03
#> [10,] 4.277238e+02
#> [11,] 5.663159e+02
#> [12,] 2.438229e+02
#> [13,] 1.000000e+03
#> [14,] 6.516246e+02

A predictor is excluded from the model (regression coefficient equals to zero) if its corresponding penalty parameter is estimated to be infinity.

Conclusion

We presented the main usage of xtune package. For more details about each function, please go check the package documentation. If you would like to give us feedback or report issue, please tell us on Github.