Pooling and Selection of Linear Regression Models

Martijn W Heymans

2023-06-16

Introduction

With the psfmi_lm function you can pool Linear regression models by using
the following pooling methods: RR (Rubin’s Rules), D1, D2 and MPR (Median R Rule).

You can also use forward or backward selection from the pooled model.

This vignette show you examples of how to apply these procedures.

Examples

Pooling without BS and method D1


  library(psfmi)
  pool_lm <- psfmi_lm(data=lbpmilr, nimp=5, impvar="Impnr", 
                      formula = Pain ~ Gender + Smoking + 
                      Function + JobControl + JobDemands + SocialSupport, 
                      method="D1")
  
  pool_lm$RR_model
#> $`Step 1 - no variables removed -`
#>            term     estimate  std.error   statistic        df     p.value
#> 1   (Intercept)  7.626750501 2.37470136  3.21166721 103.21605 0.001760151
#> 2        Gender -0.549897436 0.41763180 -1.31670395  97.10997 0.191036859
#> 3       Smoking -0.184822738 0.35459284 -0.52122524  60.23783 0.604120893
#> 4      Function -0.126983721 0.04264394 -2.97776686  46.48759 0.004600709
#> 5    JobControl -0.018201443 0.01884372 -0.96591573 117.54453 0.336069460
#> 6    JobDemands  0.015351105 0.03590006  0.42760673 121.85071 0.669692207
#> 7 SocialSupport -0.003435975 0.05621115 -0.06112622  96.21255 0.951385488

Back to Examples

Pooling with BS and method D1

Pooling linear regression models over 5 imputed datasets with backward selection using a p-value of 0.05 and as method D1 and forcing the predictor “Smoking” in the models during backward selection.


  library(psfmi)
  pool_lm <- psfmi_lm(data=lbpmilr, nimp=5, impvar="Impnr", 
                      formula = Pain ~ Gender + Smoking + 
                      Function + JobControl + JobDemands + SocialSupport, 
                      keep.predictors = "Smoking", method="D1", p.crit=0.05, 
                      direction="BW")
#> Removed at Step 1 is - SocialSupport
#> Removed at Step 2 is - JobDemands
#> Removed at Step 3 is - JobControl
#> Removed at Step 4 is - Gender
#> 
#> Selection correctly terminated, 
#> No more variables removed from the model
  
  pool_lm$RR_model_final
#> $`Step 5`
#>          term   estimate  std.error  statistic       df      p.value
#> 1 (Intercept)  6.7504947 0.47607990 14.1793314 78.48419 2.368256e-23
#> 2     Smoking -0.1998222 0.35556369 -0.5619871 57.20990 5.763201e-01
#> 3    Function -0.1403048 0.04077998 -3.4405314 51.97198 1.153144e-03
  pool_lm$multiparm_final
#> $`Step 5`
#>           p-values D1 F-statistic
#> Smoking  0.5753099186   0.3158295
#> Function 0.0008794238  11.8372561
  pool_lm$predictors_out
#>         Gender Smoking Function JobControl JobDemands SocialSupport
#> Step 1       0       0        0          0          0             1
#> Step 2       0       0        0          0          1             0
#> Step 3       0       0        0          1          0             0
#> Step 4       1       0        0          0          0             0
#> Removed      1       0        0          1          1             1

Back to Examples

Pooling with BS and method MPR

Pooling linear regression models over 5 imputed datasets with backward selection using a p-value of 0.05 and as method D1 and forcing the predictor “Smoking” in the models during backward selection.


  library(psfmi)
  pool_lm <- psfmi_lm(data=lbpmilr, nimp=5, impvar="Impnr", 
                      formula = Pain ~ Gender + Smoking + 
                      Function + JobControl + JobDemands + SocialSupport, 
                      keep.predictors = "Smoking", method="MPR", p.crit=0.05, 
                      direction="BW")
#> Removed at Step 1 is - SocialSupport
#> Removed at Step 2 is - JobDemands
#> Removed at Step 3 is - JobControl
#> Removed at Step 4 is - Gender
#> 
#> Selection correctly terminated, 
#> No more variables removed from the model
  
  pool_lm$RR_model_final
#> $`Step 5`
#>          term   estimate  std.error  statistic       df      p.value
#> 1 (Intercept)  6.7504947 0.47607990 14.1793314 78.48419 2.368256e-23
#> 2     Smoking -0.1998222 0.35556369 -0.5619871 57.20990 5.763201e-01
#> 3    Function -0.1403048 0.04077998 -3.4405314 51.97198 1.153144e-03
  pool_lm$multiparm_final
#> $`Step 5`
#>           p-value MPR
#> Smoking  0.6019832504
#> Function 0.0001268997
  pool_lm$predictors_out  
#>         Gender Smoking Function JobControl JobDemands SocialSupport
#> Step 1       0       0        0          0          0             1
#> Step 2       0       0        0          0          1             0
#> Step 3       0       0        0          1          0             0
#> Step 4       1       0        0          0          0             0
#> Removed      1       0        0          1          1             1

Back to Examples

Pooling with BS including interaction terms and method D2

Pooling linear regression models over 5 imputed datasets with BS using a p-value of 0.05 and as method D2. Several interaction terms, including a categorical predictor, are part of the selection procedure.


  library(psfmi)
  pool_lm <- psfmi_lm(data=lbpmilr, nimp=5, impvar="Impnr", 
                      formula = Pain ~ Gender + Smoking + 
                        Function + JobControl + factor(Carrying) + 
                        factor(Satisfaction) +
                        factor(Carrying):Smoking + Gender:Smoking, 
                      method="D2", p.crit=0.05, 
                      direction="BW")
#> Removed at Step 1 is - Function
#> Removed at Step 2 is - Gender*Smoking
#> Removed at Step 3 is - Smoking*factor(Carrying)
#> Removed at Step 4 is - Smoking
#> Removed at Step 5 is - JobControl
#> Removed at Step 6 is - Gender
#> 
#> Selection correctly terminated, 
#> No more variables removed from the model
  
  pool_lm$RR_model_final
#> $`Step 7`
#>                    term  estimate std.error  statistic        df      p.value
#> 1           (Intercept) 3.8156476 0.3621860 10.5350490 129.49014 4.010553e-19
#> 2     factor(Carrying)2 0.8759161 0.3761904  2.3283850 113.24274 2.166656e-02
#> 3     factor(Carrying)3 1.8001704 0.3746799  4.8045553 145.87249 3.811778e-06
#> 4 factor(Satisfaction)2 0.1385358 0.3729809  0.3714288 108.41273 7.110431e-01
#> 5 factor(Satisfaction)3 1.4420012 0.4685986  3.0772635  74.55846 2.921715e-03
  pool_lm$multiparm_final
#> $`Step 7`
#>                       p-values D2 F-statistic
#> factor(Carrying)     1.653888e-05   11.150999
#> factor(Satisfaction) 7.789587e-03    5.372204
  pool_lm$predictors_out 
#>         Gender Smoking Function JobControl factor(Carrying)
#> Step 1       0       0        1          0                0
#> Step 2       0       0        0          0                0
#> Step 3       0       0        0          0                0
#> Step 4       0       1        0          0                0
#> Step 5       0       0        0          1                0
#> Step 6       1       0        0          0                0
#> Removed      1       1        1          1                0
#>         factor(Satisfaction) Smoking*factor(Carrying) Gender*Smoking
#> Step 1                     0                        0              0
#> Step 2                     0                        0              1
#> Step 3                     0                        1              0
#> Step 4                     0                        0              0
#> Step 5                     0                        0              0
#> Step 6                     0                        0              0
#> Removed                    0                        1              1

Back to Examples

Pooling with BS and forcing interaction terms and method D1

Same as above but now forcing several predictors, including interaction terms, in the model during BS.


  library(psfmi)
  pool_lm <- psfmi_lm(data=lbpmilr, nimp=5, impvar="Impnr", 
                      formula = Pain ~ Gender + Smoking + 
                      Function + JobControl + factor(Carrying) + factor(Satisfaction) +
                        factor(Carrying):Smoking + Gender:Smoking, 
                      keep.predictors = c("Smoking*Carrying", "JobControl"), method="D1", 
                      p.crit=0.05, direction="BW")
#> Removed at Step 1 is - Function
#> Removed at Step 2 is - Gender*Smoking
#> Removed at Step 3 is - Gender
#> 
#> Selection correctly terminated, 
#> No more variables removed from the model
  
  pool_lm$RR_model_final
#> $`Step 4`
#>                        term    estimate  std.error  statistic        df
#> 1               (Intercept)  5.05673749 1.11537162  4.5336796  87.35469
#> 2                   Smoking -0.75879295 0.59455328 -1.2762405  50.60796
#> 3                JobControl -0.01558801 0.01737846 -0.8969733  87.93658
#> 4         factor(Carrying)2  0.51735642 0.51915658  0.9965325 132.99359
#> 5         factor(Carrying)3  1.31863192 0.50113424  2.6312948 126.77358
#> 6     factor(Satisfaction)2  0.11077123 0.37320587  0.2968100 117.98206
#> 7     factor(Satisfaction)3  1.44590689 0.48154484  3.0026423  64.65768
#> 8 Smoking:factor(Carrying)2  0.81312389 0.77812973  1.0449721  87.32029
#> 9 Smoking:factor(Carrying)3  1.13073244 0.79050622  1.4303903 104.46161
#>        p.value
#> 1 1.832877e-05
#> 2 2.076965e-01
#> 3 3.721823e-01
#> 4 3.208012e-01
#> 5 9.561386e-03
#> 6 7.671335e-01
#> 7 3.802284e-03
#> 8 2.989200e-01
#> 9 1.555895e-01
  pool_lm$multiparm_final
#> $`Step 4`
#>                           p-values D1 F-statistic
#> Smoking                  0.5399398352   0.7214117
#> JobControl               0.3705279273   0.8045610
#> factor(Carrying)         0.0001017566   5.9155402
#> factor(Satisfaction)     0.0025183119   6.2392581
#> Smoking*factor(Carrying) 0.3318368885   1.1068101
  pool_lm$predictors_out 
#>         Gender Smoking Function JobControl factor(Carrying)
#> Step 1       0       0        1          0                0
#> Step 2       0       0        0          0                0
#> Step 3       1       0        0          0                0
#> Removed      1       0        1          0                0
#>         factor(Satisfaction) Smoking*factor(Carrying) Gender*Smoking
#> Step 1                     0                        0              0
#> Step 2                     0                        0              1
#> Step 3                     0                        0              0
#> Removed                    0                        0              1

Back to Examples

Pooling with BS including spline coefficient and method D1

Pooling linear regression models over 5 imputed datasets with BS using a p-value of 0.05 and as method D1. A spline predictor and interaction term are part of the selection procedure.


  library(psfmi)
  pool_lm <- psfmi_lm(data=lbpmilr, nimp=5, impvar="Impnr", 
                      formula = Pain ~ Gender + Smoking + 
                      JobControl + factor(Carrying) + factor(Satisfaction) +
                      factor(Carrying):Smoking + rcs(Function, 3), 
                      method="D1", 
                      p.crit=0.05, direction="BW")
#> Removed at Step 1 is - rcs(Function,3)
#> Removed at Step 2 is - Smoking*factor(Carrying)
#> Removed at Step 3 is - Smoking
#> Removed at Step 4 is - JobControl
#> Removed at Step 5 is - Gender
#> 
#> Selection correctly terminated, 
#> No more variables removed from the model
  
  pool_lm$RR_model_final
#> $`Step 6`
#>                    term  estimate std.error  statistic        df      p.value
#> 1           (Intercept) 3.8156476 0.3621860 10.5350490 129.49014 4.010553e-19
#> 2     factor(Carrying)2 0.8759161 0.3761904  2.3283850 113.24274 2.166656e-02
#> 3     factor(Carrying)3 1.8001704 0.3746799  4.8045553 145.87249 3.811778e-06
#> 4 factor(Satisfaction)2 0.1385358 0.3729809  0.3714288 108.41273 7.110431e-01
#> 5 factor(Satisfaction)3 1.4420012 0.4685986  3.0772635  74.55846 2.921715e-03
  pool_lm$multiparm_final
#> $`Step 6`
#>                       p-values D1 F-statistic
#> factor(Carrying)     1.752967e-05   11.125118
#> factor(Satisfaction) 2.477744e-03    6.275617
  pool_lm$predictors_out 
#>         Gender Smoking JobControl factor(Carrying) factor(Satisfaction)
#> Step 1       0       0          0                0                    0
#> Step 2       0       0          0                0                    0
#> Step 3       0       1          0                0                    0
#> Step 4       0       0          1                0                    0
#> Step 5       1       0          0                0                    0
#> Removed      1       1          1                0                    0
#>         rcs(Function,3) Smoking*factor(Carrying)
#> Step 1                1                        0
#> Step 2                0                        1
#> Step 3                0                        0
#> Step 4                0                        0
#> Step 5                0                        0
#> Removed               1                        1

Back to Examples