Skip to contents

Given a multiple linear regression model with n observations and k independent variables, the degree of near-multicollinearity affects its statistical analysis (with a level of significance of alpha%) if there is a variable i, with i = 1,...,k, that verifies that the null hypothesis is not rejected in the original model and is rejected in the orthogonal model of reference.

Usage

multicollinearity(y, x, alpha = 0.05)

Arguments

y

A numerical vector representing the dependent variable of the model.

x

A numerical design matrix that should contain more than one regressor (intercept included in the first column).

alpha

Significance level (by default, 5%).

Details

This function compares the individual inference of the original model with that of the orthonormal model taken as reference.

Thus, if the null hypothesis is rejected in the individual significance tests in the model where there are no linear relationships between the independent variables (orthonormal) and is not rejected in the original model, the reason for the non-rejection is due to the existing linear relationships between the independent variables (multicollinearity) in the original model.

The second model is obtained from the first model by performing a QR decomposition, which eliminates the initial linear relationships.

Value

The function returns the value of the RVIF and the established thresholds, as well as indicating whether or not the individual significance analysis is affected by multicollinearity at the chosen significance level.

References

Salmerón, R., García, C.B. and García, J. (2025). A Redefined Variance Inflation Factor: overcoming the limitations of the Variance Inflation Factor. Computational Economics, 65, 337-363, doi: https://doi.org/10.1007/s10614-024-10575-8.

Overcoming the inconsistences of the variance inflation factor: a redefined VIF and a test to detect statistical troubling multicollinearity by Salmerón, R., García, C.B and García, J. (working paper, https://arxiv.org/pdf/2005.02245).

Author

Román Salmerón Gómez (University of Granada) and Catalina B. García García (University of Granada).

Maintainer: Román Salmerón Gómez (romansg@ugr.es)

See also

Examples

### Example 1
  
  set.seed(2024)
  obs = 100
  cte = rep(1, obs)
  x2 = rnorm(obs, 5, 0.01)  # related to intercept: non essential
  x3 = rnorm(obs, 5, 10)
  x4 = x3 + rnorm(obs, 5, 0.5) # related to x3: essential
  x5 = rnorm(obs, -1, 3)
  x6 = rnorm(obs, 15, 0.5)
  y = 4 + 5*x2 - 9*x3 -2*x4 + 2*x5 + 7*x6 + rnorm(obs, 0, 2)
  x = cbind(cte, x2, x3, x4, x5, x6)
  multicollinearity(y, x)
#>          RVIFs          c0           c3 Scenario Affects
#> 1 2.522626e+03 710.3979874 1.502436e-01      b.1     Yes
#> 2 9.875420e+01  41.8217535 3.659512e-01      b.1     Yes
#> 3 5.555945e-02   4.5808651 7.263924e-07      a.1      No
#> 4 5.528041e-02   0.3113603 1.122944e-02      a.1      No
#> 5 1.234970e-03   0.2585459 6.317865e-06      a.1      No
#> 6 5.039751e-02   3.3626976 7.553190e-04      a.1      No

### Example 2
### Effect of sample size
  
  obs = 25 # by decreasing the number of observations affected to x4 
  cte = rep(1, obs)
  x2 = rnorm(obs, 5, 0.01)  # related to intercept: non essential
  x3 = rnorm(obs, 5, 10)
  x4 = x3 + rnorm(obs, 5, 0.5) # related to x3: essential
  x5 = rnorm(obs, -1, 3)
  x6 = rnorm(obs, 15, 0.5)
  y = 4 + 5*x2 - 9*x3 -2*x4 + 2*x5 + 7*x6 + rnorm(obs, 0, 2)
  x = cbind(cte, x2, x3, x4, x5, x6)
  multicollinearity(y, x)
#>          RVIFs           c0           c3 Scenario Affects
#> 1 1.286600e+04 1.053297e+04 9.384591e-01      b.1     Yes
#> 2 5.707087e+02 5.156288e+02 1.726140e+02      b.1     Yes
#> 3 4.045355e-01 6.263143e+00 2.210555e-05      a.1      No
#> 4 4.005247e-01 4.780367e-02 9.441789e-02      b.1     Yes
#> 5 4.651546e-03 2.659262e-01 7.675577e-05      a.1      No
#> 6 4.833250e-01 1.945277e+00 1.200872e-01      a.1      No

### Example 3
  
  y = 4 - 9*x3 - 2*x5 + rnorm(obs, 0, 2)
  x = cbind(cte, x3, x5) # independently generated
  multicollinearity(y, x)
#>          RVIFs        c0           c3 Scenario Affects
#> 1 0.0446929027 0.7442711 1.003540e-04      a.1      No
#> 2 0.0004039021 4.2977544 3.601141e-08      a.1      No
#> 3 0.0044674952 0.2131438 9.363871e-05      a.1      No
  
### Example 4
### Detection of multicollinearity in Wissel data
  
  head(Wissel, n=5)
#>      t      D cte      C      I     CP
#> 1 1996 3.8051   1 4.7703 4.8786 808.23
#> 2 1997 3.9458   1 4.7784 5.0510 798.03
#> 3 1998 4.0579   1 4.9348 5.3620 806.12
#> 4 1999 4.1913   1 5.0998 5.5585 865.65
#> 5 2000 4.3585   1 5.2907 5.8425 997.30
  y = Wissel[,2]
  x = Wissel[,3:6]
  multicollinearity(y, x)
#>          RVIFs           c0           c3 Scenario Affects
#> 1 1.948661e+02 7.371069e+00 1.017198e+00      b.1     Yes
#> 2 3.032628e+01 4.456018e+00 9.157898e-01      b.1     Yes
#> 3 4.765888e+00 2.399341e+00 1.053598e+01      b.2      No
#> 4 3.821626e-05 2.042640e-06 7.149977e-04      b.2      No
  
### Example 5
### Detection of multicollinearity in euribor data
  
  head(euribor, n=5)
#>      E cte  HIPC    BC       GD
#> 1 3.63   1 92.92 17211 -51384.0
#> 2 3.90   1 93.85  2724 -49567.1
#> 3 3.45   1 93.93 17232 -52128.4
#> 4 3.01   1 94.41  9577 -53593.3
#> 5 2.54   1 95.08  4117 -65480.0
  y = euribor[,1]
  x = euribor[,2:5]
  multicollinearity(y, x)
#>          RVIFs           c0           c3 Scenario Affects
#> 1 5.325408e+00 1.575871e+01 2.166907e-02      a.1      No
#> 2 5.357830e-04 3.219456e-06 4.249359e-05      b.1     Yes
#> 3 5.109564e-11 1.098649e-09 2.586237e-12      a.1      No
#> 4 1.631439e-11 3.216522e-10 8.274760e-13      a.1      No
  
### Example 6
### Detection of multicollinearity in Cobb-Douglas production function data

  head(CDpf, n=5)
#>          P cte     logK     logW
#> 1 37641114   1 17.93734 15.55598
#> 2 42620804   1 18.01187 15.60544
#> 3 37989413   1 17.98800 15.54486
#> 4 40464915   1 18.00700 15.58605
#> 5 41002031   1 18.02283 15.59570
  y = CDpf[,1]
  x = CDpf[,2:4]  
  multicollinearity(y, x)
#>         RVIFs           c0           c3 Scenario Affects
#> 1 6388.881402 88495.933700   1.64951764      a.1      No
#> 2    4.136993   207.628058   0.05043083      a.1      No
#> 3   37.336325     9.445619 147.58213164      b.2      No
  
### Example 7
### Detection of multicollinearity in number of employees of Spanish companies data
  
  head(employees, n=5)
#>       NE cte       FA       OI        S
#> 1   2637   1    44153    38903    38867
#> 2  15954   1  9389509  4293386  4231043
#> 3 162503   1 17374000 23703000 23649000
#> 4 162450   1  9723088 23310532 23310532
#> 5  28389   1 95980120 29827663 29215382
  y = employees[,1]
  x = employees[,3:5]
  multicollinearity(y, x)
#>          RVIFs           c0           c3 Scenario Affects
#> 1 1.829154e-16 2.307712e-16 4.679301e-17      a.1      No
#> 2 1.696454e-12 9.594942e-13 2.129511e-13      b.1     Yes
#> 3 1.718535e-12 1.100437e-12 2.683809e-12      b.2      No
  
### Example 8
### Detection of multicollinearity in simple linear model simulated data
  
  head(SLM1, n=5)
#>           y1 cte         V
#> 1  82.392059   1 19.001420
#> 2  -1.942157   1 -1.733458
#> 3   7.474090   1  1.025146
#> 4 -12.303381   1 -4.445014
#> 5  30.378203   1  6.689864
  y = SLM1[,1]
  x = SLM1[,2:3]
  multicollinearity(y, x)
#>          RVIFs        c0           c3 Scenario Affects
#> 1 0.0403049717 0.6454323 1.045802e-05      a.1      No
#> 2 0.0002675731 0.8383436 8.540101e-08      a.1      No

  head(SLM2, n=5)
#>         y2 cte         Z
#> 1 43.01204   1  9.978211
#> 2 40.04163   1  9.878235
#> 3 40.17086   1  9.924592
#> 4 40.79076   1 10.019123
#> 5 44.72774   1 10.104728
  y = SLM2[,1]
  x = SLM2[,2:3]
  multicollinearity(y, x)
#>        RVIFs         c0         c3 Scenario Affects
#> 1 187.800878 21.4798003 0.03277691      b.1     Yes
#> 2   1.879296  0.3687652 9.57724567      b.2      No
    
### Example 9
### Detection of multicollinearity in soil characteristics data

  head(soil, n=5)
#>   BaseSat SumCation CECbuffer     Ca     Mg      K    Na     P    Cu    Zn
#> 1    2.34    0.1576     0.614 0.0892 0.0328 0.0256 0.010 0.000 0.080 0.184
#> 2    1.64    0.0970     0.516 0.0454 0.0218 0.0198 0.010 0.000 0.064 0.112
#> 3    5.20    0.4520     0.828 0.3306 0.0758 0.0336 0.012 0.240 0.136 0.350
#> 4    4.10    0.3054     0.698 0.2118 0.0536 0.0260 0.014 0.030 0.126 0.364
#> 5    2.70    0.2476     0.858 0.1568 0.0444 0.0304 0.016 0.384 0.078 0.376
#>      Mn HumicMatter Density    pH ExchAc Diversity
#> 1 3.200      0.1220  0.0822 0.516  0.466 0.2765957
#> 2 2.734      0.0952  0.0850 0.512  0.430 0.2613982
#> 3 4.148      0.1822  0.0746 0.554  0.388 0.2553191
#> 4 3.728      0.1646  0.0756 0.546  0.408 0.2401216
#> 5 4.756      0.2472  0.0692 0.450  0.624 0.1884498
  y = soil[,16]
  x = soil[,-16] 
  x = cbind(rep(1, length(y)), x) # the design matrix has to have the intercept in the first column
  multicollinearity(y, x)
#> System is computationally singular. Modify the design matrix before running the code.
  multicollinearity(y, x[,-3]) # eliminating the problematic variable (SumCation)
#>           RVIFs           c0           c3 Scenario Affects
#> 1  4.407184e+02 6.150190e-03 1.480048e+00      b.1     Yes
#> 2  3.828858e+00 1.142356e-02 7.653413e+00      b.2      No
#> 3  1.093791e+05 1.254955e+02 7.236491e+04      b.1     Yes
#> 4  9.883235e+04 3.938383e+01 2.237445e+05      b.2      No
#> 5  1.767758e+05 1.101028e+03 3.609837e+05      b.2      No
#> 6  1.150029e+05 1.627349e+03 1.976176e+05      b.2      No
#> 7  4.627807e+04 5.960870e+02 2.033176e+06      b.2      No
#> 8  1.338591e+01 6.062571e-01 4.060382e+02      b.2      No
#> 9  3.113066e+02 4.089095e+01 5.246698e+05      b.2      No
#> 10 5.177176e+01 6.371216e+00 8.094828e+02      b.2      No
#> 11 1.905089e-01 3.907589e-02 9.787963e-01      b.2      No
#> 12 3.379360e+02 4.534540e+01 2.861964e+02      b.1     Yes
#> 13 4.761238e+04 8.453066e+01 3.828016e+08      b.2      No
#> 14 1.502903e+03 7.901580e+01 9.961215e+03      b.2      No
#> 15 1.066711e+05 2.369347e+02 4.802466e+07      b.2      No
  
### Example 10
### The intercept must be in the first column of the design matrix
  
  set.seed(2025)
  obs = 100
  cte = rep(1, obs)
  x2 = sample(1:500, obs)
  x3 = sample(1:500, obs)
  x4 = rep(4, obs)
  x = cbind(cte, x2, x3, x4)
  u = rnorm(obs, 0, 2)
  y = 5 + 2*x2 - 3*x3 + 10*x4 + u
  multicollinearity(y, x)
#> There is a constant variable. Delete it before running the code or, if it is the intercept, it must be the first column of the design matrix.
#> Perfect multicollinearity exists. Modify the design matrix before running the code.
  multicollinearity(y, x[,-4]) # the constant variable is removed
#>          RVIFs          c0           c3 Scenario Affects
#> 1 7.404884e-02 121.0498871 4.062417e-07      a.1      No
#> 2 4.750899e-07   0.2408853 9.995920e-13      a.1      No
#> 3 4.977510e-07   0.5417202 4.573507e-13      a.1      No