Regression Diagnostics by Period using REPS

Introduction

The calculate_regression_diagnostics() function in REPS provides regression diagnostics by period. It is designed for panel or repeated cross-section data (e.g. property transactions over time) to evaluate the quality of period-specific log-linear regressions.

For each period, it:

These diagnostics help assess model quality over time, identifying periods with issues like non-normality, low fit, heteroscedasticity, or autocorrelation.

Required Data

Your dataset should include:

# Example dataset (you should already have this loaded)
head(data_constraxion)
#>   period   price floor_area dist_trainstation neighbourhood_code
#> 1 2008Q1 1142226  127.41917       2.887992985                  E
#> 2 2008Q1  667664   88.70604       2.903955192                  D
#> 3 2008Q1  636207  107.26257       8.250659447                  B
#> 4 2008Q1  777841  112.65725       0.005760792                  E
#> 5 2008Q1  795527  108.08537       1.842145127                  E
#> 6 2008Q1  539206   97.87751       6.375981360                  D
#>   dummy_large_city
#> 1                0
#> 2                1
#> 3                1
#> 4                0
#> 5                0
#> 6                1

# We log transform the floor_area again (see vignette on calculating price index as why)
dataset <- data_constraxion
dataset$floor_area <- log(dataset$floor_area)

Using calculate_regression_diagnostics()

Example:

diagnostics <- calculate_regression_diagnostics(
  dataset = dataset,
  period_variable = "period",
  dependent_variable = "price",
  numerical_variables = c("floor_area", "dist_trainstation"),
  categorical_variables = c("dummy_large_city", "neighbourhood_code")
)

head(diagnostics)
#>   period norm_pvalue  r_adjust  bp_pvalue autoc_pvalue autoc_dw
#> 1 2008Q1   0.9586930 0.8633499 0.74178260 0.5842200307 2.038772
#> 2 2008Q2   0.8191076 0.8607036 0.81813032 0.9540503936 2.274047
#> 3 2008Q3   0.4560750 0.8825515 0.15220690 0.3246547621 1.924436
#> 4 2008Q4   0.9064669 0.9098143 0.97583499 0.7436197200 2.108734
#> 5 2009Q1   0.4036003 0.8624850 0.04268543 0.4948207614 2.003177
#> 6 2009Q2   0.4644423 0.9002921 0.32760619 0.0007476682 1.487031

Visualizing Diagnostics

For convenient visualization:

plot_regression_diagnostics(diagnostics)

This generates a 3x2 grid of plots:

Example:

Interpreting the Output

The hedonic price index relies on a log-linear regression model, which assumes that certain statistical conditions hold. The diagnostics plot provides an overview of how well these assumptions are met across different periods.

Each subplot corresponds to a specific model assumption:

Row 1: Normality and Linearity

Row 2: Independence

Row 3: Homoscedasticity

Summary

The calculate_regression_diagnostics() and plot_regression_diagnostics() functions in REPS enable:

They support robust, high-quality hedonic price index modeling by systematically checking regression assumptions.