The labelled_spss_survey class

library(retroharmonize)

Use the labelled_spss_survey() helper function to create vectors of class retroharmonize_labelled_spss_survey.

sl1 <- labelled_spss_survey (
  x = c(1,1,0,8,8,8), 
  labels = c("yes" =1,
             "no" = 0,
             "declined" = 8),
  label = "Do you agree?",
  na_values = 8, 
  id = "survey1")

print(sl1)
#> [1] 1 1 0 8 8 8
#> attr(,"labels")
#>      yes       no declined 
#>        1        0        8 
#> attr(,"label")
#> [1] "Do you agree?"
#> attr(,"na_values")
#> [1] 8
#> attr(,"class")
#> [1] "retroharmonize_labelled_spss_survey" "haven_labelled_spss"                
#> [3] "haven_labelled"                     
#> attr(,"survey1_name")
#> [1] "c(1, 1, 0, 8, 8, 8)"
#> attr(,"survey1_values")
#> 0 1 8 
#> 0 1 8 
#> attr(,"survey1_label")
#> [1] "Do you agree?"
#> attr(,"survey1_labels")
#>      yes       no declined 
#>        1        0        8 
#> attr(,"survey1_na_values")
#> [1] 8
#> attr(,"id")
#> [1] "survey1"

You can check the type:

is.labelled_spss_survey (sl1)
#> [1] TRUE

The labelled_spss_survey() class inherits some properties from haven::labelled(), which can be manipulated by the labelled package (See particularly the vignette Introduction to labelled by Joseph Larmarange.)

haven::is.labelled(sl1)
#> [1] TRUE
labelled::val_labels(sl1)
#>      yes       no declined 
#>        1        0        8
labelled::na_values(sl1)
#> [1] 8

It can also be subsetted:

sl1[3:4]
#> [1] 0 8
#> attr(,"labels")
#>      yes       no declined 
#>        1        0        8 
#> attr(,"label")
#> [1] "Do you agree?"
#> attr(,"na_values")
#> [1] 8
#> attr(,"class")
#> [1] "retroharmonize_labelled_spss_survey" "haven_labelled_spss"                
#> [3] "haven_labelled"                     
#> attr(,"survey1_name")
#> [1] "c(1, 1, 0, 8, 8, 8)"
#> attr(,"survey1_values")
#> 0 1 8 
#> 0 1 8 
#> attr(,"survey1_label")
#> [1] "Do you agree?"
#> attr(,"survey1_labels")
#>      yes       no declined 
#>        1        0        8 
#> attr(,"survey1_na_values")
#> [1] 8
#> attr(,"id")
#> [1] "survey1"

When used within the modernized version of data.frame, tibble::tibble(), the summary of the variable content prints in an informative way.

df <- tibble::tibble (v1 = sl1)
## Use tibble instead of data.frame(v1=sl1) ...
print(df)
#> # A tibble: 6 x 1
#>                  v1
#>        <retroh_dbl>
#> 1 1 [yes]          
#> 2 1 [yes]          
#> 3 0 [no]           
#> 4 8 (NA) [declined]
#> 5 8 (NA) [declined]
#> 6 8 (NA) [declined]
## ... which inherits the methods of a data.frame 
subset(df, v1 == 1)
#> # A tibble: 2 x 1
#>             v1
#>   <retroh_dbl>
#> 1      1 [yes]
#> 2      1 [yes]

Coercion rules and type casting

To avoid any confusion with mis-labelled surveys, coercion with double or integer vectors will result in a double or integer vector. The use of vctrs::vec_c is generally safer than base R c().

#double
c(sl1, 1/7)
#> [1] 1.0000000 1.0000000 0.0000000 8.0000000 8.0000000 8.0000000 0.1428571
vctrs::vec_c(sl1, 1/7)
#> [1] 1.0000000 1.0000000 0.0000000 8.0000000 8.0000000 8.0000000 0.1428571
c(sl1, 1:3)
#> [1] 1 1 0 8 8 8 1 2 3

Conversion to character works as expected:

as.character(sl1)
#> [1] "1" "1" "0" "8" "8" "8"

The base as.factor converts to integer and uses the integers as levels, because base R factors are integers with a levels attribute.

as.factor(sl1)
#> [1] 1 1 0 8 8 8
#> Levels: 0 1 8

Conversion to factor with as_factor converts the value labels to factor levels:

as_factor(sl1)
#> [1] yes      yes      no       declined declined declined
#> Levels: no yes declined

Similarly, when converting to numeric types, we have to convert the user-defined missing values to NA values used in the R language. For numerical analysis, convert with as_numeric.

as.numeric(sl1)
#> [1] 1 1 0 8 8 8
as_numeric(sl1)
#> [1]  1  1  0 NA NA NA

Arithmetics

The median value is correctly displayed, because user-defined missing values are removed from the calculation. Only a few arithmetic methods are implemented, such as

median (as.numeric(sl1))
#> [1] 4.5
median (sl1)
#> [1] 4.5
quantile (as.numeric(sl1), 0.9)
#> 90% 
#>   8
quantile (sl1, 0.9)
#> 90% 
#>   1
mean (as.numeric(sl1))
#> [1] 4.333333
mean (sl1)
#> [1] 4.333333
mean (sl1, na.rm=TRUE)
#> [1] 0.6666667
weights1 <- runif (n = 6, min = 0, max = 1)
weighted.mean(as.numeric(sl1), weights1)
#> [1] 3.770921
weighted.mean(sl1, weights1)
#> [1] 3.770921
sum (as.numeric(sl1))
#> [1] 26
sum (sl1, na.rm=TRUE)
#> [1] 26

The result of the conversion to numeric can be used for other mathematical / statistical function.

as_numeric(sl1)
#> [1]  1  1  0 NA NA NA
min ( as_numeric(sl1))
#> [1] NA
min ( as_numeric(sl1), na.rm=TRUE)
#> [1] 0