--- title: "Tips and tricks" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Tips and tricks} %\VignetteEncoding{UTF-8} %\VignetteEngine{knitr::rmarkdown} editor_options: chunk_output_type: console --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE # , comment = "#>" ) ``` # Conditional independence test Suppose you want to test the equality of proportions of a specific variable across all levels of another variable. For example, suppose you are interested in visits by patients aged 65-74 years (`AGER` value of `"65-74 years"`). You'd like to test the proportion of these visits for different values of physician specialty (`SPECCAT`). To perform this test, use the `tab_subset()` function, with argument `test` set to the level of interest, in this case, `"65-74 years"`. Thus, in this case, the analysis is as follows: ```{r, results='asis', message=FALSE} library(surveytable) set_survey(namcs2019sv) tab_subset("AGER", "SPECCAT", test = "65-74 years") ``` # Citing this package If you use `surveytable` in your research, please include the following statement in the Methods section of your article or report: ```{r, echo=FALSE} version = packageVersion("surveytable") ``` > Data analyses were performed using the R package "surveytable" (version `r version`). Please cite this package as follows: > Strashny A (2023). _surveytable: Streamlining Complex Survey Estimation and Reliability Assessment in R_. doi:10.32614/CRAN.package.surveytable, R package version `r version`, . # Reset the options Depending on your coding style, you might be changing options in multiple locations in your code. In these situations, `set_opts(reset = TRUE)` might be useful. This command resets all `surveytable` options to their default values. # Verify the results Some analysts might wish to compare the output from `surveytable` to the output from other statistical software, such as SAS / SUDAAN. In this situation, `set_opts(output = "raw")` might be useful. This command tells `surveytable` to print unformatted and unrounded tables. In addition, when performing hypothesis testing, this option prints the test statistic and the degrees of freedom, not just the p-value. ```{r, results='asis', message=FALSE} library(surveytable) set_survey(namcs2019sv) ``` ```r set_opts(output = "raw") tab("SPECCAT", test = TRUE) set_opts(reset = TRUE) ``` ```{r, echo=FALSE} set_opts(output = "raw") print( tab("SPECCAT", test = TRUE), destination = "") set_opts(reset = TRUE) ``` # Subset a survey Consider this example, in which we estimate the number of medications by age group: ```{r, results='asis', message=FALSE} library(surveytable) set_survey(namcs2019sv) ``` ```{r, results='asis'} tab_subset("NUMMED", "AGER") ``` What if we'd like to estimate the same thing, but only for the visits for which `NUMMED > 0`? One way to do this is to create another survey object for which `NUMMED > 0`, and then analyze this new survey object. ```{r, results='asis'} newsurvey = survey_subset(namcs2019sv, NUMMED > 0 , label = "NAMCS 2019 PUF: NUMMED 1+") set_survey(newsurvey) ``` Note that we called `set_survey()`, to let R know that we now want to analyze the new object `newsurvey`, not `namcs2019sv`. Now, let's create the table: ```{r, results='asis'} tab_subset("NUMMED", "AGER") ``` Be sure to check the table title to verify that you are tabulating the new survey object. # Advanced variable editing and data flow ## Advanced variable editing First, let's review what I call "advanced variable editing". * `surveytable` provides a number of functions to create or modify survey variables. * Some examples include `var_collapse()` and `var_cut()`. * Occasionally, you might need to do advanced variable editing. Here's how: Keep in mind that every survey object has an element called `variables`. This is a data frame where the survey's variables are located. 1. Create a new variable in the `variables` data frame (which is part of the survey object). 2. Call `set_survey()` again. **Any time** you modify the `variables` data frame, call `set_survey()`. 3. Tabulate the new variable. For an example of this, see `vignette("Example-Residential-Care-Community-Services-User-NSLTCP-RCC-SU-report")`. ## Data flow The above explanation raises the question of why `set_survey()` must be called again, after `variables` is modified. Here is an explanation: The survey that you're analyzing actually exists in **three** separate places: 1. A file on your computer data storage that contains the survey object. For example, it could be an RDS file on your hard disk drive that contains the survey object named something like `mysurvey.rds`. 2. The survey object in R's global environment, named something like `mysurvey`. 3. A hidden copy of the survey object that's used by `surveytable`. **This is what `surveytable` analyzes**. Why is there (3) that's different from (2), you might ask. That's due to an arcane issue with how R packages work -- both (2) and (3) are necessary. Normally, **information only flows forwards**, from (1) to (2) and from (2) to (3). Forwards flow: * Going from (1) to (2): call `readRDS()`. * Going from (2) to (3): call `set_survey()`. Backwards flow: * Going from (3) to (2): you probably don't need this, but see below. If you really need this, use `surveytable:::.load_survey()`. * Going from (2) to (1): call `saveRDS()`. Normally, you probably don't want to do this. Normally, the survey file (`mysurvey.rds`) should probably not be changed. The functions for modifying or creating variables that are part of the `surveytable` package (like `var_cut()` or `var_collapse()`) modify (3). Since (3) is what `surveytable` works with and tabulates, you can use `var_collapse()`, and then immediately use `tab()`. You don't need to do anything extra in between. If you are modifying the `variables` data frame directly, you are modifying (2). After you modify (2), you need to copy it over to (3), so that `surveytable` can use it. You do that by calling `set_survey()`. Thus, any time you modify `variables` yourself, call `set_survey()`. You modify (2), then copy (2) -> (3) by calling `set_survey()`. On the flip side, the changes that you make in (3) (using `surveytable` functions like `var_cut()` or `var_collapse()`) are not reflected in (2). If you make changes in (3), then call `set_survey()`, **those changes are lost**, because `set_survey()` copies (2) -> (3). If those changes were important, you can just rerun the code that created them. If you really need to go from (3) to (2), use `mysurvey = surveytable:::.load_survey()`.