% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/create_format.R
\name{formats}
\alias{formats}
\alias{discrete_format}
\alias{interval_format}
\title{Create Format Container}
\usage{
discrete_format(...)

interval_format(...)
}
\arguments{
\item{...}{List all the desired recodings/recoding ranges. Every element contains a text for
the new category name and the values/value ranges which should be recoded into this new category.}
}
\value{
Returns a data table which contains the values/value ranges with the corresponding labels
}
\description{
Create a format container which stores discrete or interval values
with corresponding labels that can be applied by using \code{\link[=summarise_plus]{summarise_plus()}}.

Create a format container independent from any data frame. Define which values
should be recoded into which new categories, if the format is applied to
a variable in a data frame.
It is possible to assign a single value to multiple new categories to create
a multilabel.
It is recommended to let format names end with a dot to make them stand out.
}
\details{
The concept of having formats as molds or stencils to put the data through, is inspired by
'SAS' formats. In 'SAS' formats are defined with the procedure Proc Formats, which is adapted
with \code{\link[=discrete_format]{discrete_format()}} and \code{\link[=interval_format]{interval_format()}}. Here you can define, which values
should be transferred into which result categories. This is completely detached from the data
your working with.

The great thing about this is, that one can not only label and recode values, but one can also
define so called multilabels. Meaning, one original value can be transferred into multiple
result categories.

A cell in a data frame can only hold one distinct value, which is normally a good thing.
But let's say you want to convert single ages into age categories. The age "3" for example
could go into the category "under 6", but also in "under 12", "under 18" and "total". Normally
you would compute additional variables, which hold the different categorizations, or you could also
double up the observations for each category. Both ways would just bloat up the data frame and
cost additional memory, particularly if you work with big data sets.

With these format containers, you just keep a small reference of original values and result
categories. Formats and data find their way together only just before computing the results,
meaning the original data frame can be passed into a function capable of handling formats (see below),
without any data transformation beforehand. You just tell the function which format should
be applied to which variable. That's it. The function handles the rest and outputs all the
desired categories.

This method is very memory efficient, readable and user friendly for creating larger and more
complex outputs at the same time.
}
\examples{
age. <- discrete_format(
    "Total"          = 0:100,
    "under 18"       = 0:17,
    "18 to under 25" = 18:24,
    "25 to under 55" = 25:54,
    "55 to under 65" = 55:64,
    "65 and older"   = 65:100)

sex. <- discrete_format(
    "Total"  = 1:2,
    "Male"   = 1,
    "Female" = 2)

education. <- discrete_format(
    "Total"            = c("low", "middle", "high"),
    "low education"    = "low",
    "middle education" = "middle",
    "high education"   = "high")

income. <- interval_format(
    "Total"              = 0:99999,
    "below 500"          = 0:499,
    "500 to under 1000"  = 500:999,
    "1000 to under 2000" = 1000:1999,
    "2000 and more"      = 2000:99999)

state. <- discrete_format(
    "Germany"                       = 1:16,
    "Schleswig-Holstein"            = 1,
    "Hamburg"                       = 2,
    "Lower Saxony"                  = 3,
    "Bremen"                        = 4,
    "North Rhine-Westphalia"        = 5,
    "Hesse"                         = 6,
    "Rhineland-Palatinate"          = 7,
    "Baden-Württemberg"             = 8,
    "Bavaria"                       = 9,
    "Saarland"                      = 10,
    "West"                          = 1:10,
    "Berlin"                        = 11,
    "Brandenburg"                   = 12,
    "Mecklenburg-Western Pomerania" = 13,
    "Saxony"                        = 14,
    "Saxony-Anhalt"                 = 15,
    "Thuringia"                     = 16,
    "East"                          = 11:16)

}
\seealso{
Functions that can handle formats: \code{\link[=summarise_plus]{summarise_plus()}}, \code{\link[=frequencies]{frequencies()}}, \code{\link[=crosstabs]{crosstabs()}},
\code{\link[=any_table]{any_table()}}, \code{\link[=recode_multi]{recode_multi()}}.
}
\keyword{internal}
