% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/standardize_date.R
\name{standardize_dates}
\alias{standardize_dates}
\title{Standardize date variables}
\usage{
standardize_dates(
  data,
  target_columns = NULL,
  format = NULL,
  timeframe = NULL,
  error_tolerance = 0.4,
  orders = list(world_named_months = c("Ybd", "dby"), world_digit_months = c("dmy",
    "Ymd"), US_formats = c("Omdy", "YOmd"))
)
}
\arguments{
\item{data}{The input \code{<data.frame>} or \code{<linelist>}}

\item{target_columns}{A \code{<vector>} of the target date column names. When
the input data is a \code{<linelist>} object, this parameter can be set to
\code{linelist_tags} if you wish to standardize the date columns across
tagged columns only. Default is \code{NULL}.}

\item{format}{A \code{<vector>} of the expected formats in the date values
from the date columns. Default is \code{NULL}.}

\item{timeframe}{A \code{<vector>} of 2 values of type \code{<Date>}. If
provided, date values that do not fall within this timeframe will be set
to \code{NA}.}

\item{error_tolerance}{A \code{<numeric>} between 0 and 1 indicating the
proportion of entries which cannot be identified as dates to be tolerated;
if this proportion is exceeded, the original vector is returned, and a
message is issued; defaults to 0.4 (40 percent).}

\item{orders}{A \code{<list>} or \code{<vector>} of characters with the date
codes for fine-grained parsing of dates. This allows for parsing of mixed
dates. If a \code{<list>} is supplied, that \code{<list>} will be used for
successive tries in parsing. When this is not provided
(\code{orders = NULL}), the function will use the following order defined
in the guesser:

\if{html}{\out{<div class="sourceCode">}}\preformatted{list(
  quarter_partial_dates = c("Y", "Ym", "Yq"),
  world_digit_months = c("Yq", "ymd", "ydm", "dmy", "mdy", "myd", "dym",
                         "Ymd", "Ydm", "dmY", "mdY", "mYd", "dYm"),
  world_named_months = c("dby", "dyb", "bdy", "byd", "ybd", "ydb",
                         "dbY", "dYb", "bdY", "bYd", "Ybd", "Ydb"),
  us_format = c("Omdy", "YOmd")
)
}\if{html}{\out{</div>}}}
}
\value{
The input dataset where the date columns have been standardized. The
date values that are out of the specified timeframe will be reported in
the report. Similarly, date values that comply with multiple formats will
also be featured in the report object.
}
\description{
When the format of the values in a column and/or the target columns are not
defined, we strongly recommend checking a few converted dates manually to
make sure that the dates extracted from a \code{character} vector or a \code{factor}
are correct.\cr
}
\details{
Check for the presence of date values that could have multiple formats
from the \code{$multi_format_dates} element of the \code{report}.\cr

Converting ambiguous character strings to dates is difficult for
many reasons:
\itemize{
\item dates may not use the standard Ymd format
\item within the same variable, dates may follow different formats
\item dates may be mixed with things that are not dates
\item the behavior of \code{as.Date} in the presence of non-date is hard to predict,
sometimes returning \code{NA}, sometimes issuing an error.
}

This function tries to address all the above issues. Dates with the following
format should be automatically detected, irrespective of separators
(e.g. "-", " ", "/") and surrounding text:
\itemize{
\item "19 09 2018"
\item "2018 09 19"
\item "19 Sep 2018"
\item "2018 Sep 19"
\item "Sep 19 2018"
}

\subsection{How it works}{

This function relies heavily on \code{\link[lubridate:parse_date_time]{lubridate::parse_date_time()}}, which is an
extremely flexible date parser that works well for consistent date formats,
but can quickly become unwieldy and may produce spurious results.
\code{standardize_dates()} will use a list of formats in the \code{orders} argument to
run \code{parse_date_time()} with each format vector separately and take the first
correctly parsed date from all the trials.

With the default orders shown above, the dates 03 Jan 2018, 07/03/1982, and
08/20/85 are correctly interpreted as 2018-01-03, 1982-03-07, and 1985-08-20.
The examples section will show how you can manipulate the \code{orders} to be
customized for your situation.
}
}
\examples{
x <- c("03 Jan 2018", "07/03/1982", "08/20/85")
# The below will coerce values where the month is written in letters only
# into Date.
as.Date(lubridate::parse_date_time(x, orders = c("Ybd", "dby")))

# coerce values where the month is written in letters or numbers into Date.
as.Date(lubridate::parse_date_time(x, orders = c("dmy", "Ymd")))

# How to use standardize_dates()
data <- readRDS(system.file("extdata", "test_df.RDS", package = "cleanepi"))

# convert values in the 'date.of.admission' column into "\%Y-\%m-\%d"
# format
dat <- standardize_dates(
  data = data,
  target_columns = "date.of.admission",
  format = NULL,
  timeframe = as.Date(c("2021-01-01", "2021-12-01")),
  error_tolerance = 0.4,
  orders = list(
    world_named_months = c("Ybd", "dby"),
    world_digit_months = c("dmy", "Ymd"),
    US_format = c("Omdy", "YOmd")
  )
)

# print the report
print_report(dat, "date_standardization")
}
