% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/orsf_predict.R
\name{predict.ObliqueForest}
\alias{predict.ObliqueForest}
\title{Prediction for ObliqueForest Objects}
\usage{
\method{predict}{ObliqueForest}(
  object,
  new_data = NULL,
  pred_type = NULL,
  pred_horizon = NULL,
  pred_aggregate = TRUE,
  pred_simplify = FALSE,
  oobag = FALSE,
  na_action = NULL,
  boundary_checks = TRUE,
  n_thread = NULL,
  verbose_progress = NULL,
  ...
)
}
\arguments{
\item{object}{(\emph{ObliqueForest}) a trained oblique random forest object (see \link{orsf}).}

\item{new_data}{a \link{data.frame}, \link[tibble:tibble-package]{tibble}, or \link[data.table:data.table]{data.table} to compute predictions in.}

\item{pred_type}{(\emph{character}) the type of predictions to compute. Valid
options for survival are:
\itemize{
\item 'risk' : probability of having an event at or before \code{pred_horizon}.
\item 'surv' : 1 - risk.
\item 'chf': cumulative hazard function
\item 'mort': mortality prediction
\item 'time': survival time prediction
}

For classification:
\itemize{
\item 'prob': probability for each class
\item 'class': predicted class
}

For regression:
\itemize{
\item 'mean': predicted mean, i.e., the expected value
}}

\item{pred_horizon}{(\emph{double}) Only relevent for survival forests.
A value or vector indicating the time(s) that predictions will be
calibrated to. E.g., if you were predicting risk of incident heart
failure within the next 10 years, then \code{pred_horizon = 10}.
\code{pred_horizon} can be \code{NULL} if \code{pred_type} is \code{'mort'}, since
mortality predictions are aggregated over all event times}

\item{pred_aggregate}{(\emph{logical}) If \code{TRUE} (the default), predictions
will be aggregated over all trees by taking the mean. If \code{FALSE}, the
returned output will contain one row per observation and one column
for each tree. If the length of \code{pred_horizon} is two or more and
\code{pred_aggregate} is \code{FALSE}, then the result will be a list of such
matrices, with the i'th item in the list corresponding to the i'th
value of \code{pred_horizon}.}

\item{pred_simplify}{(\emph{logical}) If \code{FALSE} (the default), predictions
will always be returned in a numeric matrix or a list of numeric matrices.
If \code{TRUE}, predictions may be simplified to a vector, e.g., if \code{pred_type}
is \code{'mort'} for survival or \code{'class'} for classification, or an array of
matrices if \code{length(pred_horizon) > 1}.}

\item{oobag}{(\emph{logical}) If \code{FALSE} (the default), predictions will
be computed using all trees for each observation. If \code{TRUE}, then
out-of-bag predictions will be computed. This input parameter should
only be set to \code{TRUE} if \code{new_data} is \code{NULL}.}

\item{na_action}{(\emph{character}) what should happen when \code{new_data} contains missing values (i.e., \code{NA} values). Valid options are:
\itemize{
\item 'fail' : an error is thrown if \code{new_data} contains \code{NA} values
\item 'pass' : the output will have \code{NA} in all rows where \code{new_data} has 1 or more \code{NA} value for the predictors used by \code{object}
\item 'omit' : rows in \code{new_data} with incomplete data will be dropped
\item 'impute_meanmode' : missing values for continuous and categorical variables in \code{new_data} will be imputed using the mean and mode, respectively. To clarify,
the mean and mode used to impute missing values are from the
training data of \code{object}, not from \code{new_data}.
}}

\item{boundary_checks}{(\emph{logical}) if \code{TRUE}, \code{pred_horizon} will be
checked to make sure the requested values are less than the maximum
observed time in \code{object}'s training data. If \code{FALSE}, these checks
are skipped.}

\item{n_thread}{(\emph{integer}) number of threads to use while computing predictions. Default is 0, which allows a suitable number of threads to be used based on availability.}

\item{verbose_progress}{(\emph{logical}) if \code{TRUE}, progress messages are
printed in the console. If \code{FALSE} (the default), nothing is printed.}

\item{...}{Further arguments passed to or from other methods (not currently used).}
}
\value{
a \code{matrix} of predictions. Column \code{j} of the matrix corresponds
to value \code{j} in \code{pred_horizon}. Row \code{i} of the matrix corresponds to
row \code{i} in \code{new_data}.
}
\description{
Compute predicted values from an oblique random forest. Predictions
may be returned in aggregate (i.e., averaging over all the trees)
or tree-specific.
}
\details{
\code{new_data} must have the same columns with equivalent types as the data
used to train \code{object}. Also, factors in \code{new_data} must not have levels
that were not in the data used to train \code{object}.

\code{pred_horizon} values should not exceed the maximum follow-up time in
\code{object}'s training data, but if you truly want to do this, set
\code{boundary_checks = FALSE} and you can use a \code{pred_horizon} as large
as you want. Note that predictions beyond the maximum follow-up time
in the \code{object}'s training data are equal to predictions at the
maximum follow-up time, because \code{aorsf} does not estimate survival
beyond its maximum observed time.

If unspecified, \code{pred_horizon} may be automatically specified as the value
used for \code{oobag_pred_horizon} when \code{object} was created (see \link{orsf}).
}
\section{Examples}{
\if{html}{\out{<div class="sourceCode r">}}\preformatted{library(aorsf)
}\if{html}{\out{</div>}}
\subsection{Classification}{

\if{html}{\out{<div class="sourceCode r">}}\preformatted{set.seed(329)

index_train <- sample(nrow(penguins_orsf), 150) 

penguins_orsf_train <- penguins_orsf[index_train, ]
penguins_orsf_test <- penguins_orsf[-index_train, ]

fit_clsf <- orsf(data = penguins_orsf_train, 
                 formula = species ~ .)
}\if{html}{\out{</div>}}

Predict probability for each class or the predicted class:

\if{html}{\out{<div class="sourceCode r">}}\preformatted{# predicted probabilities, the default
predict(fit_clsf, 
        new_data = penguins_orsf_test[1:5, ],
        pred_type = 'prob')
}\if{html}{\out{</div>}}

\if{html}{\out{<div class="sourceCode">}}\preformatted{##         Adelie  Chinstrap      Gentoo
## [1,] 0.9405310 0.04121955 0.018249405
## [2,] 0.9628988 0.03455909 0.002542096
## [3,] 0.9032074 0.08510528 0.011687309
## [4,] 0.9300133 0.05209040 0.017896329
## [5,] 0.7965703 0.16243492 0.040994821
}\if{html}{\out{</div>}}

\if{html}{\out{<div class="sourceCode r">}}\preformatted{# predicted class (as a matrix by default)
predict(fit_clsf, 
        new_data = penguins_orsf_test[1:5, ],
        pred_type = 'class')
}\if{html}{\out{</div>}}

\if{html}{\out{<div class="sourceCode">}}\preformatted{##      [,1]
## [1,]    1
## [2,]    1
## [3,]    1
## [4,]    1
## [5,]    1
}\if{html}{\out{</div>}}

\if{html}{\out{<div class="sourceCode r">}}\preformatted{# predicted class (as a factor if you use simplify)
predict(fit_clsf, 
        new_data = penguins_orsf_test[1:5, ],
        pred_type = 'class',
        pred_simplify = TRUE)
}\if{html}{\out{</div>}}

\if{html}{\out{<div class="sourceCode">}}\preformatted{## [1] Adelie Adelie Adelie Adelie Adelie
## Levels: Adelie Chinstrap Gentoo
}\if{html}{\out{</div>}}
}

\subsection{Regression}{

\if{html}{\out{<div class="sourceCode r">}}\preformatted{set.seed(329)

index_train <- sample(nrow(penguins_orsf), 150) 

penguins_orsf_train <- penguins_orsf[index_train, ]
penguins_orsf_test <- penguins_orsf[-index_train, ]

fit_regr <- orsf(data = penguins_orsf_train, 
                 formula = bill_length_mm ~ .)
}\if{html}{\out{</div>}}

Predict the mean value of the outcome:

\if{html}{\out{<div class="sourceCode r">}}\preformatted{predict(fit_regr, 
        new_data = penguins_orsf_test[1:5, ], 
        pred_type = 'mean')
}\if{html}{\out{</div>}}

\if{html}{\out{<div class="sourceCode">}}\preformatted{##          [,1]
## [1,] 37.74136
## [2,] 37.42367
## [3,] 37.04598
## [4,] 39.89602
## [5,] 39.14848
}\if{html}{\out{</div>}}
}

\subsection{Survival}{

Begin by fitting an oblique survival random forest:

\if{html}{\out{<div class="sourceCode r">}}\preformatted{set.seed(329)

index_train <- sample(nrow(pbc_orsf), 150) 

pbc_orsf_train <- pbc_orsf[index_train, ]
pbc_orsf_test <- pbc_orsf[-index_train, ]

fit_surv <- orsf(data = pbc_orsf_train, 
                 formula = Surv(time, status) ~ . - id,
                 oobag_pred_horizon = 365.25 * 5)
}\if{html}{\out{</div>}}

Predict risk, survival, or cumulative hazard at one or several times:

\if{html}{\out{<div class="sourceCode r">}}\preformatted{# predicted risk, the default
predict(fit_surv, 
        new_data = pbc_orsf_test[1:5, ], 
        pred_type = 'risk', 
        pred_horizon = c(500, 1000, 1500))
}\if{html}{\out{</div>}}

\if{html}{\out{<div class="sourceCode">}}\preformatted{##             [,1]        [,2]       [,3]
## [1,] 0.013648562 0.058393393 0.11184029
## [2,] 0.003811413 0.026857586 0.04774151
## [3,] 0.030548361 0.100600301 0.14847107
## [4,] 0.040381075 0.169596943 0.27018952
## [5,] 0.001484698 0.006663576 0.01337655
}\if{html}{\out{</div>}}

\if{html}{\out{<div class="sourceCode r">}}\preformatted{# predicted survival, i.e., 1 - risk
predict(fit_surv, 
        new_data = pbc_orsf_test[1:5, ], 
        pred_type = 'surv',
        pred_horizon = c(500, 1000, 1500))
}\if{html}{\out{</div>}}

\if{html}{\out{<div class="sourceCode">}}\preformatted{##           [,1]      [,2]      [,3]
## [1,] 0.9863514 0.9416066 0.8881597
## [2,] 0.9961886 0.9731424 0.9522585
## [3,] 0.9694516 0.8993997 0.8515289
## [4,] 0.9596189 0.8304031 0.7298105
## [5,] 0.9985153 0.9933364 0.9866235
}\if{html}{\out{</div>}}

\if{html}{\out{<div class="sourceCode r">}}\preformatted{# predicted cumulative hazard function
# (expected number of events for person i at time j)
predict(fit_surv, 
        new_data = pbc_orsf_test[1:5, ], 
        pred_type = 'chf',
        pred_horizon = c(500, 1000, 1500))
}\if{html}{\out{</div>}}

\if{html}{\out{<div class="sourceCode">}}\preformatted{##             [,1]        [,2]       [,3]
## [1,] 0.015395388 0.067815817 0.14942956
## [2,] 0.004022524 0.028740305 0.05424314
## [3,] 0.034832754 0.127687156 0.20899732
## [4,] 0.059978334 0.233048809 0.42562310
## [5,] 0.001651365 0.007173177 0.01393016
}\if{html}{\out{</div>}}

Predict mortality, defined as the number of events in the forest’s
population if all observations had characteristics like the current
observation. This type of prediction does not require you to specify a
prediction horizon

\if{html}{\out{<div class="sourceCode r">}}\preformatted{predict(fit_surv, 
        new_data = pbc_orsf_test[1:5, ], 
        pred_type = 'mort')
}\if{html}{\out{</div>}}

\if{html}{\out{<div class="sourceCode">}}\preformatted{##           [,1]
## [1,] 23.405016
## [2,] 15.362916
## [3,] 26.180648
## [4,] 36.515629
## [5,]  5.856674
}\if{html}{\out{</div>}}
}
}

