% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/Simulations.R
\name{SimulateOneRep}
\alias{SimulateOneRep}
\title{Simulate a single replicate of NR-seq data}
\usage{
SimulateOneRep(
  nfeatures,
  read_vect = NULL,
  label_time = 2,
  sample_name = "sampleA",
  feature_prefix = "Gene",
  fn_vect = NULL,
  kdeg_vect = NULL,
  ksyn_vect = NULL,
  pnew = 0.05,
  pold = 0.002,
  logkdeg_mean = -1.9,
  logkdeg_sd = 0.7,
  logksyn_mean = 2.3,
  logksyn_sd = 0.7,
  seqdepth = nfeatures * 2500,
  readlength = 200,
  Ucont_alpha = 25,
  Ucont_beta = 75,
  feature_pnew = FALSE,
  pnew_kdeg_corr = FALSE,
  logit_pnew_mean = -2.5,
  logit_pnew_sd = 0.1
)
}
\arguments{
\item{nfeatures}{Number of "features" (e.g., genes) to simulate data for}

\item{read_vect}{Vector of length = \code{nfeatures}; specifies the number of reads
to be simulated for each feature. If this is not provided, the number of reads
simulated is equal to \code{round(seqdepth * (ksyn_i/kdeg_i)/sum(ksyn/kdeg))}. In other words,
the normalized steady-state abundance of a feature is multiplied by the total number
of reads to be simulated and rounded to the nearest integer.}

\item{label_time}{Length of s^4^U feed to simulate.}

\item{sample_name}{Character vector to assign to \code{sample} column of output simulated
data table (the cB table).}

\item{feature_prefix}{Name given to the i-th feature is \code{paste0(feature_prefix, i)}. Shows up in the
\code{feature} column of the output simulated data table.}

\item{fn_vect}{Vector of length = \code{nfeatures}; specifies the fraction new to use for each
feature's simulation. If this is not provided and \code{kdeg_vect} is, then \code{fn_vect = 1 - exp(-kdeg_vect*label_time)}.
If both \code{fn_vect} and \code{kdeg_vect} are not provided, then kdegs are simulated from a joint distribution as
described below and converted to a \code{fn_vect} as when \code{kdeg_vect} is user-provided.}

\item{kdeg_vect}{Vector of length = \code{nfeatures}; specifies the degradation rate constant to use for each
feature's simulation. If this is not provided and \code{fn_vect} is, then \code{kdeg_vect = -log(1 - fn_vect)/label_time}.
If both \code{kdeg_vect} and \code{fn_vect} are not provided, each feature's \code{kdeg_vect} value is drawn from a log-normal distrubition
with meanlog = \code{logkdeg_mean} and sdlog = \code{logkdeg_sd}. \code{kdeg_vect} is actually only simulated in the case
where \code{read_vect} is also not provided, as it will be used to simulate read counts as described above.}

\item{ksyn_vect}{Vector of length = \code{nfeatures}; specifies the synthesis rate constant to use for each
feature's simulation. If this is not provided, and \code{read_vect} is also not provided, then each
feature's \code{ksyn_vect} value is drawn from a log-normal distribution with meanlog = \code{logksyn_mean} and
sdlog = \code{logksyn_sd}. ksyn's do not need to be simulated if \code{read_vect} is provided, as they only
influence read counts.}

\item{pnew}{Probability that a T is mutated to a C if a read is new.}

\item{pold}{Probability that a T is mutated to a C if a read is old.}

\item{logkdeg_mean}{If necessary, meanlog of a log-normal distribution from which
kdegs are simulated}

\item{logkdeg_sd}{If necessary, sdlog of a log-normal distribution from which
kdegs are simulated}

\item{logksyn_mean}{If necessary, meanlog of a log-normal distribution from which
ksyns are simulated}

\item{logksyn_sd}{If necessary, sdlog of a log-normal distribution from which
ksyns are simulated}

\item{seqdepth}{Only relevant if \code{read_vect} is not provided; in that case, this is
the total number of reads to simulate.}

\item{readlength}{Length of simulated reads. In this simple simulation, all reads
are simulated as being exactly this length.}

\item{Ucont_alpha}{Probability that a nucleotide in a simulated read from a given feature
is a U is drawn from a beta distribution with shape1 = \code{Ucont_alpha}.}

\item{Ucont_beta}{Probability that a nucleotide in a simulated read from a given feature
is a U is drawn from a beta distribution with shape2 = \code{Ucont_beta}.}

\item{feature_pnew}{Boolean; if TRUE, simulate a different pnew for each feature}

\item{pnew_kdeg_corr}{Boolean; only relevant if \code{feature_pnew} is TRUE. If so, then
setting \code{pnew_kdeg_corr} to TRUE will ensure that higher kdeg transcripts have a higher
pnew.}

\item{logit_pnew_mean}{If \code{feature_pnew} is TRUE, then the logit(pnew) for each feature
will be drawn from a normal distribution with this mean.}

\item{logit_pnew_sd}{If \code{feature_pnew} is TRUE, then the logit(pnew) for each feature
will be drawn from a normal distribution with this standard deviation.}
}
\value{
List with two elements:
\itemize{
\item cB: Tibble that can be passed as the \code{cB} arg to \code{EZbakRData()}.
\item ground_truth: Tibble containing simulated ground truth.
}
}
\description{
In \code{SimulateOneRep}, users have the option to either provide vectors of feature-specific
read counts, fraction news, kdegs, and ksyns for the simulation, or to have those drawn
from relevant distributions whose properties can be tuned by the various optional
parameters of \code{SimulateOneRep}. The number of mutable nucleotides (nT) in
a read is drawn from a binomial distribution with \code{readlength} trials and a probability
of "success" equal to \code{Ucont}. A read's status as new or old is drawn from a Bernoulli
distribution with probability of "success" equal to the feature's fraction new. If a read
is new, the number of mutations in the read is drawn from a binomial distribution with
probability of mutation equal to pnew. If a read is old, the number of mutations is instead
drawn from a binomial distribution with probability of mutation equal to pold.
}
\examples{
simdata <- SimulateOneRep(30)
}
