Welcome to the ‘Get started’ vignette of the `jfa`

package. This vignette provides a simple explanation of the functions in
the package and how they facilitate the statistical audit sampling
workflow. See the other vignettes for a more detailed explanation of the
functionality of the package.

To concretely illustrate `jfa`

‘s functionality, we
consider the `BuildIt`

data set that is included in the
package (for more info, see `?BuildIt`

). This data set
contains a population of 3500 invoices paid to the fictional ’BuildIt’
construction company. Each invoice has an identification number
(`ID`

), a recorded value (`bookValue`

), and a
corresponding audit (true) value (`auditValue`

).

**Note:** The information in the `auditValue`

column is added for illustrative purposes since it will only be known to
the auditor after having inspected a sample of invoices.

First, we load the `jfa`

package and the
`BuildIt`

data set. The first 10 invoices from the data set
are displayed below.

```
library(jfa)
data('BuildIt')
head(BuildIt, n = 10)
```

```
## ID bookValue auditValue
## 1 82884 242.61 242.61
## 2 25064 642.99 642.99
## 3 81235 628.53 628.53
## 4 71769 431.87 431.87
## 5 55080 620.88 620.88
## 6 93224 501.76 501.76
## 7 24331 466.01 466.01
## 8 81460 295.20 295.20
## 9 14608 216.48 216.48
## 10 79064 243.43 243.43
```

For a fully illustrated walkthrough of `jfa`

’s workflow
functionality using the `BuildIt`

data set, see Workflow:
Classical audit sampling. For a Bayesian version of the illustrated
walkthrough, see Workflow:
Bayesian audit sampling.

`auditPrior()`

: The basicsThe `auditPrior()`

function can be used to create a prior
distribution for the misstatement parameter in a statistical audit
sampling model. In an audit sampling context, an advantage of Bayesian
inference is that the prior distribution can be used to incorporate
existing information into the statistical procedure. Incorporating
existing information can potentially yield a decrease in sample size and
an increase in efficiency. The type of audit information that can be
incorporated depends on the information that is available in the context
of the audit. See the vignette Planning:
Prior distributions or the accompanying article
for a detailed explanation of the types of audit information that
`jfa`

is able to incorporate into a prior distribution.

With the prior distribution in hand, Bayesian audit sampling can be
performed by providing the object returned by the
`auditPrior()`

function as input for the `prior`

argument in subsequent calls to the `planning()`

and
`evaluation()`

functions.

`planning()`

: The basicsPlanning a minimum sample size requires knowledge of the conditions that lead to acceptance of the population (i.e., the sampling objectives). Generally, a sampling objective can be one (or both) of the following:

**Hypothesis testing**: Obtain measures of evidence for the claim that the misstatement in the population is lower than a given performance materiality (i.e., the maximum tolerable misstatement).**Estimation**: Obtain measures of accuracy for the claim that the misstatement in the population is a certain value (with a minimum precision).

Next to determining the sampling objective(s), it is also important
to determine the statistical distribution linking the sample outcomes to
the population misstatement (e.g., `poisson`

,
`binomial`

, or `hypergeometric`

). All three
distributions are standard in an audit sampling context because they are
(approximations) of the hypergeometric distribution, but
`poisson`

is the default in `jfa`

because it is
the most conservative.

Lastly, it is advised to obtain knowledge of the expected (or tolerable) errors in the sample. It is strongly recommended to set the value for the expected errors in the sample conservatively to minimize the chance of the observed errors in the sample exceeding the expected errors, which would imply that insufficient work has been done in the end.

With the `BuildIt`

data set, because the booked amounts
(monetary values) of each invoice in the population are given, an
auditor may want to make a statement about the amount of misstatement in
the population. For illustrative purposes we will tolerate zero
misstatements in the sample.

First, let’s take a look at how you can use the
`planning()`

function to calculate the minimum sample size
for testing the hypothesis that the misstatement in the population is
lower than the performance materiality. In this example the performance
materiality is set to 5% of the total population value, meaning that the
population may not contain more than 5% misstatement.

**Sampling objective**: Calculate a minimum sample size
such that, when no misstatements are found in the sample, there is a 95%
chance that the misstatement in the population is lower than 5% of the
population value.

A minimum sample size for this sampling objective can be calculated
by specifying the `materiality`

parameter in the
`planning()`

function, see the code below. Next, a summary of
the statistical results can be obtained using the `summary()`

function. The results show that, given zero tolerable errors, the
minimum sample size is 60 units.

```
<- planning(materiality = 0.05, expected = 0, likelihood = 'poisson', conf.level = 0.95)
stage1 summary(stage1)
```

```
##
## Classical Audit Sample Planning Summary
##
## Options:
## Confidence level: 0.95
## Materiality: 0.05
## Hypotheses: H₀: Θ >= 0.05 vs. H₁: Θ < 0.05
## Expected: 0
## Likelihood: poisson
##
## Results:
## Minimum sample size: 60
## Tolerable errors: 0
## Expected most likely error: 0
## Expected upper bound: 0.049929
## Expected precision: 0.049929
## Expected p-value: < 2.22e-16
```

Next, let’s take a look at how you can use the
`planning()`

function to calculate the minimum sample size
for estimating the misstatement in the population with a minimum
precision. The precision is defined as the difference between the most
likely misstatement and the upper confidence bound on the misstatement.
For this example, the minimum precision is set to 2% of the population
value.

**Sampling objective**: Calculate a minimum sample size
such that, when zero misstatements are found in the sample, there is a
95% chance that the misstatement in the population is at most 2% above
the most likely misstatement.

A minimum sample size for this sampling objective can be calculated
by specifying the `min.precision`

parameter in the
`planning()`

function, see the code below. The results show
that, given zero tolerable errors, the minimum sample size is 150
units.

```
<- planning(min.precision = 0.02, expected = 0, likelihood = 'poisson', conf.level = 0.95)
stage1 summary(stage1)
```

```
##
## Classical Audit Sample Planning Summary
##
## Options:
## Confidence level: 0.95
## Min. precision: 0.02
## Expected: 0
## Likelihood: poisson
##
## Results:
## Minimum sample size: 150
## Tolerable errors: 0
## Expected most likely error: 0
## Expected upper bound: 0.019971
## Expected precision: 0.019971
```

`selection()`

: The basicsSelecting a sample using the `selection()`

function
requires knowledge of units in the population that are eligible for
selection (i.e., sampling units). Sampling units can be items or
monetary units. Items can be selected from the population using record
sampling (also known as attribute sampling or item sampling) with
`units = 'items'`

. On the other hand, monetary units can be
selected from the population using monetary unit sampling (MUS) with
`units = 'values'`

.

Once the sampling units are determined it should also be determined
what method is used to select the units (i.e., the selection method).
Sampling units can be selected with a fixed interval sampling (also
known as systematic sampling) scheme using
`method = 'interval'`

(the default), with a cell sampling
scheme using `method = 'cell'`

, using random sampling using
`method = 'random'`

, or using modified sieve sampling with
`method = 'sieve'`

. See the vignette Selection:
Sampling methodology for a more detailed explanation the selection
methods implemented in `jfa`

.

First, let’s take a look at how the `selection()`

function
can be used to perform random record sampling. Random record sampling
implies that the sampling units are set to `items`

and the
selection method is set to `random`

. The code below selects
the 60 planned invoices from the `BuildIt`

data set using
such a random record sampling scheme.

```
set.seed(1)
<- selection(data = BuildIt, size = 60, units = 'items', method = 'random')
stage2 summary(stage2)
```

```
##
## Audit Sample Selection Summary
##
## Options:
## Requested sample size: 60
## Sampling units: items
## Method: random sampling
##
## Data:
## Population size: 3500
##
## Results:
## Selected sampling units: 60
## Selected items: 60
## Proportion of size: 0.017143
```

Next, let’s take a look at how the `selection()`

function
can be used to perform fixed interval monetary unit sampling. Fixed
interval monetary unit sampling implies that the sampling units are set
to `values`

and the selection method is set to
`interval`

. The code below selects 150 monetary units from
the `BuildIt`

data set using such a fixed interval monetary
unit sampling scheme.

```
<- selection(data = BuildIt, size = 150, units = 'values', method = 'interval', values = 'bookValue')
stage2 summary(stage2)
```

```
##
## Audit Sample Selection Summary
##
## Options:
## Requested sample size: 150
## Sampling units: monetary units
## Method: fixed interval sampling
## Starting point: 1
##
## Data:
## Population size: 3500
## Population value: 1403221
## Selection interval: 9354.8
##
## Results:
## Selected sampling units: 150
## Proportion of value: 0.0001069
## Selected items: 150
## Proportion of size: 0.042857
```

The selected units and corresponding items are stored in the object
that is returned by the `selection()`

function. The sample
can be extracted from this object by indexing it via
`$sample`

, see the code below. After this step it is up to
the auditor to annotate the sample.

```
set.seed(1)
<- selection(data = BuildIt, size = 60, units = 'items', method = 'random')
stage2
<- stage2$sample
sample head(sample, n = 10)
```

```
## row times ID bookValue auditValue
## 1 1017 1 50755 618.24 618.24
## 2 679 1 20237 669.75 669.75
## 3 2177 1 9517 454.02 454.02
## 4 930 1 85674 257.82 257.82
## 5 1533 1 31051 308.53 308.53
## 6 471 1 84375 824.66 824.66
## 7 2347 1 75616 623.70 623.70
## 8 270 1 82033 352.75 352.75
## 9 1211 1 12877 52.89 52.89
## 10 3379 1 85322 330.24 330.24
```

`evaluation()`

: The basicsAfter annotating the items in the sample with their audit values you
can perform statistical inference about the misstatement in the
population with the `evaluation()`

function. Next to a data
sample as input, this function can also be used when only summary
statistics from a data sample (e.g., sample size and number of errors)
are available. For a more elaborate explanation of the output of this
function for each sampling objective, see the package vignettes Evaluation:
Testing misstatement and Evaluation:
Estimating misstatement.

First, let’s take a look at how the `evaluation()`

function can be combined with summary statistics from a sample. Suppose
that in the previously selected sample of 60 invoices it is found that a
single invoice is missing an autograph. These summary statistics can be
provided to the `evaluation()`

function with
`x = 1`

and `n = 60`

. The function also requires
that you specify the sampling objectives using the
`materiality`

or `min.precision`

arguments. Again,
a performance materiality of 5% again applies.

```
<- evaluation(materiality = 0.05, method = 'poisson', conf.level = 0.95, x = 1, n = 60)
stage4 summary(stage4)
```

```
##
## Classical Audit Sample Evaluation Summary
##
## Options:
## Confidence level: 0.95
## Materiality: 0.05
## Materiality: 0.05
## Hypotheses: H₀: Θ >= 0.05 vs. H₁: Θ < 0.05
## Method: poisson
##
## Data:
## Sample size: 60
## Number of errors: 1
## Sum of taints: 1
##
## Results:
## Most likely error: 0.016667
## 95 percent confidence interval: [0, 0.079064]
## Precision: 0.062398
## p-value: 0.19915
```

The results indicate that the most likely error in the population is
1.66%. Moreover, the 95% one-sided confidence interval for the
population misstatement ranges from 0% to 7.9% and contains the
performance materiality. This implies that we cannot reject the
hypothesis that the population misstatement is lower than 5%, which is
also indicated by a non-significant *p* value (*p* =
0.199).

Next, let’s take a look at how the `evaluation()`

function
can be combined with a data sample. Returning to our annotated sample
from the `selection()`

function, suppose that in the
previously selected sample of 60 invoices it is found that a single
invoice has a true value that deviates from its booked value.

```
$auditValue <- sample$bookValue
sample$auditValue[1] <- sample$auditValue[1] - 100 sample
```

These data can be provided to the `evaluation()`

function
using the `data`

, `values`

,
`values.audit`

, and `times`

arguments. The
`method`

argument determines the method of inference. For
example, the code below evaluates the misstatement in the population
using the commonly used Stringer bound. You can find more information
about which evaluation methods are implemented on the home page.

```
<- evaluation(materiality = 0.05, method = 'stringer', conf.level = 0.95,
stage4 data = sample, values = 'bookValue', values.audit = 'auditValue',
times = 'times')
summary(stage4)
```

```
##
## Classical Audit Sample Evaluation Summary
##
## Options:
## Confidence level: 0.95
## Materiality: 0.05
## Method: stringer
##
## Data:
## Sample size: 60
## Number of errors: 1
## Sum of taints: 0.1617495
##
## Results:
## Most likely error: 0.0026958
## 95 percent confidence interval: [0, 0.053222]
## Precision: 0.050526
```

The results indicate that the most likely error in the population is
1%. Moreover, the 95% one-sided confidence interval for the population
misstatement ranges from 0% to 6.5% and contains the performance
materiality. The `stringer`

method does not provide a
*p* value for hypothesis testing.

`report()`

: The basicsWith the results from the `evaluation()`

function in hand,
a call to the `report()`

function automatically generates a
report containing the data, the statistical results and their
interpretation, and the conclusion of the sampling procedure with
respect to the sampling objectives. The object returned by the
`evaluation()`

function can be supplied directly to the
`report()`

function, see the code below.

```
<- evaluation(materiality = 0.05, method = 'stringer', conf.level = 0.95,
stage4 data = sample, values = 'bookValue', values.audit = 'auditValue',
times = 'times')
report(stage4, file = 'report.html', format = 'html_document') # Generates .html report
```