`library(scrutiny)`

Large parts of scrutiny are about rounding. This topic is more
diverse and variegated than one might think. “Rounding” is often taken
to simply mean rounding up from 5, and although that’s wrong, it doesn’t
make much of a difference most of the time. Where it does matter is in
*reconstructing* the rounding procedures that others used, claim
to have used, or are alleged to have used. In short, rounding and all
its details matter for error detection.

This vignette covers scrutiny’s infrastructure for reconstructing rounding procedures and results from summary statistics. Doing so is essential for the package: It is a precondition for translating assumptions about the processes behind published summary data into a few simple, higher-level function calls. Some of the functions presented here might be useful beyond the package, as well. Feel free to skip the more theoretical parts if your focus is on the code.

Base R’s `round()`

function is surprisingly sophisticated,
which distinguishes it from the very simple ways in which decimal
numbers are normally rounded — most of the time, rounding up from 5. For
this very reason, however, it can’t be used to reconstruct the rounding
procedures of other software programs. This is the job of scrutiny’s
rounding functions.

First, I will present `reround()`

, a general interface to
reconstructing rounded numbers, before going through the individual
rounding functions. I will add some comments on these.

I will also discuss `unround()`

, which works the reverse
way: It takes a rounded number and reconstructs the bounds of the
original number, taking details about the assumed rounding procedure
into account.

Finally, I will take a closer look at bias from rounding raw numbers.

`reround()`

None of the error detection techniques in scrutiny calls the
individual rounding functions directly. Instead, all of them call
`reround()`

, which mediates between these two levels.
`reround()`

takes the vector of “raw” reconstructed numbers
that were not yet rounded in the way that’s assumed to have been the
original rounding procedure. Its next argument is `digits`

,
the number of decimal places to round to.

The remaining three arguments are about the rounding procedure. Most
of the time, only `rounding`

will be of any interest. It
takes a string with the name of one of the rounding procedures discussed
below.

Here is an example for a `reround()`

call:

```
reround(x = c(5.812, 7.249), digits = 2, rounding = "up")
#> [1] 5.81 7.25
```

The two remaining arguments are mostly forgettable: They only concern
obscure cases of rounding with a threshold other than 5
(`threshold`

) and rounding such that the absolute values of
positive and negative numbers are the same (`symmetric`

).
Ignore them otherwise.

`round_up()`

does what most people think of as rounding.
If the decimal portion to be cut off by rounding is 5 or greater, it
rounds up. Otherwise, it rounds down. SAS, SPSS, Stata, Matlab, and
Excel use this procedure.

```
round_up(x = 1.24, digits = 1)
#> [1] 1.2
round_up(x = 1.25, digits = 1)
#> [1] 1.3
round_up(x = 1.25) # default for `digits` is 0
#> [1] 1
```

Rounding up from 5 is actually a special case of
`round_up_from()`

, which can take any numeric threshold, not
just 5:

```
round_up_from(x = 4.28, digits = 1, threshold = 9)
#> [1] 4.2
round_up_from(x = 4.28, digits = 1, threshold = 1)
#> [1] 4.3
```

These two functions have their mirror images in
`round_down()`

and `round_down_from()`

. The
arguments are the same as in `round_up()`

:

```
round_down(x = 1.24, digits = 1)
#> [1] 1.2
round_down(x = 1.25, digits = 1)
#> [1] 1.2
round_down(x = 1.25) # default for `digits` is 0
#> [1] 1
```

`round_down_from()`

, then, is just the reverse of
`round_up_from()`

:

```
round_down_from(x = 4.28, digits = 1, threshold = 9)
#> [1] 4.3
round_down_from(x = 4.28, digits = 1, threshold = 1)
#> [1] 4.2
```

Like Python’s `round()`

function, R’s `base::round()`

doesn’t round up or down, or
use any other procedure based solely on the truncated part of the
number. Instead, `round()`

strives to round to the next even
number. This is also called “banker’s rounding”, and it follows a
technical standard, IEEE
754.

Realizing that `round()`

works in a highly unintuitive way
sometimes leads to consternation. Why can’t we just round like we
learned in school, that is, up from 5? The reason seems to be bias.
Because 5 is right in between two whole numbers, any procedure that
rounds 5 in some predetermined direction introduces a bias toward that
direction. Rounding up from 5 is therefore biased upward, and rounding
down from 5 is biased downward.

As shown in the *Rounding bias* section below, this is
unlikely to be a major issue when rounding raw numbers that originally
have many decimal places. It might be more serious, however, if the
initial number of decimal places is low (for whatever reason) and the
need for precision is high.

At least in theory, “rounding to even” is not biased in either
direction, and it preserves the mean of the original distribution. That
is how `round()`

aims to operate. Here is a case in which it
works out, whereas the bias of rounding up or down is fully
apparent:

```
<- seq(from = 0.5, to = 9.5)
vec1 <- round_up(vec1)
up1 <- round_down(vec1)
down1 <- round(vec1)
even1
vec1#> [1] 0.5 1.5 2.5 3.5 4.5 5.5 6.5 7.5 8.5 9.5
up1#> [1] 1 2 3 4 5 6 7 8 9 10
down1#> [1] 0 1 2 3 4 5 6 7 8 9
even1#> [1] 0 2 2 4 4 6 6 8 8 10
# Original mean
mean(vec1)
#> [1] 5
# Means when rounding up or down: bias!
mean(up1)
#> [1] 5.5
mean(down1)
#> [1] 4.5
# Mean when rounding to even: no bias
mean(even1)
#> [1] 5
```

However, this noble goal of unbiased rounding runs up against the
reality of floating point arithmetic. You might therefore get results
from `round()`

that first seem bizarre, or at least
unpredictable. Consider:

```
<- seq(from = 4.5, to = 10.5)
vec2
<- round_up(vec2)
up2 <- round_down(vec2)
down2 <- round(vec2)
even2
vec1#> [1] 0.5 1.5 2.5 3.5 4.5 5.5 6.5 7.5 8.5 9.5
up2#> [1] 5 6 7 8 9 10 11
down2#> [1] 4 5 6 7 8 9 10
# No symmetry here...
even2 #> [1] 4 6 6 8 8 10 10
mean(vec2)
#> [1] 7.5
mean(up2)
#> [1] 8
mean(down2)
#> [1] 7
mean(even2) # ... and the mean is slightly biased downward!
#> [1] 7.428571
<- c(
vec3 1.05, 1.15, 1.25, 1.35, 1.45,
1.55, 1.65, 1.75, 1.85, 1.95
)
# No bias here, though:
round(vec3, 1)
#> [1] 1.0 1.1 1.2 1.4 1.4 1.6 1.6 1.8 1.9 2.0
mean(vec3)
#> [1] 1.5
mean(round(vec3, 1))
#> [1] 1.5
```

Sometimes `round()`

behaves just as it should, but at
other times, results can be hard to explain. Martin Mächler, who wrote
the present version of `round()`

, describes the issue about
as follows:

The reason for the above behavior is that most decimal fractions can’t, in fact, be represented as double precision numbers. Even seemingly “clean” numbers with only a few decimal places come with a long invisible mantissa, and are therefore closer to one side or the other.

We usually think that rounding rules are all about breaking a tie
that occurs at 5. Most floating-point numbers, however, are just
somewhat less than or greater than 5. There is no tie! Consequently,
Mächler says, rounding functions need to “*measure, not guess*
which of the two possible decimals is closer to `x`

” — and
therefore, which way to round.

This seems better than going with mathematical intuitions that may not always correspond to the way computers actually deal with these issues. R has been using the present solution since version 4.0.0.

`base::round()`

can seem like a black box, but it seems
unbiased in the long run. I recommend using `round()`

for
original work, even though it is quite different from other rounding
procedures — and therefore unsuitable for reconstructing them. Instead,
we need something like scrutiny’s `round_*()`

functions.

`unround()`

Rounding leads to a loss of information. The mantissa is cut off in part or in full, and the resulting number is underdetermined with respect to the original number: The latter can’t be inferred from the former. It might be of interest, however, to compute the range of the original number given the rounded number (especially the number of decimal places to which it was rounded) and the presumed rounding method.

While it’s often easy to infer such a range, we better have the
computer do it. Enter `unround()`

. It returns the lower and
upper bounds, and it says whether these bounds are inclusive or not —
something that varies greatly by rounding procedure. Currently,
`unround()`

is used as a helper within scrutiny’s DEBIT
implementation; see `vignette("debit")`

.

The default rounding procedure for `unround()`

is
`"up_or_down"`

:

```
unround(x = "8.0")
#> # A tibble: 1 × 7
#> range rounding lower incl_lower x incl_upper upper
#> <chr> <chr> <dbl> <lgl> <chr> <lgl> <dbl>
#> 1 7.95 <= x(8.0) <= 8.05 up_or_down 7.95 TRUE 8.0 TRUE 8.05
```

For a complete list of featured rounding procedures, see
documentation for `unround()`

, section *Rounding*.

On the left, the `range`

column displays a pithy graphical
overview of the other columns (except for `rounding`

) in the
same order:

`lower`

is the lower bound for the original number.`incl_lower`

is`TRUE`

if the lower bound is inclusive and`FALSE`

otherwise.`x`

is the input value.`incl_upper`

is`TRUE`

if the upper bound is inclusive and`FALSE`

otherwise.`upper`

is the upper bound for the original number.

By default, decimal places are counted internally so that the
function always operates on the appropriate decimal level. This creates
a need to take trailing zeros into account, which is why `x`

needs to be a string:

```
unround(x = "3.50", rounding = "up")
#> # A tibble: 1 × 7
#> range rounding lower incl_lower x incl_upper upper
#> <chr> <chr> <dbl> <lgl> <chr> <lgl> <dbl>
#> 1 3.495 <= x(3.50) < 3.505 up 3.50 TRUE 3.50 FALSE 3.50
```

Alternatively, a function that uses `unround()`

as a
helper might count decimal places by itself (i.e., by internally calling
`decimal_places()`

). It should then pass these numbers to
`unround()`

via the `decimals`

argument instead of
letting it redundantly count decimal places a second time.

In this case, `x`

can be numeric because trailing zeros
are no longer needed. (That, in turn, is because the responsibility to
count decimal places in number-strings rather than numeric values shifts
from `unround()`

to the higher-level function.)

The following call returns the exact same tibble as above:

```
unround(x = 3.5, digits = 2, rounding = "up")
#> # A tibble: 1 × 7
#> range rounding lower incl_lower x incl_upper upper
#> <chr> <chr> <dbl> <lgl> <dbl> <lgl> <dbl>
#> 1 3.495 <= x(3.5) < 3.505 up 3.50 TRUE 3.5 FALSE 3.50
```

Since `x`

is vectorized, you might test several reported
numbers at once:

```
<- c(2, 3.1, 3.5) %>%
vec2 restore_zeros()
# `restore_zeros()` returns "2.0" for 2
vec2 #> [1] "2.0" "3.1" "3.5"
%>%
vec2 unround(rounding = "even")
#> # A tibble: 3 × 7
#> range rounding lower incl_lower x incl_upper upper
#> <chr> <chr> <dbl> <lgl> <chr> <lgl> <dbl>
#> 1 1.95 < x(2.0) < 2.05 even 1.95 FALSE 2.0 FALSE 2.05
#> 2 3.05 < x(3.1) < 3.15 even 3.05 FALSE 3.1 FALSE 3.15
#> 3 3.45 < x(3.5) < 3.55 even 3.45 FALSE 3.5 FALSE 3.55
```

What if you want to round numbers to a fraction instead of an
integer? Check out `reround_to_fraction()`

and
`reround_to_fraction_level()`

:

```
reround_to_fraction(x = 0.4, denominator = 2, rounding = "up")
#> [1] 0.5
```

This function rounds `0.4`

to `0.5`

because
that’s the closest fraction of `2`

. It is inspired by
`janitor::round_to_fraction()`

, and credit for the core
implementation goes there. `reround_to_fraction()`

blends
janitor’s fractional rounding with the flexibility and precision that
`reround()`

provides.

What’s more, `reround_to_fraction_level()`

rounds to the
nearest fraction at the decimal level specified via its
`digits`

argument:

```
reround_to_fraction_level(
x = 0.777, denominator = 5, digits = 0, rounding = "down"
)#> [1] 0.8
reround_to_fraction_level(
x = 0.777, denominator = 5, digits = 1, rounding = "down"
)#> [1] 0.78
reround_to_fraction_level(
x = 0.777, denominator = 5, digits = 2, rounding = "down"
)#> [1] 0.776
```

These two functions are not currently part of any error detection workflow.

I wrote above that rounding up or down from 5 is biased. However,
this points to a wider problem: It is true of any rounding procedure
that doesn’t take active precautions against such bias.
`base::round()`

does, and that is why I recommend it for
original work (as opposed to reconstruction).

It might be useful to have a general and flexible way to quantify how
far rounding biases a distribution, as compared to how it looked like
before rounding. The function `rounding_bias()`

fulfills this
role. It is a wrapper around `reround()`

, so it can access
any rounding procedure that `reround()`

can, and takes all of
the same arguments. However, the default for `rounding`

is
`"up"`

instead of `"up_or_down"`

because
`rounding_bias()`

only makes sense with single rounding
procedures.

In general, bias due to rounding is computed by subtracting the original distribution from the rounded one:

\[ bias = x_{rounded} - x \]

By default, the mean is computed to reduce the bias to a single data point:

```
<- seq(from = 0.6, to = 0.7, by = 0.01)
vec3
vec3#> [1] 0.60 0.61 0.62 0.63 0.64 0.65 0.66 0.67 0.68 0.69 0.70
# The mean before rounding...
mean(vec3)
#> [1] 0.65
# ...is not the same as afterwards...
mean(round_up(vec3))
#> [1] 1
# ...and the difference is bias:
rounding_bias(x = vec3, digits = 0, rounding = "up")
#> [1] 0.35
```

Set `mean`

to `FALSE`

to return the whole
vector of individual biases instead:

```
rounding_bias(x = vec3, digits = 0, rounding = "up", mean = FALSE)
#> [1] 0.40 0.39 0.38 0.37 0.36 0.35 0.34 0.33 0.32 0.31 0.30
```

Admittedly, this example is somewhat overdramatic. Here is a rather harmless one:

```
<- rnorm(50000, 100, 15)
vec4
rounding_bias(vec4, digits = 2)
#> [1] -7.297672e-06
```

What is responsible for such a difference? It seems to be (1) the sample size and (2) the number of decimal places to which the vector is rounded. The rounding method doesn’t appear to matter if numbers with many decimal places are rounded:

```
#> # A tibble: 10 × 3
#> bias decimal_digits rounding
#> <dbl> <chr> <chr>
#> 1 0.00000810 1 up up
#> 2 0.00000730 2 up up
#> 3 0.00000126 3 up up
#> 4 0.00000000767 4 up up
#> 5 0.0000000141 5 up up
#> 6 0.00000810 1 even even
#> 7 0.00000730 2 even even
#> 8 0.00000126 3 even even
#> 9 0.00000000767 4 even even
#> 10 0.0000000141 5 even even
```

However, if the raw values are preliminarily rounded to 2 decimal places before rounding proceeds as above, the picture is different:

```
#> # A tibble: 10 × 3
#> bias decimal_digits rounding
#> <dbl> <chr> <chr>
#> 1 0.00487 1 up up
#> 2 0 2 up up
#> 3 0 3 up up
#> 4 0 4 up up
#> 5 0 5 up up
#> 6 0.0000408 1 even even
#> 7 0 2 even even
#> 8 0 3 even even
#> 9 0 4 even even
#> 10 0 5 even even
```

In sum, the function allows users to quantify the degree to which rounding biases a distribution, so that they can assess the relative merits of different rounding procedures. This is partly to sensitize readers to potential bias in edge cases, but also to enable them to make informed rounding decisions on their own.