The pricelevels-package provides index number methods for price comparisons within or between countries. As price comparisons over time usually rely on the same index number methods, the package has been denoted more generally as pricelevels, though its primary focus is on spatial price comparisons. Currently, the following index number methods (or price indices) are implemented.

Moreover, the package offers functions for sampling and characterizing price data. This vignette highlights the main package features. It contains four sections. The first is on price data, the second on bilateral indices, and the third on multilateral indices. The last section section is on some further package features.

Price data

Data for spatial price comparisons usually have three dimensions:

the region $(r=1,\ldots,R)$, which can be a country, city, or any other regional entity,
the basic heading $(b=1,\ldots,B)$, which is a group of similar products for which expenditure weights, $w_{b}$ or $w_{b}^{r}$, are available,
and the individual product $(n=1,\ldots,N_b)$, for which prices $p_{bn}^r$ are recorded.

For transaction and supermarket scanner data, purchased quantities $q_{bn}^r$ of the individual products are often available.

Having such detailed data can pose some additional challenges. Not every individual product might be available in each region. Hence, there are gaps in the data. In the worst case these gaps can result in non-connected price data (World Bank 2013, 98), where price level comparisons of all regions are no longer possible.

Characteristics

The pricelevels-package offers various functions to characterize price data and to deal with the issue of data gaps. As a first step, it’s good to check the sparsity (share of gaps) of a dataset and if the dataset is connected.

Connected price data

Suppose we have prices for 5 products (all belonging to the same basic heading) in 4 regions. These data are stored in the data.table dt. Transforming dt into a price matrix (rows: products; columns: regions) shows that the data exhibit gaps as products 4 and 5 are not priced in regions c and d, respectively.

# example price data with gaps:
dt <- data.table(
  "r"=c(rep(c("a","b"), each=5), rep(c("c","d"), each=3)),
  "n"=as.character(c(rep(1:5, 2), rep(1:3, 2))),
  "p"=round(runif(16, min=10, max=20), 1))

# price matrix:
dt[, tapply(X=p, INDEX=list(n,r), FUN=mean)]
#>      a    b    c    d
#> 1 15.1 17.9 18.3 15.0
#> 2 13.7 19.6 18.1 17.5
#> 3 16.6 15.6 14.2 16.8
#> 4 12.2 18.3   NA   NA
#> 5 14.2 17.5   NA   NA

Due to the gaps, the sparsity() of this dataset is greater 0. As there are only a few gaps, however, function is.connected() indicates that the dataset is connected. The function comparisons() provides some further details. It shows that all four regions form one ‘block of connected regions’ since they are connected via direct or indirect links. Moreover, the function output highlights that each region can be directly compared with the other regions using at least one price ratio.

dt[, sparsity(r=r, n=n)] # sparsity
#> [1] 0.2

dt[, is.connected(r=r, n=n)] # is connected
#> [1] TRUE

dt[, comparisons(r=r, n=n)] # one group of regions
#> Key: <group_id>
#>    group_id group_members group_size total direct indirect n_obs
#>       <int>        <char>      <int> <int>  <int>    <int> <int>
#> 1:        1       a;b;c;d          4     6      0        6    16

Non-connected price data

Let’s assume now that prices for products 1, 2, and 3 are no longer available in regions a and b (for example because these products are not sold there anymore). This situation is captured in data.table dt2. In total, the price data still include 4 regions and 5 products. However, the price matrix reveals that the pattern of gaps changed.

# non-connected example price data:
dt2 <- dt[!(n%in%1:3 & r%in%c("a","b")),]

# price matrix:
dt2[, tapply(X=p, INDEX=list(n,r), FUN=mean)]
#>      a    b    c    d
#> 1   NA   NA 18.3 15.0
#> 2   NA   NA 18.1 17.5
#> 3   NA   NA 14.2 16.8
#> 4 12.2 18.3   NA   NA
#> 5 14.2 17.5   NA   NA

The sparsity increased to 0.5. Even worse, the dataset is no longer connected. This can be easily seen from the price matrix, where regions a and b form one block of connected regions, and regions c and d another but separate block. This is also highlighted in the output of comparisons(), which now has two rows.

dt2[, sparsity(r=r, n=n)] # sparsity
#> [1] 0.5

dt2[, is.connected(r=r, n=n)] # is not connected
#> [1] FALSE

dt2[, comparisons(r=r, n=n)] # two groups of regions
#> Key: <group_id>
#>    group_id group_members group_size total direct indirect n_obs
#>       <int>        <char>      <int> <int>  <int>    <int> <int>
#> 1:        1           a;b          2     1      0        1     4
#> 2:        2           c;d          2     1      0        1     6

The fact that the dataset dt2 is not connected does not mean that no price comparison is possible. It only means that a price comparison of all regions cannot be done. The function output of comparisons() shows that separate price comparisons within each block of regions could be done. However, it is clear that a comparison of the price levels between the two blocks is still not possible. To follow this approach, the regions in dt2 can be classified into blocks of connected regions using the function neighbors(). Otherwise, if one wants to carry out the price comparison only for the biggest block of connected regions, dt2 can be connected using function connect(), as shown below.

# group regions into groups of connected regions:
dt2[, "block" := neighbors(r=r, n=n, simplify=TRUE)]
dt2[, is.connected(r=r, n=n), by="block"]
#>     block     V1
#>    <fctr> <lgcl>
#> 1:      1   TRUE
#> 2:      2   TRUE

# subset to biggest group of connected regions:
dt2[connect(r=r, n=n), is.connected(r=r, n=n)]
#> [1] TRUE

Both approaches allow to perform a price comparison on a subset of the data by excluding some regions.

Simulation

The quality of price data is of utmost importance for the reliability of a regional price comparison. To assess the impact of gaps on the reliability of price index numbers, it is helpful to control the amount of gaps in the price data. At the same time, however, it must be ensured that the price data stay connected. A simplistic approach of simulating price data would not guarantee this. Therefore, the pricelevels-package offers some functions for data sampling.

In the following, we want to simulate random price data for 5 products within the same basic heading in 4 regions. This can be easily achieved if there are no data gaps. First, products and regions are sampled. Second, random prices are added.

R <- 4 # number of regions
N <- 5 # number of products

# set frame:
dt <- data.table("r"=as.character(rep(1:R, each=N)),
                 "n"=as.character(rep(1:N, times=R)))

# add random prices:
set.seed(1)
dt[, "p":=round(x=rnorm(n=.N, mean=as.integer(n), sd=0.1), digits=1)]

# price matrix:
dt[, tapply(X=p, INDEX=list(n,r), FUN=mean)]
#>     1   2   3   4
#> 1 0.9 0.9 1.2 1.0
#> 2 2.0 2.0 2.0 2.0
#> 3 2.9 3.1 2.9 3.1
#> 4 4.2 4.1 3.8 4.1
#> 5 5.0 5.0 5.1 5.1

Data gaps

It is clear that the price data dt are connected. However, to get more realistic data, we have to introduce gaps. This can be done randomly as well. If there are only few gaps, the likelihood to end up with non-connected price data is relatively low. However, with more gaps, it can happen that the resulting price data are no longer connected. To overcome this issue, the function rgaps() introduces gaps into an existing (complete) dataset while ensuring that the data stay connected. In the following, we introduce approx. 20% gaps into the price data dt.

# introduce gaps:
set.seed(1)
dt.gaps <- dt[!rgaps(r=r, n=n, amount=0.2), ]

# price matrix:
dt.gaps[, tapply(X=p, INDEX=list(n,r), FUN=mean)]
#>     1   2   3   4
#> 1  NA 0.9 1.2 1.0
#> 2 2.0 2.0 2.0 2.0
#> 3 2.9 3.1 2.9 3.1
#> 4 4.2  NA 3.8  NA
#> 5 5.0  NA 5.1 5.1

From the price matrix, we see that there is no price for product 1 in region 1. The rgaps()-function also allows to exclude specific regions and/or products from the sampling of gaps. If we want, for example, that there are generally no gaps in region 1, we can implement this as follows:

# introduce gaps but not for region 1:
set.seed(1)
dt.gaps <- dt[!rgaps(r=r, n=n, amount=0.2, exclude=data.frame(r="1", n=NA)), ]

# price matrix:
dt.gaps[, tapply(X=p, INDEX=list(n,r), FUN=mean)]
#>     1   2   3   4
#> 1 0.9 0.9 1.2  NA
#> 2 2.0 2.0 2.0 2.0
#> 3 2.9 3.1 2.9 3.1
#> 4 4.2  NA 3.8  NA
#> 5 5.0  NA 5.1 5.1

We see from the price matrix that no price is missing in region 1.

If the gaps should occur with a higher likelihood for specific regions and/or products, this can be defined as well. The prob-argument in rgaps() allows to specify for each observation the (relative) likelihood that gaps occur at this position in the data. In the following example, gaps will occur more often for products 4 and 5.

# introduce gaps with probability:
set.seed(1)
dt.gaps <- dt[!rgaps(r=r, n=n, amount=0.2, prob=as.integer(n)^2), ]

# price matrix:
dt.gaps[, tapply(X=p, INDEX=list(n,r), FUN=mean)]
#>     1   2   3   4
#> 1 0.9 0.9 1.2 1.0
#> 2 2.0 2.0 2.0 2.0
#> 3 2.9 3.1 2.9 3.1
#> 4  NA  NA 3.8 4.1
#> 5  NA 5.0 5.1  NA

Expenditure weights

The price data in dt represent the data that statistical offices usually collect as quantities or weights on individual products are not available. However, transaction data often include purchased quantities while data at the basic heading level exhibit expenditure weights as an indication of the relative importance for household consumption.

As the purchased quantities can be expected to heavily depend on the prices of products, there is no stand-alone function for the sampling of random quantities. However, random expenditure weights for basic headings can be added to some dataset using the function rweights(). These weights can be the same for each basic heading in each region, differ across basic headings but are the same for each region, or differ across basic headings and regions. This is shown below, where we sample the weights for the individual products (and not basic headings):

# constant expenditure weights:
dt.gaps[, "w1" := rweights(r=r, b=n, type=~1)]
dt.gaps[, tapply(X=w1, INDEX=list(n,r), FUN=mean)]
#>     1   2   3   4
#> 1 0.2 0.2 0.2 0.2
#> 2 0.2 0.2 0.2 0.2
#> 3 0.2 0.2 0.2 0.2
#> 4  NA  NA 0.2 0.2
#> 5  NA 0.2 0.2  NA

# weights different for basic headings but same among regions:
dt.gaps[, "w2" := rweights(r=r, b=n, type=~n)]
dt.gaps[, tapply(X=w2, INDEX=list(n,r), FUN=mean)]
#>            1          2          3          4
#> 1 0.19621397 0.19621397 0.19621397 0.19621397
#> 2 0.08129106 0.08129106 0.08129106 0.08129106
#> 3 0.50029883 0.50029883 0.50029883 0.50029883
#> 4         NA         NA 0.05927477 0.05927477
#> 5         NA 0.16292136 0.16292136         NA

# weights different for basic headings and regions:
dt.gaps[, "w3" := rweights(r=r, b=n, type=~n+r)]
dt.gaps[, tapply(X=w3, INDEX=list(n,r), FUN=mean)]
#>            1          2          3          4
#> 1 0.01418186 0.01418186 0.01418186 0.01418186
#> 2 0.40005184 0.40005184 0.40005184 0.40005184
#> 3 0.12417969 0.12417969 0.12417969 0.12417969
#> 4         NA         NA 0.33378579 0.33378579
#> 5         NA 0.12780082 0.12780082         NA

If there are no gaps in the data, the weights always add up to 1. With gaps, however, the type of weights is prioritized, meaning that constant weights remain constant even though a renormalization would have led to different weights across basic heading and regions. Therefore, the weights do not necessarily add up to 1 in each region. Of course, the weights can easily be renormalized if necessary.

# weights not necessarily add up to 1 if there are gaps:
dt.gaps[, list("w1"=sum(w1),"w2"=sum(w2),"w3"=sum(w3)), by="r"]
#>         r    w1        w2        w3
#>    <char> <num>     <num>     <num>
#> 1:      1   0.6 0.7778039 0.5384134
#> 2:      2   0.8 0.9407252 0.6662142
#> 3:      3   1.0 1.0000000 1.0000000
#> 4:      4   0.8 0.8370786 0.8721992

Price data

The previous steps to simulate random price data are put together in the function rdata(). This function easily simulates random price data for a specified number of regions, basic headings, and individual products including gaps and weights. However, beside this easier usage, there are some additional benefits:

Prices and quantities for each individual product are simulated.
Sales can be introduced. Some prices and quantities are adjusted by a random factor. The sales are flagged in the data.
The generation of prices follows a more sophisticated model (see also the NLCPD method below): \[\begin{equation} p_{bn}^r = \pi_{bn} \ (P^r)^{\delta_{bn}} \ u_{bn}^r \ , \end{equation}\] where the price of product $n$ in region $r$, $p_{bn}^r$, is explained by the general product price, $\pi_{bn}$, the product-specific elasticities $\delta_{bn}$, and the regional price level, $P^r$. The error term $u_{bn}^r$ follows a lognormal distribution.
The parameters used for the data generating process (i.e., the model above) are stored by rdata() and can be added to the function output if desired. This is particularly useful for simulations where some calculated price levels are compared to the ‘true’ price levels underlying the data generation.

The following code snippet shows how easily a connected price dataset with gaps, sales and expenditure weights for $B=4$ basic headings with $N_b=(2,2,3,3)$ products in $R=9$ regions can be simulated.

# simulate random price data:
set.seed(123)
srp <- rdata(
  R=9, # number of regions
  B=4, # number of product groups
  N=c(2,2,3,3), # number of individual products per basic heading
  gaps=0.2, # share of gaps
  weights=~b, # same varying expenditure weights for regions
  sales=0.1, # share of sales
  settings=list(par.add=TRUE) # add true parameters to function output
  )

# true parameters:
srp$param
#> $lnP
#>           1           2           3           4           5           6 
#>  0.01351101 -0.06729194 -0.03758062 -0.02719543 -0.13336083  0.11908720 
#>           7           8           9 
#>  0.05064581 -0.07850519  0.16068999 
#> 
#> $pi
#>       01       02       03       04       05       06       07       08 
#> 2.151639 2.109964 1.372202 1.754044 5.870784 5.714721 5.906962 1.797949 
#>       09       10 
#> 1.728033 1.619993 
#> 
#> $delta
#>         01         02         03         04         05         06         07 
#>  1.2247660  1.2530426  1.0527690  0.5926094  2.2100173  1.3200648 -0.3813724 
#>         08         09         10 
#>  1.4605623  0.6499459  0.2391432

# price data:
head(srp$data)
#> Key: <group, product, region>
#>     group     weight region product   sale price quantity
#>    <fctr>      <num> <fctr>  <fctr> <lgcl> <num>    <num>
#> 1:      1 0.07939543      1      01   TRUE  8.13    53705
#> 2:      1 0.07939543      2      01  FALSE  7.90     9765
#> 3:      1 0.07939543      3      01  FALSE  8.29    10992
#> 4:      1 0.07939543      5      01  FALSE  7.36     9829
#> 5:      1 0.07939543      6      01  FALSE 10.02    38338
#> 6:      1 0.07939543      7      01  FALSE  9.20    22230

The graph shows that the individual products’ prices vary across regions and within the basic heading. However, while they are at similar levels within the same basic heading, they can considerably differ between basic headings.

Bilateral price indices

Bilateral price indices compute the overall price level difference between two regions. For that purpose, the prices of individual products available in both regions are aggregated into a price index showing the price level difference of the comparison region relative to the base region. If quantities, expenditure shares, or any other sort of information on economic importance are available, these data can serve as weights in the aggregation. Otherwise, prices must be aggregated without any weights. Both approaches are shown in the following examples, where price data for four products belonging to the same basic heading were collected in five regions.

# simulate random price data:
set.seed(1)
dt <- rdata(R=5, B=1, N=4)
head(dt)
#> Key: <group, product, region>
#>     group weight region product   sale price quantity
#>    <fctr>  <num> <fctr>  <fctr> <lgcl> <num>    <num>
#> 1:      1      1      1       1  FALSE 17.81    14529
#> 2:      1      1      2       1  FALSE 17.48   102595
#> 3:      1      1      3       1  FALSE 17.22    76664
#> 4:      1      1      4       1  FALSE 20.07   127831
#> 5:      1      1      5       1  FALSE 15.53   157119
#> 6:      1      1      1       2  FALSE 20.79    18665

Elementary (unweighted) indices

If prices are collected by statistical offices locally (local or field-based price collection), quantities or any other information of the economic importance of the individual products are usually lacking. Therefore, the prices must be aggregated without any weights. The most prominent elementary price indices are implemented in the pricelevels-package:

Carli index (Carli 1804): \[P_{\text{Carli}}^{sr} = \frac{1}{N} \sum_{n=1}^{N} p_n^r / p_n^s\]
Jevons index (Jevons 1865): \[P_{\text{Jevons}}^{sr} = \prod_{n=1}^{N} \left( p_n^r / p_n^s \right)^{1/N}\]
Harmonic mean index (Jevons 1865; Coggeshall 1886): \[P_{\text{Harmonic}}^{sr} = \frac{N}{\sum_{n=1}^{N} p_n^s / p_n^r}\]
Dutot index (Dutot 1738): \[P_{\text{Dutot}}^{sr} = \frac{\sum_{n=1}^{N} p_n^r}{\sum_{n=1}^{N} p_n^s}\]
BMW index (Balk 2005; Mehrhoff 2007): \[P^{sr}_{\text{BMW}} = \frac{\sum_{n=1}^N \sqrt{p_n^r / p_n^s} }{ \sum_{n=1}^{N} \sqrt{p_n^s / p_n^r} }\]
CSWD index (Carruthers, Sellwood, and Ward 1980; Dalén 1992): \[P_{\text{CSWD}}^{sr} = \sqrt{P_{\text{Carli}}^{sr} \ P_{\text{Harmonic}}^{sr}} \ ,\]

where the indices express the prices of each region $r$ relative to the base region $s$ based on the prices of the individual products $n \left(n=1,\ldots,N\right)$.

If we want to compare the prices of each region in the data dt to those in region 1 using the Jevons index, we can set region 1 as the base region in the function jevons():

dt[, jevons(p=price, r=region, n=product, base="1")]
#>         1         2         3         4         5 
#> 1.0000000 0.9995577 0.9924891 1.0624618 0.9397925

Similarly, we can compute the price index numbers relative the base region 2 using the Dutot index:

dt[, dutot(p=price, r=region, n=product, base="2")]
#>         1         2         3         4         5 
#> 1.0002609 1.0000000 0.9932159 1.0627528 0.9419439

The price comparison of each region relative to region 2 are carried out separately (or bilaterally). Each bilateral price comparison thus relies solely on the prices of the two regions that are compared. This is the rationale of bilateral price indices. However, it also results in some undesirable properties.

A price comparison of region 2 to 1 should generally give the same result as the reciprocal of the price comparison of region 1 to 2. This requirement is denoted as the country reversal test. However, as shown below, only the Jevons index and the Dutot index comply with this requirement for the considered dataset.

# harmonic mean index fails:
all.equal(
 dt[, harmonic(p=price, r=region, n=product, base="1")][2],
 1/dt[, harmonic(p=price, r=region, n=product, base="2")][1],
 check.attributes=FALSE)
#> [1] "Mean relative difference: 0.0001990829"

# carli fails:
all.equal(
 dt[, carli(p=price, r=region, n=product, base="1")][2],
 1/dt[, carli(p=price, r=region, n=product, base="2")][1],
 check.attributes=FALSE)
#> [1] "Mean relative difference: 0.0001990433"

# jevons index ok:
all.equal(
 dt[, jevons(p=price, r=region, n=product, base="1")][2],
 1/dt[, jevons(p=price, r=region, n=product, base="2")][1],
 check.attributes=FALSE)
#> [1] TRUE

# dutot index ok:
all.equal(
 dt[, dutot(p=price, r=region, n=product, base="1")][2],
 1/dt[, dutot(p=price, r=region, n=product, base="2")][1],
 check.attributes=FALSE)
#> [1] TRUE

The resulting price index numbers of the Carli index and the Harmonic mean index depend on the choice of the base region. Consequently, if one changes the base region also the regional price level differences change.

Another important requirement on spatial price index numbers is transitivity, which demands that a direct price comparison of two regions gives the same result as an indirect comparison of the two regions via a third one. Consequently, transitivity ensures the consistency of a set of price index numbers. Again, if there are no data gaps, the Jevons and Dutot indices are transitive, while the Carli and Harmonic mean indices fail this test.

Unit value related indices

A unit value divides the total expenditure of several products by the purchased quantities. This concept is only meaningful if the products are homogeneous. Otherwise, the quantities could not be added up in a meaningful way, which would result in a unit value bias. Several approaches exist that try to adjust the quantities in a way that they can be properly added up. Prominent examples are the indices proposed by Banerjee (1977), Davies (1924), and Lehr (1885). They are defined as follows:

Unit value index (Drobisch 1871a, 1871b): \[P^{sr}_{\text{Unit-Value}} = \frac{\sum_{n=1} p_n^r q_n^r}{\sum_{n=1} p_n^s q_n^s} \frac{\sum_{n=1} q_n^s}{\sum_{n=1} q_n^r}\]
Banerjee index (Banerjee 1977): \[P^{sr}_{\text{Banerjee}} = \frac{\sum_{n=1} p_n^r q_n^r }{\sum_{n=1} p_n^s q_n^s} \frac{\sum_{n=1} q_n^s \left( p_n^r + p_n^s \right) / 2}{\sum_{n=1} q_n^r \left( p_n^r + p_n^s \right) / 2}\]
Davies index (Davies 1924): \[P^{sr}_{\text{Davies}} = \frac{\sum_{n=1} p_n^r q_n^r }{\sum_{n=1} p_n^s q_n^s} \frac{\sum_{n=1} q_n^s \sqrt{p_n^r p_n^s}}{\sum_{n=1} q_n^r \sqrt{p_n^r p_n^s}}\]
Lehr index (Lehr 1885): \[P^{sr}_{\text{Lehr}} = \frac{\sum_{n=1} p_n^r q_n^r }{\sum_{n=1} p_n^s q_n^s} \frac{\sum_{n=1} q_n^s \left( p_n^r q_n^r + p_n^s q_n^s \right) / \left( q_n^r + q_n^s\right)}{\sum_{n=1} q_n^r \left( p_n^r q_n^r + p_n^s q_n^s \right) / \left( q_n^r + q_n^s\right)} \ ,\]

where $q_n^r$ is the purchased quantity of product $n$ in region $r$. The latter three indices can be somehow seen as quality-adjusted unit value indices. This becomes more clear when rewriting the underlying formula as: \[P^{sr}_{\text{QAUV}} = \frac{\sum_{n=1} p_n^r q_n^r }{\sum_{n=1} p_n^s q_n^s} \frac{\sum_{n=1} q_n^s z_n^{sr}}{\sum_{n=1} q_n^r z_n^{sr}} \ ,\] where the quality-adjustment factors $z_n^{sr}$ are defined as \[z_n^{sr} = \begin{cases} \left( p_n^r + p_n^s \right) / 2 & \text{if Banerjee index} \\ \sqrt{p_n^r p_n^s} & \text{if Davies index} \\ \left( p_n^r q_n^r + p_n^s q_n^s \right) / \left( q_n^r + q_n^s\right) & \text{if Lehr index} \end{cases}\] For the standard unit value index, the quality-adjustment factor would simply be 1.

The indices can be computed separately for each region using the quantities available in the data, e.g., for a unit value index as follows:

# unit value index:
dt[, uvalue(p=price, r=region, n=product, q=quantity, base="1")]
#>         1         2         3         4         5 
#> 1.0000000 0.9919158 0.9876468 1.0571118 0.9348979

The unit value indices have the drawback that the resulting price index might show price level differences which solely come from the quantities. Hence, even when all prices in two regions are identical, the price index might not be unity due to different quantities. This often undesirable property is avoided by the following weighted price indices.

Quantity or expenditure share weighted indices

Not each individual product or basic heading is equally important to consumers. Household consumption considerably varies across basic headings but also for individual products within a basic heading. Neglecting these differences may cause a bias in the price index numbers as the price differences of less important products receive the same weight as more important ones. Therefore, any weighting information available should be included in the price comparison. The pricelevels-package offers the following weighted bilateral price indices. They equally work with quantities, $q_n^r$, and weights, $w_n^r$, when the weights can be expressed as regional expenditure shares, i.e., $w_n^r = p_n^r q_n^r / \sum_{j=1}^N p_j^r q_j^r$.

Laspeyres index (Laspeyres 1871): \[P_{\text{Laspeyres}}^{sr} = \frac{\sum_{n=1}^{N} p_n^r q_n^s}{\sum_{n=1}^{N} p_n^s q_n^s} = \sum_{n=1}^{N} w_n^s \left( p_n^r / p_n^s \right)\]
Paasche index (Paasche 1874): \[P_{\text{Paasche}}^{sr} = \frac{\sum_{n=1}^{N} p_n^r q_n^r}{\sum_{n=1}^{N} p_n^s q_n^r} = \frac{1}{\sum_{n=1}^{N} w_n^r \left( p_n^s / p_n^r \right)}\]
Palgrave index (Palgrave 1886): \[P_{\text{Palgrave}}^{sr} = \sum_{n=1}^N w_n^r \left( p_n^r / p_n^s \right)\]
Fisher index (Fisher 1921): \[P_{\text{Fisher}}^{sr} = \sqrt{P_{\text{Laspeyres}}^{sr} \ P_{\text{Paasche}}^{sr}}\]
Drobisch index (Drobisch 1871a, 1871b): \[P_{\text{Drobisch}}^{sr} = \left( P_{\text{Laspeyres}}^{sr} + P_{\text{Paasche}}^{sr} \right) / 2\]
Lowe index (Lowe 1823): \[\begin{aligned} P_{\text{Lowe}}^{sr} &= \frac{\sum_{n=1}^{N} p_n^r q_n^b}{\sum_{n=1}^{N} p_n^s q_n^b}\\ &= \sum_{n=1}^{N} w_n^b \left( p_n^r / p_n^s \right) \quad , \text{with} \ w_n^{b} = \frac{p_n^s q_n^b}{\sum_{j=1}^N p_j^s q_j^b} \end{aligned}\]
Young index (Young 1812): \[P_{\text{Young}}^{sr} = \sum_{n=1}^{N} w_n^b \left( p_n^r / p_n^s \right) \quad , \text{with} \ w_n^{b} = \frac{p_n^b q_n^b}{\sum_{j=1}^N p_j^b q_j^b}\]
Marshall-Edgeworth index (Marshall 1887; Edgeworth 1925): \[\begin{aligned} P_{\text{Marshall-Edgeworth}}^{sr} &= \frac{\sum_{n=1}^N p_n^r \left( q_n^r + q_n^s \right)}{\sum_{n=1}^N p_n^s \left( q_n^r + q_n^s \right)} \\ &= \sum_{n=1}^N \bar{w}_n^{sr} \left( p_n^r / p_n^s \right) \quad , \text{with} \ \bar{w}_n^{sr} = \frac{p_n^s \left( q_n^r + q_n^s \right)}{\sum_{j=1}^N p_j^s \left( q_j^r + q_j^s \right)} \end{aligned}\]
Walsh index (Walsh 1901): \[\begin{aligned} P_{\text{Walsh}}^{sr} &= \frac{\sum_{n=1}^{N} \sqrt{q_n^r q_n^s} \ p_n^r}{\sum_{n=1}^{N} \sqrt{q_n^r q_n^s} \ p_n^s}\\ &= \frac{\sum_{n=1}^{N} \left( \bar{w}_n^{sr} / \sum_{j=1}^N \bar{w}_j^{sr} \right) \sqrt{p_n^r / p_n^s}}{\sum_{n=1}^{N} \left( \bar{w}_n^{sr} / \sum_{j=1}^N \bar{w}_j^{sr} \right) \sqrt{p_n^s / p_n^r}} \quad , \text{with} \ \bar{w}_n^{sr} = \sqrt{w_n^r w_n^s} \end{aligned}\]
Geometric Laspeyres index: \[P_{\text{Geo-Laspeyres}}^{sr} = \prod_{n=1}^N \left( p_n^r / p_n^s \right)^{w_n^s}\]
Geometric Paasche index: \[P_{\text{Geo-Paasche}}^{sr} = \prod_{n=1}^N \left( p_n^r / p_n^s \right)^{w_n^r}\]
Törnqvist index (Törnqvist 1936): \[\begin{aligned} P_{\text{Törnqvist}}^{sr} &= \sqrt{P_{\text{Geo-Laspeyres}}^{sr} \ P_{\text{Geo-Paasche}}^{sr}}\\ &= \prod_{n=1}^N \left( p_n^r / p_n^s \right)^{\bar{w}_n^{sr} / \sum_{j=1}^N \bar{w}_j^{sr}} \quad , \text{with} \ \bar{w}_n^{sr} = \frac{1}{2} \left( w_n^r + w_n^s \right) \end{aligned}\]
Geometric Walsh index (Walsh 1901): \[P_{\text{Geo-Walsh}}^{sr} = \prod_{n=1}^N \left( p_n^r / p_n^s \right)^{\bar{w}_n^{sr} / \sum_{j=1}^N \bar{w}_j^{sr}} \quad , \text{with} \ \bar{w}_n^{sr} = \sqrt{w_n^r w_n^s}\]
Theil index (Theil 1973): \[P_{\text{Theil}}^{sr} = \prod_{n=1}^N \left( p_n^r / p_n^s \right)^{\bar{w}_n^{sr} / \sum_{j=1}^N \bar{w}_j^{sr}} \quad , \text{with} \ \bar{w}_n^{sr} = \left[ \frac{1}{2} \left( w_n^r + w_n^s \right) \ w_n^r \ w_n^s \right]^{1/3}\]
Sato-Vartia index (Sato 1976; Vartia 1976): \[P_{\text{Sato-Vartia}}^{sr} = \prod_{n=1}^N \left( p_n^r / p_n^s \right)^{\bar{w}_n^{sr} / \sum_{j=1}^N \bar{w}_j^{sr}} \quad , \text{with} \ \bar{w}_n^{sr} = \begin{cases} \frac{w_n^r-w_n^s}{\ln{w_n^r}-\ln{w_n^s}} & w_n^r \neq w_n^s \\ w_n^s & w_n^r = w_n^s \end{cases}\]

The well-known Laspeyres index can be computed for each region using the quantities available in the data:

dt[, laspeyres(p=price, r=region, n=product, q=quantity, base="1")]
#>         1         2         3         4         5 
#> 1.0000000 1.0000873 0.9929240 1.0607180 0.9429748

The same result can be achieved by deriving expenditure shares of the individual products and using them as weights w in the laspeyres()-function:

dt[, "share" := price*quantity/sum(price*quantity), by="region"]
dt[, laspeyres(p=price, r=region, n=product, w=share, base="1")]
#>         1         2         3         4         5 
#> 1.0000000 1.0000873 0.9929240 1.0607180 0.9429748

While the Laspeyres index uses the quantities (or expenditure shares) of the base region for weighting, the Paasche index relies on the quantities (or expenditure shares) of the comparison regions. By contrast, the Fisher, Walsh, and Törnqvist index average the quantities (or expenditure shares) of the two regions involved in the price comparison. They are also denoted as superlative price indices. However, all bilateral indices fail the country reversal and transitivity test if there are gaps in the data. Therefore, multilateral price indices have been developed to deal with this issue. Their use is recommended in particular for spatial price comparisons though they are nowadays more and more used for temporal price comparisons as well.

Multilateral price indices

Multilateral price indices simultaneously use the prices of all regions involved in the price comparison to compute a set of transitive price index numbers - irrespective of any data gaps. The pricelevels-package offers the three most prominent multilateral price indices, and a newly developed extension:

(Nonlinear) CPD method: cpd() and nlcpd()
GEKS method: geks()
Multilateral systems of equations: gkhamis(), ikle(), rao(), and rhajargasht()
Gerardi index: gerardi()

CPD and NLCPD methods

The CPD method (Summers 1973) is a linear regression model that explains the logarithmic price of product $n$ in region $r$, $\ln p_n^r$, by the general product prices, $\ln \pi_n \ (n=1,\ldots,N)$, and the overall regional price levels, $\ln P^r \ (r=1,\ldots,R)$:

\[\ln p_n^r = \ln \pi_n + \ln P^r + \ln \epsilon_n^r \quad \text{with} \ \ln \epsilon_n^r \sim N(0, \sigma^2)\] Auer and Weinand (2022) recently proposed a generalization of the CPD method. This nonlinear CPD method (NLCPD method) inflates the CPD model by product-specific elasticities $\delta_n \ (n=1,\ldots,N)$:

\[\ln p_n^r = \ln \pi_n + \delta_n \ln P^r + \ln \epsilon_n^r \quad \text{with} \ \ln \epsilon_n^r \sim N(0, \sigma^2)\] The CPD method implicitly assumes that all $\delta_n=1$. However, this assumption is not very realistic as price elasticities can considerably differ between the products, particularly at higher levels (e.g., rents versus food).

Estimating the CPD and NLCPD regression model produces estimates for the regional price levels, respectively. However, both methods require a normalization of the estimated price levels $\widehat{\ln P^r}$ to avoid multicollinearity. With normalization $\sum_{r=1}^{R} \widehat{\ln P^r}=0$, the price levels are expressed relative to the regional average; otherwise, the (logarithm of the) price level of one specific region is set to 0. The NLCPD method additionally imposes the restriction $\sum_{n=1}^{N} w_n \widehat{\delta}_n=1$. In this package (function nlcpd()), it is always the parameter $\widehat{\delta}_1$ that is derived residually from this restriction.

The CPD and NLCPD methods are implemented in the functions cpd() and nlcpd(). The functions can be used similarly to those for bilateral indices as shown below. However, if requested by the user, they provide some additional information and they can express the regional price levels relative to the unweighted regional average by setting base=NULL.

# CPD estimation with respect to regional average:
dt[, cpd(p=price, r=region, n=product, q=quantity, base=NULL)]
#>         1         2         3         4         5 
#> 0.9992531 0.9959948 0.9952321 1.0750871 0.9390731

# CPD estimation with respect to region 1:
dt[, cpd(p=price, r=region, n=product, q=quantity, base="1")]
#>         1         2         3         4         5 
#> 1.0000000 0.9967392 0.9959759 1.0758907 0.9397750

# same price levels with shares as weights:
dt[, cpd(p=price, r=region, n=product, w=share, base="1")]
#>         1         2         3         4         5 
#> 1.0000000 0.9967392 0.9959759 1.0758907 0.9397750

# NLCPD estimation with shares as weights:
dt[, nlcpd(p=price, r=region, n=product, w=share, base="1")]
#>         1         2         3         4         5 
#> 1.0000000 0.9923211 0.9860177 1.0615430 0.9341702

If not only the (unlogged) price level estimates but the full regression output is of interest, this can be retrieved by setting simplify=FALSE in the functions (shown below for cpd()):

# full CPD regression output:
dt[, cpd(p=price, r=region, n=product, w=share, simplify=FALSE)]
#> 
#> Call:
#> stats::lm(formula = cpd_mod, data = pdata, weights = w, singular.ok = FALSE)
#> 
#> Coefficients:
#>       pi.1        pi.2        pi.3        pi.4       lnP.1       lnP.2  
#>  2.8741375   3.0253536   2.9090996   2.9991116  -0.0007471  -0.0040133  
#>      lnP.3       lnP.4  
#> -0.0047793   0.0724017

The NLCPD method solves a nonlinear optimization problem. Therefore, it is computationally much slower than most other price indices. However, computation speed can be improved by using the Jacobian matrix for the optimization instead of deriving this matrix analytically during optimization. This can be achieved by changing the settings in nlcpd().

set.seed(123)
dt.big <- rdata(R=50, B=1, N=30, gaps=0.25)

# don't use jacobian matrix:
system.time(m1 <- dt.big[, nlcpd(p=price, r=region, n=product, q=quantity,
                              settings=list(use.jac=FALSE), simplify=FALSE,
                              control=minpack.lm::nls.lm.control("maxiter"=200))])
#>    user  system elapsed 
#>    0.64    0.07    0.77

# use jacobian matrix:
system.time(m2 <- dt.big[, nlcpd(p=price, r=region, n=product, q=quantity,
                              settings=list(use.jac=TRUE), simplify=FALSE,
                              control=minpack.lm::nls.lm.control("maxiter"=200))])
#>    user  system elapsed 
#>    0.09    0.00    0.11

# less computation time needed for m2, but same results as m1:
all.equal(m1$par, m2$par, tol=1e-05)
#> [1] TRUE

As stated earlier, the CPD and NLCPD methods produce transitive price levels. This is also true when there are data gaps. Below, we introduce random gaps in the data dt, reestimate the CPD and NLCPD models, and test for transitivity. More precisely, we check if the direct price comparison between regions 2 and 1 is the same as the indirect comparison including region 3, i.e., $\widehat{P}^{12} = \widehat{P}^{13} \cdot \widehat{P}^{32}$.

# introduce 20% data gaps:
set.seed(1)
dt.gaps <- dt[!rgaps(r=region, n=product, amount=0.2)]

# estimate CPD model using different base regions and check transitivity:
P1.cpd <- dt.gaps[, cpd(p=price, r=region, n=product, q=quantity, base="1")]
P3.cpd <- dt.gaps[, cpd(p=price, r=region, n=product, q=quantity, base="3")]
all.equal(P1.cpd[2], P1.cpd[3]*P3.cpd[2], check.attributes=FALSE)
#> [1] TRUE

# estimate NLCPD model using different base regions and check transitivity:
P1.nlcpd <- dt.gaps[, nlcpd(p=price, r=region, n=product, q=quantity, base="1")]
P3.nlcpd <- dt.gaps[, nlcpd(p=price, r=region, n=product, q=quantity, base="3")]
all.equal(P1.nlcpd[2], P1.nlcpd[3]*P3.nlcpd[2], check.attributes=FALSE)
#> [1] TRUE

The results indicate that both the CPD method and the NLCPD method produce transitive price level estimates even when data gaps are present.

GEKS method

The GEKS method (Gini 1924, 1931; Eltetö and Köves 1964; Szulc 1964) is a two-step approach. First, prices are aggregated into bilateral index numbers for every pair of regions in the data. Second, the bilateral index numbers are transformed into a set of transitive index numbers by estimating the linear regression model \[\ln \dot{P}^{sr} = \ln \left( P^r / P^s \right) + \ln \epsilon^{sr} \quad \text{with} \ \ln \epsilon^{sr} \sim N(0,\sigma^2) \ ,\] where $\dot{P}^{sr}$ is the bilateral price index for regions $s$ and $r$ computed in the first stage. For this computation any bilateral index could theoretically be used (and is supported in the pricelevels-package). Rao and Banerjee (1986), however, recommend that the bilateral price index satisfies the country reversal test. If, for example, the Jevons index is used in the computations, then it is more precise to denote the resulting index numbers as the GEKS-Jevons price index.

# GEKS using Törnqvist:
dt.gaps[, geks(p=price, r=region, n=product, q=quantity, settings=list(type="toernqvist"))]
#>         1         2         3         4         5 
#> 0.9968630 0.9939038 0.9905194 1.0940081 0.9314009

# GEKS using Jevons so quantities have no impact:
dt.gaps[, geks(p=price, r=region, n=product, q=quantity, settings=list(type="jevons"))]
#>         1         2         3         4         5 
#> 0.9954545 0.9945994 0.9916983 1.0880181 0.9360838

The geks()-function internally calls the function index.pairs() to compute the bilateral indices of all region pairs. The quantities q or weights w are used within this aggregation of prices into bilateral index numbers (first stage) while the subsequent transformation of these index numbers (second stage) usually does not rely on any weights (but can if specified in settings$wmethod). In this case, the solution to the regression model above simplifies to \[P_{\text{GEKS-J}}^{1r} = \prod_{s=1}^{R} \left( \dot{P}^{1s}_{\text{J}} \dot{P}^{sr}_{\text{J}} \right)^{1/R} \ ,\] if the Jevons index is used and region 1 serves as the base region (ILO et al. 2020, 448).

Again, the following checks show that the price index numbers derived with the GEKS-Törnqvist index are transitive:

# estimate GEKS-Törnqvist using different base regions and 
# applying weights in second aggregation stage:
P1.geks <- dt.gaps[, geks(p=price, r=region, n=product, q=quantity, base="1",
                          settings=list(type="toernqvist", wmethod="obs"))]
P3.geks <- dt.gaps[, geks(p=price, r=region, n=product, q=quantity, base="3",
                          settings=list(type="toernqvist", wmethod="obs"))]

# check transitivity:
all.equal(P1.geks[2], P1.geks[3]*P3.geks[2], check.attributes=FALSE)
#> [1] TRUE

Multilateral systems of equations

The following price indices belong to this very general class of multilateral price indices and are implemented in the pricelevels-package.

Geary-Khamis method (Geary 1958; Khamis 1972): gkhamis()
Rao method (Rao 1990): rao()
Iklé method (Ikle 1972; Dikhanov 1994; Balk 1996): ikle()
Rao-Hajargasht arithmetic method (Rao and Hajargasht 2016): rhajargasht()

All methods have in common that they set up a system of interrelated equations of international product prices and regional price levels, which must be solved iteratively. It is only the definition of the international product prices and the price levels that differ between the methods.

The Geary-Khamis method defines the international average prices, $\pi_n \ (n=1,\ldots,N)$, and the regional price levels, $P^r \ (r=1,\ldots,R)$, as \[\pi_n = \frac{\sum_{r=1}^{R} q_n^r \left( p_n^r/ P^r \right)}{\sum_{r=1}^{R} q_n^r} \quad \text{and} \quad P^r = \frac{\sum_{n=1}^{N} p_n^r q_n^r}{\sum_{n=1}^{N} \pi_n q_n^r} = \left[ \sum_{n=1}^{N} w_n^r \left( p_n^r / \pi_n \right)^{-1} \right]^{-1} \, ,\] where the weights $w_n^r= p_n^r q_n^r / \sum_{n=1}^{N} p_n^r q_n^r$ represent expenditure shares. This system of interrelated equations can be solved iteratively or analytically using matrix algebra (Diewert 1999). Which approach is computationally faster depends on the data.

# sample data with gaps:
set.seed(123)
dt.big <- rdata(R=99, B=1, N=50, gaps=0.25)

# iterative processing:
system.time(m1 <- dt.big[, gkhamis(p=price, r=region, n=product, q=quantity,
                                   settings=list(solve="iterative"))])
#>    user  system elapsed 
#>    0.00    0.00    0.02

# matrix algebra:
system.time(m2 <- dt.big[, gkhamis(p=price, r=region, n=product, q=quantity,
                                   settings=list(solve="matrix"))])
#>    user  system elapsed 
#>    0.03    0.00    0.05

# compare results:
all.equal(m1, m2, tol=1e-05)
#> [1] TRUE

The Iklé index defines the regional price levels $P^r$ exactly in the same way as the Geary-Khamis index. However, the calculation of the international product prices $\pi_n$ uses expenditure shares instead of quantities: \[\pi_n = \left[ \frac{\sum_{r=1}^{R} w_n^r \left( p_n^r / P^r \right)^{-1}}{\sum_{r=1}^{R} w_n^r} \right]^{-1} \quad \text{and} \quad P^r = \left[ \sum_{n=1}^{N} w_n^r \left( p_n^r / \pi_n \right)^{-1} \right]^{-1} \, . \] Therefore, the Iklé index suffers less from the Gerschenkorn effect than the Geary-Khamis index as bigger countries do not receive more influence in the calculations than smaller countries.

In contrast to the Geary-Khamis index, the Iklé index can be computed with quantities $q_n^r$ or expenditure share weights $w_n^r$. This also applies to the Rao index, which derives the international product prices, $\pi_n \ (n=1,\ldots,N)$, and regional price levels, $P^r \ (r=1,\ldots,R)$, from the following system of interrelated equations: \[\pi_n = \prod_{r=1}^{R} \left( p_n^r / P^r \right)^{w_n^r / \sum_{s=1}^{R} w_n^s} \quad \text{and} \quad P^r = \prod_{n=1}^{N} \left( p_n^r / \pi_n \right)^{w_n^r} \, , \] while the Rao-Hajargasht index sets up the following system of equations:

\[\pi_n = \sum_{r=1}^{R} \frac{w_n^r}{\sum_{s=1}^{R} w_n^s} \left( p_n^r / P^r \right) \quad \text{and} \quad P^r = \sum_{n=1}^{N} w_n^r \left( p_n^r / \pi_n \right) \ . \] All formulas above for $\pi_n$ and $P^r$ require quantities (or expenditure share weights). Following Rao and Hajargasht (2016, 417), however, unweighted variants of the multilateral indices exist, which are implemented in the corresponding functions by setting q=NULL or w=NULL.

Also the price indices belonging to this class of multilateral systems of equations produce transitive price levels.

# Geary-Khamis index transitive:
P1.gk <- dt.gaps[, gkhamis(p=price, r=region, n=product, q=quantity, base="1")]
P3.gk <- dt.gaps[, gkhamis(p=price, r=region, n=product, q=quantity, base="3")]
all.equal(P1.gk[2], P1.gk[3]*P3.gk[2], check.attributes=FALSE)
#> [1] TRUE

Gerardi index

The multilateral Gerardi index used by Eurostat (1978) is implemented in the function gerardi(). Similar to the indices from the previous section, the Gerardi index defines international product prices, $\pi_n$, and regional price levels, $P^r$:

\[\pi_n = \left( \prod_{r=1}^{R} p_n^r \right)^{1/R} \quad n=1,\ldots,N\] and \[P^r = \frac{\sum_{n=1}^{N} p_n^r q_n^r}{\sum_{n=1}^{N} \pi_n q_n^r} = \left[ \sum_{n=1}^{N} w_n^r \left( p_n^r / \pi_n \right)^{-1} \right]^{-1} \quad r=1,\ldots,R \, .\]

Consequently, the regional price levels are defined in the same way as for the Geary-Khamis and Ikle methods. It is only the definition of the international product prices that differs. More importantly, however, the $\pi_n$ rely only on the observed prices. Therefore, the Gerardi index does not require an iterative procedure, but can be solved in one step.

One obvious drawback of the Gerardi index is the definition of $\pi_n$ as an unweighted geometric mean even when quantities or expenditure shares are available. Hence, the function gerardi() offers another variant where the $\pi_n$’s are computed as weighted geometric means. This can be set by settings=list(variant="adjusted").

dt.gaps[, gerardi(p=price, r=region, n=product, q=quantity, base="1", settings=list(variant="adjusted"))]
#>         1         2         3         4         5 
#> 1.0000000 0.9818328 0.9909638 1.0803662 0.9304644

The Gerardi index produces transitive regional price levels.

# Gerardi index transitive:
P1 <- dt.gaps[, gerardi(p=price, r=region, n=product, q=quantity, base="1")]
P3 <- dt.gaps[, gerardi(p=price, r=region, n=product, q=quantity, base="3")]
all.equal(P1[2], P1[3]*P3[2], check.attributes=FALSE)
#> [1] TRUE

Some further package features

Price indices with non-connected price data

Bilateral price indices use solely the direct matches or links between two regions. Hence, indirect links between two regions via a third one are not taken into account. This is different for multilateral price indices, which make use of these indirect links. Consequently, the function output between these indices may differ.

# example data:
dt <- data.table(
  "region"=rep(letters[1:5], c(3,3,7,4,4)),
  "product"=as.character(c(1:3, 1:3, 1:7, 4:7, 4:7)))
set.seed(123)
dt[, "price":=rnorm(n=.N, mean=20, sd=2)]
dt[, "quantity":=rnorm(n=.N, mean=999, sd=100)]

# price matrix:
with(dt, tapply(X=price, list(product, region), mean))
#>          a        b        c        d        e
#> 1 18.87905 20.14102 20.92183       NA       NA
#> 2 19.53965 20.25858 17.46988       NA       NA
#> 3 23.11742 23.43013 18.62629       NA       NA
#> 4       NA       NA 19.10868 20.22137 16.06677
#> 5       NA       NA 22.44816 18.88832 21.40271
#> 6       NA       NA 20.71963 23.57383 19.05442
#> 7       NA       NA 20.80154 20.99570 17.86435

The price matrix shows that prices for products 1 to 3 are available in regions a, b, and c, while prices for products 4 to 7 were recorded in regions c, d, and e. Since in region c the prices are complete, the data are connected - either through direct or indirect regional links.

dt[, is.connected(r=region, n=product)] # true
#> [1] TRUE

However, the price matrix also shows that there are no product matches for regions a and d, for example. Therefore, any bilateral price index between these two regions is not defined, while the multilateral price indices will provide a price level by taking into account the indirect link of regions a and d via region c.

# bilateral jevons index:
dt[, jevons(p=price, r=region, n=product, base="a")]
#>         a         b         c         d         e 
#> 1.0000000 1.0388264 0.9276696        NA        NA

# multilateral unweighted cpd index:
dt[, cpd(p=price, r=region, n=product, base="a")]
#>         a         b         c         d         e 
#> 1.0000000 1.0388264 0.9276696 0.9328507 0.8274965

If we now assume that there is no region c, the data will consist of two separate blocks of regions and, thus, become non-connected.

# drop region 'c':
dt2 <- dt[!region%in%"c",]

# price matrix:
with(dt2, tapply(X=price, list(product, region), mean))
#>          a        b        d        e
#> 1 18.87905 20.14102       NA       NA
#> 2 19.53965 20.25858       NA       NA
#> 3 23.11742 23.43013       NA       NA
#> 4       NA       NA 20.22137 16.06677
#> 5       NA       NA 18.88832 21.40271
#> 6       NA       NA 23.57383 19.05442
#> 7       NA       NA 20.99570 17.86435

# check if connected:
dt2[, is.connected(r=region, n=product)]
#> [1] FALSE

In this case, no price comparison involving all regions is possible. The price indices implemented in the pricelevels-package deal with this issue by using either the block of regions that includes the base region or the biggest block of regions for a price comparison on a subset of the data.

# bilateral jevons index:
dt2[, jevons(p=price, r=region, n=product, base="a")]
#> Warning: Non-connected regions -> computations with subset of data
#>        a        b        d        e 
#> 1.000000 1.038826       NA       NA

# multilateral unweighted cpd index:
dt2[, cpd(p=price, r=region, n=product, base="a")]
#> Warning: Non-connected regions -> computations with subset of data
#>        a        b        d        e 
#> 1.000000 1.038826       NA       NA

As can be seen, the two functions return the price levels for regions a and b only, while the price levels for regions c and d are set to NA. The functions also print a corresponding warning. All pricelevels-warnings can be suppressed in the settings if wanted.

dt2[, cpd(p=price, r=region, n=product, base="a", settings=list(chatty=FALSE))]
#>        a        b        d        e 
#> 1.000000 1.038826       NA       NA

Also the checking of connectedness can be suppressed. This can be useful in simulations or other cases when it is known that the data are connected. Of course, in our example this setting would result in an error.

dt2[, cpd(p=price, r=region, n=product, base="a", settings=list(connect=FALSE))]
#> Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...): singular fit encountered

If the data contains duplicated prices or missing values NA, these are removed before any calculations start. Prices and weights are averaged within each basic heading for each region and product, while quantities are summed up. Again, a corresponding warning message is returned.

# example data:
dt <- data.table(
  "region"=c("a","a","a","b","b","b","b"),
  "product"=as.character(c(1,1,2,1,1,2,2)))
set.seed(123)
dt[, "price":=rnorm(n=.N, mean=20, sd=2)]
dt[, "quantity":=rnorm(n=.N, mean=999, sd=100)]
dt[1, "price" := NA_real_]

dt[, cpd(p=price, r=region, n=product)]
#> Warning: 1 incomplete case(s) found and removed
#> Warning: Duplicated observations found and aggregated
#>         a         b 
#> 1.0020896 0.9979148

Calculation of multiple price indices at once

In some situations it is useful to compute the price levels of the regions using multiple price indices. This could be done by calculating, for example, the CPD index, the GEKS-Jevons index, and the Jevons index, separately using the corresponding package functions.

# example data:
set.seed(123)
dt <- rdata(R=5, B=1, N=7, gaps=0.2)

# cpd:
dt[, cpd(p=price, r=region, n=product, base="1")]
#>        1        2        3        4        5 
#> 1.000000 1.304266 1.170156 1.098752 1.197490

# geks-jevons:
dt[, geks(p=price, r=region, n=product, base="1", setting=list(type="jevons"))]
#>        1        2        3        4        5 
#> 1.000000 1.294057 1.165078 1.093972 1.191073

# jevons:
dt[, jevons(p=price, r=region, n=product, base="1")]
#>        1        2        3        4        5 
#> 1.000000 1.294572 1.145790 1.097794 1.206428

However, this approach is not convenient and also not efficient as each price index function checks the same user inputs, removes the same missing values and duplicated observations, and connects the same data if needed. This can be time consuming for bigger datasets. The pricelevels-package therefore offers the pricelevels()-function, which can be used to efficiently calculate multiple price indices at once.

dt[, pricelevels(p=price, r=region, n=product, base="1", settings=list(type=c("jevons","cpd","geks-jevons")))]
#>             1        2        3        4        5
#> cpd         1 1.304266 1.170156 1.098752 1.197490
#> geks-jevons 1 1.294057 1.165078 1.093972 1.191073
#> jevons      1 1.294572 1.145790 1.097794 1.206428

If the user wants to compute all unweighted price indices available in the pricelevels-package, this can be achieved by setting settings$type=NULL:

dt[, pricelevels(p=price, r=region, n=product, base="1")]
#>               1        2        3        4        5
#> bmw           1 1.294645 1.145793 1.097794 1.206430
#> carli         1 1.303176 1.147201 1.098526 1.209764
#> cpd           1 1.304266 1.170156 1.098752 1.197490
#> cswd          1 1.294864 1.145803 1.097795 1.206437
#> dutot         1 1.303007 1.150488 1.099698 1.208837
#> geks-bmw      1 1.294092 1.165093 1.093983 1.191088
#> geks-carli    1 1.294195 1.165138 1.094014 1.191133
#> geks-cswd     1 1.294195 1.165138 1.094014 1.191133
#> geks-dutot    1 1.302091 1.168106 1.096550 1.194862
#> geks-harmonic 1 1.294195 1.165138 1.094014 1.191133
#> geks-jevons   1 1.294057 1.165078 1.093972 1.191073
#> gkhamis       1 1.311388 1.172914 1.101115 1.199854
#> harmonic      1 1.286606 1.144407 1.097065 1.203118
#> ikle          1 1.304265 1.171281 1.099912 1.198626
#> jevons        1 1.294572 1.145790 1.097794 1.206428
#> nlcpd         1 1.313387 1.164121 1.096029 1.193531
#> rao           1 1.304266 1.170156 1.098752 1.197490
#> rhajargasht   1 1.304206 1.168994 1.097576 1.196306

Similarly, if there are quantities or weights available in the data, all weighted and unweighted price indices available in the pricelevels-package are computed at once:

dt[, pricelevels(p=price, r=region, n=product, q=quantity, base="1")]
#>                   1        2        3        4        5
#> banerjee          1 1.253417 1.141503 1.094111 1.198268
#> bmw               1 1.294645 1.145793 1.097794 1.206430
#> carli             1 1.303176 1.147201 1.098526 1.209764
#> cpd               1 1.266089 1.152739 1.082465 1.182087
#> cswd              1 1.294864 1.145803 1.097795 1.206437
#> davies            1 1.252229 1.142528 1.094370 1.199939
#> drobisch          1 1.252491 1.142447 1.094359 1.200005
#> dutot             1 1.303007 1.150488 1.099698 1.208837
#> fisher            1 1.252462 1.142372 1.094347 1.199874
#> geks-banerjee     1 1.271009 1.153711 1.083487 1.180644
#> geks-bmw          1 1.294092 1.165093 1.093983 1.191088
#> geks-carli        1 1.294195 1.165138 1.094014 1.191133
#> geks-cswd         1 1.294195 1.165138 1.094014 1.191133
#> geks-davies       1 1.270677 1.154297 1.084168 1.181476
#> geks-drobisch     1 1.270818 1.154259 1.084083 1.181445
#> geks-dutot        1 1.302091 1.168106 1.096550 1.194862
#> geks-fisher       1 1.270818 1.154259 1.084083 1.181445
#> geks-geolaspeyres 1 1.270597 1.154364 1.084272 1.181536
#> geks-geopaasche   1 1.270597 1.154364 1.084272 1.181536
#> geks-geowalsh     1 1.269451 1.153443 1.084131 1.181396
#> geks-harmonic     1 1.294195 1.165138 1.094014 1.191133
#> geks-jevons       1 1.294057 1.165078 1.093972 1.191073
#> geks-laspeyres    1 1.270818 1.154259 1.084083 1.181445
#> geks-lehr         1 1.253837 1.142062 1.078325 1.171155
#> geks-lowe         1 1.281260 1.159068 1.089011 1.182639
#> geks-medgeworth   1 1.270634 1.155082 1.084910 1.183136
#> geks-paasche      1 1.270818 1.154259 1.084083 1.181445
#> geks-palgrave     1 1.270620 1.154579 1.084537 1.181742
#> geks-svartia      1 1.269841 1.153757 1.084181 1.181444
#> geks-theil        1 1.269829 1.153749 1.084179 1.181445
#> geks-toernqvist   1 1.270597 1.154364 1.084272 1.181536
#> geks-uvalue       1 1.236111 1.163371 1.076930 1.211239
#> geks-walsh        1 1.269482 1.153458 1.084141 1.181411
#> geks-young        1 1.281712 1.159905 1.090079 1.183622
#> geolaspeyres      1 1.255885 1.128502 1.088582 1.179864
#> geopaasche        1 1.248368 1.157057 1.100236 1.220512
#> geowalsh          1 1.252576 1.140507 1.093937 1.200037
#> gerardi           1 1.241508 1.141248 1.081700 1.175969
#> gkhamis           1 1.267113 1.156586 1.084440 1.183675
#> harmonic          1 1.286606 1.144407 1.097065 1.203118
#> ikle              1 1.266336 1.153682 1.083412 1.183243
#> jevons            1 1.294572 1.145790 1.097794 1.206428
#> laspeyres         1 1.261008 1.129372 1.089128 1.182314
#> lehr              1 1.246348 1.126641 1.087801 1.183916
#> lowe              1 1.279239 1.142610 1.093595 1.196531
#> medgeworth        1 1.250211 1.142092 1.096414 1.203379
#> nlcpd             1 1.304658 1.160267 1.094826 1.189399
#> paasche           1 1.243973 1.155522 1.099591 1.217695
#> palgrave          1 1.253266 1.158605 1.100882 1.223354
#> rao               1 1.266089 1.152739 1.082465 1.182087
#> rhajargasht       1 1.265828 1.151783 1.081527 1.180898
#> svartia           1 1.252423 1.141251 1.094091 1.200032
#> theil             1 1.252424 1.141228 1.094087 1.200039
#> toernqvist        1 1.252121 1.142690 1.094394 1.200016
#> uvalue            1 1.222189 1.171691 1.093794 1.197584
#> walsh             1 1.252639 1.140511 1.093938 1.200043
#> young             1 1.287435 1.144315 1.095294 1.200834

It is important to note that only the simplified output including the regions’ price levels is returned. Moreover, the user has to provide a specific base region as bilateral indices do not allow a comparison to the average regional price level. However, any additional settings of some price index function (e.g., settings$use.jac=TRUE for nlcpd()) can be used in pricelevels() as well.

References

Auer, Ludwig von, and Sebastian Weinand. 2022. “A Nonlinear Generalization of the Country-Product-Dummy Method.” Discussion Paper 45. Deutsche Bundesbank.

Balk, Bert. 1996. “A comparison of ten methods for multilateral international price and volume comparisons.” Journal of Official Statistics 12 (1): 199–222.

———. 2005. “Price Indexes for Elementary Aggregates: The Sampling Approach.” Journal of Official Statistics 21 (4): 675–99.

Banerjee, Kali S. 1977. On the factorial approach providing the true index of cost of living. Göttingen: Vandenhoeck & Ruprecht.

Carli, Gian Rinaldo. 1804. “Del Valore e Della Proporzione Dei Metallic Monetati.” In Scrittori Classici Italiani Di Economia Politica, 297–366. Milano: G. G. Destefanis.

Carruthers, A. G., D. J. Sellwood, and P. W. Ward. 1980. “Recent Developments in the Retail Prices Index.” The Statistician 29 (1): 1–32. https://doi.org/10.2307/2987492.

Coggeshall, F. 1886. “The Arithmetic, Geometric, and Harmonic Means.” The Quarterly Journal of Economics 1 (1): 83–86. https://doi.org/10.2307/1883111.

Dalén, Jörgen. 1992. “Computing Elementary Aggregates in the Swedish Consumer Price Index.” Journal of Official Statistics 8 (2): 129–47.

Davies, George R. 1924. “The Problem of a Standard Index Number Formula.” Journal of the American Statistical Association 19 (146): 180–88.

Diewert, Walter Erwin. 1999. “Axiomatic and Economic Approaches to International Comparisons.” In International and Interarea Comparisons of Income, Output, and Prices, 13–107. University of Chicago Press. http://www.nber.org/chapters/c8385.

Dikhanov, Yuri. 1994. “Sensitivity of PPP-based income estimates to the choice of aggregation procedures.”

Drobisch, Moritz Wilhelm. 1871a. “Über die Berechnung der Veränderungen der Waarenpreise und des Geldwerths.” Jahrbücher für Nationalökonomie und Statistik 16: 143–56.

———. 1871b. “Über einige Einwürfe gegen die indiesen Jahrbüchern veröffentlichte neue Methode, die Veränderungen der Waarenpreise und des Geldwerths zu berechten.” Jahrbücher für Nationalökonomie und Statistik 16: 416–27.

Dutot, Nicolas. 1738. Réflections politiques sur les finances et le commerce. La Haye: Les frères Vaillant et N. Prevost.

Edgeworth, Francis Ysidro. 1925. Papers Relating to Political Economy. Vol. 1. London: Macmillan for the Royal Economic Society.

Eltetö, Ödön, and Pál Köves. 1964. “On a Problem of Index Number Computation Relating to International Comparison.” Statisztikai Szemle 42: 507–18.

Eurostat. 1978. Comparison in real values of the aggregates of ESA 1975. Luxembourg: Publications Office of the European Commission.

Fisher, Irving. 1921. “The Best Form of Index Number.” Quarterly Publications of the American Statistical Association 17 (133): 533–37. https://doi.org/10.2307/2965310.

Geary, Robert Charles. 1958. “A Note on the Comparison of Exchange Rates and Purchasing Power Between Countries.” Journal of the Royal Statistical Society. Series A (General) 121 (1): 97–99.

Gini, Corrado. 1924. “Quelques Considerations Au Sujet de La Construction Des Nombres Indices Des Prix Et Des Questions Analogues.” Mentron 4 (1): 3–162.

———. 1931. “On the Circular Test of Index Numbers.” International Statistical Review 9 (2): 3–25.

Ikle, Doris M. 1972. “A New Approach to the Index Number Problem.” The Quarterly Journal of Economics 86 (2): 188. https://doi.org/10.2307/1880559.

ILO, IMF, OECD, UNECE, Eurostat, and World Bank. 2020. Consumer Price Index Manual: Concepts and Methods. Washington DC: International Monetary Fund. https://doi.org/10.5089/9781484354841.069.

Jevons, William Stanley. 1865. “On the Variation of Prices and the Value of the Currency Since 1782.” Journal of the Statistical Society of London 28 (2): 294–320. http://www.jstor.org/stable/2338419.

Khamis, Salem H. 1972. “A New System of Index Numbers for National and International Purposes.” Journal of the Royal Statistical Society. Series A (General) 135 (1): 96–121.

Laspeyres, Etienne. 1871. “Die Berechnung einer mittleren Waarenpreissteigerung.” Jahrbücher für Nationalökonomie und Statistik 16 (1): 296–314.

Lehr, Julius. 1885. Beiträge zur Statistik der Preise: Insbesondere des Geldes und des Holzes. Frankfurt am Main: F. D. Sauerländer Verlag.

Lowe, Joseph. 1823. The Present State of England in Regard to Agriculture, Trade and Finance. 2nd ed. London: Longman, Hurst, Rees, Orme,; Brown.

Marshall, Alfred. 1887. “Remedies for Fluctuations of General Prices.” Contemporary Review 51: 355–75.

Mehrhoff, Jens. 2007. “A Linear Approximation to the Jevons Index.” In Index Theory and Price Statistics, edited by Peter von der Lippe. Peter Lang Ltd. International Academic Publishers.

Paasche, Hermann. 1874. “Über die Preisentwicklung der letzten Jahre nach den Hamburger Börsennotierungen.” Jahrbücher für Nationalökonomie und Statistik 12: 168–78.

Palgrave, Robert Harry Inglis. 1886. “Currency and Standard of Value in England, France and India and the Rates of Exchange Between These Countries.” Memorandum for the Royal Commission on the Depression of Trade and Industry. In British Parliamentary Papers: Session 1886: Third Report and Final Report of the Royal Commission on the Depression in Trade and Industry, with Minutes of Evidence and Appendices. Shannon, Ireland: Irish University Press, 1969.

Rao, D. S. Prasada. 1990. “A System of Log-Change Index Numbers for Multilateral Comparisons.” In Comparisons of Prices and Real Products in Latin America. Contributions to Economic Analysis Series, edited by Salazar-Carrillo and Rao. Amsterdam: North-Holland Publishing Company.

Rao, D. S. Prasada, and K. S. Banerjee. 1986. “A Multilateral Index Number System Based on the Factorial Approach.” Statistische Hefte 27 (1): 297–313. https://doi.org/10.1007/BF02932575.

Rao, D. S. Prasada, and Gholamreza Hajargasht. 2016. “Stochastic Approach to Computation of Purchasing Power Parities in the International Comparison Program (ICP).” Journal of Econometrics 191 (2): 414–25. https://doi.org/https://doi.org/10.1016/j.jeconom.2015.12.012.

Sato, Kazuo. 1976. “The Ideal Log-Change Index Number.” The Review of Economics and Statistics 58 (2): 223–28. https://doi.org/10.2307/1924029.

Summers, Robert. 1973. “International Price Comparisons Based Upon Incomplete Data.” Review of Income and Wealth 19 (1): 1–16. https://doi.org/https://doi.org/10.1111/j.1475-4991.1973.tb00870.x.

Szulc, Bohdan J. 1964. “Indices for Multiregional Comparisons.” Przeglad Statystyczny 3: 239–54.

Theil, Henri. 1973. “A New Index Number Formula.” The Review of Economics and Statistics 55 (4): 498–502. https://doi.org/10.2307/1925674.

Törnqvist, Leo. 1936. “The Bank of Finland’s Consumption Price Index.” Bank of Finland Monthly Bulletin 16 (10): 27–34.

Vartia, Yrjö O. 1976. “Ideal Log-Change Index Numbers.” Scandinavian Journal of Statistics 3 (3): 121–26.

Walsh, Correa Moylan. 1901. The Measurement of General Exchange-Value. New York: Macmillan.

World Bank. 2013. Measuring the Real Size of the World Economy: The Framework, Methodology, and Results of the International Comparison Program. Washington, DC: World Bank. https://doi.org/10.1596/978-0-8213-9728-2.

Young, Arthur. 1812. An Inquiry into the Progressive Value of Money in England as Marked by the Price of Agricultural Products. London: McMillan.

Introduction to the pricelevels-package

Sebastian Weinand

22 May 2024

Price data

Characteristics

Connected price data

Non-connected price data

Simulation

Data gaps

Expenditure weights

Price data

Bilateral price indices

Elementary (unweighted) indices

Multilateral price indices

CPD and NLCPD methods

GEKS method

Multilateral systems of equations

Gerardi index

Some further package features

Price indices with non-connected price data

Calculation of multiple price indices at once

References