--- title: "Making price indexes" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Making price indexes} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` Most price indexes are made with a two-step procedure, where period-over-period *elemental indexes* are calculated for a collection of *elemental aggregates* at each point in time, and then aggregated according to a *price index aggregation structure*. These indexes can then be chained together to form a time series that gives the evolution of prices with respect to a fixed base period. The **piar** package contains a collection of functions that revolve around this work flow, making it easy to build standard price indexes in **R**. The purpose of this vignette is to give several extended examples of how to use the functions in this package to make different types of price indexes. This should serve both as a introduction to the functionality in **piar**, and a reference for solving specific index-number problems. ## Matched-sample index The first example covers calculating a matched-sample index, where a fixed set of businesses each provide prices for a collection of products over time. The products reported by a businesses can change over time, but the set of businesses is fixed for the duration of the sample. Each businesses has a weight that is established when the sample is drawn, and represents a particular segment of the economy. The usual approach for calculating a matched-sample index starts by computing the elemental index for each business as an equally-weighted geometric mean of price relatives (i.e., a Jevons index). From there, index values for different segments of the economy are calculated as an arithmetic mean of the elemental indexes, using the businesses-level weights (either a Young or Lowe index, depending how the weights are constructed). The `ms_prices` dataset has price data for five businesses over four quarters, and the `ms_weights` dataset has the weight data. Note that these data have fairly realistic patterns of missing data. ```{r} library(piar) head(ms_prices) ms_weights ``` The `elemental_index()` function makes, well, elemental indexes, using information on price relatives, elemental aggregates (businesses), and time periods (quarters). By default it makes a Jevons index, but any bilateral generalized-mean index is possible. The only wrinkle is that price data here are in levels, and not relatives, but the `price_relative()` function can make the necessary conversion. ```{r} relatives <- with( ms_prices, price_relative(price, period = period, product = product) ) ms_elemental <- with( ms_prices, elemental_index(relatives, period = period, ea = business, na.rm = TRUE) ) ms_elemental ``` (Homogeneous elemental aggregates often leads to unit-value elemental indexes that are not based on price relatives. These cases can be dealt with by first aggregating prices for each elemental aggregate, `aggregate(price ~ period + product, ms_prices, mean)`, at each point in time with an arithmetic mean, then forming price relatives to feed into `elemental_index()`.) As with most functions in **R**, missing values are contagious by default in **piar**. Setting `na.rm = TRUE` in `elemental_index()` means that missing price relatives are ignored, which is equivalent to imputing these missing relatives with the value of the elemental index for the respective businesses (i.e., parental or overall mean imputation). Other types of imputation are possible, and are the topic of a subsequent example. The `elemental_index()` function returns a special index object, and there are a number of methods for working with these objects. Probably the most useful of these methods allows the resulting elemental indexes to be extracted like a matrix, even though it's not a matrix. (Note that there are only indexes for four businesses, not five, because the fifth business never reports any prices; as will be seen in another example, an elemental index can be made for this business with a small change to the call to `elemental_index()`.) ```{r} ms_elemental[, "202004"] ms_elemental["B1", ] ``` With the elemental indexes out of the way, it's time to make a price-index aggregation structure that maps each business to its position in the aggregation hierarchy. The only hiccup is unpacking the digit-wise classification for each businesses that defines the hierarchy. That's the job of the `expand_classification()` function. ```{r} hierarchy <- with( ms_weights, c(expand_classification(classification), list(business)) ) pias <- aggregation_structure(hierarchy, weights = ms_weights$weight) ``` It is now simple to aggregate the elemental indexes according to this aggregation structure with the `aggregate()` function. As with the elemental indexes, missing values are ignored by setting `na.rm = TRUE`, which is equivalent to parentally imputing missing values. Note that, unlike the elemental indexes, missing values are filled in to ensure the index can be chained over time. ```{r} ms_index <- aggregate(ms_elemental, pias, na.rm = TRUE) ms_index ``` Although simple, this example covers the core functionality of **piar**. The remaining examples in the vignette build on this one by adding complexities that often arise in practice. ## Chaining The `elemental_index()` function makes period-over-period elemental indexes by default, which can then be aggregated to make a period-over-period index. Chaining an index is the process of taking the cumulative product of each of these period-over-period indexes to make a time series that compares prices to a fixed base period. The `chain()` function can be used to chain the values in an index object. ```{r} ms_index_chained <- chain(ms_index) ms_index_chained ``` This gives almost the same result as directly manipulating the index as a matrix, except that the former returns an index object (not a matrix). ```{r} t(apply(as.matrix(ms_index), 1, cumprod)) ``` Chained indexes often need be to rebased, and this can be done with the `rebase()` function. For example, rebasing the index so that 202004 is the base period just requires dividing the chained index by the slice for 202004. ```{r} rebase(ms_index_chained, ms_index_chained[, "202004"]) ``` In some cases the base period is the average of several periods; setting the base period to the second half of 2020 just requires averaging the index over subperiods before rebasing. ```{r} rebase( ms_index_chained, mean(window(ms_index_chained, "202003")) ) ``` ## Multi-dimensional aggregation structures Price indexes are often aggregated over multiple dimensions. Matched sample indexes that use sequential Poisson sampling are a good example, as there are usually take-all and take-some strata in addition to, say, an industry classification. ```{r} ms_weights$stratum <- c("TS", "TA", "TS", "TS", "TS") ms_weights ``` The easiest way to deal with multiple digit-wise classifications is to turn them into one classification. In this example the "stratum" dimension comes before the "classification" dimension for the purposes of parental imputation. ```{r} classification_sps <- with(ms_weights, paste0(classification, stratum)) classification_sps ``` This classification can be expanded with the `expand_classification()` function as before, just with an extra instruction to say that the last "digit" in the classification is two characters wide, not one. ```{r} classification_sps <- expand_classification( classification_sps, width = c(1, 1, 2) ) pias_sps <- with( ms_weights, aggregation_structure(c(classification_sps, list(business)), weight) ) ``` The elemental indexes can now be aggregated according to this new aggregation structure. ```{r} index_sps <- aggregate(ms_elemental, pias_sps, na.rm = TRUE) index_sps ``` When a price index has many dimensions (e.g., industry, sampling stratum, region), it can be useful to interact the classifications for these different dimensions to get all possible aggregation structures. The aggregated index can then be re-aggregated to get index values for all dimensions. Continuing with the example, the industry and strata structures can be interacted to get two aggregation structures that can be used to re-aggregate `index_sps`. ```{r} interacted_hierarchy <- with( ms_weights, interact_classifications( expand_classification(classification), expand_classification(stratum) ) ) pias_sps2 <- lapply( interacted_hierarchy, \(x) aggregation_structure(c(x, ms_weights["business"]), ms_weights$weight) ) index_sps2 <- lapply(pias_sps2, \(x) aggregate(index_sps, x, include_ea = FALSE)) ``` The resulting indexes can be merged together to give an index that includes all combinations of industry and sampling stratum. ```{r} Reduce(merge, index_sps2) ``` ## Matrix aggregation Aggregating a price index can be done as a matrix operation. Although this approach is less flexible than the `aggregate()` method, it can be considerably faster for larger indexes. The key is to turn the aggregation structure into an aggregation matrix. ```{r} pias_matrix <- as.matrix(pias) pias_matrix ``` Multiplying this matrix by a matrix of fixed-base elemental indexes now computes the aggregate index in each time period. ```{r} pias_matrix %*% as.matrix(chain(ms_index[ms_weights$business])) ``` ## Computing the shadow of an index It's often useful to determine which higher-level index values are missing, and subsequently get imputed during aggregation (i.e., compute the shadow of an index). This is simple to do if there's an elemental index for each elemental aggregate in the aggregation structure. ```{r} ms_elemental2 <- elemental_index( ms_prices, relatives ~ period + factor(business, ms_weights$business), na.rm = TRUE ) ``` The idea is to simply aggregate an indicator for missingness to get a matrix that gives the share of missing elemental indexes for each higher-level index. ```{r} pias_matrix <- as.matrix(pias) > 0 pias_matrix %*% is.na(ms_elemental2) / rowSums(pias_matrix) ``` A value of 1 means that there are no non-missing elemental indexes, and that the value for this level of the index is imputed from its parent in the aggregation structure. A value below 1 but above zero means that some but not all elemental indexes are missing, and the index value for this level is based on the non-missing elemental indexes. A value of zero means there's no imputation for this level of the index. ## Non-parental imputation during aggregation Parental imputation is the usual way to impute missing index values during aggregation, and it is simple to do with `aggregate()`. In some cases, however, a business-level index may get imputed with the value for, say, another business, rather than for an entire group of businesses. The simplest way to do this sort of imputation is to alter the elemental indexes prior to aggregation. It is also possible to augment the aggregation structure with an imputation layer, but this is more complex. Suppose that missing index values for business B2 should be imputed as 1, rather than the value for group 11. This replacement can be done as if the index was a matrix. ```{r} ms_elemental2 <- ms_elemental ms_elemental2["B2", 2:3] <- 1 ms_elemental2 ``` The index can now be aggregated as usual. ```{r} aggregate(ms_elemental2, pias, na.rm = TRUE) ``` ## Alternate index-number formulas By default, the `elemental_index()` function calculates a Jevons index. Although this is the standard index-number formula for making elemental indexes, many other types of index-numbers are possible. The Carli index (equally-weighted arithmetic mean of price relatives) is the main competitor to the Jevons, and requires specifying the order of the index `r` when calling `elemental_index()`. An order of 1 corresponds to an arithmetic mean. ```{r} elemental_index(ms_prices, relatives ~ period + business, na.rm = TRUE, r = 1) ``` The Coggeshall index (equally-weighted harmonic mean of price relatives) is another competitor to the Jevons, but is seldom used in practice. Despite it being more exotic, it is just as easy to make by specifying an order `r` of -1. ```{r} elemental_index(ms_prices, relatives ~ period + business, na.rm = TRUE, r = -1) ``` The type of mean used to aggregate elemental indexes can be controlled in the same way in the call to `aggregate()`. The default makes an arithmetic index, but any type of generalized-mean index is possible. Many superlative indexes can be made by supplying unequal and (usually) time-varying weights when making the elemental indexes. These weights often come from information on quantities. ```{r} ms_prices2 <- transform(ms_prices, quantity = 10 - price) ``` The Tornqvist index is a popular superlative index-number formula, using average period-over-period value shares as the weights in a geometric mean. The only tricky part is making the weights from data on prices and quantities. ```{r} library(gpindex) tw <- grouped(index_weights("Tornqvist")) ms_prices2[c("back_price", "back_quantity")] <- ms_prices2[back_period(ms_prices2$period, ms_prices2$product), c("price", "quantity")] ms_prices2 <- na.omit(ms_prices2) # can't have NAs for Tornqvist weights ms_prices2$weight <- with( ms_prices2, tw( price, back_price, quantity, back_quantity, group = interaction(period, business) ) ) ``` As `elemental_index()` makes a geometric index by default, all that is needed to make a Tornqvist index is to provide the weights. ```{r} elemental_index( ms_prices2, price / back_price ~ period + business, weights = weight ) ``` ## Percent-change contributions It's often convenient to decompose an index into the (additive) contribution of each price relative, also known as the percent-change contributions. This can be done with the same work flow as in the previous examples, specifying `contrib = TRUE` when calling `elemental_index()`. ```{r} ms_elemental <- elemental_index( ms_prices, relatives ~ period + business, contrib = TRUE, na.rm = TRUE ) ``` As with index values, percent-change contributions for a given level of the index can be extracted as a matrix. ```{r} contrib(ms_elemental) ``` Aggregating the elemental indexes automatically aggregates percent-change contributions, so no extra steps are needed after the elemental indexes are made. ```{r} contrib(aggregate(ms_elemental, pias, na.rm = TRUE)) ``` ## Updating a basket The functions in **piar** are all designed to work within a "basket", which is a fancy way of saying within a given aggregation structure. Over time, however, aggregation structures change as the weights used to aggregate an index get updated, and new samples of businesses are drawn. The general approach to keep the time series going is to "chain" the index across baskets. It is easier to see how to chain an index over time with a simple example that just splits the `ms_prices` data in two. ```{r} ms_prices1 <- subset(ms_prices, period <= "202003") ms_prices2 <- subset(ms_prices, period >= "202003") ``` The index for the first basket can be calculate as usual. ```{r} ms_elemental1 <- elemental_index( ms_prices1, price_relative(price, period = period, product = product) ~ period + business, na.rm = TRUE ) ms_index1 <- aggregate(ms_elemental1, pias, na.rm = TRUE) ms_index1 ``` Nothing special needs to be done to make the elemental indexes for the new basket, but it's easier to remove the index values of 1 for quarter 3 2020. ```{r} ms_elemental2 <- ms_prices2 |> transform(rel = price_relative(price, period = period, product = product)) |> subset(period > "202003") |> elemental_index(rel ~ period + business, na.rm = TRUE) ``` Aggregating these elemental indexes, however, requires an aggregation structure. The results of the first example can be reproduced by simply "price updating" the original weights, then building the aggregation structure as usual. ```{r} ms_index2 <- aggregate(ms_elemental2, update(pias, ms_index1), na.rm = TRUE) ms_index2 ``` This produces two sets of period-over-period indexes that can be stacked together and then chained. ```{r} chain(stack(ms_index1, ms_index2)) ``` ## Carry-forward imputation The previous examples used parental imputation to both impute missing price relatives when calculating the elemental indexes, and to impute missing elemental indexes during aggregation. Another common imputation strategy when making elemental indexes is to carry forward the previous price to impute for missing prices, and parentally impute missing elemental indexes during aggregation. As the `elemental_index()` function accepts price relatives as its input, other types of imputations can be done prior to passing price relatives to this function. ```{r} ms_elemental2 <- ms_prices |> transform(imputed_price = carry_forward(price, period = period, product = product)) |> elemental_index( price_relative(imputed_price, period = period, product = product) ~ period + business, na.rm = TRUE ) ms_elemental2 ``` Aggregation follows the same steps as in the previous examples, with missing values set to be ignored in order to parentally impute missing elemental indexes. ```{r} ms_index <- aggregate(ms_elemental2, pias, na.rm = TRUE) ms_index ``` ## Calculating an index from multiple sources All of the examples so far have built an index from a single source of price data. In many cases the elemental indexes are built from multiple sources of data, either because no single source of data has the necessary coverage, or different index-number formulas are employed for different elemental aggregates. It is straightforward to merge index objects together, provided they're for the same time periods. To keep the example simple, suppose that `ms_prices` is split in two. ```{r} ms_prices1 <- subset(ms_prices, business %in% c("B1", "B2", "B3")) ms_prices2 <- subset(ms_prices, business == "B4") ``` Elemental indexes can be made for both groups separately with the usual recipe. Note that there is no data for business B4 in the first two periods, so the time periods need to be made explicit. ```{r} ms_elemental1 <- elemental_index( ms_prices1, price_relative(price, period = period, product = product) ~ period + business, na.rm = TRUE ) ms_elemental1 ms_elemental2 <- ms_prices2 |> transform(period = factor(period, levels = time(ms_elemental1))) |> elemental_index( price_relative(price, period = period, product = product) ~ period + business, na.rm = TRUE ) ms_elemental2 ``` Once the elemental indexes are made, they can be merged together and then aggregated. ```{r} aggregate(merge(ms_elemental1, ms_elemental2), pias, na.rm = TRUE) ``` A slightly more complex case is when some of the input data are already a price index. For example, suppose the index values for businesses B4 and B5 come from some outside process, and are taken as inputs. ```{r} ms_prices2 <- subset( as.data.frame(aggregate(ms_elemental, pias, na.rm = TRUE)), level %in% c("B4", "B5") ) ms_prices2 ``` All that is required is to pass the pre-existing indexes to `as_index()` to cast them into the correct form. This won't affect their values, but will allow them to be merged with the other elemental indexes, and aggregated. ```{r} ms_elemental2 <- as_index(ms_prices2) aggregate(merge(ms_elemental1, ms_elemental2), pias, na.rm = TRUE) ``` ## Aggregating a Paasche index All of the examples so far have used a single set of weights to aggregate an index. Although this is by far the most common case, there are situations where the aggregation weights change every period. The Paasche index is the notable example, as the weights for aggregation are the current-period revenue shares in each period. ```{r} weights <- data.frame( period = rep(c("202001", "202002", "202003", "202004"), each = 5), classification = ms_weights$classification, weight = 1:20 ) head(weights) ``` The only new tools needed to deal with time-varying weights are the `stack()` and `unstack()` functions. `stack()` appends a later index series onto an earlier one for the same levels, whereas `unstack()` pulls apart an index series for many periods into a collection of one-period indexes. The first step to making a Paasche index is to unstack the elemental indexes into a list of elemental indexes for each period. (Trying to make the elemental indexes period-by-period can be dangerous when there are missing values.) ```{r} ms_elemental <- unstack(ms_elemental) ms_elemental ``` The second step is to make a sequence of aggregation structures for each set of weights. ```{r} pias <- with( weights, Map(aggregation_structure, list(hierarchy), split(weight, period)) ) ``` Making the Paasche index for each period is now just a case of mapping the `aggregate()` function to each elemental index and aggregation structure, and then reducing the result with the `stack()` function. ```{r} paasche <- Reduce( stack, Map(aggregate, ms_elemental, pias, na.rm = TRUE, r = -1) ) paasche ``` ## Making a Fisher index With a Paasche index in hand, it is now trivial to make a Fisher index by first making the period-over-period Laspeyres index, and then doing a simple matrix operation. ```{r} laspeyres <- Reduce( stack, Map(aggregate, ms_elemental, pias[c(1, 1, 2, 3)], na.rm = TRUE) ) fisher <- sqrt(as.matrix(laspeyres) * as.matrix(paasche)) fisher ``` Percent-change contributions can similarly be computed with a matrix operation. ```{r} geometric_weights <- transmute_weights(0, 1) w <- mapply( \(x, y) scale_weights(geometric_weights(c(x, y))), as.numeric(laspeyres[1]), as.numeric(paasche[1]) ) laspeyres_contrib <- contrib(laspeyres) paasche_contrib <- contrib(paasche) fisher_contrib <- w[1, col(laspeyres_contrib)] * laspeyres_contrib + w[2, col(paasche_contrib)] * paasche_contrib fisher_contrib ``` Despite being a matrix, the resulting Fisher index can be chained just like any other index. ```{r} chain(fisher) ``` A chained Fisher index can also be made by first chaining the Laspeyres and Paasche indexes, then taking the geometric mean. ```{r} sqrt(as.matrix(chain(laspeyres)) * as.matrix(chain(paasche))) ```