Fast Design of Risk Parity Portfolios

Zé Vinícius and Daniel P. Palomar
The Hong Kong University of Science and Technology (HKUST)

2019-10-05



This vignette illustrates the design of risk parity portfolios, which are widely used by practitioners in the financial industry, with the package riskParityPortfolio, giving a description of the algorithms and comparing the performance against existing packages.

Quick Start

Let’s start by loading the package:

The simplest use is for the vanilla risk parity portfolio:

To obtain the naive diagonal solution, also known as inverse volatility portfolio, make use of the formulation argument:

For a more realistic formulation including the expected return in the objective and box constraints: \[\begin{array}{ll} \underset{\mathbf{w}}{\textsf{minimize}} & \sum_{i,j=1}^{N}\left(\dfrac{w_{i}\left(\boldsymbol{\Sigma}\mathbf{w}\right)_{i}}{b_i}-\dfrac{w_{j}\left(\boldsymbol{\Sigma}\mathbf{w}\right)_{j}}{b_j}\right)^{2} \color{blue}{- \lambda \;\mathbf{w}^T\boldsymbol{\mu}}\\ \textsf{subject to} & \mathbf{w} \ge \mathbf{0}, \quad\mathbf{1}^T\mathbf{w}=1, \quad\color{blue}{\mathbf{l}\le\mathbf{w}\le\mathbf{u}}. \end{array}\]

To plot and compare the results:

What is a Risk Parity Portfolio?

Signal model

Let us denote by \(\mathbf{r}_{t}\) the vector of the returns of \(N\) assets at time \(t\) and suppose they follow an i.i.d. distribution (which is not a totally accurate assumption, but it is widely adopted, nonetheless) with mean \(\boldsymbol{\mu}\) and covariance matrix \(\boldsymbol{\Sigma}\).

The portfolio vector \(\mathbf{w}\) denotes the normalized dollar weights allocated to the \(N\) assets, such that \(\mathbf{1}^{T}\mathbf{w}=1\), and the portfolio return is then \(r_{t}^{\sf portf} = \mathbf{w}^{T}\mathbf{r}_{t}\), with expected return \(\mathbf{w}^{T}\boldsymbol{\mu}\) and variance \(\mathbf{w}^{T}\boldsymbol{\Sigma}\mathbf{w}\).

Modern Portfolio Theory

In 1952, Markowitz proposed in his seminal paper [1] to find a trade-off between the portfolio expected return and its risk measured by the variance: \[\begin{array}{ll} \underset{\mathbf{w}}{\textsf{maximize}} & \mathbf{w}^{T}\boldsymbol{\mu}-\lambda\mathbf{w}^{T}\boldsymbol{\Sigma}\mathbf{w}\\ \textsf{subject to} & \mathbf{w} \ge \mathbf{0}, \quad\mathbf{1}^T\mathbf{w}=1, \end{array}\] where \(\lambda\) is a parameter that controls how risk-averse the investor is.

Markowitz’s portfolio has been heavily critized for over half a century and has never been fully embraced by practitioners, among many reasons because:

  • it only considers the risk of the portfolio as a whole and ignores the risk diversification (i.e., concentrates too much risk in few assets, which was observed in the 2008 financial crisis)
  • it is highly sensitive to the estimation errors in the parameters (i.e., small estimation errors in the parameters may change the designed portfolio drastically).

Although portfolio management did not change much during the 40 years after the seminal works of Markowitz and Sharpe, the development of risk budgeting techniques marked an important milestone in deepening the relationship between risk and asset management.

From “dollar” to risk diversification

Since the global financial crisis in 2008, risk management has particularly become more important than performance management in portfolio optimization. Indeed, risk parity became a popular financial model after the global financial crisis in 2008 [2], [3].

The alternative risk parity portfolio design has been receiving significant attention from both the theoretical and practical sides [4], [5] because it: (1) diversifies the risk, instead of the capital, among the assets and (2) is less sensitive to parameter estimation errors.

Nowadays, pension funds and institutional investors are using this approach in the development of smart indexing and the redefinition of long-term investment policies.

Risk parity is an approach to portfolio management that focuses on allocation of risk rather than allocation of capital. The risk parity approach asserts that when asset allocations are adjusted to the same risk level, the portfolio can achieve a higher Sharpe ratio and can be more resistant to market downturns.

While the minimum variance portfolio tries to minimize the variance (with the disadvantage that a few assets may be the ones contributing most to the risk), the risk parity portfolio tries to constrain each asset (or asset class, such as bonds, stocks, real estate, etc.) to contribute equally to the portfolio overall volatility.

The term “risk parity” was coined by Edward Qian from PanAgora Asset Management [2] and was then adopted by the asset management industry. Some of its theoretical components were developed in the 1950s and 1960s, but the first risk parity fund, called the “All Weather” fund, was pioneered by Bridgewater Associates LP in 1996. The interest in the risk parity approach has increased since the financial crisis in the late 2000s as the risk parity approach fared better than portfolios designed in traditional fashions.

Some portfolio managers have expressed skepticism with risk parity, while others point to its performance during the financial crisis of 2007-2008 as an indication of its potential success.

The idea of the risk parity portfolio (RPP), aka equal risk portfolio (ERP), is to “equalize” the risk so that the risk contribution of every asset is equal, rather than simply having an equal capital allocation like the equally weighted portfolio (EWP):

Risk parity portfolio

From Euler’s theorem, the volatility of the portfolio \(\sigma\left(\mathbf{w}\right)=\sqrt{\mathbf{w}^{T}\boldsymbol{\Sigma}\mathbf{w}}\) can be decomposed as \[\sigma\left(\mathbf{w}\right)=\sum_{i=1}^{N}w_i\frac{\partial\sigma}{\partial w_i} = \sum_{i=1}^N\frac{w_i\left(\boldsymbol{\Sigma}\mathbf{w}\right)_{i}}{\sqrt{\mathbf{w}^{T}\boldsymbol{\Sigma}\mathbf{w}}}.\]

The risk contribution (RC) from the \(i\)th asset to the total risk \(\sigma(\mathbf{w})\) is defined as \[{\sf RC}_i =\frac{w_i\left(\boldsymbol{\Sigma}\mathbf{w}\right)_i}{\sqrt{\mathbf{w}^{T}\boldsymbol{\Sigma}\mathbf{w}}}\] which satisfies \(\sum_{i=1}^{N}{\sf RC}_i=\sigma\left(\mathbf{w}\right)\).

The relative risk contribution (RRC) is a normalized version: \[{\sf RRC}_i = \frac{w_i\left(\boldsymbol{\Sigma}\mathbf{w}\right)_i}{\mathbf{w}^{T}\boldsymbol{\Sigma}\mathbf{w}}\] so that \(\sum_{i=1}^{N}{\sf RRC}_i=1\).

The risk parity portfolio (RPP) attemps to “equalize” the risk contributions: \[{\sf RC}_i = \frac{1}{N}\sigma(\mathbf{w})\quad\text{or}\quad{\sf RRC}_i = \frac{1}{N}.\]

More generally, the risk budgeting portfolio (RBP) attemps to allocate the risk according to the risk profile determined by the weights \(\mathbf{b}\) (with \(\mathbf{1}^T\mathbf{b}=1\) and \(\mathbf{b}\ge \mathbf{0}\)): \[{\sf RC}_i = b_i \sigma(\mathbf{w})\quad\text{or}\quad{\sf RRC}_i = b_i.\]

In practice, one can express the condition \({\sf RC}_i = \frac{1}{N}\sigma(\mathbf{w})\) in different equivalent ways such as \[w_i(\Sigma \mathbf{w})_{i} = w_j(\Sigma \mathbf{w})_{j}, \quad\forall i, j.\] The budget condition \({\sf RC}_i = b_i \sigma(\mathbf{w})\) can also be expressed as \[w_i (\Sigma \mathbf{w})_i = b_i \mathbf{w}^{T}\Sigma\mathbf{w}, \quad\forall i.\]

Solving the Risk Parity Portfolio (RPP)

Naive diagonal formulation

Assuming that the assets are uncorrelated, i.e., that \(\boldsymbol{\Sigma}\) is diagonal, and simply using the volatilities \(\boldsymbol{\sigma} = \sqrt{{\sf diag(\boldsymbol{\Sigma})}}\), one obtains \[\mathbf{w} = \frac{\boldsymbol{\sigma}^{-1}}{\mathbf{1}^T\boldsymbol{\sigma}^{-1}}\] or, more generally, \[\mathbf{w} = \frac{\sqrt{\mathbf{b}}\odot\boldsymbol{\sigma}^{-1}}{\mathbf{1}^T\left(\sqrt{\mathbf{b}}\odot\boldsymbol{\sigma}^{-1}\right)}.\] However, for non-diagonal \(\boldsymbol{\Sigma}\) or with other additional constraints or objective function terms, a closed-form solution does not exist and some optimization procedures have to be constructed. The previous diagonal solution can always be used and is called naive risk budgeting portfolio.

Vanilla convex formulation

Suppose we only have the constraints \(\mathbf{1}^T\mathbf{w}=1\) and \(\mathbf{w} \ge \mathbf{0}\). Then, after the change of variable \(\mathbf{x}=\mathbf{w}/\sqrt{\mathbf{w}^{T}\boldsymbol{\Sigma}\mathbf{w}}\), the equations \(w_i (\Sigma \mathbf{w})_i = b_i \mathbf{w}^{T}\Sigma\mathbf{w}\) become \(x_i\left(\boldsymbol{\Sigma}\mathbf{x}\right)_i = b_i\) or, more compactly in vector form, as \[\boldsymbol{\Sigma}\mathbf{x} = \mathbf{b}/\mathbf{x}\] with \(\mathbf{x} \ge \mathbf{0}\) and we can always recover the portfolio by normalizing: \(\mathbf{w} = \mathbf{x}/(\mathbf{1}^T\mathbf{x})\).

At this point, one could use a nonlinear multivariate root finder for \(\boldsymbol{\Sigma}\mathbf{x} = \mathbf{b}/\mathbf{x}\). For example, in R we can use the package rootSolve.

With the goal of designing risk budget portfolios, Spinu proposed in [6] to solve the following convex optimization problem: \[\underset{\mathbf{x}\ge\mathbf{0}}{\textsf{minimize}} \quad \frac{1}{2}\mathbf{x}^{T}\boldsymbol{\Sigma}\mathbf{x} - \sum_{i=1}^{N}b_i\log(x_i),\] where the portfolio can be recovered as \(\mathbf{w} = \mathbf{x}/(\mathbf{1}^T\mathbf{x})\).

Indeed, Spinu realized in [6] that precisely the risk budgeting equation \(\boldsymbol{\Sigma}\mathbf{x} = \mathbf{b}/\mathbf{x}\) corresponds to the gradient of the convex function \(f(\mathbf{x}) = \frac{1}{2}\mathbf{x}^{T}\boldsymbol{\Sigma}\mathbf{x} - \mathbf{b}^T\log(\mathbf{x})\) set to zero: \[\nabla f(\mathbf{x}) = \boldsymbol{\Sigma}\mathbf{x} - \mathbf{b}/\mathbf{x} = \mathbf{0}.\]

Thus, a convenient way to solve the problem is by solving the following convex optimization problem: \[\underset{\mathbf{x}\ge\mathbf{0}}{\textsf{minimize}} \quad \frac{1}{2}\mathbf{x}^{T}\boldsymbol{\Sigma}\mathbf{x} - \mathbf{b}^T\log(\mathbf{x})\] which has optimality condition \(\boldsymbol{\Sigma}\mathbf{x} = \mathbf{b}/\mathbf{x}\).

Such solution can be computed using a general-purpose convex optimization package, but faster algorithms such as the Newton method and the cyclical coordinate descent method, employed in [6] and [7], are implemented in this package. (Yet another convex formulation was proposed in [8].)

General nonconvex formulation

The previous methods are based on a convex reformulation of the problem so they are guaranteed to converge to the optimal risk budgeting solution. However, they can only be employed for the simplest risk budgeting formulation with a simplex constraint set (i.e., \(\mathbf{1}^T\mathbf{w}=1\) and \(\mathbf{w} \ge \mathbf{0}\)). They cannot be used if

  • we have other constraints like allowing shortselling or box constraints: \(l_i \le w_i \le u_i\)
  • on top of the risk budgeting constraints \(w_i\left(\boldsymbol{\Sigma}\mathbf{w}\right)_i = b_i \;\mathbf{w}^{T}\boldsymbol{\Sigma}\mathbf{w}\) we have other objectives like maximizing the expected return \(\mathbf{w}^T\boldsymbol{\mu}\) or minimizing the overall variance \(\mathbf{w}^T\boldsymbol{\Sigma}\mathbf{w}\).

For those more general cases, we need more sophisticated formulations, which unfortunately are not convex. The idea is to try to achieve equal risk contributions \({\sf RC}_i = \frac{w_i\left(\boldsymbol{\Sigma}\mathbf{w}\right)_i}{\sqrt{\mathbf{w}^{T}\boldsymbol{\Sigma}\mathbf{w}}}\) by penalizing the differences between the terms \(w_{i}\left(\boldsymbol{\Sigma}\mathbf{w}\right)_{i}\).

There are many reformulations possible. For illustrative purposes, one such formulation is \[\begin{array}{ll} \underset{\mathbf{w}}{\textsf{minimize}} & \sum_{i,j=1}^{N}\left(w_{i}\left(\boldsymbol{\Sigma}\mathbf{w}\right)_{i}-w_{j}\left(\boldsymbol{\Sigma}\mathbf{w}\right)_{j}\right)^{2} \color{blue}{- \;F(\mathbf{w})}\\ \textsf{subject to} & \mathbf{w} \ge \mathbf{0}, \quad\mathbf{1}^T\mathbf{w}=1, \quad\color{blue}{\mathbf{w}\in\cal{W}} \end{array}\] where \(F(\mathbf{w})\) denotes some additional objective function and \(\cal{W}\) denotes an arbitrary convex set of constraints. More expressions for the risk concentration terms are listed in Appendix I.

The way to solve this general problem is derived in [9], [10] and is based on a powerful optimization framework named successive convex approximation (SCA) [11]. See Appendix II for a general idea of the method.

Using the Package riskParityPortfolio

A simple code example on how to design a risk parity portfolio is as follows:

As presented earlier, risk parity portfolios are designed in such a way as to ensure equal risk contribution from the assests, which can be noted in the chart above.

Modern Risk Parity Portfolio

The design of risk parity portfolios as solved by [6] and [7] is of paramount importance both for academia and industry. However, practitioners would like the ability to include additional constraints and objective terms desired in practice, such as the mean return, box constraints, etc. In such cases, the risk-contribution constraints cannot be met exactly due to the trade-off among different objectives or additional constraints.

RPP with additional expected return term

Let us explore, for instance, the effect of including the expected return as an additional objective in the optimization problem. The problem can be formulated as \[\begin{array}{ll} \underset{\mathbf{w}}{\textsf{minimize}} & R(\mathbf{w}) - \lambda_{\mu} \mathbf{w}^{T}\boldsymbol{\mu}\\ \textsf{subject to} & \mathbf{1}^T\mathbf{w}=1, \mathbf{w} \ge \mathbf{0}, \end{array}\] where \(R(\mathbf{w}) = \sum_{i=1}^{N}\left(w_{i}\left(\boldsymbol{\Sigma}\mathbf{w}\right)_i - b_i\mathbf{w}^T\boldsymbol{\Sigma}\mathbf{w}\right)^{2}\) is the risk concentration function or risk parity function, \(\mathbf{w}^{T}\boldsymbol{\mu}\) is the expected return, and \(\lambda_{\mu}\) is a trade-off parameter.

RPP with additional variance term

Similarly, the riskParityPortfolio package allows us to include the variance \(\mathbf{w}^{T}\boldsymbol{\Sigma}\mathbf{w}\) as an objective term: \[\begin{array}{ll} \underset{\mathbf{w}}{\textsf{minimize}} & R(\mathbf{w}) + \lambda_{\sf var} \mathbf{w}^{T}\boldsymbol{\Sigma}\mathbf{w}\\ \textsf{subject to} & \mathbf{1}^T\mathbf{w}=1, \mathbf{w} \ge \mathbf{0}, \end{array}\] where \(\lambda_{\sf var}\) is a trade-off parameter.

In the code, that can be done by passing a positive value to the parameter lmd_var. Let’s check the following illustrative example that depicts the trade-off between volatility and risk parity:

RPP with general linear constraints

In version 2.0, we added support for general linear constraints, i.e., riskParityPortfolio is now able to solve the following problem: \[\begin{array}{ll} \underset{\mathbf{w}}{\textsf{minimize}} & R(\mathbf{w}) + \lambda F(\mathbf{w})\\ \textsf{subject to} & \mathbf{C}\mathbf{w} = \mathbf{c},~~\mathbf{D}\mathbf{w} \leq \mathbf{d}. \end{array}\]

Users interested in the details of the algorithm used to solve this problems are refered to Section V (Advanced Solving Approaches) of [9]. In summary, the algorithm fits well within the SCA framework, while preserving speed and scalability.

It was recently mentioned by [12] that the problem of designing risk parity portfolios with general constraints is harder than it seems. Indeed, [12] shows that, after imposing general linear constraints, the property of equal risk contributions (ERC) is unlikely to be preserved among the assets affected by the constraints.

Let’s check out a numerical example from [12]. Consider we have a universe of eight assets and we would like to design a risk parity portfolio \(\mathbf{w}\) satisfying the following constraints: \[ w_5 + w_6 + w_7 + w_8 \geq 30\%, \] \[ w_2 + w_6 \geq w_1 + w_5 + 5\%, \] and \[ \sum_i w_i = 100\% . \]

As we can observe, the risk contributions are somewhat clustered according to the relationship among assets defined by the linear constraints, as mentioned by [12].

The linear constraints are obviously satisfied:

The results obtained by our implementation agree with those reported by [12].

A pratical example using FAANG price data

In [13], the author showed how to build a risk parity portfolio for FAANG companies (Facebook, Apple, Amazon, Netflix, and Google) using riskParityPortfolio. Here, we will attempt to replicate their backtest results, but using the package portfolioBacktest instead.

library(xts)
library(portfolioBacktest)
library(riskParityPortfolio)

# download price data
faang_data <- stockDataDownload(c("GOOG", "NFLX", "AAPL", "AMZN", "FB"),
                                from = "2014-01-01", to = "2019-06-25")

# define portfolios to be backtested
# risk parity portfolio
risk_parity <- function(dataset) {
  prices <- dataset$adjusted
  log_returns <- diff(log(prices))[-1]
  return(riskParityPortfolio(cov(log_returns))$w)
}

# tangency portfolio (maximum sharpe ratio)
library(quadprog)
max_sharpe_ratio <- function(dataset) {
    prices <- dataset$adjusted
    log_returns <- diff(log(prices))[-1]
    N <- ncol(prices)
    Sigma <- cov(log_returns)
    mu <- colMeans(log_returns)
    if (all(mu <= 1e-8))
        return(rep(0, N))
    Dmat <- 2 * Sigma
    Amat <- diag(N)
    Amat <- cbind(mu, Amat)
    bvec <- c(1, rep(0, N))
    dvec <- rep(0, N)
    res <- solve.QP(Dmat = Dmat, dvec = dvec, Amat = Amat, bvec = bvec, meq = 1)
    w <- res$solution
    return(w/sum(w))
}

# call portfolioBacktest and benchmark against the uniform (1/N) portfolio
bt <- portfolioBacktest(list("risk parity portfolio" = risk_parity,
                             "tangency portfolio"    = max_sharpe_ratio),
                        list(faang_data),
                        T_rolling_window = 12*20, 
                        optimize_every = 3*20, rebalance_every = 3*20)

# dates of the designed portfolios
index(bt$tangency$data1$w_designed)
#>  [1] "2014-12-12" "2015-03-12" "2015-06-08" "2015-09-01" "2015-11-25" "2016-02-24" "2016-05-19" "2016-08-15"
#>  [9] "2016-11-08" "2017-02-06" "2017-05-03" "2017-07-28" "2017-10-23" "2018-01-19" "2018-04-17" "2018-07-12"
#> [17] "2018-10-05" "2019-01-03" "2019-04-01"

# check performance summary
backtestSummary(bt)$performance
#>                   risk parity portfolio tangency portfolio
#> Sharpe ratio                  1.3800144          0.8787596
#> max drawdown                  0.3062046          0.3516856
#> annual return                 0.3117200          0.2324203
#> annual volatility             0.2258817          0.2644868
#> Sterling ratio                1.0180122          0.6608751
#> Omega ratio                   1.2710283          1.1793760
#> ROT (bps)                  8310.1199557        793.0188434

# plot cumulative returns chart
backtestChartCumReturns(bt)

Indeed, the charts are quite similar to those reported in [13]. Note that the weights evolution of the Markowitz’s tangency portfolio is quite sensitive to minute changes in the rebalance/optimization dates, whereas no significant difference is noticed for the risk parity portfolio weights.

Comparison with other Packages

Others R packages with the goal of designing risk parity portfolios do exist, such as FinCovRegularization, cccp, and RiskPortfolios. Let’s check how do they perform against riskParityPortfolio. (Note that other packages like FRAPO use cccp under the hood.)

Depending on the condition number of the covariance matrix, we found that the packages FinCovRegularization and RiskPortfolios may fail unexpectedly. Apart from that, all functions perform similarly.

Now, let’s see a comparison, in terms of computational time, of our cyclical coordinate descent implementation against the rp() function from the cccp package and the optimalPortfolio() function from the RiskPortfolios package. (For a fair comparison, instead of calling our function riskParityPortfolio(), we call directly the core internal function risk_parity_portfolio_ccd_spinu(), which only computes the risk parity weights, just like rp() and optimalPortfolio().)

As it can be observed, our implementation is orders of maginitude faster than the interior-point method used by cccp and the formulation used by RiskPortfolios.

Appendix I - Risk concentration formulations

In general, with different constraints and objective functions, exact parity cannot be achieved and one needs to define a risk term to be minimized: \(R(\mathbf{w}) = \sum_{i=1}^{N}\left(g_{i}\left(\mathbf{w}\right)\right)^{2}\), where the \(g_{i}\)’s denote the different risk contribution errors, e.g., \(g_{i} = w_{i}\left(\boldsymbol{\Sigma}\mathbf{w}\right)_i - b_i\mathbf{w}^T\boldsymbol{\Sigma}\mathbf{w}\). A double-index summation can also be used: \(R(\mathbf{w}) = \sum_{i,j=1}^{N}\left(g_{ij}\left(\mathbf{w}\right)\right)^{2}\).

We consider the risk formulations as presented in [9], [10]. They can be passed through the keyword argument formulation in the function riskParityPortfolio().

The name of the formulations and their mathematical expressions are presented as follows.

Formulation “rc-double-index”: \[R(\mathbf{w}) = \sum_{i,j=1}^{N}\left(w_{i}\left(\boldsymbol{\Sigma}\mathbf{w}\right)_{i}-w_{j}\left(\boldsymbol{\Sigma}\mathbf{w }\right)_{j}\right)^{2}\]

Formulation “rc-vs-theta”: \[ R(\mathbf{w},\theta) = \sum_{i=1}^{N}\left(w_{i}\left(\boldsymbol{\Sigma}\mathbf{w}\right)_i - \theta \right)^{2} \]

Formulation “rc-over-var-vs-b”: \[ R(\mathbf{w}) = \sum_{i=1}^{N}\left(\frac{w_{i}\left(\boldsymbol{\Sigma}\mathbf{w}\right)_i}{\mathbf{w}^T\boldsymbol{\Sigma}\mathbf{w}}-b_i\right)^{2} \]

Formulation “rc-over-b double-index”: \[ R(\mathbf{w}) = \sum_{i,j=1}^{N}\left(\frac{w_i\left(\boldsymbol{\Sigma}\mathbf{w}\right)_i}{b_i} - \frac{w_j\left(\boldsymbol{\Sigma}\mathbf{w}\right)_j}{b_j}\right)^{2} \]

Formulation “rc-vs-b-times-var”: \[ R(\mathbf{w}) = \sum_{i=1}^{N}\left(w_{i}\left(\boldsymbol{\Sigma}\mathbf{w}\right)_i - b_i\mathbf{w}^T\boldsymbol{\Sigma}\mathbf{w}\right)^{2} \]

Formulation “rc-over-sd vs b-times-sd”: \[ R(\mathbf{w}) = \sum_{i=1}^{N}\left(\frac{w_{i}\left(\boldsymbol{\Sigma}\mathbf{w}\right)_i}{\sqrt{\mathbf{w}^T\boldsymbol{\Sigma}\mathbf{w}}}-b_i\sqrt{\mathbf{w}^T\boldsymbol{\Sigma}\mathbf{w}}\right)^{2} \]

Formulation “rc-over-b vs theta”: \[ R(\mathbf{w},\theta) = \sum_{i=1}^{N}\left(\frac{w_{i}\left(\boldsymbol{\Sigma}\mathbf{w}\right)_i}{b_i} - \theta \right)^{2} \]

Formulation “rc-over-var”: \[ R(\mathbf{w}) = \sum_{i=1}^{N}\left(\frac{w_{i}\left(\boldsymbol{\Sigma}\mathbf{w}\right)_i}{\mathbf{w}^T\boldsymbol{\Sigma}\mathbf{w}}\right)^{2} \]

Appendix II - Numerical algorithms for the risk parity portfolio

In this appendix we describe the algorithms implemented for both the vanilla risk parity portfolio and the modern risk parity portfolio that may contain additional objective terms and constraints.

Algorithms for the vanilla risk parity formulation

We now describe the implementation of the Newton method and the cyclical (coordinate) descent algorithm for the vanilla risk parity formulations presented in [6] and [7].

Consider the risk budgeting equations \[w_i\left(\boldsymbol{\Sigma}\mathbf{w}\right)_i = b_i \;\mathbf{w}^{T}\boldsymbol{\Sigma}\mathbf{w}, \qquad i=1,\ldots,N\] with \(\mathbf{1}^T\mathbf{w}=1\) and \(\mathbf{w} \ge \mathbf{0}\).

If we define \(\mathbf{x}=\mathbf{w}/\sqrt{\mathbf{w}^{T}\boldsymbol{\Sigma}\mathbf{w}}\), then we can rewrite the risk budgeting equations compactly as \[\boldsymbol{\Sigma}\mathbf{x} = \mathbf{b}/\mathbf{x}\] with \(\mathbf{x} \ge \mathbf{0}\) and we can always recover the portfolio by normalizing: \(\mathbf{w} = \mathbf{x}/(\mathbf{1}^T\mathbf{x})\).

Spinu [6] realized that precisely that equation corresponds to the gradient of the function \(f(\mathbf{x}) = \frac{1}{2}\mathbf{x}^{T}\boldsymbol{\Sigma}\mathbf{x} - \mathbf{b}^T\log(\mathbf{x})\) set to zero, which is the optimality condition for its minimization.

So we can finally formulate the risk budgeting problem as the following convex optimization problem: \[\underset{\mathbf{x}\ge\mathbf{0}}{\textsf{minimize}} \quad \frac{1}{2}\mathbf{x}^{T}\boldsymbol{\Sigma}\mathbf{x} - \mathbf{b}^T\log(\mathbf{x}).\]

Roncalli et al. [7] proposed a slightly different formulation (also convex): \[\underset{\mathbf{x}\ge\mathbf{0}}{\textsf{minimize}} \quad \sqrt{\mathbf{x}^{T}\boldsymbol{\Sigma}\mathbf{x}} - \mathbf{b}^T\log(\mathbf{x}).\]

Unfortunately, even though these two problems are convex, they do not conform with the typical classes that most solvers embrace (i.e., LP, QP, QCQP, SOCP, SDP, GP, etc.).

Nevertheless, there are several simple iterative algorithms that can be used, like the Newton method and the cyclical coordinate descent algorithm.

Newton method
The Newton method obtains the iterates based on the gradient \(\nabla f\) and the Hessian \({\sf H}\) of the objective function \(f(\mathbf{x})\) as follows: \[\mathbf{x}^{(k+1)} = \mathbf{x}^{(k)} - {\sf H}^{-1}(\mathbf{x}^{(k)})\nabla f(\mathbf{x}^{(k)})\]

In practice, one may need to use the backtracking method to properly adjust the step size of each iteration [14].

  • For the function \(f(\mathbf{x}) = \frac{1}{2}\mathbf{x}^{T}\boldsymbol{\Sigma}\mathbf{x} - \mathbf{b}^T\log(\mathbf{x})\), the gradient and Hessian are given by \[\begin{array}{ll} \nabla f(\mathbf{x}) &= \boldsymbol{\Sigma}\mathbf{x} - \mathbf{b}/\mathbf{x}\\ {\sf H}(\mathbf{x}) &= \boldsymbol{\Sigma} + {\sf Diag}(\mathbf{b}/\mathbf{x}^2). \end{array}\]

  • For the function \(f(\mathbf{x}) = \sqrt{\mathbf{x}^{T}\boldsymbol{\Sigma}\mathbf{x}} - \mathbf{b}^T\log(\mathbf{x})\), the gradient and Hessian are given by \[\begin{array}{ll} \nabla f(\mathbf{x}) &= \boldsymbol{\Sigma}\mathbf{x}/\sqrt{\mathbf{x}^{T}\boldsymbol{\Sigma}\mathbf{x}} - \mathbf{b}/\mathbf{x}\\ {\sf H}(\mathbf{x}) &= \left(\boldsymbol{\Sigma} - \boldsymbol{\Sigma}\mathbf{x}\mathbf{x}^T\boldsymbol{\Sigma}/\mathbf{x}^{T}\boldsymbol{\Sigma}\mathbf{x}\right) / \sqrt{\mathbf{x}^{T}\boldsymbol{\Sigma}\mathbf{x}} + {\sf Diag}(\mathbf{b}/\mathbf{x}^2). \end{array}\]

Cyclical coordinate descent algorithm
This method simply minimizes in a cyclical manner with respect to each element of the variable \(\mathbf{x}\) (denote \(\mathbf{x}_{-i}=[x_1,\ldots,x_{i-1},0,x_{i+1},\ldots,x_N]^T\)), while helding the other elements fixed.

  • For the function \(f(\mathbf{x}) = \frac{1}{2}\mathbf{x}^{T}\boldsymbol{\Sigma}\mathbf{x} - \mathbf{b}^T\log(\mathbf{x})\), the minimization w.r.t. \(x_i\) is \[\underset{x_i>0}{\textsf{minimize}} \quad \frac{1}{2}x_i^2\boldsymbol{\Sigma}_{ii} + x_i(\mathbf{x}_{-i}^T\boldsymbol{\Sigma}_{\cdot,i}) - b_i\log{x_i}\] with gradient \(\nabla_i f = x_i\boldsymbol{\Sigma}_{ii} + (\mathbf{x}_{-i}^T\boldsymbol{\Sigma}_{\cdot,i}) - b_i/x_i\). Setting the gradient to zero gives us the second order equation \[x_i^2\boldsymbol{\Sigma}_{ii} + x_i(\mathbf{x}_{-i}^T\boldsymbol{\Sigma}_{\cdot,i}) - b_i = 0\] with positive solution given by \[x_i^\star = \frac{-(\mathbf{x}_{-i}^T\boldsymbol{\Sigma}_{\cdot,i})+\sqrt{(\mathbf{x}_{-i}^T\boldsymbol{\Sigma}_{\cdot,i})^2+ 4\boldsymbol{\Sigma}_{ii} b_i}}{2\boldsymbol{\Sigma}_{ii}}.\]

  • The derivation for the function \(f(\mathbf{x}) = \sqrt{\mathbf{x}^{T}\boldsymbol{\Sigma}\mathbf{x}} - \mathbf{b}^T\log(\mathbf{x})\) follows similarly. The update for \(x_i\) is given by \[x_i^\star = \frac{-(\mathbf{x}_{-i}^T\boldsymbol{\Sigma}_{\cdot,i})+\sqrt{(\mathbf{x}_{-i}^T\boldsymbol{\Sigma}_{\cdot,i})^2+ 4\boldsymbol{\Sigma}_{ii} b_i \sqrt{\mathbf{x}^{T}\boldsymbol{\Sigma}\mathbf{x}}}}{2\boldsymbol{\Sigma}_{ii}}.\]

Successive convex approximation algorithm for the modern risk parity formulation

Many practical formulations deployed to design risk parity portfolios lead to nonconvex problems, specially when additional objective terms such as mean return or variance, or additional constraints, namely, shortselling, are taken into account. To circumvent the complications that arise in such formulations, Feng & Palomar [9] proposed a method called sucessive convex approximation (SCA). The SCA method works by convexifying the risk concentration term at some pre-defined point, casting the nonconvex problem into a much simpler strongly convex optimization problem. This procedure is then iterated until convergence is achieved. It is important to highlight that the SCA method always converges to a stationary point.

At the \(k\)-th iteration, the SCA method aims to solve \[\begin{align}\begin{array}{ll} \underset{\mathbf{w}}{\textsf{minimize}} & \sum_{i=1}^{n}\left(g_i(\mathbf{w}^k) + (\nabla g_i(\mathbf{w}^{k}))^{T}(\mathbf{w} - \mathbf{w}^{k})\right)^2 + \tau ||\mathbf{w} - \mathbf{w}^{k}||^{2}_{2} + \lambda F(\mathbf{w})\\ \textsf{subject to} & \mathbf{w}^{T}\mathbf{1} = 1, \mathbf{w} \in \mathcal{W}, \end{array}\end{align}\] where the first order Taylor expasion of \(g_i(\mathbf{w})\) has been used.

After some mathematical manipulations described in detail in [9], the optimization problem above can be rewritten as \[\begin{align}\begin{array}{ll} \underset{\mathbf{w}}{\textsf{minimize}} & \dfrac{1}{2}\mathbf{w}^{T}\mathbf{Q}^{k}\mathbf{w} + \mathbf{w}^{T}\mathbf{q}^{k} + \lambda F(\mathbf{w})\\ \textsf{subject to} & \mathbf{w}^{T}\mathbf{1} = 1, \mathbf{w} \in \mathcal{W}, \end{array}\end{align}\] where \[\begin{align} \mathbf{Q}^{k} & \triangleq 2(\mathbf{A}^{k})^{T}\mathbf{A}^{k} + \tau \mathbf{I},\\ \mathbf{q}^{k} & \triangleq 2(\mathbf{A}^{k})^{T}\mathbf{g}(\mathbf{w}^{k}) - \mathbf{Q}^{k}\mathbf{w}^{k}, \end{align}\] and \[\begin{align} \mathbf{A}^{k} & \triangleq \left[\nabla_{\mathbf{w}} g_{1}\left(\mathbf{w}^{k}\right), ..., \nabla_{\mathbf{w}} g_{n}\left(\mathbf{w}^{k}\right)\right]^{T} \\ \mathbf{g}\left(\mathbf{w}^{k}\right) & \triangleq \left[g_{1}\left(\mathbf{w}^{k}\right), ..., g_{n}\left(\mathbf{w}^{k}\right)\right]^{T}. \end{align}\]

The above problem is a quadratic program (QP) which can be efficiently solved by standard R libraries. Furthermore, it is straightforward that adding the mean return or variance terms still keeps the structure of the problem intact.

Appendix III - Computational time

In order to efficiently design high dimensional portfolios that follows the risk parity criterion, we implement the cyclical coordinate descent algorithm proposed by [7]. In brief, this algorithm optimizes for one portfolio weight at a time while leaving the rest fixed.

The plot below illustrates the computational scaling of both Newton and cyclical algorithms. Note that the cyclical algorithm is implemented for two different formulations used by [6] and [7], respectively. Nonetheless, they output the same solution, as they should.

References

[1] H. Markowitz, “Portfolio selection,” J. Financ., vol. 7, no. 1, pp. 77–91, 1952.

[2] E. Qian, “Risk parity portfolios: Efficient portfolios through true diversification,” PanAgora Asset Management, 2005.

[3] C. S. Asness, A. Frazzini, and L. H. Pedersen, “Leverage aversion and risk parity,” Financial Analysts Journal, vol. 68, no. 1, pp. 47–59, 2012.

[4] T. Roncalli, Introduction to risk parity and budgeting. CRC Press, 2013.

[5] E. E. Qian, Risk parity fundamentals. CRC Press, 2016.

[6] F. Spinu, “An algorithm for computing risk parity weights,” SSRN, 2013.

[7] T. Griveau-Billion, J.-C. Richard, and T. Roncalli, “A fast algorithm for computing high-dimensional risk parity portfolios,” SSRN, 2013.

[8] H. Kaya and W. Lee, “Demystifying risk parity,” Neuberger Berman, 2012.

[9] Y. Feng and D. P. Palomar, “SCRIP: Successive convex optimization methods for risk parity portfolios design,” IEEE Trans. Signal Processing, vol. 63, no. 19, pp. 5285–5300, 2015.

[10] Y. Feng and D. P. Palomar, A Signal Processing Perspective on Financial Engineering. Foundations; Trends in Signal Processing, Now Publishers, 2016.

[11] G. Scutari, F. Facchinei, P. Song, D. P. Palomar, and J.-S. Pang, “Decomposition by partial linearization: Parallel optimization of multi-agent systems,” IEEE Trans. Signal Processing, vol. 62, no. 3, pp. 641–656, 2014.

[12] J.-C. Richard and T. Roncalli, “Constrained risk budgeting portfolios: Theory, algorithms, applications, & puzzles,” SSRN, 2019.

[13] T. Souza, “DIY ray dalio etf: How to build your own hedge fund strategy with risk parity portfolios,” Medium: https://towardsdatascience.com/ray-dalio-etf-900edfe64b05. 2019.

[14] S. P. Boyd and L. Vandenberghe, Convex optimization. Cambridge University Press, 2004.