Hierarchical Risk Parity
Hierarchical Risk Parity (HRP), developed by Marcos Lopez de Prado, combines hierarchical clustering with inverse-variance allocation to create more stable and robust portfolios. Unlike mean-variance optimization, HRP does not require matrix inversion of the covariance matrix, making it numerically stable even when the number of assets is large relative to the sample size.
Overview
Algorithm Steps:
- Tree Clustering: Group assets based on correlation distance
- Quasi-Diagonalization: Reorder covariance matrix to group similar assets
- Recursive Bisection: Allocate weights through the hierarchy using inverse-variance weighting
Advantages
- More stable out-of-sample performance
- Less sensitive to estimation errors
- No need for matrix inversion (avoids numerical instability)
- Intuitive interpretation through hierarchical structure
HRP was introduced in Lopez de Prado (2016), “Building Diversified Portfolios that Outperform Out of Sample”. The paper demonstrates that HRP outperforms both mean-variance and risk parity in Monte Carlo experiments with estimation noise.
Implementation
Puffin provides several HRP utilities in puffin.portfolio:
| Function | Description |
|---|---|
hrp_weights |
Returns a NumPy array of HRP weights |
hrp_weights_with_names |
Returns a pandas Series with asset names as the index |
plot_dendrogram |
Visualizes the hierarchical clustering tree |
hrp_allocation_stats |
Detailed per-asset allocation statistics |
Computing HRP Weights
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from puffin.portfolio import (
hrp_weights,
hrp_weights_with_names,
plot_dendrogram,
hrp_allocation_stats
)
# Load historical returns
returns = pd.read_csv('returns.csv', index_col=0, parse_dates=True)
# Calculate HRP weights
hrp_w = hrp_weights_with_names(returns)
print("HRP Weights:")
print(hrp_w)
# Visualize hierarchical clustering
plt.figure(figsize=(12, 6))
linkage_matrix, dend = plot_dendrogram(returns, linkage_method='single')
plt.title('Asset Clustering Dendrogram')
plt.xlabel('Assets')
plt.ylabel('Distance')
plt.show()
# Get detailed allocation statistics
stats = hrp_allocation_stats(returns, hrp_w.values)
print("\nHRP Allocation Statistics:")
print(stats)
The dendrogram is one of the most useful diagnostic plots in HRP. Assets that merge at low distance are highly correlated and will share a subtree. If two assets you consider distinct merge early, investigate whether their correlation has increased recently.
Comparing Linkage Methods
The choice of linkage method affects how the hierarchy is built and, consequently, the final weights. The four standard methods are:
- Single: Merges clusters by the minimum pairwise distance (can produce long chains)
- Complete: Merges by maximum pairwise distance (produces compact clusters)
- Average: Uses the mean pairwise distance (a compromise)
- Ward: Minimizes within-cluster variance (tends to produce balanced trees)
# Test different clustering methods
methods = ['single', 'complete', 'average', 'ward']
fig, axes = plt.subplots(2, 2, figsize=(15, 12))
axes = axes.ravel()
for i, method in enumerate(methods):
weights = hrp_weights(returns, linkage_method=method)
axes[i].bar(range(len(returns.columns)), weights)
axes[i].set_xticks(range(len(returns.columns)))
axes[i].set_xticklabels(returns.columns, rotation=45)
axes[i].set_title(f'HRP Weights ({method.capitalize()} Linkage)')
axes[i].set_ylabel('Weight')
axes[i].grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
Different linkage methods can produce materially different weight vectors. Always compare several methods and validate with out-of-sample backtests before committing to one.
Performance Analysis with Tearsheets
Once portfolio weights are determined, it is essential to evaluate performance using standard risk-return metrics. Puffin provides a tearsheet module for this purpose.
Computing Portfolio Statistics
from puffin.portfolio import (
compute_stats,
generate_tearsheet,
print_tearsheet_summary
)
# Create portfolio returns using HRP weights
portfolio_returns = (returns * hrp_w.values).sum(axis=1)
# Compute statistics
stats = compute_stats(portfolio_returns, risk_free_rate=0.02, periods_per_year=252)
print("Portfolio Performance:")
print(f"Annual Return: {stats['annual_return']:.2%}")
print(f"Annual Volatility: {stats['annual_vol']:.2%}")
print(f"Sharpe Ratio: {stats['sharpe']:.3f}")
print(f"Sortino Ratio: {stats['sortino']:.3f}")
print(f"Maximum Drawdown: {stats['max_dd']:.2%}")
print(f"Calmar Ratio: {stats['calmar']:.3f}")
print(f"VaR (95%): {stats['var_95']:.2%}")
print(f"CVaR (95%): {stats['cvar_95']:.2%}")
Generating Tearsheets
A tearsheet aggregates cumulative returns, drawdowns, rolling statistics, and benchmark comparisons into a single report.
# Generate comprehensive tearsheet
# If you have a benchmark (e.g., S&P 500 returns)
benchmark_returns = pd.read_csv('sp500_returns.csv', index_col=0, parse_dates=True)
tearsheet = generate_tearsheet(
portfolio_returns,
benchmark=benchmark_returns.squeeze(),
risk_free_rate=0.02
)
# Print formatted summary
print_tearsheet_summary(tearsheet)
# Access specific metrics
print(f"\nBeta: {tearsheet.get('beta', 'N/A'):.3f}")
print(f"Alpha: {tearsheet.get('alpha', 'N/A'):.2%}")
print(f"Information Ratio: {tearsheet.get('information_ratio', 'N/A'):.3f}")
Visualization
from puffin.portfolio import (
plot_returns,
plot_drawdown,
plot_monthly_returns,
plot_rolling_metrics
)
# Cumulative returns
fig1 = plot_returns(portfolio_returns, benchmark=benchmark_returns.squeeze())
plt.show()
# Drawdown analysis
fig2 = plot_drawdown(portfolio_returns)
plt.show()
# Monthly returns heatmap
fig3 = plot_monthly_returns(portfolio_returns)
plt.show()
# Rolling performance metrics
fig4 = plot_rolling_metrics(portfolio_returns, window=252)
plt.show()
The rolling Sharpe chart (
plot_rolling_metrics) is especially useful for detecting regime changes. A sustained drop in the 1-year rolling Sharpe may indicate that the correlation structure has shifted and the portfolio needs re-clustering.
Portfolio Rebalancing
Rebalancing Strategies
After constructing optimal weights, portfolios drift as asset prices move. Rebalancing restores the target allocation but incurs transaction costs. Puffin’s RebalanceEngine supports several scheduling strategies.
from puffin.portfolio import (
RebalanceEngine,
CostModel,
rebalance_schedule,
backtest_rebalancing
)
# Define transaction cost model
cost_model = CostModel(
commission_pct=0.001, # 0.1% commission
commission_fixed=1.0, # $1 fixed fee
slippage_pct=0.0005, # 0.05% slippage
min_commission=1.0 # $1 minimum
)
# Create rebalancing engine
engine = RebalanceEngine(cost_model=cost_model)
# Define target weights (e.g., using HRP)
target_weights = dict(zip(returns.columns, hrp_w.values))
# Monthly rebalancing schedule
monthly_schedule = rebalance_schedule(strategy='monthly')
# Backtest the strategy
backtest_result = backtest_rebalancing(
returns,
target_weights,
monthly_schedule,
initial_value=100000.0,
cost_model=cost_model
)
print("Rebalancing Backtest Results:")
print(f"Final Portfolio Value: ${backtest_result['portfolio_value'].iloc[-1]:,.2f}")
print(f"Total Transaction Costs: ${backtest_result['transaction_costs'].iloc[-1]:,.2f}")
print(f"Number of Rebalances: {backtest_result['rebalanced'].sum()}")
Threshold-Based Rebalancing
Calendar-based rebalancing (monthly, quarterly) trades on a fixed schedule regardless of drift. Threshold-based rebalancing only triggers when weights deviate beyond a tolerance, often reducing turnover and costs.
# Rebalance only when weights drift significantly
threshold_schedule = rebalance_schedule(strategy='threshold', threshold=0.05)
threshold_result = backtest_rebalancing(
returns,
target_weights,
threshold_schedule,
initial_value=100000.0,
cost_model=cost_model
)
print(f"\nThreshold Rebalancing:")
print(f"Number of Rebalances: {threshold_result['rebalanced'].sum()}")
print(f"Total Transaction Costs: ${threshold_result['transaction_costs'].iloc[-1]:,.2f}")
Cost-Aware Rebalancing
The most sophisticated approach performs a cost-benefit analysis before each potential rebalance, only executing trades when the expected improvement exceeds the transaction costs.
# Current portfolio state (after drift)
current_weights = {
'AAPL': 0.27,
'GOOGL': 0.24,
'MSFT': 0.26,
'AMZN': 0.23
}
# Current prices
prices = {
'AAPL': 150.0,
'GOOGL': 2800.0,
'MSFT': 300.0,
'AMZN': 3200.0
}
# Decide whether to rebalance based on cost-benefit analysis
result = engine.optimize_with_costs(
current_weights,
target_weights,
portfolio_value=100000.0,
prices=prices,
cost_threshold=0.001 # Only rebalance if benefit exceeds 0.1% of portfolio
)
print("\nCost-Benefit Analysis:")
print(f"Should Rebalance: {result['should_rebalance']}")
print(f"Expected Benefit: ${result['expected_benefit']:.2f}")
print(f"Transaction Costs: ${result['total_cost']:.2f}")
print(f"Benefit-Cost Ratio: {result['benefit_cost_ratio']:.2f}")
if result['should_rebalance']:
print("\nProposed Trades:")
for trade in result['trades']:
action = "BUY" if trade.quantity > 0 else "SELL"
print(f" {action} {abs(trade.quantity):.2f} shares of {trade.symbol} @ ${trade.price:.2f}")
The cost-benefit analysis uses expected tracking error reduction as the “benefit”. This estimate is only as good as the covariance matrix. In volatile markets, consider using a shorter lookback window for covariance estimation.
Comparing Rebalancing Strategies
from puffin.portfolio import compare_rebalancing_strategies
# Compare multiple strategies
strategies = ['monthly', 'quarterly', 'threshold']
comparison = compare_rebalancing_strategies(
returns,
target_weights,
strategies=strategies,
initial_value=100000.0,
cost_model=cost_model
)
# Plot comparison
plt.figure(figsize=(12, 6))
for strategy, result in comparison.items():
plt.plot(result.index, result['portfolio_value'], label=strategy.capitalize())
plt.xlabel('Date')
plt.ylabel('Portfolio Value ($)')
plt.title('Rebalancing Strategy Comparison')
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()
# Summary statistics
print("\nStrategy Comparison:")
for strategy, result in comparison.items():
final_value = result['portfolio_value'].iloc[-1]
total_costs = result['transaction_costs'].iloc[-1]
n_rebalances = result['rebalanced'].sum()
print(f"{strategy.capitalize()}:")
print(f" Final Value: ${final_value:,.2f}")
print(f" Total Costs: ${total_costs:,.2f}")
print(f" Rebalances: {n_rebalances}")
print(f" Return: {(final_value/100000 - 1)*100:.2f}%")
print()
Complete Example: Multi-Strategy Portfolio
This example brings together all three optimization methods – mean-variance, risk parity, and HRP – backtests each with monthly rebalancing, and compares their performance side by side.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from puffin.portfolio import (
MeanVarianceOptimizer,
risk_parity_weights,
hrp_weights,
compute_stats,
plot_returns,
RebalanceEngine,
CostModel,
rebalance_schedule,
backtest_rebalancing
)
# Load data
returns = pd.read_csv('returns.csv', index_col=0, parse_dates=True)
# Strategy 1: Maximum Sharpe Ratio
optimizer = MeanVarianceOptimizer()
max_sharpe = optimizer.max_sharpe(returns, risk_free_rate=0.02)
sharpe_weights = dict(zip(returns.columns, max_sharpe['weights']))
# Strategy 2: Risk Parity
rp_w = risk_parity_weights(returns)
rp_weights = dict(zip(returns.columns, rp_w))
# Strategy 3: Hierarchical Risk Parity
hrp_w = hrp_weights(returns)
hrp_weights_dict = dict(zip(returns.columns, hrp_w))
# Backtest each strategy
cost_model = CostModel(commission_pct=0.001, slippage_pct=0.0005)
schedule = rebalance_schedule(strategy='monthly')
strategies = {
'Max Sharpe': sharpe_weights,
'Risk Parity': rp_weights,
'HRP': hrp_weights_dict
}
results = {}
for name, weights in strategies.items():
result = backtest_rebalancing(
returns,
weights,
schedule,
initial_value=100000.0,
cost_model=cost_model
)
results[name] = result
# Compare performance
fig, axes = plt.subplots(2, 1, figsize=(14, 10))
# Portfolio values
for name, result in results.items():
axes[0].plot(result.index, result['portfolio_value'], label=name, linewidth=2)
axes[0].set_xlabel('Date')
axes[0].set_ylabel('Portfolio Value ($)')
axes[0].set_title('Strategy Performance Comparison')
axes[0].legend()
axes[0].grid(True, alpha=0.3)
# Cumulative returns
for name, result in results.items():
cumulative = (result['portfolio_value'] / 100000 - 1) * 100
axes[1].plot(result.index, cumulative, label=name, linewidth=2)
axes[1].set_xlabel('Date')
axes[1].set_ylabel('Cumulative Return (%)')
axes[1].set_title('Cumulative Returns')
axes[1].legend()
axes[1].grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
# Print summary statistics
print("Strategy Performance Summary:")
print("=" * 70)
for name, result in results.items():
# Calculate portfolio returns
portfolio_returns = result['portfolio_value'].pct_change().dropna()
stats = compute_stats(portfolio_returns, risk_free_rate=0.02/252, periods_per_year=252)
print(f"\n{name}:")
print(f" Final Value: ${result['portfolio_value'].iloc[-1]:,.2f}")
print(f" Total Return: {(result['portfolio_value'].iloc[-1]/100000 - 1)*100:.2f}%")
print(f" Annual Return: {stats['annual_return']:.2%}")
print(f" Annual Volatility: {stats['annual_vol']:.2%}")
print(f" Sharpe Ratio: {stats['sharpe']:.3f}")
print(f" Max Drawdown: {stats['max_dd']:.2%}")
print(f" Transaction Costs: ${result['transaction_costs'].iloc[-1]:,.2f}")
Source Code
Browse the implementation: puffin/portfolio/