Bayesian Sharpe Ratio
The Sharpe ratio is a key metric for strategy evaluation, but traditional estimates don’t account for estimation uncertainty. A strategy with 252 days of data might have a Sharpe of 1.5, but how confident are we in that number? Bayesian methods provide the full distribution of possible Sharpe ratios, enabling principled strategy comparison and selection.
The Bayesian Sharpe ratio uses a Student’s t-distribution for returns rather than a normal distribution, making it more robust to the fat tails commonly observed in financial data.
Basic Usage
from puffin.models.bayesian import bayesian_sharpe
import numpy as np
# Simulate strategy returns
np.random.seed(42)
returns = np.random.randn(252) * 0.015 + 0.0008 # Daily returns
# Compute Bayesian Sharpe ratio
sharpe_stats = bayesian_sharpe(returns, samples=5000)
print(f"Posterior Mean Sharpe: {sharpe_stats['mean']:.2f}")
print(f"94% HDI: [{sharpe_stats['hdi_low']:.2f}, {sharpe_stats['hdi_high']:.2f}]")
print(f"P(Sharpe > 0): {sharpe_stats['prob_positive']:.1%}")
The bayesian_sharpe function returns a dictionary with:
| Key | Description |
|---|---|
mean |
Posterior mean of the annualized Sharpe ratio |
std |
Posterior standard deviation |
hdi_low |
Lower bound of the 94% Highest Density Interval |
hdi_high |
Upper bound of the 94% Highest Density Interval |
prob_positive |
Posterior probability that the Sharpe ratio is positive |
The width of the HDI depends on sample size. A strategy with only 60 days of data will have a much wider interval than one with 3 years of data – reflecting the greater uncertainty in the estimate.
Strategy Comparison
Comparing strategies using point estimates of the Sharpe ratio can be misleading. Bayesian comparison accounts for estimation uncertainty and tells you the probability that one strategy truly outperforms another.
from puffin.models.bayesian import compare_strategies_bayesian
import numpy as np
# Simulate three strategies
np.random.seed(42)
momentum_returns = np.random.randn(252) * 0.012 + 0.001
mean_reversion_returns = np.random.randn(252) * 0.010 + 0.0008
ml_strategy_returns = np.random.randn(252) * 0.015 + 0.0012
# Compare strategies
comparison = compare_strategies_bayesian({
'Momentum': momentum_returns,
'Mean Reversion': mean_reversion_returns,
'ML Strategy': ml_strategy_returns
}, samples=5000)
print(comparison)
Output:
rank strategy sharpe_mean sharpe_std hdi_low hdi_high prob_positive
0 1 ML Strategy 1.27 0.18 0.92 1.61 1.00
1 2 Momentum 1.13 0.17 0.80 1.45 1.00
2 3 Mean Reversion 1.01 0.16 0.70 1.32 1.00
Notice that while ML Strategy ranks first, its HDI overlaps with Momentum’s. This means we cannot conclude with high confidence that ML Strategy is truly better – a nuance that point estimates would miss entirely.
Key Advantages
- Accounts for Estimation Uncertainty: Small sample sizes have wider credible intervals
- Robust to Outliers: Uses Student’s t-distribution instead of normal
- Probability Statements: Can ask “What’s the probability this strategy has positive Sharpe?”
- Principled Comparison: Compare strategies accounting for uncertainty
Example: Strategy Selection with Uncertainty
This example compares five strategies with different risk-return characteristics and visualizes the results with credible intervals.
from puffin.models.bayesian import bayesian_sharpe, compare_strategies_bayesian
import numpy as np
import pandas as pd
# Simulate 5 strategies with different characteristics
np.random.seed(42)
n_days = 252
strategies = {
'High Sharpe Low Vol': np.random.randn(n_days) * 0.008 + 0.0010,
'High Sharpe High Vol': np.random.randn(n_days) * 0.020 + 0.0025,
'Low Sharpe Low Vol': np.random.randn(n_days) * 0.005 + 0.0003,
'Negative Sharpe': np.random.randn(n_days) * 0.015 - 0.0005,
'Zero Sharpe': np.random.randn(n_days) * 0.012,
}
# Compare all strategies
results = compare_strategies_bayesian(strategies, samples=5000)
print(results)
# Visualize results
import matplotlib.pyplot as plt
fig, ax = plt.subplots(figsize=(10, 6))
x = range(len(results))
ax.barh(x, results['sharpe_mean'])
ax.errorbar(
results['sharpe_mean'],
x,
xerr=[
results['sharpe_mean'] - results['hdi_low'],
results['hdi_high'] - results['sharpe_mean']
],
fmt='none',
color='black',
capsize=5
)
ax.set_yticks(x)
ax.set_yticklabels(results['strategy'])
ax.set_xlabel('Sharpe Ratio')
ax.set_title('Strategy Comparison (with 94% Credible Intervals)')
ax.axvline(x=0, color='r', linestyle='--', alpha=0.5)
ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
When comparing strategies, ensure the return series cover the same time period. Comparing a strategy tested during a bull market with one tested during a bear market will produce misleading results regardless of the statistical method used.
When to Use Bayesian Sharpe
The Bayesian approach to Sharpe ratio estimation is particularly valuable when:
- Limited data: The strategy has less than 2 years of live or backtest data
- Strategy selection: Choosing among multiple candidate strategies for deployment
- Risk budgeting: Allocating capital proportional to confidence in each strategy
- Regime changes: Assessing whether a strategy’s performance has genuinely shifted
For strategies with extensive track records (5+ years), the frequentist and Bayesian estimates will largely agree, but the Bayesian approach still provides the useful prob_positive metric.
Source Code
Browse the implementation: puffin/models/