Stochastic Volatility & Bayesian Pairs Trading

This page covers two advanced Bayesian applications for trading: dynamic hedge ratios for pairs trading and stochastic volatility models. Both leverage PyMC’s MCMC sampling to provide uncertainty-aware estimates that update as new data arrives.

Unlike GARCH models that specify volatility as a deterministic function of past returns, stochastic volatility models treat volatility as a latent random process – providing more flexible modeling of volatility dynamics.

Dynamic Hedge Ratios for Pairs Trading

In pairs trading, the hedge ratio (beta between two assets) changes over time. Bayesian methods allow us to track this with uncertainty quantification, so we know not only the current hedge ratio but also how confident we are in it.

Basic Pairs Trading

from puffin.models.bayesian import BayesianPairsTrading
import yfinance as yf

# Download pair
start_date = '2022-01-01'
end_date = '2023-12-31'

stock_y = yf.download('PEP', start=start_date, end=end_date)['Adj Close']
stock_x = yf.download('KO', start=start_date, end=end_date)['Adj Close']

# Initialize model
pairs_model = BayesianPairsTrading()

# Fit dynamic hedge ratios
hedge_ratios = pairs_model.fit_dynamic_hedge(
    stock_y,
    stock_x,
    window=60  # 60-day rolling window
)

print(hedge_ratios.tail())

Output:

            hedge_ratio_mean  hedge_ratio_std    spread
Date
2023-12-22             1.23             0.08      2.45
2023-12-23             1.24             0.09      2.31
2023-12-26             1.22             0.08      2.67
2023-12-27             1.23             0.08      2.52
2023-12-28             1.25             0.09      2.18

Generate Trading Signals

# Generate signals based on spread z-score
signals = pairs_model.generate_signals(
    entry_threshold=2.0,   # Enter when |z-score| > 2
    exit_threshold=0.5      # Exit when |z-score| < 0.5
)

# Backtest signals
returns_y = stock_y.pct_change()
returns_x = stock_x.pct_change()

# Strategy return: signal * (y - hedge_ratio * x)
strategy_returns = signals * (
    returns_y - hedge_ratios['hedge_ratio_mean'] * returns_x
)

print(f"Strategy Sharpe: {strategy_returns.mean() / strategy_returns.std() * np.sqrt(252):.2f}")

The hedge_ratio_std column tells you how uncertain the hedge ratio estimate is. When uncertainty is high, consider reducing position sizes or widening entry thresholds.

Visualize Hedge Ratio Evolution

import matplotlib.pyplot as plt

fig, axes = plt.subplots(3, 1, figsize=(12, 10))

# Plot prices
axes[0].plot(stock_y.index, stock_y.values, label='PEP')
axes[0].plot(stock_x.index, stock_x.values, label='KO')
axes[0].set_title('Asset Prices')
axes[0].legend()
axes[0].grid(True, alpha=0.3)

# Plot hedge ratio with uncertainty
axes[1].plot(hedge_ratios.index, hedge_ratios['hedge_ratio_mean'], label='Hedge Ratio')
axes[1].fill_between(
    hedge_ratios.index,
    hedge_ratios['hedge_ratio_mean'] - 2 * hedge_ratios['hedge_ratio_std'],
    hedge_ratios['hedge_ratio_mean'] + 2 * hedge_ratios['hedge_ratio_std'],
    alpha=0.3,
    label='95% Credible Interval'
)
axes[1].set_title('Dynamic Hedge Ratio')
axes[1].legend()
axes[1].grid(True, alpha=0.3)

# Plot spread with entry/exit thresholds
spread_zscore = (hedge_ratios['spread'] - hedge_ratios['spread'].rolling(20).mean()) / \
                hedge_ratios['spread'].rolling(20).std()

axes[2].plot(spread_zscore.index, spread_zscore.values, label='Spread Z-Score')
axes[2].axhline(y=2.0, color='r', linestyle='--', label='Entry Threshold')
axes[2].axhline(y=-2.0, color='r', linestyle='--')
axes[2].axhline(y=0.5, color='g', linestyle='--', label='Exit Threshold')
axes[2].axhline(y=-0.5, color='g', linestyle='--')
axes[2].set_title('Spread Z-Score')
axes[2].legend()
axes[2].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

Stochastic Volatility Models

Stochastic volatility (SV) models treat volatility as a latent time-varying process, providing more flexible modeling than GARCH.

Model Specification

The standard SV model:

r_t = σ_t × ε_t                    (returns equation)
log(σ_t²) = μ + φ(log(σ_{t-1}²) - μ) + η_t   (volatility equation)

Where:

σ_t: volatility at time t (unobserved)
μ: long-run mean of log-volatility
φ: persistence (0 < φ < 1)
ε_t, η_t: independent standard normal innovations

Basic Usage

from puffin.models.stochastic_vol import StochasticVolatilityModel
import yfinance as yf

# Download data
spy = yf.download('SPY', start='2022-01-01', end='2023-12-31')
returns = spy['Adj Close'].pct_change().dropna()

# Fit stochastic volatility model
sv_model = StochasticVolatilityModel()
sv_model.fit(returns, samples=2000, tune=1000)

# Extract volatility path
vol_path = sv_model.volatility_path
vol_forecast = sv_model.volatility_forecast

print(f"Current volatility: {vol_path[-1]:.4f}")
print(f"Next-period forecast: {vol_forecast:.4f}")

# Get parameter estimates
summary = sv_model.summary()
print(summary)

Visualize Volatility

# Plot volatility over time
fig = sv_model.plot_volatility()
plt.show()

# Plot parameter posteriors
fig = sv_model.plot_posterior()
plt.show()

Quick Volatility Estimation

For faster estimation without storing the full model:

from puffin.models.stochastic_vol import estimate_volatility_regime

# Quick estimation
vol_df = estimate_volatility_regime(returns, samples=1000)

print(vol_df.tail())

Output:

            volatility  vol_lower  vol_upper
Date
2023-12-22      0.0092     0.0078     0.0108
2023-12-23      0.0095     0.0080     0.0112
2023-12-26      0.0089     0.0075     0.0105
2023-12-27      0.0091     0.0077     0.0107
2023-12-28      0.0093     0.0079     0.0109

Trading Applications

1. Volatility-Adjusted Position Sizing

# Scale positions by inverse volatility
target_vol = 0.01  # 1% daily vol target

position_sizes = target_vol / vol_df['volatility']

# Cap maximum leverage
position_sizes = position_sizes.clip(upper=2.0)

2. Dynamic Stop Losses

# Set stops at 2 standard deviations
stop_distance = 2 * vol_df['volatility']

# Wider stops in high vol, tighter in low vol

3. Volatility Regime Detection

# Identify high/low volatility regimes
vol_median = vol_df['volatility'].median()

regime = pd.Series('normal', index=vol_df.index)
regime[vol_df['volatility'] > 1.5 * vol_median] = 'high_vol'
regime[vol_df['volatility'] < 0.5 * vol_median] = 'low_vol'

print(regime.value_counts())

Volatility regime detection can be used as a meta-strategy: reduce position sizes or switch to mean-reversion strategies during high-volatility regimes, and increase trend-following exposure during low-volatility regimes.

Practical Examples

Walk-Forward Pairs Trading

from puffin.models.bayesian import BayesianPairsTrading
import yfinance as yf
import pandas as pd

# Download multiple pairs
pairs = [
    ('XOM', 'CVX'),   # Energy
    ('JPM', 'BAC'),   # Banks
    ('PG', 'KO')      # Consumer
]

results = []

for stock1, stock2 in pairs:
    # Download data
    data1 = yf.download(stock1, start='2022-01-01', end='2023-12-31')['Adj Close']
    data2 = yf.download(stock2, start='2022-01-01', end='2023-12-31')['Adj Close']

    # Fit pairs model
    pairs_model = BayesianPairsTrading()
    hedge_df = pairs_model.fit_dynamic_hedge(data1, data2, window=60)

    # Generate signals
    signals = pairs_model.generate_signals(entry_threshold=2.0)

    # Calculate returns
    ret1 = data1.pct_change()
    ret2 = data2.pct_change()
    strategy_ret = signals * (ret1 - hedge_df['hedge_ratio_mean'] * ret2)

    # Compute performance
    sharpe = strategy_ret.mean() / strategy_ret.std() * np.sqrt(252)

    results.append({
        'pair': f'{stock1}-{stock2}',
        'sharpe': sharpe,
        'mean_hedge': hedge_df['hedge_ratio_mean'].mean(),
        'hedge_std': hedge_df['hedge_ratio_std'].mean()
    })

results_df = pd.DataFrame(results)
print(results_df.sort_values('sharpe', ascending=False))

Volatility-Scaled Portfolio

from puffin.models.stochastic_vol import estimate_volatility_regime
import yfinance as yf
import numpy as np

# Download multiple assets
tickers = ['SPY', 'QQQ', 'IWM', 'EFA']
data = yf.download(tickers, start='2022-01-01', end='2023-12-31')['Adj Close']
returns = data.pct_change().dropna()

# Estimate volatility for each asset
volatilities = {}
for ticker in tickers:
    vol_df = estimate_volatility_regime(returns[ticker], samples=1000)
    volatilities[ticker] = vol_df['volatility']

vol_df = pd.DataFrame(volatilities)

# Equal risk contribution portfolio
# Weight inversely proportional to volatility
weights = (1 / vol_df).div((1 / vol_df).sum(axis=1), axis=0)

# Calculate portfolio returns
portfolio_returns = (returns * weights).sum(axis=1)

# Compare to equal-weight
equal_weight_returns = returns.mean(axis=1)

print(f"Equal Risk Sharpe: {portfolio_returns.mean() / portfolio_returns.std() * np.sqrt(252):.2f}")
print(f"Equal Weight Sharpe: {equal_weight_returns.mean() / equal_weight_returns.std() * np.sqrt(252):.2f}")

Best Practices

Prior Selection

Weakly Informative Priors: Balance between regularization and data influence
Domain Knowledge: Use reasonable ranges based on market behavior
Sensitivity Analysis: Check robustness to prior choice

Computational Efficiency

Start Small: Use fewer samples for development, scale up for production
Parallel Chains: Use cores=4 parameter in pm.sample()
Vectorization: Process multiple assets in parallel
Caching: Store fitted models for reuse

Production Considerations

# Save fitted model
import pickle

with open('bayesian_model.pkl', 'wb') as f:
    pickle.dump(model, f)

# Load later
with open('bayesian_model.pkl', 'rb') as f:
    loaded_model = pickle.load(f)

Bayesian models can be computationally expensive. For production use, consider fitting models during off-market hours and caching results. The estimate_volatility_regime function provides a faster alternative when full posterior analysis is not needed.

Source Code

Browse the implementation: puffin/models/