Data-Driven Risk Factors
Traditional factor models (Fama-French, Barra) use pre-defined factors such as market, size, and value. PCA extracts factors directly from return data, producing a purely statistical decomposition. This page covers factor extraction, exposure analysis, attribution, and trading signals built on data-driven factors.
Data-driven factors complement traditional models. They can capture structural patterns (e.g., sector rotations, crowded trades) that predefined factors miss.
Extract Risk Factors
Use PCA to extract latent factors from a cross-section of returns. Each factor is a weighted combination of all assets, chosen to maximize explanatory power.
from puffin.unsupervised import extract_risk_factors, factor_exposures
# Extract 5 data-driven factors
factors = extract_risk_factors(returns, n_factors=5)
print(factors.head())
# Factor_1 Factor_2 Factor_3 Factor_4 Factor_5
# 2020-01-01 0.012 0.005 -0.003 0.001 0.002
# 2020-01-02 -0.008 0.010 0.001 -0.004 0.000
Each row is one trading day. Each column is the return of one statistical factor on that day. Factor 1 explains the most variance (typically the market factor), Factor 2 the next most, and so on.
Factor Exposures
Factor exposures (also called loadings or betas) measure how sensitive each asset is to each factor.
# Compute factor exposures (betas)
loadings = factor_exposures(returns, factors)
print(loadings)
# Factor_1 Factor_2 Factor_3 Factor_4 Factor_5
# AAPL 0.85 0.23 0.15 0.05 0.01
# GOOGL 0.80 0.30 0.10 0.08 0.02
# MSFT 0.75 0.35 0.05 0.10 0.03
A loading of 0.85 on Factor 1 means that a 1% move in Factor 1 explains a 0.85% move in that stock’s return, all else equal. High Factor 1 loadings across all stocks confirm it is the market factor.
Interpreting the loading matrix:
- Factor 1: Near-uniform loadings suggest the market factor
- Factor 2: Positive/negative splits may indicate sector or style tilts
- Higher factors: Increasingly specific patterns (may be noise in small samples)
Factor Attribution
Decompose returns into the portion explained by common factors and the idiosyncratic (stock-specific) residual.
from puffin.unsupervised import factor_attribution, specific_risk
# Attribute returns to factors
attribution = factor_attribution(returns, factors, loadings)
# Specific (idiosyncratic) risk
spec_risk = specific_risk(returns, attribution)
print(spec_risk)
# AAPL 0.15
# GOOGL 0.18
# MSFT 0.12
The attribution DataFrame contains the factor-explained component of each asset’s returns. The difference between actual and attributed returns is the specific return – the portion unique to each asset.
Low specific risk means the asset moves primarily with common factors (high systematic risk). High specific risk means idiosyncratic events (earnings surprises, management changes) dominate.
Variance Decomposition
Quantify how much of each asset’s total variance comes from common factors versus idiosyncratic sources.
from puffin.unsupervised import factor_variance_decomposition
decomp = factor_variance_decomposition(returns, n_factors=5)
print(decomp)
# total_variance factor_variance specific_variance pct_factor pct_specific
# AAPL 0.040 0.032 0.008 80.0 20.0
# GOOGL 0.050 0.038 0.012 76.0 24.0
# MSFT 0.045 0.036 0.009 80.0 20.0
Interpretation: 80% of AAPL’s variance is explained by common factors, 20% is stock-specific. This ratio varies by:
- Market cap: Large caps tend to have higher factor variance (more correlated with the market)
- Sector: Utility stocks may have lower factor variance than tech
- Time period: Factor explanatory power increases during crises (correlations rise)
Factor Mimicking Portfolio
Create a tradeable portfolio that replicates a specific statistical factor. This converts an abstract PCA factor into an implementable strategy.
from puffin.unsupervised import factor_mimicking_portfolio
# Replicate Factor 1 (usually market factor)
portfolio = factor_mimicking_portfolio(returns, target_factor_idx=0, n_factors=5)
print(portfolio)
# AAPL 0.22
# GOOGL 0.20
# MSFT 0.19
# AMZN 0.21
# TSLA 0.18
The weights represent the portfolio that best tracks the target factor’s returns. For Factor 1, the near-equal weights confirm it is the market factor. For Factor 2, you would see a long-short split reflecting the style or sector tilt it captures.
Mimicking portfolios for higher-order factors can be noisy and may have high turnover. Consider using only Factors 1-3 for practical portfolio construction.
Dynamic Factor Exposure
Factor exposures are not static. Track how an asset’s sensitivity to each factor evolves over time using rolling windows.
from puffin.unsupervised import dynamic_factor_exposure
import matplotlib.pyplot as plt
# Rolling 252-day window
exposures = dynamic_factor_exposure(returns, window=252, n_factors=3)
# exposures is a dict: Factor_1 -> DataFrame of rolling betas
factor_1_betas = exposures["Factor_1"]
print(factor_1_betas.head())
# AAPL GOOGL MSFT AMZN TSLA
# 2021-01-01 0.82 0.78 0.75 0.80 0.85
# 2021-01-02 0.83 0.79 0.76 0.81 0.86
# Plot AAPL's exposure to Factor 1
plt.plot(factor_1_betas.index, factor_1_betas["AAPL"])
plt.title("AAPL Exposure to Factor 1")
plt.show()
Use cases for dynamic exposures:
- Risk monitoring: Detect when a stock’s market beta is drifting
- Hedging: Adjust hedge ratios as exposures change
- Regime detection: Sharp changes in exposure may signal regime shifts
Factor Timing Signal
Generate trading signals based on factor momentum. If a factor has performed well recently, tilt toward assets with high exposure to it; if it has performed poorly, tilt away.
from puffin.unsupervised import factor_timing_signal
# Signal based on Factor 1's 21-day momentum
signals = factor_timing_signal(returns, factor_idx=0, n_factors=3, lookback=21)
print(signals.tail())
# 2020-12-27 1 # Long
# 2020-12-28 1 # Long
# 2020-12-29 0 # Neutral
# 2020-12-30 -1 # Short
# 2020-12-31 -1 # Short
The signal is +1 (long) when the factor’s recent return is above its historical average, -1 (short) when below, and 0 (neutral) otherwise. Combine this with factor exposures to construct a factor-timed portfolio.
Factor timing is notoriously difficult. Backtest carefully and account for transaction costs. Many practitioners use factor timing only to adjust tilts, not as a standalone strategy.
Practical Example: Portfolio Construction
Combine clustering and PCA for a robust, diversified portfolio.
import numpy as np
import pandas as pd
from puffin.unsupervised import (
cluster_assets,
cluster_summary,
extract_risk_factors,
factor_exposures,
)
# Load returns
returns = pd.read_csv("sp500_returns.csv", index_col=0, parse_dates=True)
# Step 1: Cluster assets into 5 groups
labels = cluster_assets(returns, n_clusters=5, method='kmeans')
summary = cluster_summary(returns, labels)
print(summary)
# Step 2: Select best asset from each cluster (highest Sharpe)
selected_assets = []
for cluster_id in summary['cluster']:
cluster_assets_col = returns.columns[labels == cluster_id]
cluster_returns = returns[cluster_assets_col]
# Compute Sharpe ratio
mean_ret = cluster_returns.mean() * 252
std_ret = cluster_returns.std() * np.sqrt(252)
sharpe = mean_ret / std_ret
# Pick best
best_asset = sharpe.idxmax()
selected_assets.append(best_asset)
print(f"Selected assets: {selected_assets}")
# Step 3: Extract factors for risk management
factors = extract_risk_factors(returns[selected_assets], n_factors=3)
loadings = factor_exposures(returns[selected_assets], factors)
print("Factor exposures:")
print(loadings)
# Step 4: Construct equal-weighted portfolio
portfolio_weights = pd.Series(1/len(selected_assets), index=selected_assets)
print("\nPortfolio weights:")
print(portfolio_weights)
This pipeline selects one representative asset per cluster (maximizing diversification) and then analyzes the resulting portfolio’s factor exposures to ensure no unintended concentration.
Source Code
Browse the implementation: puffin/unsupervised/