OLS, Ridge & Lasso Regression

This section covers the three core linear regression techniques for return prediction: Ordinary Least Squares (OLS), Ridge regression (L2 regularization), and Lasso regression (L1 regularization). Each method trades off bias and variance differently, making them suitable for different trading scenarios.

OLS Regression for Return Prediction

Ordinary Least Squares (OLS) minimizes the sum of squared residuals to estimate coefficients. It is the simplest linear model and the starting point for most regression-based trading strategies.

Basic Example

import pandas as pd
import numpy as np
from puffin.models.linear import OLSModel
from puffin.data import YFinanceProvider

# Fetch historical data
symbol = 'AAPL'
df = YFinanceProvider().fetch(symbol, start='2020-01-01', end='2023-12-31')

# Calculate returns and features
df['returns'] = df['close'].pct_change()
df['momentum_5'] = df['close'].pct_change(5)
df['momentum_20'] = df['close'].pct_change(20)
df['volume_ratio'] = df['volume'] / df['volume'].rolling(20).mean()
df['volatility'] = df['returns'].rolling(20).std()

# Prepare data (shift features to avoid look-ahead bias)
df['target'] = df['returns'].shift(-1)  # Next day's return
df = df.dropna()

features = ['momentum_5', 'momentum_20', 'volume_ratio', 'volatility']
X = df[features]
y = df['target']

# Split into train/test
split_idx = int(len(df) * 0.8)
X_train, X_test = X[:split_idx], X[split_idx:]
y_train, y_test = y[:split_idx], y[split_idx:]

# Fit OLS model
model = OLSModel(add_constant=True)
model.fit(X_train, y_train)

# Examine results
print("Model Summary:")
summary = model.summary()
print(f"R-squared: {summary['r_squared']:.4f}")
print(f"Adjusted R-squared: {summary['adj_r_squared']:.4f}")
print(f"RMSE: {summary['rmse']:.6f}")

print("\nCoefficients:")
print(model.coefficients)

print("\nP-values:")
print(model.p_values)

# Make predictions
y_pred = model.predict(X_test)

# Calculate prediction accuracy
correlation = np.corrcoef(y_test, y_pred)[0, 1]
print(f"\nPrediction Correlation: {correlation:.4f}")

Interpreting OLS Results

Coefficients:

Positive coefficient: Feature increase leads to return increase
Negative coefficient: Feature increase leads to return decrease
Magnitude: Effect size (e.g., beta=0.01 means 1 unit increase leads to 0.01 return increase)

Statistical Tests:

P-value < 0.05: Feature is statistically significant
R-squared: Proportion of variance explained (higher is better)
Residuals: Should be randomly distributed (check for autocorrelation)

OLS is sensitive to outliers and multicollinearity. Always check the Variance Inflation Factor (VIF) for correlated features, and examine residual plots before relying on coefficient estimates.

Common Issues:

Multicollinearity: Correlated features cause unstable coefficients
Heteroscedasticity: Non-constant variance violates OLS assumptions
Autocorrelation: Time-series dependencies violate independence assumption

Regularization: Ridge and Lasso

Regularization adds penalties to prevent overfitting, especially with many features. Both Ridge and Lasso shrink coefficient magnitudes, but they do so differently, leading to distinct practical trade-offs.

Ridge Regression (L2 Regularization)

Ridge adds a penalty proportional to the squared coefficients:

minimize ||y - Xb||^2 + alpha * ||b||^2

This shrinks all coefficients toward zero but never sets them exactly to zero.

from puffin.models.linear import RidgeModel

# Ridge with cross-validated alpha selection
ridge = RidgeModel(alphas=np.logspace(-3, 3, 50), cv=5, normalize=True)
ridge.fit(X_train, y_train)

print(f"Selected alpha: {ridge.alpha:.4f}")
print("\nCoefficients:")
print(ridge.coefficients)

# Feature importance
print("\nFeature Importance:")
print(ridge.feature_importance())

# Predictions
y_pred_ridge = ridge.predict(X_test)

Ridge regression is often the best default choice when you have many correlated features (e.g., overlapping momentum windows). It stabilizes coefficient estimates without discarding any feature entirely.

When to use Ridge:

Many correlated features
Want to keep all features
Multicollinearity present
Need stable coefficient estimates

Lasso Regression (L1 Regularization)

Lasso adds a penalty proportional to the absolute value of coefficients:

minimize ||y - Xb||^2 + alpha * ||b||_1

The key property of Lasso is that it can set coefficients exactly to zero, performing automatic feature selection.

from puffin.models.linear import LassoModel

# Lasso with cross-validated alpha selection
lasso = LassoModel(alphas=np.logspace(-4, 0, 50), cv=5, normalize=True)
lasso.fit(X_train, y_train)

print(f"Selected alpha: {lasso.alpha:.6f}")
print("\nCoefficients:")
print(lasso.coefficients)

# Selected features (non-zero coefficients)
print("\nSelected Features:")
print(lasso.selected_features)

# Feature importance
print("\nFeature Importance:")
print(lasso.feature_importance())

When to use Lasso:

Feature selection needed
Many irrelevant features
Want sparse model
Interpretability important

In practice, Lasso is particularly useful when constructing alpha factors from a large pool of candidate signals. It automatically identifies which signals contribute to return prediction and discards the rest.

Comparing OLS, Ridge, and Lasso

The following code compares all three models on the same test set, evaluating mean squared error, R-squared, and prediction correlation:

from sklearn.metrics import mean_squared_error, r2_score

models = {
    'OLS': model,
    'Ridge': ridge,
    'Lasso': lasso,
}

print("Model Comparison:")
print("-" * 60)
for name, m in models.items():
    y_pred = m.predict(X_test)
    mse = mean_squared_error(y_test, y_pred)
    r2 = r2_score(y_test, y_pred)
    corr = np.corrcoef(y_test, y_pred)[0, 1]

    print(f"{name:10s} - MSE: {mse:.6f}, R2: {r2:.4f}, Corr: {corr:.4f}")

Quick Comparison Guide

Property	OLS	Ridge	Lasso
Regularization	None	L2 (squared)	L1 (absolute)
Feature selection	No	No	Yes
Coefficient behavior	Unbiased	Shrunk toward zero	Shrunk, some exactly zero
Best for	Few features, no collinearity	Many correlated features	Many irrelevant features
Overfitting risk	High with many features	Low	Low

A common workflow: start with OLS to establish a baseline, switch to Ridge if you see multicollinearity or overfitting, and use Lasso when you suspect many features are irrelevant and want an automatically sparse model.

Source Code

Browse the implementation: puffin/models/