Gradient Boosting
Gradient boosting builds trees sequentially, where each tree corrects the errors of the previous trees. This approach often achieves better accuracy than Random Forests, especially with careful hyperparameter tuning.
Key Differences from Random Forests
| Aspect | Random Forest | Gradient Boosting |
|---|---|---|
| Training | Parallel (independent trees) | Sequential (error correction) |
| Bias-Variance | Reduces variance | Reduces bias |
| Tree depth | Can use deep trees | Typically shallow (3–7) |
| Learning rate | N/A | Critical hyperparameter |
| Overfitting risk | Lower | Higher (requires regularization) |
Gradient boosting’s sequential nature means each tree is “specialized” in correcting specific errors. This produces a more accurate but potentially more fragile model. Always use regularization and early stopping with financial data.
Three Major Implementations
Puffin supports three battle-tested gradient boosting frameworks:
- XGBoost: Optimized implementation with L1/L2 regularization
- LightGBM: Fast, memory-efficient histogram-based algorithm
- CatBoost: Native categorical feature handling with ordered boosting
XGBoost
XGBoost (eXtreme Gradient Boosting) is an optimized gradient boosting library with built-in regularization that has dominated structured-data ML competitions since 2015.
Implementation
from puffin.ensembles import XGBoostTrader
# Initialize with financial-optimized defaults
model = XGBoostTrader(task="classification", random_state=42)
# Fit with default parameters
model.fit(features, target)
# Or specify custom parameters
custom_params = {
"learning_rate": 0.01, # Lower = more conservative
"max_depth": 5,
"min_child_weight": 5, # Higher = more regularization
"subsample": 0.8, # Row sampling
"colsample_bytree": 0.8, # Column sampling
"reg_alpha": 0.1, # L1 regularization
"reg_lambda": 1.0, # L2 regularization
"n_estimators": 200,
}
model.fit(features, target, params=custom_params)
# Make predictions
predictions = model.predict(features)
# Plot feature importance
fig = model.plot_importance(max_features=10)
fig.savefig("xgboost_importance.png")
Hyperparameter Tuning
# Define parameter grid
param_grid = {
"learning_rate": [0.01, 0.05, 0.1],
"max_depth": [3, 5, 7],
"min_child_weight": [1, 5, 10],
"subsample": [0.8, 1.0],
"colsample_bytree": [0.8, 1.0],
"n_estimators": [100, 200, 300],
}
# Tune hyperparameters
best_params = model.tune_hyperparameters(features, target, param_grid=param_grid, cv=5)
print("Best parameters:", best_params)
Start with a low learning rate (0.01–0.05) and increase
n_estimatorsto compensate. A lower learning rate with more trees generally produces better generalization.
LightGBM
LightGBM uses histogram-based algorithms for faster training and lower memory usage. It is particularly effective for large datasets common in cross-sectional equity strategies.
Key Features
- Leaf-wise tree growth: Grows trees leaf-by-leaf rather than level-by-level, finding better splits
- Histogram-based: Bins continuous features for faster splitting decisions
- Native categorical support: No need for one-hot encoding
- Efficient memory usage: Can handle datasets with millions of rows
LightGBM’s leaf-wise growth can overfit small datasets more easily than XGBoost’s level-wise approach. Use
num_leavescarefully – it should be less than2^max_depth.
Implementation
from puffin.ensembles import LightGBMTrader
# Initialize model
model = LightGBMTrader(task="classification", random_state=42)
# Fit with categorical features
categorical_features = ["sector", "market_cap_category"]
model.fit(features, target, categorical_features=categorical_features)
# Tune hyperparameters
param_grid = {
"learning_rate": [0.01, 0.05, 0.1],
"max_depth": [3, 5, 7],
"num_leaves": [15, 31, 63],
"min_child_samples": [10, 20, 30],
"n_estimators": [100, 200],
}
best_params = model.tune_hyperparameters(
features, target, param_grid=param_grid, cv=5, categorical_features=categorical_features
)
LightGBM vs XGBoost
For most trading applications:
- Speed: LightGBM trains 2-10x faster on large datasets
- Memory: LightGBM uses significantly less RAM
- Accuracy: Comparable; LightGBM may slightly edge out on large cross-sectional data
- Tuning: XGBoost is somewhat easier to tune for beginners
CatBoost
CatBoost (Categorical Boosting) provides native categorical feature handling without preprocessing, making it ideal for trading data with sector, industry, or rating categories.
Key Features
- Native categorical encoding: Handles categories without one-hot encoding
- Ordered boosting: Reduces overfitting through clever ordering of training examples
- Symmetric trees: Faster prediction at inference time
- Built-in regularization: Less prone to overfitting out of the box
Implementation
from puffin.ensembles import CatBoostTrader
# Prepare data with categorical features
features = df[["return_5d", "volatility_20d", "sector", "market_cap"]].copy()
features["sector"] = features["sector"].astype(str) # Ensure it's string type
# Initialize model
model = CatBoostTrader(task="classification", random_state=42)
# Fit with categorical features
cat_features = ["sector"]
model.fit(features, target, cat_features=cat_features)
# Get feature importance
importance = model.feature_importance()
print(importance)
# Cross-validate
cv_results = model.cross_validate(features, target, cv=5, cat_features=cat_features)
print(f"CV Score: {cv_results['mean_score']:.3f} ± {cv_results['std_score']:.3f}")
CatBoost’s ordered boosting uses permutation-based target encoding internally. While this reduces overfitting, it means training is slower than LightGBM. The tradeoff is worthwhile when you have many categorical features (sector, exchange, rating, etc.).
Choosing Between Frameworks
| Use Case | Recommended Framework |
|---|---|
| Quick baseline | XGBoost (well-documented, easy to tune) |
| Large datasets (1M+ rows) | LightGBM (fastest training, lowest memory) |
| Many categorical features | CatBoost (native support, no encoding needed) |
| Production inference speed | CatBoost (symmetric trees) or LightGBM |
| Kaggle-style competitions | XGBoost or LightGBM |
In practice, the best approach is to train all three and combine them in an ensemble long-short strategy. Model diversity improves robustness – each framework learns slightly different patterns from the same data.
Source Code
Browse the implementation: puffin/ensembles/