Standard & Denoising Autoencoders
Standard autoencoders learn compressed representations by minimizing reconstruction error. Denoising autoencoders add a twist: they corrupt the input with noise during training, forcing the network to learn robust features that look past microstructure noise, bid-ask bounce, and data quality issues.
Both the
AutoencoderandDenoisingAutoencoderclasses live inpuffin.deep.autoencoder. They share the sameAETrainerfor fitting and feature extraction.
Standard Autoencoder
Architecture
A feedforward autoencoder compresses n_features inputs through progressively smaller hidden layers to an encoding_dim bottleneck, then mirrors the architecture back out to reconstruct the original input.
import numpy as np
from puffin.deep.autoencoder import Autoencoder, AETrainer
# Create market data (e.g., 100 features from technical indicators)
np.random.seed(42)
n_samples = 1000
n_features = 100
# Simulate market features
market_data = np.random.randn(n_samples, n_features)
# Create autoencoder with progressive compression
model = Autoencoder(
input_dim=100,
encoding_dim=20, # Compress to 20 features
hidden_dims=[80, 50, 30] # Encoder layers
)
# Train the model
trainer = AETrainer()
history = trainer.fit(
model,
market_data,
epochs=100,
lr=0.001,
batch_size=64
)
# Extract compressed features
compressed_features = trainer.extract_features(model, market_data)
print(f"Original shape: {market_data.shape}") # (1000, 100)
print(f"Compressed shape: {compressed_features.shape}") # (1000, 20)
The hidden_dims parameter controls the encoder’s layer progression. The decoder automatically mirrors this in reverse order: 30 -> 50 -> 80 -> 100.
Use Cases for Standard Autoencoders
- Dimensionality reduction: Compress hundreds of technical indicators to a compact set of latent factors before feeding them to a downstream classifier or regression model
- Pre-processing: Reduce collinearity in feature sets where many indicators are correlated
- Feature extraction from tick data: Compress high-frequency order book snapshots into a manageable representation
A standard autoencoder with linear activations and MSE loss is mathematically equivalent to PCA. Adding nonlinear activations (ReLU) enables the network to capture nonlinear relationships that PCA misses entirely.
Choosing the Encoding Dimension
The encoding dimension determines the level of information compression. Too small and you lose signal; too large and the model memorizes noise.
# Rule of thumb: start with 10-30% of input dimension
input_dim = 100
encoding_dim = int(0.2 * input_dim) # 20 features
Progressive compression works best – avoid aggressive jumps:
# Good: Gradual compression
Autoencoder(
input_dim=100,
encoding_dim=10,
hidden_dims=[80, 60, 40, 20]
)
# Avoid: Too aggressive compression
Autoencoder(
input_dim=100,
encoding_dim=10,
hidden_dims=[10] # Direct jump loses too much
)
If the encoding dimension is too close to the input dimension, the autoencoder will learn an identity mapping and provide no useful compression. Monitor reconstruction loss on a held-out validation set to find the sweet spot.
Denoising Autoencoder
How Noise Injection Works
A denoising autoencoder deliberately corrupts each training sample with Gaussian noise, then trains to reconstruct the clean original. This forces the network to learn features that are robust to perturbations – exactly what you need when working with noisy market data.
from puffin.deep.autoencoder import DenoisingAutoencoder, AETrainer
# Create denoising autoencoder
model = DenoisingAutoencoder(
input_dim=100,
encoding_dim=20,
noise_factor=0.3 # Standard deviation of Gaussian noise
)
# Train with automatic noise injection
trainer = AETrainer()
history = trainer.fit(model, market_data, epochs=100)
# Extract robust features (no noise added during inference)
robust_features = trainer.extract_features(model, market_data)
During training, the DenoisingAutoencoder.forward() method adds noise when self.training is True. At inference time (after calling model.eval()), noise injection is disabled automatically.
Noise Factor Selection
The noise_factor parameter controls how much corruption is applied:
| noise_factor | Effect | When to Use |
|---|---|---|
| 0.1 | Light corruption | Clean institutional data |
| 0.3 | Moderate corruption | General market data with bid-ask noise |
| 0.5 | Heavy corruption | Extremely noisy alternative data |
Start with
noise_factor=0.3and tune based on validation reconstruction error. If the model cannot reconstruct well even on training data, reduce the noise.
Use Cases for Denoising Autoencoders
- Microstructure noise: Handle bid-ask spreads and tick-level noise in high-frequency data
- Data quality robustness: Learn features resistant to missing values and reporting errors
- Generalization: Denoising training acts as a regularizer, improving performance on new market conditions
Feature Extraction for Strategy Development
Both standard and denoising autoencoders excel at producing compact features for downstream models:
import numpy as np
from puffin.deep.autoencoder import Autoencoder, AETrainer
from sklearn.linear_model import Ridge
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
# Simulate market features and returns
np.random.seed(42)
X = np.random.randn(1000, 50)
y = np.random.randn(1000)
# Always normalize inputs before training
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
# Train autoencoder to extract key features
ae = Autoencoder(input_dim=50, encoding_dim=10)
trainer = AETrainer()
trainer.fit(ae, X_scaled, epochs=50)
# Use compressed features for a trading model
compressed_X = trainer.extract_features(ae, X_scaled)
model = Ridge()
model.fit(compressed_X, y)
predictions = model.predict(compressed_X)
Always normalize inputs with
StandardScalerbefore training autoencoders. Unnormalized data with features on different scales will cause the reconstruction loss to be dominated by high-variance features, ignoring low-variance but potentially informative ones.
Anomaly Detection
High reconstruction error signals that a sample does not fit the patterns the autoencoder learned from normal data. This is a natural anomaly detector for market surveillance.
import torch
import numpy as np
from puffin.deep.autoencoder import Autoencoder, AETrainer
from sklearn.preprocessing import StandardScaler
# Simulate and normalize market data
np.random.seed(42)
X = np.random.randn(1000, 50)
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
# Train autoencoder on normal market data
ae = Autoencoder(input_dim=50, encoding_dim=10)
trainer = AETrainer()
trainer.fit(ae, X_scaled, epochs=50)
# Calculate reconstruction errors
ae.eval()
with torch.no_grad():
X_tensor = torch.FloatTensor(X_scaled)
reconstructed = ae(X_tensor).numpy()
reconstruction_errors = np.mean((X_scaled - reconstructed) ** 2, axis=1)
# Define anomaly threshold (e.g., 95th percentile)
threshold = np.percentile(reconstruction_errors, 95)
anomalies = reconstruction_errors > threshold
print(f"Detected {anomalies.sum()} anomalies out of {len(X_scaled)} samples")
Anomaly detection works best when the autoencoder is trained on a representative sample of “normal” market behavior. Anomalies are then defined as samples that fall outside the learned distribution – flash crashes, liquidity events, or regime shifts.
Regime Detection via Latent Clustering
The latent space learned by an autoencoder can reveal hidden market regimes when combined with clustering:
from puffin.deep.autoencoder import Autoencoder, AETrainer
from sklearn.cluster import KMeans
import numpy as np
# Assume ae and trainer are already fitted from above
latent_features = trainer.extract_features(ae, X_scaled)
# Cluster into market regimes
n_regimes = 4
kmeans = KMeans(n_clusters=n_regimes, random_state=42)
regimes = kmeans.fit_predict(latent_features)
# Analyze regime characteristics
for regime in range(n_regimes):
regime_mask = regimes == regime
print(f"Regime {regime}: {regime_mask.sum()} samples")
Combining autoencoder features with K-Means or Gaussian Mixture Models gives you a data-driven regime indicator that can condition strategy behavior – for example, reducing position sizes in high-volatility regimes.
Training Best Practices
Learning Rate and Convergence
Start with a learning rate of 0.001 and monitor the training and validation loss curves:
import matplotlib.pyplot as plt
from puffin.deep.autoencoder import Autoencoder, AETrainer
ae = Autoencoder(input_dim=50, encoding_dim=10)
trainer = AETrainer()
history = trainer.fit(ae, X_scaled, epochs=100, lr=0.001)
# Plot loss to check convergence
plt.plot(history['train_loss'], label='Train')
plt.plot(history['val_loss'], label='Validation')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.legend()
plt.title('Autoencoder Training Convergence')
plt.show()
Early Stopping
The AETrainer tracks validation loss at each epoch. Implement early stopping to prevent overfitting:
from puffin.deep.training import EarlyStopping
best_val_loss = float('inf')
patience = 10
patience_counter = 0
for epoch in range(200):
# Training step (simplified)
# ...
if val_loss < best_val_loss:
best_val_loss = val_loss
patience_counter = 0
# Save best model weights
else:
patience_counter += 1
if patience_counter >= patience:
print(f"Early stopping at epoch {epoch}")
break
Evaluation Metrics
Track both reconstruction error and latent space quality:
import numpy as np
from sklearn.decomposition import PCA
import matplotlib.pyplot as plt
# Reconstruction metrics
mse = np.mean((X_scaled - reconstructed) ** 2)
mae = np.mean(np.abs(X_scaled - reconstructed))
print(f"MSE: {mse:.6f}, MAE: {mae:.6f}")
# Visualize latent space
latent = trainer.extract_features(ae, X_scaled)
pca = PCA(n_components=2)
latent_2d = pca.fit_transform(latent)
plt.scatter(latent_2d[:, 0], latent_2d[:, 1], alpha=0.5)
plt.xlabel('PC1')
plt.ylabel('PC2')
plt.title('Latent Space Visualization')
plt.show()
A well-trained autoencoder’s latent space should show meaningful structure when projected to 2D – clusters, gradients, or separations that correspond to known market conditions.
Source Code
- Autoencoder and DenoisingAutoencoder classes:
puffin/deep/autoencoder.py - AETrainer and training utilities:
puffin/deep/training.py