Variational Autoencoders (VAE)

A Variational Autoencoder replaces the deterministic bottleneck of a standard autoencoder with a probabilistic latent space. Instead of mapping each input to a single point, the encoder outputs the parameters of a distribution (mean and variance). This enables two capabilities that standard autoencoders lack: principled anomaly detection via likelihood estimation, and generation of new synthetic samples by sampling from the learned distribution.

The VAE implementation in puffin.deep.autoencoder uses the reparameterization trick for backpropagation through stochastic nodes and supports a configurable beta parameter for disentangled representations (beta-VAE).

How VAEs Differ from Standard Autoencoders

In a standard autoencoder, the encoder maps input x to a fixed latent vector z. In a VAE, the encoder maps x to a distribution over z, parameterized by mean (mu) and log-variance (log_var):

Component Standard AE VAE
Encoder output Single vector z Distribution parameters (mu, log_var)
Latent space Deterministic Stochastic (sample from N(mu, sigma))
Loss function Reconstruction only Reconstruction + KL divergence
Generation Not supported Sample z ~ N(0, I), decode

The VAE Loss Function

The VAE loss has two terms that balance reconstruction quality against latent space regularity:

Loss = Reconstruction Loss + KL Divergence
     = MSE(x, reconstructed_x) + KL(q(z|x) || p(z))
  • Reconstruction loss: How well the decoder reproduces the input (MSE)
  • KL divergence: How close the learned distribution q(z x) is to the prior p(z) = N(0, I)

The KL term prevents the encoder from collapsing to a single point per input, ensuring the latent space is smooth and continuous – which is what makes generation possible.

If the KL term dominates, the model ignores input information and produces blurry reconstructions (posterior collapse). If the reconstruction term dominates, the latent space becomes disconnected and generation quality degrades. Tune the beta parameter to balance these.

The Reparameterization Trick

Sampling from N(mu, sigma) is not differentiable, so gradients cannot flow through it. The reparameterization trick rewrites the sampling as a deterministic function of the parameters plus external noise:

z = mu + sigma * epsilon,   where epsilon ~ N(0, I)

This makes the sampling differentiable with respect to mu and sigma, enabling standard backpropagation.

Basic VAE Usage

import numpy as np
from puffin.deep.autoencoder import VAE, AETrainer

# Simulate market data
np.random.seed(42)
n_samples = 1000
n_features = 100
market_data = np.random.randn(n_samples, n_features)

# Create VAE
model = VAE(
    input_dim=100,
    latent_dim=20,
    hidden_dims=[80, 50, 30]
)

# Train VAE (AETrainer auto-detects VAE and uses combined loss)
trainer = AETrainer()
history = trainer.fit(model, market_data, epochs=100)

# Extract features (mean of latent distribution)
latent_features = trainer.extract_features(model, market_data)
print(f"Latent shape: {latent_features.shape}")  # (1000, 20)

# Generate synthetic market data
synthetic_data = model.sample(n=100)
print(f"Generated synthetic samples: {synthetic_data.shape}")  # (100, 100)

The AETrainer automatically detects when a VAE is passed and switches to the combined reconstruction + KL divergence loss. No special configuration is needed.

VAE Loss Implementation

The AETrainer.vae_loss static method computes the combined loss with an optional beta weight:

from puffin.deep.autoencoder import AETrainer
import torch

# The loss function used internally by AETrainer
# recon: reconstructed output, x: original input
# mu, log_var: encoder distribution parameters
# beta: weight on KL term (default 1.0)

# Reconstruction loss (MSE, summed over features)
recon_loss = torch.nn.MSELoss(reduction='sum')(recon, x)

# KL divergence: -0.5 * sum(1 + log_var - mu^2 - exp(log_var))
kl_loss = -0.5 * torch.sum(1 + log_var - mu.pow(2) - log_var.exp())

# Total loss, averaged over batch
total_loss = (recon_loss + beta * kl_loss) / x.size(0)

Beta-VAE for Disentangled Representations

Setting beta > 1 encourages more disentangled latent dimensions, where each dimension captures a single independent factor of variation:

beta Behavior Trading Use
0.5 Favor reconstruction Better feature extraction accuracy
1.0 Standard VAE Balanced reconstruction and generation
2.0+ Favor disentanglement Interpretable latent factors (e.g., momentum, volatility)

Synthetic Data Generation

One of the most powerful applications of VAEs in trading is generating synthetic market scenarios for stress testing and data augmentation.

import numpy as np
from puffin.deep.autoencoder import VAE, AETrainer
from sklearn.preprocessing import StandardScaler

# Prepare and normalize data
np.random.seed(42)
X = np.random.randn(1000, 50)
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Train VAE on historical data
vae = VAE(input_dim=50, latent_dim=10)
trainer = AETrainer()
trainer.fit(vae, X_scaled, epochs=100)

# Generate synthetic scenarios
n_scenarios = 1000
synthetic_scenarios = vae.sample(n=n_scenarios)
print(f"Generated {n_scenarios} synthetic market scenarios")
print(f"Shape: {synthetic_scenarios.shape}")  # (1000, 50)

# Inverse transform to original scale
synthetic_original = scaler.inverse_transform(
    synthetic_scenarios.detach().numpy()
)

Synthetic data from a VAE preserves the statistical relationships (correlations, distributions) present in the training data while introducing realistic variation. This is valuable for backtesting strategies on scenarios that have not occurred historically.

Targeted Scenario Generation

You can interpolate in latent space to generate scenarios between known market states:

import torch
from puffin.deep.autoencoder import VAE

# Assume vae is already trained
vae.eval()

# Encode two known market states
with torch.no_grad():
    state_a = torch.FloatTensor(X_scaled[0:1])  # Calm market
    state_b = torch.FloatTensor(X_scaled[100:101])  # Volatile market

    mu_a, _ = vae.encode(state_a)
    mu_b, _ = vae.encode(state_b)

# Interpolate between states
n_steps = 10
interpolated = []
for alpha in np.linspace(0, 1, n_steps):
    z = (1 - alpha) * mu_a + alpha * mu_b
    scenario = vae.decode(z)
    interpolated.append(scenario.numpy())

interpolated = np.vstack(interpolated)
print(f"Generated {n_steps} interpolated scenarios: {interpolated.shape}")

Latent space interpolation is a powerful tool for stress testing. You can generate a smooth transition from “normal market” to “crisis” conditions by interpolating between the encoded representations of historical calm and stress periods.

Anomaly Detection with VAE

VAEs provide a principled anomaly score: the combination of reconstruction error and the distance of the encoded distribution from the prior.

import torch
import numpy as np
from puffin.deep.autoencoder import VAE, AETrainer

# Assume vae and trainer are already fitted
vae.eval()

with torch.no_grad():
    X_tensor = torch.FloatTensor(X_scaled)
    recon, mu, log_var = vae(X_tensor)

    # Reconstruction error per sample
    recon_error = torch.mean((X_tensor - recon) ** 2, dim=1).numpy()

    # KL divergence per sample (measures how unusual the encoding is)
    kl_per_sample = -0.5 * torch.sum(
        1 + log_var - mu.pow(2) - log_var.exp(), dim=1
    ).numpy()

    # Combined anomaly score
    anomaly_score = recon_error + 0.1 * kl_per_sample

# Flag anomalies
threshold = np.percentile(anomaly_score, 95)
anomalies = anomaly_score > threshold
print(f"Detected {anomalies.sum()} anomalous market states")

The relative weighting between reconstruction error and KL divergence in the anomaly score is a hyperparameter. Reconstruction error alone often works well for detecting gross anomalies, while the KL term catches subtler distributional shifts.

Data Augmentation for Rare Events

Markets exhibit rare but important events (crashes, squeezes, flash crashes) that appear only a few times in historical data. A VAE can augment these samples:

import torch
import numpy as np
from puffin.deep.autoencoder import VAE

# Assume vae is trained and we have identified rare event samples
# rare_indices = identify_rare_events(X_scaled)
rare_indices = np.array([10, 50, 200])  # Example indices

vae.eval()
with torch.no_grad():
    rare_samples = torch.FloatTensor(X_scaled[rare_indices])
    mu_rare, log_var_rare = vae.encode(rare_samples)

    # Generate variations around rare events
    n_augmented = 50
    augmented = []
    for _ in range(n_augmented):
        # Sample near the rare event encoding
        std = torch.exp(0.5 * log_var_rare)
        eps = torch.randn_like(std) * 0.5  # Smaller noise for proximity
        z = mu_rare + eps * std
        generated = vae.decode(z)
        augmented.append(generated.numpy())

augmented = np.vstack(augmented)
print(f"Augmented rare events: {augmented.shape}")

This gives downstream models more examples of tail events to learn from, reducing the bias toward common market conditions.

Monitoring VAE Training

Track both components of the loss separately to diagnose training issues:

import matplotlib.pyplot as plt
from puffin.deep.autoencoder import VAE, AETrainer

vae = VAE(input_dim=50, latent_dim=10)
trainer = AETrainer()
history = trainer.fit(vae, X_scaled, epochs=100, verbose=True)

# Plot combined loss
plt.figure(figsize=(10, 4))
plt.plot(history['train_loss'], label='Train Loss')
plt.plot(history['val_loss'], label='Val Loss')
plt.xlabel('Epoch')
plt.ylabel('Loss (Recon + KL)')
plt.legend()
plt.title('VAE Training Progress')
plt.show()

If the validation loss plateaus while training loss continues to drop, the model is overfitting. Increase the beta parameter or add dropout to the encoder/decoder layers.

Source Code