LSTM Fundamentals
Long Short-Term Memory (LSTM) networks were designed by Hochreiter and Schmidhuber (1997) to solve the vanishing gradient problem that plagues vanilla RNNs. By introducing a gated cell structure, LSTMs can learn to retain or discard information over long sequences, making them well-suited for financial time series where patterns may span dozens or hundreds of time steps.
LSTM Architecture
An LSTM cell contains two state vectors and three gates that regulate the flow of information:
- Cell State (C_t): The long-term memory that runs through the entire chain
- Hidden State (h_t): The short-term output passed to the next time step
- Three Gates:
- Forget Gate: Decides what to remove from cell state
- Input Gate: Decides what new information to add
- Output Gate: Decides what to output from the cell state
Mathematical Formulation
Forget Gate: f_t = σ(W_f · [h_{t-1}, x_t] + b_f)
Input Gate: i_t = σ(W_i · [h_{t-1}, x_t] + b_i)
Candidate: C̃_t = tanh(W_C · [h_{t-1}, x_t] + b_C)
Cell State: C_t = f_t * C_{t-1} + i_t * C̃_t
Output Gate: o_t = σ(W_o · [h_{t-1}, x_t] + b_o)
Hidden State: h_t = o_t * tanh(C_t)
Where:
- σ is the sigmoid function (output between 0 and 1)
- * denotes element-wise multiplication
- [h_{t-1}, x_t] is the concatenation of the previous hidden state and the current input
The cell state is the key innovation. Because it flows through the chain with only linear interactions (element-wise multiply and add), gradients can propagate backward through many time steps without vanishing or exploding.
Implementing LSTM for Price Prediction
The LSTMNet class in Puffin provides a configurable LSTM network, and TradingLSTM wraps it with data preparation, training, and prediction utilities.
The LSTMNet Module
import torch
import torch.nn as nn
from puffin.deep.rnn import LSTMNet
# LSTMNet is a configurable LSTM network
model = LSTMNet(
input_dim=1, # single feature (e.g., closing price)
hidden_dim=64, # 64 hidden units per layer
num_layers=2, # 2 stacked LSTM layers
output_dim=1, # predict a single value
dropout=0.2 # dropout between layers
)
# Forward pass with dummy data
x = torch.randn(16, 20, 1) # batch=16, seq_len=20, features=1
output = model(x)
print(f"Output shape: {output.shape}") # torch.Size([16, 1])
Training with TradingLSTM
The TradingLSTM class handles sequence windowing, train/validation splitting, normalization, and the training loop internally.
import yfinance as yf
from puffin.deep.rnn import TradingLSTM
# Download stock data
ticker = yf.Ticker("AAPL")
df = ticker.history(period="2y")
prices = df['Close'].values
# Create and train LSTM
lstm = TradingLSTM()
history = lstm.fit(
prices,
lookback=20, # use 20 days of history per sample
epochs=50,
lr=0.001,
batch_size=32
)
# Make predictions
predictions = lstm.predict(prices, steps=5)
print(f"Next 5 predictions: {predictions}")
Start with a lookback of 20 trading days (approximately one month). Shorter windows respond faster to regime changes but capture less context; longer windows are more stable but slower to adapt.
Visualizing Training Progress
Monitoring the gap between training and validation loss is essential for detecting overfitting.
import matplotlib.pyplot as plt
# Plot training history
plt.figure(figsize=(10, 5))
plt.plot(history['train_loss'], label='Training Loss')
plt.plot(history['val_loss'], label='Validation Loss')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.title('LSTM Training Progress')
plt.legend()
plt.grid(True)
plt.show()
Multivariate Time Series Prediction
Real trading models typically use multiple input features – price, volume, technical indicators – rather than a single closing price. The MultivariateLSTM class extends TradingLSTM to handle DataFrames with multiple columns.
Preparing Multivariate Features
import pandas as pd
import numpy as np
from puffin.deep.rnn import MultivariateLSTM
# Download data
ticker = yf.Ticker("AAPL")
df = ticker.history(period="2y")
# Create features
features = pd.DataFrame({
'close': df['Close'],
'volume': df['Volume'],
'high': df['High'],
'low': df['Low'],
'returns': df['Close'].pct_change()
})
# Add technical indicators
features['sma_20'] = df['Close'].rolling(20).mean()
features['volatility'] = features['returns'].rolling(20).std()
# Create target (next day's return)
features['target'] = features['returns'].shift(-1)
features = features.dropna()
Training the Multivariate Model
# Train multivariate LSTM
lstm = MultivariateLSTM()
history = lstm.fit(
features,
target_col='target',
lookback=20,
epochs=50,
lr=0.001,
hidden_dims=[128, 64, 32]
)
# Predict using all features
prediction = lstm.predict(features.iloc[-20:])
print(f"Predicted next return: {prediction[0]:.4f}")
Comprehensive Feature Engineering
A richer feature set often improves predictive power. The function below builds a broad set of price-derived features suitable for a multivariate LSTM.
def create_trading_features(df):
"""Create comprehensive feature set for trading."""
features = pd.DataFrame(index=df.index)
# Price features
features['close'] = df['Close']
features['high'] = df['High']
features['low'] = df['Low']
features['volume'] = df['Volume']
# Returns
features['returns'] = df['Close'].pct_change()
features['log_returns'] = np.log(df['Close']).diff()
# Volatility
features['volatility'] = features['returns'].rolling(20).std()
# Moving averages and ratios
for window in [5, 10, 20, 50]:
features[f'sma_{window}'] = df['Close'].rolling(window).mean()
features[f'sma_{window}_ratio'] = df['Close'] / features[f'sma_{window}']
# Volume indicators
features['volume_sma'] = df['Volume'].rolling(20).mean()
features['volume_ratio'] = df['Volume'] / features['volume_sma']
return features.dropna()
# Build features and train
features = create_trading_features(df)
features['target'] = features['returns'].shift(-1)
features = features.dropna()
lstm = MultivariateLSTM()
history = lstm.fit(features, target_col='target', lookback=30, epochs=50)
Always normalize or standardize features before feeding them to an LSTM. The
TradingLSTMandMultivariateLSTMclasses handle this internally, but if you build a custom pipeline, applyStandardScaleror min-max scaling to each feature independently.
Practical Considerations
Choosing the Lookback Window
The lookback (sequence length) is one of the most impactful hyperparameters. Too short and the model misses longer-term patterns; too long and training slows with diminishing returns.
from puffin.deep.rnn import TradingLSTM
# Experiment with different lookback periods
lookbacks = [10, 20, 30, 50]
results = {}
for lookback in lookbacks:
lstm = TradingLSTM()
history = lstm.fit(prices, lookback=lookback, epochs=30)
results[lookback] = history['val_loss'][-1]
# Find best lookback
best_lookback = min(results, key=results.get)
print(f"Best lookback period: {best_lookback} (val_loss: {results[best_lookback]:.4f})")
For daily equity data, lookback values between 15 and 60 tend to work well. For intraday data with more observations, consider longer windows (60-200 bars).
Source Code
- LSTMNet and TradingLSTM:
puffin/deep/rnn.py - MultivariateLSTM:
puffin/deep/rnn.py - Training utilities:
puffin/deep/training.py