RNNs for Trading

Recurrent Neural Networks (RNNs) are specifically designed to handle sequential data, making them particularly well-suited for time series prediction in algorithmic trading. Unlike feedforward networks that treat each input independently, RNNs maintain an internal hidden state that captures information about previous time steps – a form of memory that is essential for modeling financial time series.

Why RNNs for Time Series?

Traditional feedforward neural networks have no concept of temporal ordering. RNNs address this by processing one time step at a time while carrying forward a hidden state that encodes the history of the sequence.

Key Characteristics:

  • Sequential Processing: Process one time step at a time while maintaining context
  • Parameter Sharing: Same weights used across all time steps
  • Variable-Length Sequences: Can handle sequences of different lengths
  • Memory: Hidden state acts as memory of past observations

RNNs are not the only option for sequence modeling. Transformers (covered in later chapters) have largely overtaken RNNs on NLP tasks, but LSTMs and GRUs remain competitive for shorter financial time series where the inductive bias of recurrence is beneficial.

The Vanishing Gradient Problem

One of the main challenges with basic (vanilla) RNNs is the vanishing gradient problem. During backpropagation through time (BPTT), gradients can become exponentially small as they propagate backward through many time steps, making it difficult to learn long-range dependencies.

import numpy as np

# Simulate gradient flow through time
def gradient_flow(num_steps, weight=0.5):
    """Show how gradients vanish over time steps."""
    gradient = 1.0
    gradients = [gradient]

    for t in range(num_steps):
        gradient *= weight  # Gradient gets multiplied at each step
        gradients.append(gradient)

    return gradients

# After 50 time steps with weight < 1
gradients = gradient_flow(50, weight=0.5)
print(f"Initial gradient: {gradients[0]}")
print(f"Gradient after 50 steps: {gradients[-1]:.2e}")
# Output: Gradient after 50 steps: 8.88e-16 (essentially zero!)

This motivates the gated architectures – LSTM and GRU – covered in the sub-pages below.

RNN Architecture Pipeline

The following diagram shows the progression from raw market data through RNN-based model architectures to trading decisions.

flowchart TB
    A[Raw Market Data] --> B[Feature Engineering]
    B --> C[Sequence Windowing]
    C --> D{RNN Architecture}

    D --> E[LSTM]
    D --> F[GRU]
    D --> G[Stacked LSTM]

    E --> H[Hidden States]
    F --> H
    G --> H

    H --> I[Dense Output Layer]
    I --> J{Task}

    J --> K[Price Prediction]
    J --> L[Sentiment Classification]
    J --> M[Signal Generation]

    K --> N[Trading Decision]
    L --> N
    M --> N

    classDef input fill:#1a3a5c,stroke:#0d2137,color:#e8e0d4
    classDef process fill:#2d5016,stroke:#1a3a1a,color:#e8e0d4
    classDef arch fill:#6b2d5b,stroke:#4a1e3f,color:#e8e0d4
    classDef output fill:#8b4513,stroke:#5c2e0d,color:#e8e0d4

    class A,B,C input
    class D,J process
    class E,F,G,H,I arch
    class K,L,M,N output

Chapter Overview

This chapter covers three core topics for applying recurrent networks to algorithmic trading:

LSTM Fundamentals

Long Short-Term Memory networks solve the vanishing gradient problem through a gated cell architecture with forget, input, and output gates. This section covers the mathematical formulation, a from-scratch implementation using PyTorch, univariate price prediction with the TradingLSTM wrapper, and multivariate forecasting with technical indicators via MultivariateLSTM.

Stacked LSTM & GRU

Stacking multiple recurrent layers builds hierarchical feature representations, where lower layers capture short-term patterns and upper layers capture longer-term structure. This section also introduces the GRU as a lighter alternative to LSTM, covers hyperparameter tuning, regularization techniques, and walk-forward evaluation with direction accuracy metrics.

Sentiment RNN

LSTMs with word embeddings are effective for classifying financial text sentiment. This section covers the SentimentLSTM and SentimentClassifier classes, pretrained GloVe embeddings, and integration of text-derived sentiment signals with price-based features for combined prediction.

Summary

RNNs and their gated variants provide a natural framework for sequential financial data:

  • LSTM: Three-gate architecture for capturing long-term dependencies in price and volume series
  • GRU: Simpler two-gate alternative with fewer parameters and faster training
  • Stacked architectures: Hierarchical feature learning through multiple recurrent layers
  • Sentiment analysis: LSTM with word embeddings for classifying financial text
  • Multivariate prediction: Combining price, volume, and indicator features in a single model

RNN-based price prediction is not a guaranteed path to profits. Markets are noisy and non-stationary. Always use walk-forward validation, test on out-of-sample data, and combine model predictions with proper risk management before deploying to live trading.

Notebook: Run the examples interactively in deep_learning.ipynb

Source Code

Further Reading


Table of contents