Puffin: Algorithmic Trading Guide

A comprehensive, hands-on guide to building algorithmic trading systems — from market fundamentals through machine learning, deep learning, and AI-assisted trading. Based on Stefan Jansen’s Machine Learning for Algorithmic Trading (2nd Edition) and extended with modern LLM-powered trading.

This guide is designed to be followed sequentially. Each part builds on concepts and code from previous parts.

Start Learning — Part 1: Market Foundations

Learning Path

graph TD
    A[Part 1: Market Foundations] --> B[Part 2: Data Pipeline]
    B --> C[Part 3: Alternative Data]
    B --> D[Part 4: Alpha Factors]
    D --> E[Part 5: Portfolio Optimization]
    D --> F[Part 6: Trading Strategies]
    F --> G[Part 7: Backtesting]
    G --> H[Part 8: Linear Models]
    H --> I[Part 9: Time Series Models]
    I --> J[Part 10: Bayesian ML]
    G --> K[Part 11: Tree Ensembles]
    K --> L[Part 12: Unsupervised Learning]
    L --> M[Part 13: NLP for Trading]
    M --> N[Part 14: Topic Modeling]
    N --> O[Part 15: Word Embeddings]
    O --> P[Part 16: Deep Learning]
    P --> Q[Part 17: CNNs]
    P --> R[Part 18: RNNs]
    P --> S[Part 19: Autoencoders]
    S --> T[Part 20: GANs & Synthetic Data]
    P --> U[Part 21: Deep RL]
    U --> V[Part 22: AI-Assisted Trading]
    G --> W[Part 23: Live Trading]
    W --> X[Part 24: Risk Management]
    X --> Y[Part 25: Monitoring & Analytics]
    W --> Z[Part 26: Real-Time Market Data]

    style A fill:#2d5016,stroke:#1a3a1a,color:#e8e0d4
    style B fill:#2d5016,stroke:#1a3a1a,color:#e8e0d4
    style C fill:#2d5016,stroke:#1a3a1a,color:#e8e0d4
    style D fill:#1a3a5c,stroke:#0d2137,color:#e8e0d4
    style E fill:#1a3a5c,stroke:#0d2137,color:#e8e0d4
    style F fill:#8b4513,stroke:#5c2d0a,color:#e8e0d4
    style G fill:#8b4513,stroke:#5c2d0a,color:#e8e0d4
    style H fill:#4a2060,stroke:#2d1040,color:#e8e0d4
    style I fill:#4a2060,stroke:#2d1040,color:#e8e0d4
    style J fill:#4a2060,stroke:#2d1040,color:#e8e0d4
    style K fill:#4a2060,stroke:#2d1040,color:#e8e0d4
    style L fill:#4a2060,stroke:#2d1040,color:#e8e0d4
    style M fill:#0d4a5c,stroke:#083040,color:#e8e0d4
    style N fill:#0d4a5c,stroke:#083040,color:#e8e0d4
    style O fill:#0d4a5c,stroke:#083040,color:#e8e0d4
    style P fill:#6b2d5b,stroke:#401a38,color:#e8e0d4
    style Q fill:#6b2d5b,stroke:#401a38,color:#e8e0d4
    style R fill:#6b2d5b,stroke:#401a38,color:#e8e0d4
    style S fill:#6b2d5b,stroke:#401a38,color:#e8e0d4
    style T fill:#6b2d5b,stroke:#401a38,color:#e8e0d4
    style U fill:#6b2d5b,stroke:#401a38,color:#e8e0d4
    style V fill:#8b4513,stroke:#5c2d0a,color:#e8e0d4
    style W fill:#7a2020,stroke:#4a1010,color:#e8e0d4
    style X fill:#7a2020,stroke:#4a1010,color:#e8e0d4
    style Y fill:#7a2020,stroke:#4a1010,color:#e8e0d4
    style Z fill:#7a2020,stroke:#4a1010,color:#e8e0d4

    click A "01-market-foundations/"
    click B "02-data-pipeline/"
    click C "03-alternative-data/"
    click D "04-alpha-factors/"
    click E "05-portfolio-optimization/"
    click F "06-trading-strategies/"
    click G "07-backtesting/"
    click H "08-linear-models/"
    click I "09-time-series-models/"
    click J "10-bayesian-ml/"
    click K "11-tree-ensembles/"
    click L "12-unsupervised-learning/"
    click M "13-nlp-trading/"
    click N "14-topic-modeling/"
    click O "15-word-embeddings/"
    click P "16-deep-learning/"
    click Q "17-cnns-for-trading/"
    click R "18-rnns-for-trading/"
    click S "19-autoencoders/"
    click T "20-synthetic-data-gans/"
    click U "21-deep-rl/"
    click V "22-ai-assisted-trading/"
    click W "23-live-trading/"
    click X "24-risk-management/"
    click Y "25-monitoring-analytics/"
    click Z "26-realtime-data/"

Parts

Part Topic What You’ll Learn
1. Market Foundations How markets work Exchanges, order books, asset classes, market microstructure
2. Data Pipeline Getting market data Data providers, caching, preprocessing, HDF5/Parquet storage
3. Alternative Data Non-traditional data Web scraping, earnings calls, alternative data evaluation
4. Alpha Factors Predictive signals TA-Lib, Kalman filter, wavelets, Alphalens, WorldQuant alphas
5. Portfolio Optimization Building portfolios Mean-variance, risk parity, HRP, pyfolio tearsheets
6. Trading Strategies Classical strategies Momentum, mean reversion, stat arb, market making
7. Backtesting Testing strategies Event-driven backtester, walk-forward analysis
8. Linear Models Linear ML OLS, ridge, lasso, Fama-French factor models
9. Time Series Models Time series ARIMA, VAR, GARCH, cointegration, pairs trading
10. Bayesian ML Bayesian methods PyMC, Bayesian Sharpe, stochastic volatility
11. Tree Ensembles Gradient boosting Random forests, XGBoost, LightGBM, CatBoost, SHAP
12. Unsupervised Learning Clustering & PCA Eigenportfolios, k-means, hierarchical clustering
13. NLP for Trading Text analysis spaCy, TF-IDF, naive Bayes, sentiment analysis
14. Topic Modeling Document topics LSI, LDA, pyLDAvis, earnings call analysis
15. Word Embeddings Semantic analysis word2vec, GloVe, doc2vec, BERT, SEC filings
16. Deep Learning Neural networks Feedforward NNs, PyTorch, TensorFlow, TensorBoard
17. CNNs for Trading Convolutions 1D CNN, CNN-TA image approach, transfer learning
18. RNNs for Trading Sequence models LSTM, GRU, stacked RNNs, sentiment classification
19. Autoencoders Feature extraction Denoising AE, VAE, conditional AE for pricing
20. GANs & Synthetic Data Data generation TimeGAN, synthetic financial time series
21. Deep RL RL agents Q-learning, DQN, DDQN, PPO, trading agents
22. AI-Assisted Trading LLM-powered trading Sentiment, news signals, AI agent portfolio mgmt
23. Live Trading Going live Paper trading, broker integration, order management
24. Risk Management Managing risk Position sizing, stop losses, VaR, portfolio controls
25. Monitoring & Analytics Tracking performance Dashboards, trade logs, P&L attribution
26. Real-Time Market Data Live streaming WebSocket feeds, order book depth, tick-to-bar aggregation

Prerequisites

  • Python 3.11+
  • Basic Python programming knowledge (NumPy and pandas familiarity recommended)
  • Familiarity with Jupyter notebooks
  • Basic understanding of financial markets (helpful but not required)
  • GPU recommended for Parts 16–21 (deep learning and RL)

Quick Start

# Clone the repository
git clone https://github.com/MichaelTien8901/puffin.git
cd puffin

# Install core dependencies
pip install -e .

# Install all optional dependencies (ML, NLP, AI, dashboard, dev tools)
pip install -e ".[all]"

# Copy environment template and add your API keys
cp .env.example .env

Project Structure

puffin/
├── docs/            # This tutorial site (Jekyll + Just the Docs)
├── puffin/          # Python package
│   ├── data/        # Data pipeline
│   ├── factors/     # Alpha factor research
│   ├── portfolio/   # Portfolio optimization
│   ├── strategies/  # Trading strategies
│   ├── backtest/    # Backtesting engine
│   ├── models/      # Linear & time series models
│   ├── ensembles/   # Tree ensemble models
│   ├── unsupervised/# PCA, clustering
│   ├── nlp/         # NLP for trading
│   ├── deep/        # Deep learning (CNN, RNN, AE, GAN)
│   ├── rl/          # Deep reinforcement learning
│   ├── ml/          # ML utilities
│   ├── ai/          # AI-assisted trading (LLMs)
│   ├── broker/      # Live trading
│   ├── risk/        # Risk management
│   └── monitor/     # Monitoring & analytics
├── tests/           # Test suite
└── notebooks/       # Interactive Jupyter notebooks