Puffin: Algorithmic Trading Guide

A comprehensive, hands-on guide to building algorithmic trading systems — from market fundamentals through machine learning, deep learning, and AI-assisted trading. Based on Stefan Jansen’s Machine Learning for Algorithmic Trading (2nd Edition) and extended with modern LLM-powered trading.

This guide is designed to be followed sequentially. Each part builds on concepts and code from previous parts.

Start Learning — Part 1: Market Foundations

Learning Path

graph TD
    A[Part 1: Market Foundations] --> B[Part 2: Data Pipeline]
    B --> C[Part 3: Alternative Data]
    B --> D[Part 4: Alpha Factors]
    D --> E[Part 5: Portfolio Optimization]
    D --> F[Part 6: Trading Strategies]
    F --> G[Part 7: Backtesting]
    G --> H[Part 8: Linear Models]
    H --> I[Part 9: Time Series Models]
    I --> J[Part 10: Bayesian ML]
    G --> K[Part 11: Tree Ensembles]
    K --> L[Part 12: Unsupervised Learning]
    L --> M[Part 13: NLP for Trading]
    M --> N[Part 14: Topic Modeling]
    N --> O[Part 15: Word Embeddings]
    O --> P[Part 16: Deep Learning]
    P --> Q[Part 17: CNNs]
    P --> R[Part 18: RNNs]
    P --> S[Part 19: Autoencoders]
    S --> T[Part 20: GANs & Synthetic Data]
    P --> U[Part 21: Deep RL]
    U --> V[Part 22: AI-Assisted Trading]
    G --> W[Part 23: Live Trading]
    W --> X[Part 24: Risk Management]
    X --> Y[Part 25: Monitoring & Analytics]
    W --> Z[Part 26: Real-Time Market Data]

    style A fill:#2d5016,stroke:#1a3a1a,color:#e8e0d4
    style B fill:#2d5016,stroke:#1a3a1a,color:#e8e0d4
    style C fill:#2d5016,stroke:#1a3a1a,color:#e8e0d4
    style D fill:#1a3a5c,stroke:#0d2137,color:#e8e0d4
    style E fill:#1a3a5c,stroke:#0d2137,color:#e8e0d4
    style F fill:#8b4513,stroke:#5c2d0a,color:#e8e0d4
    style G fill:#8b4513,stroke:#5c2d0a,color:#e8e0d4
    style H fill:#4a2060,stroke:#2d1040,color:#e8e0d4
    style I fill:#4a2060,stroke:#2d1040,color:#e8e0d4
    style J fill:#4a2060,stroke:#2d1040,color:#e8e0d4
    style K fill:#4a2060,stroke:#2d1040,color:#e8e0d4
    style L fill:#4a2060,stroke:#2d1040,color:#e8e0d4
    style M fill:#0d4a5c,stroke:#083040,color:#e8e0d4
    style N fill:#0d4a5c,stroke:#083040,color:#e8e0d4
    style O fill:#0d4a5c,stroke:#083040,color:#e8e0d4
    style P fill:#6b2d5b,stroke:#401a38,color:#e8e0d4
    style Q fill:#6b2d5b,stroke:#401a38,color:#e8e0d4
    style R fill:#6b2d5b,stroke:#401a38,color:#e8e0d4
    style S fill:#6b2d5b,stroke:#401a38,color:#e8e0d4
    style T fill:#6b2d5b,stroke:#401a38,color:#e8e0d4
    style U fill:#6b2d5b,stroke:#401a38,color:#e8e0d4
    style V fill:#8b4513,stroke:#5c2d0a,color:#e8e0d4
    style W fill:#7a2020,stroke:#4a1010,color:#e8e0d4
    style X fill:#7a2020,stroke:#4a1010,color:#e8e0d4
    style Y fill:#7a2020,stroke:#4a1010,color:#e8e0d4
    style Z fill:#7a2020,stroke:#4a1010,color:#e8e0d4

    click A "01-market-foundations/"
    click B "02-data-pipeline/"
    click C "03-alternative-data/"
    click D "04-alpha-factors/"
    click E "05-portfolio-optimization/"
    click F "06-trading-strategies/"
    click G "07-backtesting/"
    click H "08-linear-models/"
    click I "09-time-series-models/"
    click J "10-bayesian-ml/"
    click K "11-tree-ensembles/"
    click L "12-unsupervised-learning/"
    click M "13-nlp-trading/"
    click N "14-topic-modeling/"
    click O "15-word-embeddings/"
    click P "16-deep-learning/"
    click Q "17-cnns-for-trading/"
    click R "18-rnns-for-trading/"
    click S "19-autoencoders/"
    click T "20-synthetic-data-gans/"
    click U "21-deep-rl/"
    click V "22-ai-assisted-trading/"
    click W "23-live-trading/"
    click X "24-risk-management/"
    click Y "25-monitoring-analytics/"
    click Z "26-realtime-data/"

Parts

Part	Topic	What You’ll Learn
1. Market Foundations	How markets work	Exchanges, order books, asset classes, market microstructure
2. Data Pipeline	Getting market data	Data providers, caching, preprocessing, HDF5/Parquet storage
3. Alternative Data	Non-traditional data	Web scraping, earnings calls, alternative data evaluation
4. Alpha Factors	Predictive signals	TA-Lib, Kalman filter, wavelets, Alphalens, WorldQuant alphas
5. Portfolio Optimization	Building portfolios	Mean-variance, risk parity, HRP, pyfolio tearsheets
6. Trading Strategies	Classical strategies	Momentum, mean reversion, stat arb, market making
7. Backtesting	Testing strategies	Event-driven backtester, walk-forward analysis
8. Linear Models	Linear ML	OLS, ridge, lasso, Fama-French factor models
9. Time Series Models	Time series	ARIMA, VAR, GARCH, cointegration, pairs trading
10. Bayesian ML	Bayesian methods	PyMC, Bayesian Sharpe, stochastic volatility
11. Tree Ensembles	Gradient boosting	Random forests, XGBoost, LightGBM, CatBoost, SHAP
12. Unsupervised Learning	Clustering & PCA	Eigenportfolios, k-means, hierarchical clustering
13. NLP for Trading	Text analysis	spaCy, TF-IDF, naive Bayes, sentiment analysis
14. Topic Modeling	Document topics	LSI, LDA, pyLDAvis, earnings call analysis
15. Word Embeddings	Semantic analysis	word2vec, GloVe, doc2vec, BERT, SEC filings
16. Deep Learning	Neural networks	Feedforward NNs, PyTorch, TensorFlow, TensorBoard
17. CNNs for Trading	Convolutions	1D CNN, CNN-TA image approach, transfer learning
18. RNNs for Trading	Sequence models	LSTM, GRU, stacked RNNs, sentiment classification
19. Autoencoders	Feature extraction	Denoising AE, VAE, conditional AE for pricing
20. GANs & Synthetic Data	Data generation	TimeGAN, synthetic financial time series
21. Deep RL	RL agents	Q-learning, DQN, DDQN, PPO, trading agents
22. AI-Assisted Trading	LLM-powered trading	Sentiment, news signals, AI agent portfolio mgmt
23. Live Trading	Going live	Paper trading, broker integration, order management
24. Risk Management	Managing risk	Position sizing, stop losses, VaR, portfolio controls
25. Monitoring & Analytics	Tracking performance	Dashboards, trade logs, P&L attribution
26. Real-Time Market Data	Live streaming	WebSocket feeds, order book depth, tick-to-bar aggregation

Prerequisites

Python 3.11+
Basic Python programming knowledge (NumPy and pandas familiarity recommended)
Familiarity with Jupyter notebooks
Basic understanding of financial markets (helpful but not required)
GPU recommended for Parts 16–21 (deep learning and RL)

Quick Start

# Clone the repository
git clone https://github.com/MichaelTien8901/puffin.git
cd puffin

# Install core dependencies
pip install -e .

# Install all optional dependencies (ML, NLP, AI, dashboard, dev tools)
pip install -e ".[all]"

# Copy environment template and add your API keys
cp .env.example .env

Project Structure

puffin/
├── docs/            # This tutorial site (Jekyll + Just the Docs)
├── puffin/          # Python package
│   ├── data/        # Data pipeline
│   ├── factors/     # Alpha factor research
│   ├── portfolio/   # Portfolio optimization
│   ├── strategies/  # Trading strategies
│   ├── backtest/    # Backtesting engine
│   ├── models/      # Linear & time series models
│   ├── ensembles/   # Tree ensemble models
│   ├── unsupervised/# PCA, clustering
│   ├── nlp/         # NLP for trading
│   ├── deep/        # Deep learning (CNN, RNN, AE, GAN)
│   ├── rl/          # Deep reinforcement learning
│   ├── ml/          # ML utilities
│   ├── ai/          # AI-assisted trading (LLMs)
│   ├── broker/      # Live trading
│   ├── risk/        # Risk management
│   └── monitor/     # Monitoring & analytics
├── tests/           # Test suite
└── notebooks/       # Interactive Jupyter notebooks