Diagnostics & Stationarity
Stationarity is a fundamental concept in time series analysis. A stationary time series has statistical properties (mean, variance, autocorrelation) that do not change over time. Before fitting any time series model, you must verify that your data meets the stationarity assumption.
Why Stationarity Matters
Most time series models (ARIMA, VAR) assume stationarity. Fitting these models to non-stationary data leads to spurious regressions, invalid statistical inference, and poor forecasting performance.
Non-stationary series exhibit trends, changing variance, or structural breaks. Common examples include:
- Stock prices: Exhibit random-walk behavior (non-stationary)
- Stock returns: Typically stationary (differences of log-prices)
- Volatility: May be stationary but with clustering effects
Testing for Stationarity
The Puffin library provides two main stationarity tests that are complementary:
- Augmented Dickey-Fuller (ADF) Test: Null hypothesis is non-stationarity (unit root present)
- KPSS Test: Null hypothesis is stationarity (opposite of ADF)
Using both tests together provides a more robust assessment. If ADF rejects and KPSS does not reject, the series is likely stationary. If both reject or neither rejects, the result is ambiguous.
ADF Test
import numpy as np
import pandas as pd
from puffin.models import test_stationarity, test_kpss, check_stationarity
# Generate example data
np.random.seed(42)
returns = pd.Series(np.random.randn(252)) # Stationary
prices = pd.Series(np.random.randn(252).cumsum()) # Non-stationary
# Test returns (should be stationary)
adf_result = test_stationarity(returns)
print(f"Returns - Stationary: {adf_result['is_stationary']}")
print(f"ADF p-value: {adf_result['p_value']:.4f}")
# Test prices (should be non-stationary)
price_result = test_stationarity(prices)
print(f"Prices - Stationary: {price_result['is_stationary']}")
KPSS Test
# KPSS test (null hypothesis is stationarity)
kpss_result = test_kpss(returns)
print(f"Returns - KPSS stationary: {kpss_result['is_stationary']}")
print(f"KPSS p-value: {kpss_result['p_value']:.4f}")
Comprehensive Stationarity Check
Use check_stationarity to run both tests and get a combined assessment:
# Comprehensive check using both tests
check_stationarity(returns, verbose=True)
This function runs both ADF and KPSS tests and reports whether they agree on the stationarity diagnosis.
Time Series Decomposition
Decomposing a time series into its structural components helps you understand the underlying patterns before modeling.
A time series can be decomposed into:
- Trend: Long-term movement in the level of the series
- Seasonal: Repeating patterns at fixed intervals
- Residual: Random variation after removing trend and seasonality
from puffin.models import decompose_series
# Generate seasonal data
t = np.arange(500)
trend = 0.05 * t
seasonal = 10 * np.sin(2 * np.pi * t / 52)
noise = np.random.randn(500)
series = pd.Series(trend + seasonal + noise)
series.index = pd.date_range('2020-01-01', periods=500, freq='D')
# Decompose
components = decompose_series(series, period=52)
print("Components:", components.keys())
# Components: trend, seasonal, residual, observed
The choice of decomposition method (additive vs. multiplicative) depends on whether the seasonal effect is constant (additive) or proportional to the level (multiplicative). Financial returns typically use additive decomposition.
Autocorrelation Analysis
Autocorrelation measures the correlation of a time series with its own lagged values. It is essential for:
- Identifying the order of AR and MA components in ARIMA models
- Detecting seasonality and cyclical patterns
- Checking whether model residuals are white noise
ACF and PACF
- ACF (Autocorrelation Function): Shows correlation at each lag, including indirect effects through intermediate lags
- PACF (Partial Autocorrelation Function): Shows the direct correlation at each lag, controlling for shorter lags
| Pattern | ACF | PACF | Suggested Model |
|---|---|---|---|
| Cuts off at lag q | Tails off | – | MA(q) |
| Tails off | – | Cuts off at lag p | AR(p) |
| Tails off | Tails off | – | ARMA(p,q) |
Plotting ACF and PACF
from puffin.models import plot_acf_pacf, autocorrelation
import matplotlib.pyplot as plt
# Plot ACF and PACF
fig = plot_acf_pacf(returns, nlags=20)
plt.show()
Computing Autocorrelation Values
# Get autocorrelation values
acf_values = autocorrelation(returns, nlags=40)
print(f"First-order autocorrelation: {acf_values[1]:.3f}")
A first-order autocorrelation near zero suggests that returns are not predictable from the immediately preceding value – consistent with weak-form market efficiency. Significant autocorrelation at specific lags can indicate exploitable patterns.
Practical Guidelines
Rule of thumb: If the ADF p-value is below 0.05 and the KPSS p-value is above 0.05, you can treat the series as stationary and proceed with modeling.
- Always test raw data first before differencing or transforming.
- Apply log-returns for price data – this typically produces stationary series.
- Check residuals after fitting any model to confirm they resemble white noise.
- Beware of structural breaks – a series may appear non-stationary due to regime changes rather than unit roots.
Source Code
Browse the implementation: puffin/models/