Is It Even Forecastable? Triage Before You Model
Most forecasting projects start with model selection. They should start with a cheaper question — does this series contain exploitable structure at the horizon you care about?
- #Forecastability
- #TimeSeries
- #OpenSource
Most forecasting projects start too late:
data → model search → tuning → more features → more compute → still poor results
By the time you've benchmarked Prophet against a gradient-boosted lag model against an N-BEATS variant, you've spent weeks answering the wrong question. The question that should come first is cheaper and more honest:
Does this series contain exploitable structure — and at what horizon?
A pre-modeling layer, not another forecaster
This is the idea behind dependence-forecastability, the open-source toolkit I maintain. It's deliberately not another forecasting library. It's a deterministic diagnostic you run before model search to inspect:
- Readiness — is there learnable signal at all, or are you about to fit noise?
- Informative horizons — the series might be predictable one step ahead and pure noise ten steps ahead. Those are different projects.
- Target lags & seasonality — which past values actually carry information.
- Covariate usefulness — does that external driver help, and at which lag?
- Leakage risk — the feature that looks brilliant in backtest because it quietly encodes the future.
Why dependence, not correlation
Linear correlation misses the structure that non-linear models exploit — and flags structure that isn't there. The toolkit leans on information-theoretic measures (average mutual information, transfer entropy) with proper multiple-comparison correction across lags, so you don't fool yourself by testing twenty horizons and celebrating the one that looked significant by chance.
Covariates: surviving the conditioning test
Pairwise dependence alone isn't enough. A covariate can look informative simply because it shares a common driver with the target — a negative control that sails through naïve, even redundancy-aware, feature selection. The fix is a two-step screen:
- CrossAMI asks: does this covariate contain information about the target?
- CrosspAMI asks: does that information survive after conditioning on the target's own history?
The retention ratio between the two answers the question that matters: how
much of the apparent external signal is actually new. The covariate workflow
then runs CrossAMI → CrosspAMI retention filter → lag-aware sparse selection → a forecast-preparation contract — diagnose first, screen covariates, pick
forecast-safe lags, and hand structured inputs to whatever forecaster you like
(Nixtla / MLForecast, Darts, your own).
The payoff
Forecastability triage turns the most expensive guesswork in a forecasting engagement — is this learnable, and with what? — into a fast, reproducible diagnostic. Sometimes the most valuable output is the one nobody wants to hear: this series isn't forecastable at your horizon; stop here and save the compute. Knowing that on day one is worth more than a fortnight of model search.