How to Build a Forecasting Pipeline with TimeCopilot Using Foundation Models and Automated Anomaly Detection

In this tutorial, we build an end-to-end forecasting workflow with TimeCopilot. We prepare a panel dataset containing real airline passenger data and a synthetic seasonal series with injected anomalies, then evaluate a diverse collection of statistical, foundation, and optional GPU-based forecasting models. We use rolling cross-validation and multiple error metrics to identify the strongest model, generate probabilistic forecasts with prediction intervals, visualize future trends, and detect unusual observations. Finally, we explore TimeCopilot’s optional LLM agent, which selects a forecasting model and translates its predictions into an accessible analytical response.

Installing TimeCopilot and Pinning Compatible NumPy and SciPy Versions

!pip install -q "timecopilot" "utilsforecast" "matplotlib"
!pip install -q --force-reinstall --no-deps "numpy==1.26.4" "scipy==1.13.1"
print("Setup complete. Restarting the runtime to load clean binaries...")
import IPython
IPython.Application.instance().kernel.do_shutdown(True)

We install TimeCopilot, UtilsForecast, and Matplotlib to prepare the forecasting environment. We enforce compatible NumPy and SciPy versions to prevent binary conflicts. We then restart the Colab runtime so the updated libraries load correctly.

Loading AirPassengers Data and Building a Synthetic Anomaly Panel

import os, warnings
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
warnings.filterwarnings("ignore")
pd.set_option("display.width", 160)
pd.set_option("display.max_columns", 30)
print("numpy:", np.__version__)
import scipy; print("scipy:", scipy.__version__)
try:
   import torch
   HAS_GPU = torch.cuda.is_available()
except Exception:
   HAS_GPU = False
print(f"GPU available: {HAS_GPU}")
df = pd.read_csv(
   "https://timecopilot.s3.amazonaws.com/public/data/air_passengers.csv",
   parse_dates=["ds"],
)
df["unique_id"] = df["unique_id"].astype(str)
rng = np.random.default_rng(7)
dates = df["ds"].unique(); n = len(dates)
synth = pd.DataFrame({
   "unique_id": "Synthetic",
   "ds": dates,
   "y": (np.linspace(50, 250, n)
         + 40 * np.sin(2 * np.pi * np.arange(n) / 12)
         + rng.normal(0, 8, n)).round(2),
})
anomaly_idx = [30, 75, 120]
synth.loc[anomaly_idx, "y"] *= 2.2
panel = pd.concat([df[["unique_id", "ds", "y"]], synth], ignore_index=True)
print("nPanel shape:", panel.shape)
print(panel.groupby("unique_id")["y"].agg(["count", "mean", "min", "max"]))
H, FREQ = 12, "MS"

We import the required libraries, verify the environment, and detect GPU availability. We load the AirPassengers dataset and create a second synthetic seasonal series with injected spikes. We combine the two series into a panel dataset and set the forecasting horizon and monthly frequency.

Configuring Statistical, Prophet, and Chronos Forecasting Models

from timecopilot.forecaster import TimeCopilotForecaster
from timecopilot.models.stats import AutoARIMA, AutoETS, SeasonalNaive, Theta
from timecopilot.models.prophet import Prophet
from timecopilot.models.foundation.chronos import Chronos
chronos_repo = "amazon/chronos-bolt-small" if HAS_GPU else "amazon/chronos-bolt-tiny"
models = [
   SeasonalNaive(), AutoETS(), AutoARIMA(), Theta(), Prophet(),
   Chronos(repo_id=chronos_repo, alias="Chronos"),
]
if HAS_GPU:
   try:
       from timecopilot.models.foundation.timesfm import TimesFM
       models.append(TimesFM(repo_id="google/timesfm-2.0-500m-pytorch", alias="TimesFM"))
   except Exception as e:
       print("Skipping TimesFM:", e)
tcf = TimeCopilotForecaster(models=models)
print("nModels:", [getattr(m, "alias", type(m).__name__) for m in models])

We configure a diverse collection of statistical, Prophet, and Chronos forecasting models. We select the Chronos model size according to the available hardware and optionally include TimesFM when a GPU is present. We then initialize TimeCopilotForecaster to manage all models through one consistent interface.

Running Rolling Cross-Validation and Ranking Models by RMSE

print("nRunning cross-validation (slow step: foundation weights download)...")
cv_df = tcf.cross_validation(df=panel, h=H, freq=FREQ, n_windows=3)
print(cv_df.head())
from utilsforecast.evaluation import evaluate
from utilsforecast.losses import mae, rmse, mape
eval_df = evaluate(cv_df.drop(columns=["cutoff"]), metrics=[mae, rmse, mape])
print("n=== Per-series error (lower = better) ===")
print(eval_df.round(3))
model_cols = [c for c in eval_df.columns if c not in ("unique_id", "metric")]
leaderboard = (eval_df.groupby("metric")[model_cols].mean().T.sort_values("rmse"))
print("n=== Leaderboard (mean across series) ===")
print(leaderboard.round(3))
best_model = leaderboard.index[0]
print(f"n>>> Best model by mean RMSE: {best_model}")

We perform rolling cross-validation across three windows to measure each model’s forecasting performance. We calculate MAE, RMSE, and MAPE for every series and aggregate the results into a leaderboard. We identify the model with the lowest mean RMSE for subsequent forecasting and visualization.

Generating Probabilistic Forecasts with Prediction Intervals

fcst_df = tcf.forecast(df=panel, h=H, freq=FREQ, level=[80, 95])
print("nForecast columns:", list(fcst_df.columns))
def plot_series(uid, point_model=best_model):
   hist = panel[panel["unique_id"] == uid]; fc = fcst_df[fcst_df["unique_id"] == uid]
   plt.figure(figsize=(11, 4)); plt.plot(hist["ds"], hist["y"], color="black", label="history")
   if point_model in fc.columns:
       plt.plot(fc["ds"], fc[point_model], color="C0", label=f"{point_model} forecast")
       lo, hi = f"{point_model}-lo-95", f"{point_model}-hi-95"
       if lo in fc.columns and hi in fc.columns:
           plt.fill_between(fc["ds"], fc[lo], fc[hi], alpha=0.25, color="C0", label="95% interval")
   plt.title(f"{uid} — {point_model}"); plt.legend(); plt.tight_layout(); plt.show()
for uid in panel["unique_id"].unique():
   plot_series(uid)

We generate 12-month probabilistic forecasts with 80% and 95% prediction intervals. We define a reusable plotting function that displays historical values, point forecasts, and uncertainty ranges. We apply this function to each series to compare its observed history with the predicted future trajectory.

Detecting Anomalies Across the Forecasting Panel

print("nRunning anomaly detection...")
anomalies_df = tcf.detect_anomalies(df=panel, h=H, freq=FREQ, level=99)
anom_cols = [c for c in anomalies_df.columns if c.endswith("-anomaly")]
if anom_cols:
   flagged = anomalies_df[anomalies_df[anom_cols].any(axis=1)]
   print(f"Flagged points (>=1 model): {len(flagged)}")
   print(flagged[["unique_id", "ds", "y"] + anom_cols].head(20).to_string(index=False))
   col = f"{best_model}-anomaly"
   if col not in anomalies_df.columns: col = anom_cols[0]
   sub = anomalies_df[anomalies_df["unique_id"] == "Synthetic"]
   pts = sub[sub[col] == True]
   plt.figure(figsize=(11, 4)); plt.plot(sub["ds"], sub["y"], color="black", label="value")
   plt.scatter(pts["ds"], pts["y"], color="red", zorder=5, label=f"anomaly ({col})")
   plt.title("Anomaly detection — Synthetic series"); plt.legend(); plt.tight_layout(); plt.show()
else:
   print(anomalies_df.head())

Interpreting Forecasts with the TimeCopilot LLM Agent

from timecopilot import TimeCopilot
if os.environ.get("OPENAI_API_KEY") or os.environ.get("ANTHROPIC_API_KEY"):
   llm = "openai:gpt-4o" if os.environ.get("OPENAI_API_KEY") else "anthropic:claude-sonnet-4-5"
   tc = TimeCopilot(llm=llm, retries=3)
   single = panel[panel["unique_id"] == "AirPassengers"]
   result = tc.forecast(df=single, freq=FREQ, h=H,
                        query="Total air passengers expected over the next 12 months, and which months peak?")
   out = result.output
   print("n=== AGENT REPORT ===")
   print("Selected model:", out.selected_model)
   print("Beats SeasonalNaive:", out.is_better_than_seasonal_naive)
   print("Why:", out.reason_for_selection)
   print("Answer:", out.user_query_response)
   print(result.fcst_df.head())
else:
   print("n[Agent section skipped] No LLM key. Everything above ran key-free.")
print("nDone. ✅")

We detect anomalies across the panel and visualize the flagged observations in the synthetic series. We optionally initialize the TimeCopilot LLM agent when an OpenAI or Anthropic API key is available. We use the agent to select a model, evaluate it against SeasonalNaive, and explain the forecast in response to a practical question.

Conclusion

In conclusion, we created a unified TimeCopilot pipeline that takes us from data preparation to model evaluation, probabilistic forecasting, visualization, anomaly detection, and agent-driven interpretation. We compared traditional statistical methods with modern foundation models within a consistent cross-validation framework and selected the best-performing approach based on objective error metrics. We also quantified forecast uncertainty through prediction intervals and identified abnormal observations across multiple time series. By combining automated forecasting with an optional LLM agent, we produced both accurate numerical predictions and clear, decision-oriented insights within a single workflow.


Check out the Full Codes with Notebook. Also, feel free to follow us on Twitter and don’t forget to join our 150k+ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.

Need to partner with us for promoting your GitHub Repo OR Hugging Face Page OR Product Release OR Webinar etc.? Connect with us

The post How to Build a Forecasting Pipeline with TimeCopilot Using Foundation Models and Automated Anomaly Detection appeared first on MarkTechPost.