How to Build a Forecasting Pipeline with TimeCopilot Using Foundation Models and Automated Anomaly Detection

Source: MarkTechPost

In this tutorial, we build an end-to-end forecasting workflow with TimeCopilot. We prepare a panel dataset containing real airline passenger data and a synthetic seasonal series with injected anomalies, then evaluate a diverse collection of statistical, foundation, and optional GPU-based forecasting models. We use rolling cross-validation and multiple error metrics to identify the strongest model, generate probabilistic forecasts with prediction intervals, visualize future trends, and detect unusual observations. Finally, we explore TimeCopilot’s optional LLM agent, which selects a forecasting model and translates its predictions into an accessible analytical response.

Installing TimeCopilot and Pinning Compatible NumPy and SciPy Versions

!pip install -q "timecopilot" "utilsforecast" "matplotlib" !pip install -q --force-reinstall --no-deps "numpy==1.26.4" "scipy==1.13.1" print("Setup complete. Restarting the runtime to load clean binaries...") import IPython IPython.Application.instance().kernel.do_shutdown(True)

We install TimeCopilot, UtilsForecast, and Matplotlib to prepare the forecasting environment. We enforce compatible NumPy and SciPy versions to prevent binary conflicts. We then restart the Colab runtime so the updated libraries load correctly.

Loading AirPassengers Data and Building a Synthetic Anomaly Panel

import os, warnings import numpy as np import pandas as pd import matplotlib.pyplot as plt warnings.filterwarnings("ignore") pd.set_option("display.width", 160) pd.set_option("display.max_columns", 30) print("numpy:", np.__version__) import scipy; print("scipy:", scipy.__version__) try:    import torch    HAS_GPU = torch.cuda.is_available() except Exception:    HAS_GPU = False print(f"GPU available: {HAS_GPU}") df = pd.read_csv(    "https://timecopilot.s3.amazonaws.com/public/data/air_passengers.csv",    parse_dates=["ds"], ) df["unique_id"] = df["unique_id"].astype(str) rng = np.random.default_rng(7) dates = df["ds"].unique(); n = len(dates) synth = pd.DataFrame({    "unique_id": "Synthetic",    "ds": dates,    "y": (np.linspace(50, 250, n)          + 40 * np.sin(2 * np.pi * np.arange(n) / 12)          + rng.normal(0, 8, n)).round(2), }) anomaly_idx = [30, 75, 120] synth.loc[anomaly_idx, "y"] *= 2.2 panel = pd.concat([df[["unique_id", "ds", "y"]], synth], ignore_index=True) print("nPanel shape:", panel.shape) print(panel.groupby("unique_id")["y"].agg(["count", "mean", "min", "max"])) H, FREQ = 12, "MS"

We import the required libraries, verify the environment, and detect GPU availability. We load the AirPassengers dataset and create a second synthetic seasonal series with injected spikes. We combine the two series into a panel dataset and set the forecasting horizon and monthly frequency.

Configuring Statistical, Prophet, and Chronos Forecasting Models

from timecopilot.forecaster import TimeCopilotForecaster from timecopilot.models.stats import AutoARIMA, AutoETS, SeasonalNaive, Theta from timecopilot.models.prophet import Prophet from timecopilot.models.foundation.chronos import Chronos chronos_repo = "amazon/chronos-bolt-small" if HAS_GPU else "amazon/chronos-bolt-tiny" models = [    SeasonalNaive(), AutoETS(), AutoARIMA(), Theta(), Prophet(),    Chronos(repo_id=chronos_repo, alias="Chronos"), ] if HAS_GPU:    try:        from timecopilot.models.foundation.timesfm import TimesFM        models.append(TimesFM(repo_id="google/timesfm-2.0-500m-pytorch", alias="TimesFM"))    except Exception as e:        print("Skipping TimesFM:", e) tcf = TimeCopilotForecaster(models=models) print("nModels:", [getattr(m, "alias", type(m).__name__) for m in models])

We configure a diverse collection of statistical, Prophet, and Chronos forecasting models. We select the Chronos model size according to the available hardware and optionally include TimesFM when a GPU is present. We then initialize TimeCopilotForecaster to manage all models through one consistent interface.

Running Rolling Cross-Validation and Ranking Models by RMSE

print("nRunning cross-validation (slow step: foundation weights download)...") cv_df = tcf.cross_validation(df=panel, h=H, freq=FREQ, n_windows=3) print(cv_df.head()) from utilsforecast.evaluation import evaluate from utilsforecast.losses import mae, rmse, mape eval_df = evaluate(cv_df.drop(columns=["cutoff"]), metrics=[mae, rmse, mape]) print("n=== Per-series error (lower = better) ===") print(eval_df.round(3)) model_cols = [c for c in eval_df.columns if c not in ("unique_id", "metric")] leaderboard = (eval_df.groupby("metric")[model_cols].mean().T.sort_values("rmse")) print("n=== Leaderboard (mean across series) ===") print(leaderboard.round(3)) best_model = leaderboard.index[0] print(f"n>>> Best model by mean RMSE: {best_model}")

We perform rolling cross-validation across three windows to measure each model’s forecasting performance. We calculate MAE, RMSE, and MAPE for every series and aggregate the results into a leaderboard. We identify the model with the lowest mean RMSE for subsequent forecasting and visualization.

Generating Probabilistic Forecasts with Prediction Intervals

fcst_df = tcf.forecast(df=panel, h=H, freq=FREQ, level=[80, 95]) print("nForecast columns:", list(fcst_df.columns)) def plot_series(uid, point_model=best_model):    hist = panel[panel["unique_id"] == uid]; fc = fcst_df[fcst_df["unique_id"] == uid]    plt.figure(figsize=(11, 4)); plt.plot(hist["ds"], hist["y"], color="black", label="history")    if point_model in fc.columns:        plt.plot(fc["ds"], fc[point_model], color="C0", label=f"{point_model} forecast")        lo, hi = f"{point_model}-lo-95", f"{point_model}-hi-95"        if lo in fc.columns and hi in fc.columns:            plt.fill_between(fc["ds"], fc[lo], fc[hi], alpha=0.25, color="C0", label="95% interval")    plt.title(f"{uid} — {point_model}"); plt.legend(); plt.tight_layout(); plt.show() for uid in panel["unique_id"].unique():    plot_series(uid)

We generate 12-month probabilistic forecasts with 80% and 95% prediction intervals. We define a reusable plotting function that displays historical values, point forecasts, and uncertainty ranges. We apply this function to each series to compare its observed history with the predicted future trajectory.

Detecting Anomalies Across the Forecasting Panel

print("nRunning anomaly detection...") anomalies_df = tcf.detect_anomalies(df=panel, h=H, freq=FREQ, level=99) anom_cols = [c for c in anomalies_df.columns if c.endswith("-anomaly")] if anom_cols:    flagged = anomalies_df[anomalies_df[anom_cols].any(axis=1)]    print(f"Flagged points (>=1 model): {len(flagged)}")    print(flagged[["unique_id", "ds", "y"] + anom_cols].head(20).to_string(index=False))    col = f"{best_model}-anomaly"    if col not in anomalies_df.columns: col = anom_cols[0]    sub = anomalies_df[anomalies_df["unique_id"] == "Synthetic"]    pts = sub[sub[col] == True]    plt.figure(figsize=(11, 4)); plt.plot(sub["ds"], sub["y"], color="black", label="value")    plt.scatter(pts["ds"], pts["y"], color="red", zorder=5, label=f"anomaly ({col})")    plt.title("Anomaly detection — Synthetic series"); plt.legend(); plt.tight_layout(); plt.show() else:    print(anomalies_df.head())

Interpreting Forecasts with the TimeCopilot LLM Agent

from timecopilot import TimeCopilot if os.environ.get("OPENAI_API_KEY") or os.environ.get("ANTHROPIC_API_KEY"):    llm = "openai:gpt-4o" if os.environ.get("OPENAI_API_KEY") else "anthropic:claude-sonnet-4-5"    tc = TimeCopilot(llm=llm, retries=3)    single = panel[panel["unique_id"] == "AirPassengers"]    result = tc.forecast(df=single, freq=FREQ, h=H,                         query="Total air passengers expected over the next 12 months, and which months peak?")    out = result.output    print("n=== AGENT REPORT ===")    print("Selected model:", out.selected_model)    print("Beats SeasonalNaive:", out.is_better_than_seasonal_naive)    print("Why:", out.reason_for_selection)    print("Answer:", out.user_query_response)    print(result.fcst_df.head()) else:    print("n[Agent section skipped] No LLM key. Everything above ran key-free.") print("nDone. ✅")

We detect anomalies across the panel and visualize the flagged observations in the synthetic series. We optionally initialize the TimeCopilot LLM agent when an OpenAI or Anthropic API key is available. We use the agent to select a model, evaluate it against SeasonalNaive, and explain the forecast in response to a practical question.

Conclusion

In conclusion, we created a unified TimeCopilot pipeline that takes us from data preparation to model evaluation, probabilistic forecasting, visualization, anomaly detection, and agent-driven interpretation. We compared traditional statistical methods with modern foundation models within a consistent cross-validation framework and selected the best-performing approach based on objective error metrics. We also quantified forecast uncertainty through prediction intervals and identified abnormal observations across multiple time series. By combining automated forecasting with an optional LLM agent, we produced both accurate numerical predictions and clear, decision-oriented insights within a single workflow.

Check out the Full Codes with Notebook. Also, feel free to follow us on Twitter and don’t forget to join our 150k+ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.

Need to partner with us for promoting your GitHub Repo OR Hugging Face Page OR Product Release OR Webinar etc.? Connect with us

Sana Hassan

Sana Hassan, a consulting intern at Marktechpost and dual-degree student at IIT Madras, is passionate about applying technology and AI to address real-world challenges. With a keen interest in solving practical problems, he brings a fresh perspective to the intersection of AI and real-life solutions.