Tutorials

SARIMAX Time Series Forecasting with Statsmodels

Forecasting is useful when you need to plan inventory, staffing, or budgets ahead of demand changes. [SARIMA and SARIMAX models](/tutorials/statsmodels-arima-models) are popular because they capture trend, autoregression, moving-average effects, and seasonality in one model family.

`SARIMA` models univariate seasonal time series, while `SARIMAX` extends this framework and can also include exogenous predictors (`X`) when needed.

## Preparing time series data

import requests
import pandas as pd

# Download once and persist locally for the rest of the tutorial
url = "https://raw.githubusercontent.com/jbrownlee/Datasets/master/airline-passengers.csv"
response = requests.get(url, timeout=30)
response.raise_for_status()

with open("airline-passengers.csv", "w", encoding="utf-8") as f:
    f.write(response.text)

# Parse dates and set a monthly index with explicit frequency
df = pd.read_csv("airline-passengers.csv")
df["Month"] = pd.to_datetime(df["Month"])
df = df.set_index("Month")
series = df["Passengers"].asfreq("MS")

print(series.head())
print(series.index.freq)
Month
1949-01-01    112
1949-02-01    118
1949-03-01    132
1949-04-01    129
1949-05-01    121
Freq: MS, Name: Passengers, dtype: int64
<MonthBegin>
This block downloads and saves `airline-passengers.csv` once, then prepares the monthly time series index used in later examples. Setting an explicit monthly frequency is important so statsmodels handles forecast steps on the right calendar spacing.

## Defining SARIMAX parameters

import pandas as pd

# Reload prepared series and define non-seasonal + seasonal orders
df = pd.read_csv("airline-passengers.csv")
df["Month"] = pd.to_datetime(df["Month"])
df = df.set_index("Month")
series = df["Passengers"].asfreq("MS")

order = (1, 1, 1)
seasonal_order = (1, 1, 1, 12)

print("order:", order)
print("seasonal_order:", seasonal_order)
order: (1, 1, 1)
seasonal_order: (1, 1, 1, 12)
`order=(p,d,q)` controls non-seasonal ARIMA terms, and `seasonal_order=(P,D,Q,s)` controls seasonal terms with seasonal period `s=12` for monthly data. Stating these values explicitly makes model structure clear before fitting.

## Fitting the model

import pandas as pd
from statsmodels.tsa.statespace.sarimax import SARIMAX

# Build and fit SARIMAX on the full historical series
df = pd.read_csv("airline-passengers.csv")
df["Month"] = pd.to_datetime(df["Month"])
df = df.set_index("Month")
series = df["Passengers"].asfreq("MS")

model = SARIMAX(
    series,
    order=(1, 1, 1),
    seasonal_order=(1, 1, 1, 12),
    enforce_stationarity=False,
    enforce_invertibility=False,
)
results = model.fit(disp=False)
print(results.summary())
                                     SARIMAX Results                                      
==========================================================================================
Dep. Variable:                         Passengers   No. Observations:                  144
Model:             SARIMAX(1, 1, 1)x(1, 1, 1, 12)   Log Likelihood                -456.103
Date:                            Wed, 11 Mar 2026   AIC                            922.205
Time:                                    16:47:56   BIC                            936.016
Sample:                                01-01-1949   HQIC                           927.812
                                     - 12-01-1960                                         
Covariance Type:                              opg                                         
==============================================================================
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
ar.L1         -0.2298      0.401     -0.573      0.567      -1.016       0.557
ma.L1         -0.0987      0.374     -0.264      0.792      -0.832       0.634
ar.S.L12      -0.5460      0.299     -1.825      0.068      -1.133       0.041
ma.S.L12       0.3959      0.352      1.125      0.261      -0.294       1.086
sigma2       140.2945     17.997      7.795      0.000     105.020     175.569
===================================================================================
Ljung-Box (L1) (Q):                   0.00   Jarque-Bera (JB):                 5.42
Prob(Q):                              0.95   Prob(JB):                         0.07
Heteroskedasticity (H):               2.51   Skew:                             0.12
Prob(H) (two-sided):                  0.01   Kurtosis:                         4.03
===================================================================================

Warnings:
[1] Covariance matrix calculated using the outer product of gradients (complex-step).
This fits the SARIMAX model and prints coefficient estimates and fit diagnostics for interpretation. Reviewing this summary helps you validate whether the chosen order looks reasonable before using forecasts operationally.

## Forecasting and visualization

import pandas as pd
import matplotlib.pyplot as plt
from statsmodels.tsa.statespace.sarimax import SARIMAX

# Split into train/test so forecast quality can be checked on unseen data
df = pd.read_csv("airline-passengers.csv")
df["Month"] = pd.to_datetime(df["Month"])
df = df.set_index("Month")
series = df["Passengers"].asfreq("MS")

train = series.iloc[:-12]
test = series.iloc[-12:]

model = SARIMAX(
    train,
    order=(1, 1, 1),
    seasonal_order=(1, 1, 1, 12),
    enforce_stationarity=False,
    enforce_invertibility=False,
)
results = model.fit(disp=False)

# Forecast the holdout window with 95% confidence intervals
forecast = results.get_forecast(steps=12)
forecast_df = forecast.summary_frame(alpha=0.05)
forecast_df.index = test.index

plt.figure(figsize=(10, 5))
plt.plot(train.index, train.values, label="Train")
plt.plot(test.index, test.values, label="Actual", color="black")
plt.plot(forecast_df.index, forecast_df["mean"], label="Forecast", color="blue")
plt.fill_between(
    forecast_df.index,
    forecast_df["mean_ci_lower"],
    forecast_df["mean_ci_upper"],
    color="blue",
    alpha=0.2,
    label="95% CI",
)
plt.title("SARIMAX Forecast vs Actual")
plt.xlabel("Month")
plt.ylabel("Passengers")
plt.legend()
plt.tight_layout()
plt.show()

print(forecast_df[["mean", "mean_ci_lower", "mean_ci_upper"]])
Passengers        mean  mean_ci_lower  mean_ci_upper
Month                                               
1960-01-01  423.220775     402.813162     443.628389
1960-02-01  406.433566     380.708655     432.158478
1960-03-01  467.547433     436.021002     499.073865
1960-04-01  457.478940     421.736394     493.221486
1960-05-01  480.937601     441.101984     520.773218
1960-06-01  534.599304     491.215120     577.983488
1960-07-01  609.414970     562.670581     656.159359
1960-08-01  621.009678     571.171892     670.847464
1960-09-01  523.363515     470.592559     576.134470
1960-10-01  468.643694     413.104946     524.182443
1960-11-01  423.339480     365.158953     481.520006
1960-12-01  465.073605     404.369009     525.778201
This block performs a train-test forecast, plots predictions against actual values, and includes confidence intervals for uncertainty-aware planning. Comparing forecast and actual curves is a practical check of whether the model is good enough for decision-making.