Forecasting is useful when you need to plan inventory, staffing, or budgets ahead of demand changes. [SARIMA and SARIMAX models](/tutorials/statsmodels-arima-models) are popular because they capture trend, autoregression, moving-average effects, and seasonality in one model family. `SARIMA` models univariate seasonal time series, while `SARIMAX` extends this framework and can also include exogenous predictors (`X`) when needed. ## Preparing time series data
import requests
import pandas as pd
# Download once and persist locally for the rest of the tutorial
url = "https://raw.githubusercontent.com/jbrownlee/Datasets/master/airline-passengers.csv"
response = requests.get(url, timeout=30)
response.raise_for_status()
with open("airline-passengers.csv", "w", encoding="utf-8") as f:
f.write(response.text)
# Parse dates and set a monthly index with explicit frequency
df = pd.read_csv("airline-passengers.csv")
df["Month"] = pd.to_datetime(df["Month"])
df = df.set_index("Month")
series = df["Passengers"].asfreq("MS")
print(series.head())
print(series.index.freq)This block downloads and saves `airline-passengers.csv` once, then prepares the monthly time series index used in later examples. Setting an explicit monthly frequency is important so statsmodels handles forecast steps on the right calendar spacing. ## Defining SARIMAX parameters
import pandas as pd
# Reload prepared series and define non-seasonal + seasonal orders
df = pd.read_csv("airline-passengers.csv")
df["Month"] = pd.to_datetime(df["Month"])
df = df.set_index("Month")
series = df["Passengers"].asfreq("MS")
order = (1, 1, 1)
seasonal_order = (1, 1, 1, 12)
print("order:", order)
print("seasonal_order:", seasonal_order)`order=(p,d,q)` controls non-seasonal ARIMA terms, and `seasonal_order=(P,D,Q,s)` controls seasonal terms with seasonal period `s=12` for monthly data. Stating these values explicitly makes model structure clear before fitting. ## Fitting the model
import pandas as pd
from statsmodels.tsa.statespace.sarimax import SARIMAX
# Build and fit SARIMAX on the full historical series
df = pd.read_csv("airline-passengers.csv")
df["Month"] = pd.to_datetime(df["Month"])
df = df.set_index("Month")
series = df["Passengers"].asfreq("MS")
model = SARIMAX(
series,
order=(1, 1, 1),
seasonal_order=(1, 1, 1, 12),
enforce_stationarity=False,
enforce_invertibility=False,
)
results = model.fit(disp=False)
print(results.summary())This fits the SARIMAX model and prints coefficient estimates and fit diagnostics for interpretation. Reviewing this summary helps you validate whether the chosen order looks reasonable before using forecasts operationally. ## Forecasting and visualization
import pandas as pd
import matplotlib.pyplot as plt
from statsmodels.tsa.statespace.sarimax import SARIMAX
# Split into train/test so forecast quality can be checked on unseen data
df = pd.read_csv("airline-passengers.csv")
df["Month"] = pd.to_datetime(df["Month"])
df = df.set_index("Month")
series = df["Passengers"].asfreq("MS")
train = series.iloc[:-12]
test = series.iloc[-12:]
model = SARIMAX(
train,
order=(1, 1, 1),
seasonal_order=(1, 1, 1, 12),
enforce_stationarity=False,
enforce_invertibility=False,
)
results = model.fit(disp=False)
# Forecast the holdout window with 95% confidence intervals
forecast = results.get_forecast(steps=12)
forecast_df = forecast.summary_frame(alpha=0.05)
forecast_df.index = test.index
plt.figure(figsize=(10, 5))
plt.plot(train.index, train.values, label="Train")
plt.plot(test.index, test.values, label="Actual", color="black")
plt.plot(forecast_df.index, forecast_df["mean"], label="Forecast", color="blue")
plt.fill_between(
forecast_df.index,
forecast_df["mean_ci_lower"],
forecast_df["mean_ci_upper"],
color="blue",
alpha=0.2,
label="95% CI",
)
plt.title("SARIMAX Forecast vs Actual")
plt.xlabel("Month")
plt.ylabel("Passengers")
plt.legend()
plt.tight_layout()
plt.show()
print(forecast_df[["mean", "mean_ci_lower", "mean_ci_upper"]])This block performs a train-test forecast, plots predictions against actual values, and includes confidence intervals for uncertainty-aware planning. Comparing forecast and actual curves is a practical check of whether the model is good enough for decision-making.