Many real-world decisions depend on how a metric evolves over time: demand, temperature, sales, traffic, or usage. **[ARIMA](/tutorials/statsmodels-sarimax-time-series-forecasting)** models are a foundational forecasting method because they combine trend handling, autoregressive behavior, and moving-average error structure in one interpretable framework. In this tutorial, you will build an ARIMA forecasting workflow in Python using `statsmodels`. ## What ARIMA means ARIMA stands for: - **AR (Autoregressive)**: use past values of the series - **I (Integrated)**: difference the series to make it more stationary - **MA (Moving Average)**: use past forecast errors An ARIMA model is written as `ARIMA(p, d, q)`: - `p`: number of autoregressive lags - `d`: number of differences - `q`: number of moving-average lags ## Creating the dataset
import requests
import pandas as pd
# Download source file once so later blocks can focus on modeling steps
url = "https://raw.githubusercontent.com/jbrownlee/Datasets/master/daily-min-temperatures.csv"
response = requests.get(url, timeout=30)
response.raise_for_status()
with open("daily-min-temperatures.csv", "w", encoding="utf-8") as f:
f.write(response.text)
# Parse dates and set a daily frequency-aware index
df = pd.read_csv("daily-min-temperatures.csv")
df["Date"] = pd.to_datetime(df["Date"])
df = df.set_index("Date")
series = df["Temp"].asfreq("D")
print(series.head())
print(series.index.freq)This block downloads the dataset once, stores it locally, parses the date index, and creates a daily time series used by the remaining examples. Keeping a local copy avoids repeated network calls and makes the tutorial easier to rerun. ## Visualizing the time series
import pandas as pd
import matplotlib.pyplot as plt
# Load prepared series for a quick visual inspection
df = pd.read_csv("daily-min-temperatures.csv")
df["Date"] = pd.to_datetime(df["Date"])
df = df.set_index("Date")
series = df["Temp"].asfreq("D")
plt.figure(figsize=(10, 4))
plt.plot(series.index, series.values, linewidth=1)
plt.title("Daily Minimum Temperatures")
plt.xlabel("Date")
plt.ylabel("Temperature")
plt.tight_layout()
plt.show()Plotting first helps you quickly inspect trend, variability, and potential seasonality before selecting ARIMA parameters. This initial check reduces guesswork when deciding whether differencing is needed. ## Differencing to reduce non-stationarity
import pandas as pd
import matplotlib.pyplot as plt
# Create first-differenced series (x_t - x_{t-1})
df = pd.read_csv("daily-min-temperatures.csv")
df["Date"] = pd.to_datetime(df["Date"])
df = df.set_index("Date")
series = df["Temp"].asfreq("D")
diff_series = series.diff().dropna()
plt.figure(figsize=(10, 4))
plt.plot(diff_series.index, diff_series.values, linewidth=1, color="tab:orange")
plt.title("First-Differenced Temperature Series")
plt.xlabel("Date")
plt.ylabel("Differenced Temperature")
plt.tight_layout()
plt.show()First differencing (`d=1`) is a common way to stabilize the mean level so ARIMA assumptions are more reasonable. A more stable series generally leads to better-behaved parameter estimates. ## Fitting an ARIMA model
import pandas as pd
from statsmodels.tsa.arima.model import ARIMA
# Fit ARIMA with chosen (p, d, q) values
df = pd.read_csv("daily-min-temperatures.csv")
df["Date"] = pd.to_datetime(df["Date"])
df = df.set_index("Date")
series = df["Temp"].asfreq("D")
model = ARIMA(series, order=(2, 1, 2))
results = model.fit()
print(results.summary())This fits an `ARIMA(2, 1, 2)` model and prints parameter estimates and diagnostic statistics to assess model behavior. The summary is where you inspect coefficient significance and residual diagnostics before trusting forecasts. ## Train-test forecasting
import pandas as pd
from statsmodels.tsa.arima.model import ARIMA
# Hold out the last year to evaluate forecasting performance
df = pd.read_csv("daily-min-temperatures.csv")
df["Date"] = pd.to_datetime(df["Date"])
df = df.set_index("Date")
series = df["Temp"].asfreq("D")
train = series.iloc[:-365]
test = series.iloc[-365:]
model = ARIMA(train, order=(2, 1, 2))
results = model.fit()
forecast_res = results.get_forecast(steps=len(test))
forecast_df = forecast_res.summary_frame(alpha=0.05)
forecast_df.index = test.index
forecast_df["actual"] = test.values
print(forecast_df[["mean", "mean_ci_lower", "mean_ci_upper", "actual"]].head())This example trains on historical data, forecasts the holdout period, and returns both point forecasts and confidence intervals. Using a holdout period gives a more honest estimate of real-world forecast behavior than evaluating on training data. ## Visualizing forecast vs actual
import pandas as pd
import matplotlib.pyplot as plt
from statsmodels.tsa.arima.model import ARIMA
# Refit on train, forecast test, then compare visually
df = pd.read_csv("daily-min-temperatures.csv")
df["Date"] = pd.to_datetime(df["Date"])
df = df.set_index("Date")
series = df["Temp"].asfreq("D")
train = series.iloc[:-365]
test = series.iloc[-365:]
model = ARIMA(train, order=(2, 1, 2))
results = model.fit()
forecast_res = results.get_forecast(steps=len(test))
forecast_df = forecast_res.summary_frame(alpha=0.05)
forecast_df.index = test.index
plt.figure(figsize=(11, 5))
plt.plot(train.index[-500:], train.values[-500:], label="Train (recent)", alpha=0.7)
plt.plot(test.index, test.values, label="Actual", color="black")
plt.plot(forecast_df.index, forecast_df["mean"], label="Forecast", color="tab:blue")
plt.fill_between(
forecast_df.index,
forecast_df["mean_ci_lower"],
forecast_df["mean_ci_upper"],
color="tab:blue",
alpha=0.2,
label="95% CI",
)
plt.title("ARIMA Forecast vs Actual")
plt.xlabel("Date")
plt.ylabel("Temperature")
plt.legend()
plt.tight_layout()
plt.show()Overlaying forecast and actual values helps you evaluate practical forecast quality and whether uncertainty intervals are realistic. The chart also makes bias and under/over-shoot patterns easier to spot than tables alone.