Most real outcomes depend on several factors at once. **[Multiple linear regression](/tutorials/statsmodels-linear-regression)** helps you estimate how each predictor relates to the target while controlling for other predictors. In this tutorial, you will build a model with multiple predictors and interpret partial effects. ## Preparing multiple predictors
import requests
import pandas as pd
# Download once so every later example can reuse the same data file
url = "https://raw.githubusercontent.com/vincentarelbundock/Rdatasets/master/csv/datasets/mtcars.csv"
response = requests.get(url, timeout=30)
response.raise_for_status()
# Save local CSV copy for the rest of this tutorial
with open("mtcars.csv", "w", encoding="utf-8") as f:
f.write(response.text)
df = pd.read_csv("mtcars.csv")
features = ["hp", "wt", "qsec"]
# X contains multiple predictors; y is the target to predict
X = df[features]
y = df["mpg"]
print(X.head())
print(y.head())This code downloads and saves `mtcars.csv` once, then creates a feature matrix (`X`) with multiple columns and a target vector (`y`). Splitting predictors and target clearly at the start makes later model code easier to reason about and helps prevent accidental leakage of target information into features. ## Building the feature matrix and adding intercept
import pandas as pd
import statsmodels.api as sm
# Load source data and build model matrix with intercept
df = pd.read_csv("mtcars.csv")
X = sm.add_constant(df[["hp", "wt", "qsec"]])
y = df["mpg"]
print(X.head())The added constant gives the model an intercept term, while each feature column contributes to the regression line in higher-dimensional space. This setup is required for OLS to estimate each coefficient as a partial effect while accounting for other predictors. ## Fitting the regression model
import pandas as pd
import statsmodels.api as sm
# Fit OLS with three predictors at once
df = pd.read_csv("mtcars.csv")
X = sm.add_constant(df[["hp", "wt", "qsec"]])
y = df["mpg"]
model = sm.OLS(y, X).fit()
print(model.summary())This fits the full multiple linear regression model and prints inference statistics for each predictor. Looking at the full summary lets you evaluate both overall model quality and whether individual variables contribute useful signal. ## Interpreting coefficients and significance
import pandas as pd
import statsmodels.api as sm
# Refit model and extract key inference fields into one table
df = pd.read_csv("mtcars.csv")
X = sm.add_constant(df[["hp", "wt", "qsec"]])
y = df["mpg"]
model = sm.OLS(y, X).fit()
summary_table = pd.DataFrame(
{
"coefficient": model.params,
"p_value": model.pvalues,
"ci_lower": model.conf_int()[0],
"ci_upper": model.conf_int()[1],
}
)
print(summary_table)Each coefficient is a **partial effect**: it represents the expected change in `mpg` from a one-unit increase in that feature while holding other predictors fixed. This interpretation is the main reason to use multiple regression, since it separates the contribution of correlated real-world factors. ## Multicollinearity discussion with VIF When predictors are strongly correlated with each other, coefficient estimates can become unstable. Variance Inflation Factor (VIF) helps detect this.
import pandas as pd
import statsmodels.api as sm
from statsmodels.stats.outliers_influence import variance_inflation_factor
# Compute VIF on the same design matrix used for modeling
df = pd.read_csv("mtcars.csv")
X = sm.add_constant(df[["hp", "wt", "qsec"]])
vif = pd.DataFrame(
{
"feature": X.columns,
"VIF": [variance_inflation_factor(X.values, i) for i in range(X.shape[1])],
}
)
print(vif)This block calculates VIF values for each predictor so you can identify potential multicollinearity before interpreting coefficients. Checking VIF first helps you avoid overconfident conclusions from unstable coefficient estimates when predictors overlap heavily.