2 La régression linéaire multiple

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
import statsmodels.api as sm
import statsmodels.formula.api as smf

La concentration en ozone

ozone = pd.read_csv("../donnees/ozone.txt", header=0, sep=";")

fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.scatter(ozone.T12, ozone.Vx,ozone.O3)
ax.set_xlabel('T12') ; ax.set_ylabel('Vx') ; ax.set_zlabel('O3')
fig.tight_layout()

reg = smf.ols('O3 ~ T12+Vx', data=ozone).fit()
reg.summary()

OLS Regression Results
Dep. Variable:	O3	R-squared:	0.525
Model:	OLS	Adj. R-squared:	0.505
Method:	Least Squares	F-statistic:	25.96
Date:	Fri, 31 Jan 2025	Prob (F-statistic):	2.54e-08
Time:	17:30:04	Log-Likelihood:	-210.53
No. Observations:	50	AIC:	427.1
Df Residuals:	47	BIC:	432.8
Df Model:	2
Covariance Type:	nonrobust

	coef	std err	t	P>\|t\|	[0.025	0.975]
Intercept	35.4530	10.745	3.300	0.002	13.838	57.068
T12	2.5380	0.515	4.927	0.000	1.502	3.574
Vx	0.8736	0.177	4.931	0.000	0.517	1.230

Omnibus:	0.280	Durbin-Watson:	1.678
Prob(Omnibus):	0.869	Jarque-Bera (JB):	0.331
Skew:	0.165	Prob(JB):	0.848
Kurtosis:	2.777	Cond. No.	94.4

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

La hauteur des eucalyptus

eucalyptus = pd.read_csv("../donnees/eucalyptus.txt",header=0,sep=";")
fig = plt.figure()
plt.plot(eucalyptus.circ, eucalyptus.ht, '+k')
plt.ylabel('ht') ; plt.xlabel('circ')
fig.tight_layout()

reg = smf.ols('ht ~ circ+np.sqrt(circ)', data=eucalyptus).fit()
reg.summary()

OLS Regression Results
Dep. Variable:	ht	R-squared:	0.792
Model:	OLS	Adj. R-squared:	0.792
Method:	Least Squares	F-statistic:	2718.
Date:	Fri, 31 Jan 2025	Prob (F-statistic):	0.00
Time:	17:30:04	Log-Likelihood:	-2208.5
No. Observations:	1429	AIC:	4423.
Df Residuals:	1426	BIC:	4439.
Df Model:	2
Covariance Type:	nonrobust

	coef	std err	t	P>\|t\|	[0.025	0.975]
Intercept	-24.3520	2.614	-9.314	0.000	-29.481	-19.223
circ	-0.4829	0.058	-8.336	0.000	-0.597	-0.369
np.sqrt(circ)	9.9869	0.780	12.798	0.000	8.456	11.518

Omnibus:	3.015	Durbin-Watson:	0.947
Prob(Omnibus):	0.221	Jarque-Bera (JB):	2.897
Skew:	-0.097	Prob(JB):	0.235
Kurtosis:	3.103	Cond. No.	4.41e+03

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 4.41e+03. This might indicate that there are
strong multicollinearity or other numerical problems.

grille = pd.DataFrame({'circ' : np.linspace(eucalyptus.circ.min(), \
                            eucalyptus.circ.max(),100)})
calculprev = reg.get_prediction(grille)
prev = calculprev.predicted_mean

fig = plt.figure()
plt.plot(eucalyptus.circ, eucalyptus.ht, '+k')
plt.ylabel('ht') ; plt.xlabel('circ')
plt.plot(grille.circ, prev, '-', lw=1)
fig.tight_layout()

Retour au sommet