Python 데이터 분석
Python 데이터분석 기초 50 - 귀납적 추론, 연역적 추론, 단순선형회귀 예제(mtcars), 키보드로 값 받기
코딩탕탕
2022. 11. 15. 15:49
# mtcars dataset으로 단순/다중회귀 모델 작성 : ols() 사용
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import statsmodels.api
plt.rc('font', family = 'malgun gothic')
import seaborn as sns
import statsmodels.formula.api as smf
mtcars = statsmodels.api.datasets.get_rdataset('mtcars').data
print(mtcars.head(3))
# print(mtcars.corr())
print(np.corrcoef(mtcars.hp,mtcars.mpg)[0,1]) # -0.7761683718265864
print(np.corrcoef(mtcars.wt,mtcars.mpg)[0,1]) # -0.8676593765172281
# 단순선형회귀 : mtcars.hp(feature, x), mtcars.mpg(label, y)
# 시각화
# plt.scatter(mtcars.hp, mtcars.mpg)
# # 참고 : numpy의 polyfit()을 이용하면 slope, intercept를 얻을 수 있다.
# slope, intercept = np.polyfit(mtcars.hp, mtcars.mpg, 1)
# print('slope : {}, intercept : {}'.format(slope, intercept)) # slope : -0.06822, intercept : 30.098860
# plt.plot(mtcars.hp, slope * mtcars.hp + intercept)
# plt.xlabel('마력수')
# plt.ylabel('연비')
# plt.show()
result1 = smf.ols('mpg ~ hp', data = mtcars).fit()
print(result1.summary())
print(result1.conf_int(alpha = 0.05))
print()
print(result1.summary().tables[1])
print('마력수 110에 대한 연비는 ', -0.088895 * 110 + 30.0989) # coef(hp) * 예측값 + coef(intercept)
print('마력수 50에 대한 연비는 ', -0.088895 * 50 + 30.0989)
print('마력수 200에 대한 연비는 ', -0.088895 * 200 + 30.0989)
print('------------')
# 다중선형회귀 : mtcars.hp(feature, x), mtcars.mpg(label, y)
result2 = smf.ols(formula = 'mpg ~ hp + wt', data = mtcars).fit()
print(result2.summary())
print(result2.summary().tables[1])
print('마력수 110, 차체 무게 5톤에 대한 연비는 :', (-0.0318 * 110) + (-3.8778 * 5) + 37.2273)
print('predict 함수 사용')
new_data = pd.DataFrame({'hp':[110, 120, 150],'wt':[5, 2, 7]})
new_pred = result2.predict(new_data)
print('예상 연비 :', new_pred.values)
# 키보드로 값 받기
new_hp = float(input('새로운 마력수 : '))
new_wt = float(input('새로운 차체무게 : '))
new_data2 = pd.DataFrame({'hp':[new_hp],'wt':[new_wt]})
new_pred2 = result2.predict(new_data2)
print('예상 연비 :', new_pred2.values)
<console>
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
-0.7761683718265864
-0.8676593765172281
OLS Regression Results
==============================================================================
Dep. Variable: mpg R-squared: 0.602
Model: OLS Adj. R-squared: 0.589
Method: Least Squares F-statistic: 45.46
Date: Tue, 15 Nov 2022 Prob (F-statistic): 1.79e-07
Time: 16:05:23 Log-Likelihood: -87.619
No. Observations: 32 AIC: 179.2
Df Residuals: 30 BIC: 182.2
Df Model: 1
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
Intercept 30.0989 1.634 18.421 0.000 26.762 33.436
hp -0.0682 0.010 -6.742 0.000 -0.089 -0.048
==============================================================================
Omnibus: 3.692 Durbin-Watson: 1.134
Prob(Omnibus): 0.158 Jarque-Bera (JB): 2.984
Skew: 0.747 Prob(JB): 0.225
Kurtosis: 2.935 Cond. No. 386.
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
0 1
Intercept 26.761949 33.435772
hp -0.088895 -0.047562
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
Intercept 30.0989 1.634 18.421 0.000 26.762 33.436
hp -0.0682 0.010 -6.742 0.000 -0.089 -0.048
==============================================================================
마력수 110에 대한 연비는 20.32045
마력수 50에 대한 연비는 25.65415
마력수 200에 대한 연비는 12.3199
------------
OLS Regression Results
==============================================================================
Dep. Variable: mpg R-squared: 0.827
Model: OLS Adj. R-squared: 0.815
Method: Least Squares F-statistic: 69.21
Date: Tue, 15 Nov 2022 Prob (F-statistic): 9.11e-12
Time: 16:05:23 Log-Likelihood: -74.326
No. Observations: 32 AIC: 154.7
Df Residuals: 29 BIC: 159.0
Df Model: 2
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
Intercept 37.2273 1.599 23.285 0.000 33.957 40.497
hp -0.0318 0.009 -3.519 0.001 -0.050 -0.013
wt -3.8778 0.633 -6.129 0.000 -5.172 -2.584
==============================================================================
Omnibus: 5.303 Durbin-Watson: 1.362
Prob(Omnibus): 0.071 Jarque-Bera (JB): 4.046
Skew: 0.855 Prob(JB): 0.132
Kurtosis: 3.332 Cond. No. 588.
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
Intercept 37.2273 1.599 23.285 0.000 33.957 40.497
hp -0.0318 0.009 -3.519 0.001 -0.050 -0.013
wt -3.8778 0.633 -6.129 0.000 -5.172 -2.584
==============================================================================
마력수 110, 차체 무게 5톤에 대한 연비는 : 14.3403
predict 함수 사용
예상 연비 : [14.34309224 25.65885499 5.31651287]
새로운 마력수 : 80
새로운 차체무게 : 8
예상 연비 : [3.66278842]