Python 데이터 분석

Python 데이터분석 기초 50 - 귀납적 추론, 연역적 추론, 단순선형회귀 예제(mtcars), 키보드로 값 받기

코딩탕탕 2022. 11. 15. 15:49

 

 

# mtcars dataset으로 단순/다중회귀 모델 작성 : ols() 사용

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import statsmodels.api
plt.rc('font', family = 'malgun gothic')
import seaborn as sns
import statsmodels.formula.api as smf

mtcars = statsmodels.api.datasets.get_rdataset('mtcars').data
print(mtcars.head(3))
# print(mtcars.corr())
print(np.corrcoef(mtcars.hp,mtcars.mpg)[0,1]) # -0.7761683718265864
print(np.corrcoef(mtcars.wt,mtcars.mpg)[0,1]) # -0.8676593765172281

# 단순선형회귀 : mtcars.hp(feature, x), mtcars.mpg(label, y)
# 시각화
# plt.scatter(mtcars.hp, mtcars.mpg)
# # 참고 : numpy의 polyfit()을 이용하면 slope, intercept를 얻을 수 있다.
# slope, intercept = np.polyfit(mtcars.hp, mtcars.mpg, 1)
# print('slope : {}, intercept : {}'.format(slope, intercept)) # slope : -0.06822, intercept : 30.098860
# plt.plot(mtcars.hp, slope * mtcars.hp + intercept)
# plt.xlabel('마력수')
# plt.ylabel('연비')
# plt.show()

result1 = smf.ols('mpg ~ hp', data = mtcars).fit()
print(result1.summary())
print(result1.conf_int(alpha = 0.05))
print()
print(result1.summary().tables[1])

print('마력수 110에 대한 연비는 ', -0.088895 * 110 + 30.0989) # coef(hp) * 예측값 + coef(intercept)
print('마력수 50에 대한 연비는 ', -0.088895 * 50 + 30.0989)
print('마력수 200에 대한 연비는 ', -0.088895 * 200 + 30.0989)

print('------------')
# 다중선형회귀 : mtcars.hp(feature, x), mtcars.mpg(label, y)
result2 = smf.ols(formula = 'mpg ~ hp + wt', data = mtcars).fit()
print(result2.summary())
print(result2.summary().tables[1])
print('마력수 110, 차체 무게 5톤에 대한 연비는 :', (-0.0318 * 110) + (-3.8778 * 5) + 37.2273)

print('predict 함수 사용')
new_data = pd.DataFrame({'hp':[110, 120, 150],'wt':[5, 2, 7]})
new_pred = result2.predict(new_data)
print('예상 연비 :', new_pred.values)

# 키보드로 값 받기
new_hp = float(input('새로운 마력수 : '))
new_wt = float(input('새로운 차체무게 : '))
new_data2 = pd.DataFrame({'hp':[new_hp],'wt':[new_wt]})
new_pred2 = result2.predict(new_data2)
print('예상 연비 :', new_pred2.values)


<console>
                mpg  cyl   disp   hp  drat     wt   qsec  vs  am  gear  carb
Mazda RX4      21.0    6  160.0  110  3.90  2.620  16.46   0   1     4     4
Mazda RX4 Wag  21.0    6  160.0  110  3.90  2.875  17.02   0   1     4     4
Datsun 710     22.8    4  108.0   93  3.85  2.320  18.61   1   1     4     1
-0.7761683718265864
-0.8676593765172281
                            OLS Regression Results                            
==============================================================================
Dep. Variable:                    mpg   R-squared:                       0.602
Model:                            OLS   Adj. R-squared:                  0.589
Method:                 Least Squares   F-statistic:                     45.46
Date:                Tue, 15 Nov 2022   Prob (F-statistic):           1.79e-07
Time:                        16:05:23   Log-Likelihood:                -87.619
No. Observations:                  32   AIC:                             179.2
Df Residuals:                      30   BIC:                             182.2
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept     30.0989      1.634     18.421      0.000      26.762      33.436
hp            -0.0682      0.010     -6.742      0.000      -0.089      -0.048
==============================================================================
Omnibus:                        3.692   Durbin-Watson:                   1.134
Prob(Omnibus):                  0.158   Jarque-Bera (JB):                2.984
Skew:                           0.747   Prob(JB):                        0.225
Kurtosis:                       2.935   Cond. No.                         386.
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
                   0          1
Intercept  26.761949  33.435772
hp         -0.088895  -0.047562

==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept     30.0989      1.634     18.421      0.000      26.762      33.436
hp            -0.0682      0.010     -6.742      0.000      -0.089      -0.048
==============================================================================
마력수 110에 대한 연비는  20.32045
마력수 50에 대한 연비는  25.65415
마력수 200에 대한 연비는  12.3199
------------
                            OLS Regression Results                            
==============================================================================
Dep. Variable:                    mpg   R-squared:                       0.827
Model:                            OLS   Adj. R-squared:                  0.815
Method:                 Least Squares   F-statistic:                     69.21
Date:                Tue, 15 Nov 2022   Prob (F-statistic):           9.11e-12
Time:                        16:05:23   Log-Likelihood:                -74.326
No. Observations:                  32   AIC:                             154.7
Df Residuals:                      29   BIC:                             159.0
Df Model:                           2                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept     37.2273      1.599     23.285      0.000      33.957      40.497
hp            -0.0318      0.009     -3.519      0.001      -0.050      -0.013
wt            -3.8778      0.633     -6.129      0.000      -5.172      -2.584
==============================================================================
Omnibus:                        5.303   Durbin-Watson:                   1.362
Prob(Omnibus):                  0.071   Jarque-Bera (JB):                4.046
Skew:                           0.855   Prob(JB):                        0.132
Kurtosis:                       3.332   Cond. No.                         588.
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept     37.2273      1.599     23.285      0.000      33.957      40.497
hp            -0.0318      0.009     -3.519      0.001      -0.050      -0.013
wt            -3.8778      0.633     -6.129      0.000      -5.172      -2.584
==============================================================================
마력수 110, 차체 무게 5톤에 대한 연비는 : 14.3403
predict 함수 사용
예상 연비 : [14.34309224 25.65885499  5.31651287]
새로운 마력수 : 80
새로운 차체무게 : 8
예상 연비 : [3.66278842]

 

 

mtcars.hp, mtcars.mpg를 이용하여 시각화