Logistic Regression(로지스틱 회귀분석) 예제(외식 데이터)

Python 데이터 분석 2022. 11. 17. 17:46

# [로지스틱 분류분석 문제1]
# 문1] 소득 수준에 따른 외식 성향을 나타내고 있다. 주말 저녁에 외식을 하면 1, 외식을 하지 않으면 0으로 처리되었다. 
# 다음 데이터에 대하여 소득 수준이 외식에 영향을 미치는지 로지스틱 회귀분석을 실시하라.
# 키보드로 소득 수준(양의 정수)을 입력하면 외식 여부 분류 결과 출력하라.

import statsmodels.formula.api as smf
import numpy as np
import pandas as pd
import statsmodels.api as sm
from sklearn.model_selection import train_test_split

df2 = pd.read_csv('../testdata/eat_out.csv')
df = df2.loc[(df2['요일'] == '토') | (df2['요일'] == '일')]
print(df.head(2), df.shape) # (28, 3)
print(df['외식유무'].unique()) # [0 1]
# 외식유무 : 종속변수, 그 외는 독립변수

# train / test split == 8 : 2
train, test = train_test_split(df, test_size = 0.3, random_state=42) # random_state 숫자가 랜덤으로 뽑히기 때문에 고정
print(train.shape, test.shape) # (19, 3) (9, 3)

model = smf.logit(formula = '외식유무 ~ 소득수준', data = train, family = sm.families.Binomial()).fit()
print(model.summary())

print('예측값 :', np.around(model.predict(test)[:10].values))
print('실제값 :', test['외식유무'][:10].values)

# 정확도
conf_mat = model.pred_table()
print('conf_mat : \n', conf_mat)
print('분류 정확도 :', (conf_mat[0][0] + conf_mat[1][1]) / len(train))
from sklearn.metrics import accuracy_score
pred = model.predict(test)
print('분류 정확도 :', accuracy_score(test['외식유무'], np.around(pred))) # 실제값, 예측값

print('\n새로운 값으로 분류 예측')
new_input_data = pd.DataFrame({'소득수준':[int(input('소득수준 : '))]})
print('외식 유무 :', np.rint(model.predict(new_input_data)))
print('외식을 함' if np.rint(model.predict(new_input_data))[0] == 1 else '외식안함')




<console>
  요일  외식유무  소득수준
0  토     0    57
1  토     0    39 (21, 3)
[0 1]
(14, 3) (7, 3)

Optimization terminated successfully.
         Current function value: 0.132728
         Iterations 11

                           Logit Regression Results                           
==============================================================================
Dep. Variable:                   외식유무   No. Observations:                   14
Model:                          Logit   Df Residuals:                       12
Method:                           MLE   Df Model:                            1
Date:                Thu, 17 Nov 2022   Pseudo R-squ.:                  0.8056
Time:                        18:10:43   Log-Likelihood:                -1.8582
converged:                       True   LL-Null:                       -9.5607
Covariance Type:            nonrobust   LLR p-value:                 8.676e-05
==============================================================================
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept    -27.1642     27.013     -1.006      0.315     -80.109      25.780
소득수준           0.6049      0.617      0.981      0.327      -0.604       1.814
==============================================================================

Possibly complete quasi-separation: A fraction 0.29 of observations can be
perfectly predicted. This might indicate that there is complete
quasi-separation. In this case some parameters will not be identified.
예측값 : [1. 1. 1. 0. 1. 1. 1.]
실제값 : [0 1 0 0 1 1 1]
conf_mat : 
 [[7. 1.]
 [1. 5.]]
분류 정확도 : 0.8571428571428571
분류 정확도 : 0.7142857142857143

새로운 값으로 분류 예측
소득수준 : 60
외식 유무 : 0    1.0
dtype: float64
외식을 함

'Python 데이터 분석' 카테고리의 다른 글

Python 데이터분석 기초 55 - Logistic Regression : 다항분류 (얘는 활성화 함수로 softmax - 결과값을 확률로 반환), 표준 (0)	2022.11.18
Logistic Regression(로지스틱 회귀분석) 예제(당뇨 데이터), 로지스틱 회귀분석 후 저장 후 불러쓰기 (0)	2022.11.18
Logistic Regression(로지스틱 회귀분석) 예제(날씨 데이터) - train_test_split(과적합 방지), 머신러닝의 포용성(inclusion, tolerance) (1)	2022.11.17
Python 데이터분석 기초 54 - Logistic Regression(로지스틱 회귀분석) (0)	2022.11.17
다중회귀모델 예제(degree) (0)	2022.11.17

ABOUT ME

코딩탕탕 코딩탕탕

'Python 데이터 분석' 카테고리의 다른 글

티스토리툴바

ABOUT ME

'Python 데이터 분석' 카테고리의 다른 글

관련글 관련글 더보기

티스토리툴바