ABOUT ME

-

Today
-
Yesterday
-
Total
-
  • Python 데이터분석 기초 60 - Ensemble Learning
    Python 데이터 분석 2022. 11. 22. 10:45

     

    Ensemble Learning : 개별적인 여러 모델들을 모아 종합적으로 취합 후 최종 분류 결과를 출력
    종류로는 voting, baggin, boosting 방법이 있다.

     

    # Ensemble Learning : 개별적인 여러 모델들을 모아 종합적으로 취합 후 최종 분류 결과를 출력
    # 종류로는 voting, baggin, boosting 방법이 있다.
    # breast_cancer dataset 사용
    # LogisticRegressoin, DecisionTree, KNN을 사용하여 보팅 분류기 작성
    
    import pandas as pd
    from sklearn.datasets import load_breast_cancer
    from sklearn.model_selection import train_test_split
    from sklearn.ensemble import VotingClassifier
    from sklearn.linear_model import LogisticRegression
    from sklearn.neighbors import KNeighborsClassifier
    from sklearn.tree import DecisionTreeClassifier
    from sklearn.metrics import accuracy_score
    # sklearn은 데이터프레임으로 가져오지 않는다.
    
    cancer = load_breast_cancer()
    data_df = pd.DataFrame(cancer.data, columns=cancer.feature_names)
    print(data_df.head(2))
    
    # train / test split
    x_train, x_test, y_train, y_test = train_test_split(cancer.data, cancer.target, test_size=0.2, random_state=1)
    print(x_train.shape, x_test.shape, y_train.shape, y_test.shape) # (455, 30) (114, 30) (455,) (114,)
    print(x_train[:3])
    print(y_train[:3], set(y_train)) # {0, 1}  0: 양성, 1 : 음성
    
    # Ensemble model(VotingClassifier) : LogisticRegression + KNN + DecisionTreeClassifier
    logi_regression = LogisticRegression()
    knn = KNeighborsClassifier(n_neighbors=3)
    demodel = DecisionTreeClassifier()
    
    voting_model = VotingClassifier(estimators=[('LR', logi_regression), ('KNN', knn), ('Decision', demodel)],
                                    voting='soft')
    classifiers = [logi_regression, knn, demodel]
    
    # 개별 모델의 학습 및 평가
    for classifier in classifiers:
        classifier.fit(x_train, y_train)
        pred = classifier.predict(x_test)
        class_name = classifier.__class__.__name__
        print('{0} 정확도 : {1:.4f}'.format(class_name, accuracy_score(y_test, pred)))
    
    # 앙상블 모델 학습 및 평가
    voting_model.fit(x_train, y_train)
    vpred = voting_model.predict(x_test)
    print('앙상블 모델의 정확도 : {0:.4f}'.format(accuracy_score(y_test, vpred)))
    
    
    
    <console>
       mean radius  mean texture  ...  worst symmetry  worst fractal dimension
    0        17.99         10.38  ...          0.4601                  0.11890
    1        20.57         17.77  ...          0.2750                  0.08902
    
    [2 rows x 30 columns]
    (455, 30) (114, 30) (455,) (114,)
    [[1.799e+01 2.066e+01 1.178e+02 9.917e+02 1.036e-01 1.304e-01 1.201e-01
      8.824e-02 1.992e-01 6.069e-02 4.537e-01 8.733e-01 3.061e+00 4.981e+01
      7.231e-03 2.772e-02 2.509e-02 1.480e-02 1.414e-02 3.336e-03 2.108e+01
      2.541e+01 1.381e+02 1.349e+03 1.482e-01 3.735e-01 3.301e-01 1.974e-01
      3.060e-01 8.503e-02]
     [2.029e+01 1.434e+01 1.351e+02 1.297e+03 1.003e-01 1.328e-01 1.980e-01
      1.043e-01 1.809e-01 5.883e-02 7.572e-01 7.813e-01 5.438e+00 9.444e+01
      1.149e-02 2.461e-02 5.688e-02 1.885e-02 1.756e-02 5.115e-03 2.254e+01
      1.667e+01 1.522e+02 1.575e+03 1.374e-01 2.050e-01 4.000e-01 1.625e-01
      2.364e-01 7.678e-02]
     [9.000e+00 1.440e+01 5.636e+01 2.463e+02 7.005e-02 3.116e-02 3.681e-03
      3.472e-03 1.788e-01 6.833e-02 1.746e-01 1.305e+00 1.144e+00 9.789e+00
      7.389e-03 4.883e-03 3.681e-03 3.472e-03 2.701e-02 2.153e-03 9.699e+00
      2.007e+01 6.090e+01 2.855e+02 9.861e-02 5.232e-02 1.472e-02 1.389e-02
      2.991e-01 7.804e-02]]
    [0 0 1] {0, 1}
    
    LogisticRegression 정확도 : 0.9474
    KNeighborsClassifier 정확도 : 0.9211
    DecisionTreeClassifier 정확도 : 0.9386
    
    앙상블 모델의 정확도 : 0.9474

    댓글

Designed by Tistory.