ABOUT ME

-

Today
-
Yesterday
-
Total
-
  • Python 데이터분석 기초 74 - Clustering(군집화) - 계층 군집분석 - data(iris)
    Python 데이터 분석 2022. 11. 25. 16:09

     

    계층적 군집분석

    개별 대상 간의 거리에 의하여 가장 가까이 있는 대상들로 부터 시작하여 결합해 감으로써 나무모양의 계층적 구조를 형성해 나가는 방법으로 이 과정에서 군집의 수가 감소한다. 계층적 군집분석은 군집이 형성되는 과정을 정확하게 파악할 수 있다는 장점이 있으나 자료의 크기가 크면 분석하기 어렵다는 단점이 있다.

     

    방법 : 단일결합법, 완전결합법, 평균결합법, 중심결합기준법, Ward법

     

     

     

    # iris dataset으로 군집화
    
    import pandas as pd
    import matplotlib.pyplot as plt
    plt.rc('font', family = 'malgun gothic')
    from sklearn.datasets import load_iris
    from scipy.spatial.distance import pdist, squareform
    from scipy.cluster.hierarchy import linkage, dendrogram
    
    
    iris = load_iris()
    iris_df = pd.DataFrame(iris.data, columns = iris.feature_names)
    print(iris_df.head(3))
    print()
    # dist_vec = pdist(iris_df.loc[0:4, ['sepal length (cm)', 'sepal width (cm)']], metric='euclidean')
    dist_vec = pdist(iris_df.loc[:, ['sepal length (cm)', 'sepal width (cm)']], metric='euclidean')
    print('dist_vec :', dist_vec)
    print()
    row_dist = pd.DataFrame(squareform(dist_vec))
    print(row_dist) # squareform을 활용하여 데이터 프레임으로 넣어주면 보기 편하다.
    
    row_clusters = linkage(dist_vec, method='complete') # linkage 안에는 데이터 간의 거리 데이터를 넣는다.
    print('row_clusters :', row_clusters)
    df = pd.DataFrame(row_clusters, columns=['군집id1', '군집id2', '거리', '멤버수'])
    print(df)
    
    # dendrogram으로 row_clusters를 시각화
    low_dend = dendrogram(row_clusters)
    plt.ylabel('유클리드 거리')
    plt.show()
    
    
    
    <console>
       sepal length (cm)  sepal width (cm)  petal length (cm)  petal width (cm)
    0                5.1               3.5                1.4               0.2
    1                4.9               3.0                1.4               0.2
    2                4.7               3.2                1.3               0.2
    
    dist_vec : [0.53851648 0.5        0.64031242 ... 0.5        0.6        0.5       ]
    
              0         1         2    ...       147       148       149
    0    0.000000  0.538516  0.500000  ...  1.486607  1.104536  0.943398
    1    0.538516  0.000000  0.282843  ...  1.600000  1.360147  1.000000
    2    0.500000  0.282843  0.000000  ...  1.811077  1.513275  1.216553
    3    0.640312  0.316228  0.141421  ...  1.902630  1.627882  1.303840
    4    0.141421  0.608276  0.500000  ...  1.615549  1.216553  1.081665
    ..        ...       ...       ...  ...       ...       ...       ...
    145  1.676305  1.800000  2.009975  ...  0.200000  0.640312  0.800000
    146  1.562050  1.486607  1.746425  ...  0.538516  0.905539  0.640312
    147  1.486607  1.600000  1.811077  ...  0.000000  0.500000  0.600000
    148  1.104536  1.360147  1.513275  ...  0.500000  0.000000  0.500000
    149  0.943398  1.000000  1.216553  ...  0.600000  0.500000  0.000000
    
    [150 rows x 150 columns]
    row_clusters : [[0.00000000e+00 1.70000000e+01 0.00000000e+00 2.00000000e+00]
     [2.00000000e+00 2.90000000e+01 0.00000000e+00 2.00000000e+00]
     [5.00000000e+00 1.60000000e+01 0.00000000e+00 2.00000000e+00]
     [1.10000000e+01 2.40000000e+01 0.00000000e+00 2.00000000e+00]
     [7.00000000e+00 2.60000000e+01 0.00000000e+00 2.00000000e+00]
     [9.00000000e+00 3.40000000e+01 0.00000000e+00 2.00000000e+00]
     [1.20000000e+01 4.50000000e+01 0.00000000e+00 2.00000000e+00]
     [4.00000000e+01 4.30000000e+01 0.00000000e+00 2.00000000e+00]
     [2.00000000e+01 3.10000000e+01 0.00000000e+00 2.00000000e+00]
     [1.90000000e+01 4.40000000e+01 0.00000000e+00 2.00000000e+00]
     [4.60000000e+01 1.59000000e+02 0.00000000e+00 3.00000000e+00]
     [6.60000000e+01 8.80000000e+01 0.00000000e+00 2.00000000e+00]
     [5.50000000e+01 9.90000000e+01 0.00000000e+00 2.00000000e+00]
     [6.70000000e+01 8.20000000e+01 0.00000000e+00 2.00000000e+00]
     [1.01000000e+02 1.63000000e+02 0.00000000e+00 3.00000000e+00]
     [1.42000000e+02 1.64000000e+02 0.00000000e+00 4.00000000e+00]
     [6.10000000e+01 1.49000000e+02 0.00000000e+00 2.00000000e+00]
     [8.00000000e+01 8.10000000e+01 0.00000000e+00 2.00000000e+00]
     [5.60000000e+01 1.00000000e+02 0.00000000e+00 2.00000000e+00]
     [9.10000000e+01 1.27000000e+02 0.00000000e+00 2.00000000e+00]
     [7.10000000e+01 7.30000000e+01 0.00000000e+00 2.00000000e+00]
     [5.10000000e+01 1.15000000e+02 0.00000000e+00 2.00000000e+00]
     [1.04000000e+02 1.16000000e+02 0.00000000e+00 2.00000000e+00]
     [1.47000000e+02 1.72000000e+02 0.00000000e+00 3.00000000e+00]
     [6.50000000e+01 8.60000000e+01 0.00000000e+00 2.00000000e+00]
     [1.40000000e+02 1.74000000e+02 0.00000000e+00 3.00000000e+00]
     [7.70000000e+01 1.45000000e+02 0.00000000e+00 2.00000000e+00]
     [1.24000000e+02 1.44000000e+02 0.00000000e+00 2.00000000e+00]
     [1.28000000e+02 1.32000000e+02 0.00000000e+00 2.00000000e+00]
     [5.20000000e+01 1.39000000e+02 0.00000000e+00 2.00000000e+00]
     [1.41000000e+02 1.79000000e+02 0.00000000e+00 3.00000000e+00]
     [7.20000000e+01 1.46000000e+02 0.00000000e+00 2.00000000e+00]
     [6.20000000e+01 1.19000000e+02 0.00000000e+00 2.00000000e+00]
     [1.00000000e+00 2.50000000e+01 1.00000000e-01 2.00000000e+00]
     [4.00000000e+00 3.70000000e+01 1.00000000e-01 2.00000000e+00]
     [1.50000000e+02 1.57000000e+02 1.00000000e-01 4.00000000e+00]
     [2.10000000e+01 1.60000000e+02 1.00000000e-01 4.00000000e+00]
     [3.90000000e+01 1.54000000e+02 1.00000000e-01 3.00000000e+00]
     [2.30000000e+01 4.90000000e+01 1.00000000e-01 2.00000000e+00]
     [9.40000000e+01 1.21000000e+02 1.00000000e-01 2.00000000e+00]
     [1.14000000e+02 1.62000000e+02 1.00000000e-01 3.00000000e+00]
     [7.90000000e+01 9.20000000e+01 1.00000000e-01 2.00000000e+00]
     [1.38000000e+02 1.66000000e+02 1.00000000e-01 3.00000000e+00]
     [1.36000000e+02 1.48000000e+02 1.00000000e-01 2.00000000e+00]
     [6.90000000e+01 8.90000000e+01 1.00000000e-01 2.00000000e+00]
     [6.30000000e+01 7.80000000e+01 1.00000000e-01 2.00000000e+00]
     [1.26000000e+02 1.33000000e+02 1.00000000e-01 2.00000000e+00]
     [9.70000000e+01 1.03000000e+02 1.00000000e-01 2.00000000e+00]
     [1.10000000e+02 1.71000000e+02 1.00000000e-01 3.00000000e+00]
     [7.50000000e+01 1.73000000e+02 1.00000000e-01 4.00000000e+00]
     [1.12000000e+02 1.76000000e+02 1.00000000e-01 3.00000000e+00]
     [5.00000000e+01 1.20000000e+02 1.00000000e-01 2.00000000e+00]
     [5.40000000e+01 1.78000000e+02 1.00000000e-01 3.00000000e+00]
     [3.00000000e+00 4.70000000e+01 1.00000000e-01 2.00000000e+00]
     [8.00000000e+00 3.80000000e+01 1.00000000e-01 2.00000000e+00]
     [3.00000000e+01 1.56000000e+02 1.00000000e-01 3.00000000e+00]
     [2.70000000e+01 2.80000000e+01 1.00000000e-01 2.00000000e+00]
     [6.40000000e+01 1.61000000e+02 1.00000000e-01 3.00000000e+00]
     [9.50000000e+01 9.60000000e+01 1.00000000e-01 2.00000000e+00]
     [5.70000000e+01 1.06000000e+02 1.00000000e-01 2.00000000e+00]
     [5.30000000e+01 1.67000000e+02 1.00000000e-01 3.00000000e+00]
     [1.00000000e+01 4.80000000e+01 1.00000000e-01 2.00000000e+00]
     [1.11000000e+02 1.23000000e+02 1.00000000e-01 2.00000000e+00]
     [1.02000000e+02 1.29000000e+02 1.00000000e-01 2.00000000e+00]
     [1.05000000e+02 1.35000000e+02 1.00000000e-01 2.00000000e+00]
     [3.50000000e+01 1.88000000e+02 1.41421356e-01 3.00000000e+00]
     [1.65000000e+02 1.90000000e+02 1.41421356e-01 7.00000000e+00]
     [8.30000000e+01 1.70000000e+02 1.41421356e-01 3.00000000e+00]
     [1.43000000e+02 1.77000000e+02 1.41421356e-01 3.00000000e+00]
     [6.80000000e+01 8.70000000e+01 1.41421356e-01 2.00000000e+00]
     [3.60000000e+01 1.58000000e+02 1.41421356e-01 3.00000000e+00]
     [1.85000000e+02 1.87000000e+02 1.41421356e-01 7.00000000e+00]
     [1.55000000e+02 1.83000000e+02 1.41421356e-01 4.00000000e+00]
     [9.00000000e+01 1.94000000e+02 1.41421356e-01 3.00000000e+00]
     [1.13000000e+02 1.91000000e+02 1.41421356e-01 3.00000000e+00]
     [1.68000000e+02 1.93000000e+02 1.41421356e-01 4.00000000e+00]
     [1.69000000e+02 1.95000000e+02 1.41421356e-01 4.00000000e+00]
     [1.37000000e+02 1.98000000e+02 1.41421356e-01 4.00000000e+00]
     [5.80000000e+01 1.99000000e+02 1.41421356e-01 5.00000000e+00]
     [1.75000000e+02 2.00000000e+02 1.41421356e-01 6.00000000e+00]
     [7.40000000e+01 2.02000000e+02 1.41421356e-01 4.00000000e+00]
     [1.80000000e+02 2.01000000e+02 1.41421356e-01 5.00000000e+00]
     [1.96000000e+02 1.97000000e+02 1.41421356e-01 4.00000000e+00]
     [1.30000000e+01 2.04000000e+02 1.41421356e-01 3.00000000e+00]
     [1.51000000e+02 2.03000000e+02 1.41421356e-01 4.00000000e+00]
     [2.07000000e+02 2.08000000e+02 1.41421356e-01 5.00000000e+00]
     [1.07000000e+02 1.30000000e+02 1.41421356e-01 2.00000000e+00]
     [1.34000000e+02 2.17000000e+02 2.00000000e-01 4.00000000e+00]
     [1.18000000e+02 1.22000000e+02 2.00000000e-01 2.00000000e+00]
     [6.00000000e+00 2.20000000e+01 2.00000000e-01 2.00000000e+00]
     [1.17000000e+02 1.31000000e+02 2.00000000e-01 2.00000000e+00]
     [9.80000000e+01 2.09000000e+02 2.23606798e-01 3.00000000e+00]
     [1.92000000e+02 2.26000000e+02 2.23606798e-01 7.00000000e+00]
     [7.00000000e+01 8.50000000e+01 2.23606798e-01 2.00000000e+00]
     [1.40000000e+01 1.80000000e+01 2.23606798e-01 2.00000000e+00]
     [1.52000000e+02 2.11000000e+02 2.23606798e-01 4.00000000e+00]
     [1.89000000e+02 2.16000000e+02 2.23606798e-01 9.00000000e+00]
     [9.30000000e+01 2.41000000e+02 2.23606798e-01 4.00000000e+00]
     [2.12000000e+02 2.30000000e+02 2.23606798e-01 6.00000000e+00]
     [2.05000000e+02 2.22000000e+02 2.23606798e-01 7.00000000e+00]
     [2.06000000e+02 2.21000000e+02 2.23606798e-01 9.00000000e+00]
     [1.25000000e+02 2.13000000e+02 2.23606798e-01 3.00000000e+00]
     [1.84000000e+02 1.86000000e+02 2.82842712e-01 6.00000000e+00]
     [1.50000000e+01 3.30000000e+01 2.82842712e-01 2.00000000e+00]
     [1.53000000e+02 2.39000000e+02 2.82842712e-01 4.00000000e+00]
     [4.20000000e+01 2.34000000e+02 3.00000000e-01 5.00000000e+00]
     [2.28000000e+02 2.29000000e+02 3.00000000e-01 1.10000000e+01]
     [2.10000000e+02 2.23000000e+02 3.00000000e-01 6.00000000e+00]
     [1.81000000e+02 2.19000000e+02 3.16227766e-01 4.00000000e+00]
     [7.60000000e+01 1.08000000e+02 3.16227766e-01 2.00000000e+00]
     [2.18000000e+02 2.31000000e+02 3.16227766e-01 8.00000000e+00]
     [8.40000000e+01 2.35000000e+02 3.16227766e-01 6.00000000e+00]
     [2.32000000e+02 2.48000000e+02 3.16227766e-01 1.00000000e+01]
     [2.24000000e+02 2.46000000e+02 3.16227766e-01 1.20000000e+01]
     [2.15000000e+02 2.50000000e+02 3.60555128e-01 1.20000000e+01]
     [2.25000000e+02 2.27000000e+02 3.60555128e-01 8.00000000e+00]
     [2.14000000e+02 2.38000000e+02 4.12310563e-01 4.00000000e+00]
     [1.82000000e+02 2.58000000e+02 4.24264069e-01 6.00000000e+00]
     [3.20000000e+01 2.45000000e+02 4.47213595e-01 5.00000000e+00]
     [2.37000000e+02 2.42000000e+02 4.47213595e-01 1.10000000e+01]
     [2.33000000e+02 2.55000000e+02 4.47213595e-01 8.00000000e+00]
     [5.90000000e+01 2.47000000e+02 4.47213595e-01 5.00000000e+00]
     [2.36000000e+02 2.51000000e+02 4.47213595e-01 5.00000000e+00]
     [2.56000000e+02 2.60000000e+02 5.38516481e-01 1.90000000e+01]
     [2.20000000e+02 2.64000000e+02 5.83095189e-01 1.50000000e+01]
     [2.52000000e+02 2.68000000e+02 5.83095189e-01 1.10000000e+01]
     [2.61000000e+02 2.63000000e+02 5.83095189e-01 1.80000000e+01]
     [4.10000000e+01 6.00000000e+01 5.83095189e-01 2.00000000e+00]
     [2.43000000e+02 2.65000000e+02 6.00000000e-01 1.00000000e+01]
     [2.44000000e+02 2.53000000e+02 6.00000000e-01 4.00000000e+00]
     [2.62000000e+02 2.69000000e+02 6.32455532e-01 2.10000000e+01]
     [2.49000000e+02 2.70000000e+02 7.00000000e-01 1.50000000e+01]
     [2.57000000e+02 2.71000000e+02 7.07106781e-01 1.10000000e+01]
     [2.54000000e+02 2.81000000e+02 7.28010989e-01 1.90000000e+01]
     [1.09000000e+02 2.40000000e+02 7.28010989e-01 3.00000000e+00]
     [2.66000000e+02 2.72000000e+02 7.81024968e-01 9.00000000e+00]
     [2.59000000e+02 2.73000000e+02 8.00000000e-01 2.10000000e+01]
     [2.78000000e+02 2.80000000e+02 8.24621125e-01 3.10000000e+01]
     [2.74000000e+02 2.75000000e+02 9.21954446e-01 2.60000000e+01]
     [2.76000000e+02 2.82000000e+02 1.00000000e+00 2.90000000e+01]
     [2.86000000e+02 2.87000000e+02 1.14017543e+00 5.20000000e+01]
     [2.84000000e+02 2.85000000e+02 1.21655251e+00 1.20000000e+01]
     [2.79000000e+02 2.88000000e+02 1.38924440e+00 3.00000000e+01]
     [2.77000000e+02 2.89000000e+02 1.39283883e+00 3.10000000e+01]
     [2.67000000e+02 2.90000000e+02 1.41421356e+00 5.80000000e+01]
     [2.83000000e+02 2.93000000e+02 1.64924225e+00 5.00000000e+01]
     [2.92000000e+02 2.94000000e+02 2.25610283e+00 8.80000000e+01]
     [2.95000000e+02 2.96000000e+02 2.70739727e+00 1.38000000e+02]
     [2.91000000e+02 2.97000000e+02 3.71618084e+00 1.50000000e+02]]
         군집id1  군집id2        거리    멤버수
    0      0.0   17.0  0.000000    2.0
    1      2.0   29.0  0.000000    2.0
    2      5.0   16.0  0.000000    2.0
    3     11.0   24.0  0.000000    2.0
    4      7.0   26.0  0.000000    2.0
    ..     ...    ...       ...    ...
    144  267.0  290.0  1.414214   58.0
    145  283.0  293.0  1.649242   50.0
    146  292.0  294.0  2.256103   88.0
    147  295.0  296.0  2.707397  138.0
    148  291.0  297.0  3.716181  150.0
    
    [149 rows x 4 columns]

    군집형성 시각화

    댓글

Designed by Tistory.