Python 데이터 분석

Python 데이터분석 기초 74 - Clustering(군집화) - 계층 군집분석 - data(iris)

코딩탕탕 2022. 11. 25. 16:09

 

계층적 군집분석

개별 대상 간의 거리에 의하여 가장 가까이 있는 대상들로 부터 시작하여 결합해 감으로써 나무모양의 계층적 구조를 형성해 나가는 방법으로 이 과정에서 군집의 수가 감소한다. 계층적 군집분석은 군집이 형성되는 과정을 정확하게 파악할 수 있다는 장점이 있으나 자료의 크기가 크면 분석하기 어렵다는 단점이 있다.

 

방법 : 단일결합법, 완전결합법, 평균결합법, 중심결합기준법, Ward법

 

 

 

# iris dataset으로 군집화

import pandas as pd
import matplotlib.pyplot as plt
plt.rc('font', family = 'malgun gothic')
from sklearn.datasets import load_iris
from scipy.spatial.distance import pdist, squareform
from scipy.cluster.hierarchy import linkage, dendrogram


iris = load_iris()
iris_df = pd.DataFrame(iris.data, columns = iris.feature_names)
print(iris_df.head(3))
print()
# dist_vec = pdist(iris_df.loc[0:4, ['sepal length (cm)', 'sepal width (cm)']], metric='euclidean')
dist_vec = pdist(iris_df.loc[:, ['sepal length (cm)', 'sepal width (cm)']], metric='euclidean')
print('dist_vec :', dist_vec)
print()
row_dist = pd.DataFrame(squareform(dist_vec))
print(row_dist) # squareform을 활용하여 데이터 프레임으로 넣어주면 보기 편하다.

row_clusters = linkage(dist_vec, method='complete') # linkage 안에는 데이터 간의 거리 데이터를 넣는다.
print('row_clusters :', row_clusters)
df = pd.DataFrame(row_clusters, columns=['군집id1', '군집id2', '거리', '멤버수'])
print(df)

# dendrogram으로 row_clusters를 시각화
low_dend = dendrogram(row_clusters)
plt.ylabel('유클리드 거리')
plt.show()



<console>
   sepal length (cm)  sepal width (cm)  petal length (cm)  petal width (cm)
0                5.1               3.5                1.4               0.2
1                4.9               3.0                1.4               0.2
2                4.7               3.2                1.3               0.2

dist_vec : [0.53851648 0.5        0.64031242 ... 0.5        0.6        0.5       ]

          0         1         2    ...       147       148       149
0    0.000000  0.538516  0.500000  ...  1.486607  1.104536  0.943398
1    0.538516  0.000000  0.282843  ...  1.600000  1.360147  1.000000
2    0.500000  0.282843  0.000000  ...  1.811077  1.513275  1.216553
3    0.640312  0.316228  0.141421  ...  1.902630  1.627882  1.303840
4    0.141421  0.608276  0.500000  ...  1.615549  1.216553  1.081665
..        ...       ...       ...  ...       ...       ...       ...
145  1.676305  1.800000  2.009975  ...  0.200000  0.640312  0.800000
146  1.562050  1.486607  1.746425  ...  0.538516  0.905539  0.640312
147  1.486607  1.600000  1.811077  ...  0.000000  0.500000  0.600000
148  1.104536  1.360147  1.513275  ...  0.500000  0.000000  0.500000
149  0.943398  1.000000  1.216553  ...  0.600000  0.500000  0.000000

[150 rows x 150 columns]
row_clusters : [[0.00000000e+00 1.70000000e+01 0.00000000e+00 2.00000000e+00]
 [2.00000000e+00 2.90000000e+01 0.00000000e+00 2.00000000e+00]
 [5.00000000e+00 1.60000000e+01 0.00000000e+00 2.00000000e+00]
 [1.10000000e+01 2.40000000e+01 0.00000000e+00 2.00000000e+00]
 [7.00000000e+00 2.60000000e+01 0.00000000e+00 2.00000000e+00]
 [9.00000000e+00 3.40000000e+01 0.00000000e+00 2.00000000e+00]
 [1.20000000e+01 4.50000000e+01 0.00000000e+00 2.00000000e+00]
 [4.00000000e+01 4.30000000e+01 0.00000000e+00 2.00000000e+00]
 [2.00000000e+01 3.10000000e+01 0.00000000e+00 2.00000000e+00]
 [1.90000000e+01 4.40000000e+01 0.00000000e+00 2.00000000e+00]
 [4.60000000e+01 1.59000000e+02 0.00000000e+00 3.00000000e+00]
 [6.60000000e+01 8.80000000e+01 0.00000000e+00 2.00000000e+00]
 [5.50000000e+01 9.90000000e+01 0.00000000e+00 2.00000000e+00]
 [6.70000000e+01 8.20000000e+01 0.00000000e+00 2.00000000e+00]
 [1.01000000e+02 1.63000000e+02 0.00000000e+00 3.00000000e+00]
 [1.42000000e+02 1.64000000e+02 0.00000000e+00 4.00000000e+00]
 [6.10000000e+01 1.49000000e+02 0.00000000e+00 2.00000000e+00]
 [8.00000000e+01 8.10000000e+01 0.00000000e+00 2.00000000e+00]
 [5.60000000e+01 1.00000000e+02 0.00000000e+00 2.00000000e+00]
 [9.10000000e+01 1.27000000e+02 0.00000000e+00 2.00000000e+00]
 [7.10000000e+01 7.30000000e+01 0.00000000e+00 2.00000000e+00]
 [5.10000000e+01 1.15000000e+02 0.00000000e+00 2.00000000e+00]
 [1.04000000e+02 1.16000000e+02 0.00000000e+00 2.00000000e+00]
 [1.47000000e+02 1.72000000e+02 0.00000000e+00 3.00000000e+00]
 [6.50000000e+01 8.60000000e+01 0.00000000e+00 2.00000000e+00]
 [1.40000000e+02 1.74000000e+02 0.00000000e+00 3.00000000e+00]
 [7.70000000e+01 1.45000000e+02 0.00000000e+00 2.00000000e+00]
 [1.24000000e+02 1.44000000e+02 0.00000000e+00 2.00000000e+00]
 [1.28000000e+02 1.32000000e+02 0.00000000e+00 2.00000000e+00]
 [5.20000000e+01 1.39000000e+02 0.00000000e+00 2.00000000e+00]
 [1.41000000e+02 1.79000000e+02 0.00000000e+00 3.00000000e+00]
 [7.20000000e+01 1.46000000e+02 0.00000000e+00 2.00000000e+00]
 [6.20000000e+01 1.19000000e+02 0.00000000e+00 2.00000000e+00]
 [1.00000000e+00 2.50000000e+01 1.00000000e-01 2.00000000e+00]
 [4.00000000e+00 3.70000000e+01 1.00000000e-01 2.00000000e+00]
 [1.50000000e+02 1.57000000e+02 1.00000000e-01 4.00000000e+00]
 [2.10000000e+01 1.60000000e+02 1.00000000e-01 4.00000000e+00]
 [3.90000000e+01 1.54000000e+02 1.00000000e-01 3.00000000e+00]
 [2.30000000e+01 4.90000000e+01 1.00000000e-01 2.00000000e+00]
 [9.40000000e+01 1.21000000e+02 1.00000000e-01 2.00000000e+00]
 [1.14000000e+02 1.62000000e+02 1.00000000e-01 3.00000000e+00]
 [7.90000000e+01 9.20000000e+01 1.00000000e-01 2.00000000e+00]
 [1.38000000e+02 1.66000000e+02 1.00000000e-01 3.00000000e+00]
 [1.36000000e+02 1.48000000e+02 1.00000000e-01 2.00000000e+00]
 [6.90000000e+01 8.90000000e+01 1.00000000e-01 2.00000000e+00]
 [6.30000000e+01 7.80000000e+01 1.00000000e-01 2.00000000e+00]
 [1.26000000e+02 1.33000000e+02 1.00000000e-01 2.00000000e+00]
 [9.70000000e+01 1.03000000e+02 1.00000000e-01 2.00000000e+00]
 [1.10000000e+02 1.71000000e+02 1.00000000e-01 3.00000000e+00]
 [7.50000000e+01 1.73000000e+02 1.00000000e-01 4.00000000e+00]
 [1.12000000e+02 1.76000000e+02 1.00000000e-01 3.00000000e+00]
 [5.00000000e+01 1.20000000e+02 1.00000000e-01 2.00000000e+00]
 [5.40000000e+01 1.78000000e+02 1.00000000e-01 3.00000000e+00]
 [3.00000000e+00 4.70000000e+01 1.00000000e-01 2.00000000e+00]
 [8.00000000e+00 3.80000000e+01 1.00000000e-01 2.00000000e+00]
 [3.00000000e+01 1.56000000e+02 1.00000000e-01 3.00000000e+00]
 [2.70000000e+01 2.80000000e+01 1.00000000e-01 2.00000000e+00]
 [6.40000000e+01 1.61000000e+02 1.00000000e-01 3.00000000e+00]
 [9.50000000e+01 9.60000000e+01 1.00000000e-01 2.00000000e+00]
 [5.70000000e+01 1.06000000e+02 1.00000000e-01 2.00000000e+00]
 [5.30000000e+01 1.67000000e+02 1.00000000e-01 3.00000000e+00]
 [1.00000000e+01 4.80000000e+01 1.00000000e-01 2.00000000e+00]
 [1.11000000e+02 1.23000000e+02 1.00000000e-01 2.00000000e+00]
 [1.02000000e+02 1.29000000e+02 1.00000000e-01 2.00000000e+00]
 [1.05000000e+02 1.35000000e+02 1.00000000e-01 2.00000000e+00]
 [3.50000000e+01 1.88000000e+02 1.41421356e-01 3.00000000e+00]
 [1.65000000e+02 1.90000000e+02 1.41421356e-01 7.00000000e+00]
 [8.30000000e+01 1.70000000e+02 1.41421356e-01 3.00000000e+00]
 [1.43000000e+02 1.77000000e+02 1.41421356e-01 3.00000000e+00]
 [6.80000000e+01 8.70000000e+01 1.41421356e-01 2.00000000e+00]
 [3.60000000e+01 1.58000000e+02 1.41421356e-01 3.00000000e+00]
 [1.85000000e+02 1.87000000e+02 1.41421356e-01 7.00000000e+00]
 [1.55000000e+02 1.83000000e+02 1.41421356e-01 4.00000000e+00]
 [9.00000000e+01 1.94000000e+02 1.41421356e-01 3.00000000e+00]
 [1.13000000e+02 1.91000000e+02 1.41421356e-01 3.00000000e+00]
 [1.68000000e+02 1.93000000e+02 1.41421356e-01 4.00000000e+00]
 [1.69000000e+02 1.95000000e+02 1.41421356e-01 4.00000000e+00]
 [1.37000000e+02 1.98000000e+02 1.41421356e-01 4.00000000e+00]
 [5.80000000e+01 1.99000000e+02 1.41421356e-01 5.00000000e+00]
 [1.75000000e+02 2.00000000e+02 1.41421356e-01 6.00000000e+00]
 [7.40000000e+01 2.02000000e+02 1.41421356e-01 4.00000000e+00]
 [1.80000000e+02 2.01000000e+02 1.41421356e-01 5.00000000e+00]
 [1.96000000e+02 1.97000000e+02 1.41421356e-01 4.00000000e+00]
 [1.30000000e+01 2.04000000e+02 1.41421356e-01 3.00000000e+00]
 [1.51000000e+02 2.03000000e+02 1.41421356e-01 4.00000000e+00]
 [2.07000000e+02 2.08000000e+02 1.41421356e-01 5.00000000e+00]
 [1.07000000e+02 1.30000000e+02 1.41421356e-01 2.00000000e+00]
 [1.34000000e+02 2.17000000e+02 2.00000000e-01 4.00000000e+00]
 [1.18000000e+02 1.22000000e+02 2.00000000e-01 2.00000000e+00]
 [6.00000000e+00 2.20000000e+01 2.00000000e-01 2.00000000e+00]
 [1.17000000e+02 1.31000000e+02 2.00000000e-01 2.00000000e+00]
 [9.80000000e+01 2.09000000e+02 2.23606798e-01 3.00000000e+00]
 [1.92000000e+02 2.26000000e+02 2.23606798e-01 7.00000000e+00]
 [7.00000000e+01 8.50000000e+01 2.23606798e-01 2.00000000e+00]
 [1.40000000e+01 1.80000000e+01 2.23606798e-01 2.00000000e+00]
 [1.52000000e+02 2.11000000e+02 2.23606798e-01 4.00000000e+00]
 [1.89000000e+02 2.16000000e+02 2.23606798e-01 9.00000000e+00]
 [9.30000000e+01 2.41000000e+02 2.23606798e-01 4.00000000e+00]
 [2.12000000e+02 2.30000000e+02 2.23606798e-01 6.00000000e+00]
 [2.05000000e+02 2.22000000e+02 2.23606798e-01 7.00000000e+00]
 [2.06000000e+02 2.21000000e+02 2.23606798e-01 9.00000000e+00]
 [1.25000000e+02 2.13000000e+02 2.23606798e-01 3.00000000e+00]
 [1.84000000e+02 1.86000000e+02 2.82842712e-01 6.00000000e+00]
 [1.50000000e+01 3.30000000e+01 2.82842712e-01 2.00000000e+00]
 [1.53000000e+02 2.39000000e+02 2.82842712e-01 4.00000000e+00]
 [4.20000000e+01 2.34000000e+02 3.00000000e-01 5.00000000e+00]
 [2.28000000e+02 2.29000000e+02 3.00000000e-01 1.10000000e+01]
 [2.10000000e+02 2.23000000e+02 3.00000000e-01 6.00000000e+00]
 [1.81000000e+02 2.19000000e+02 3.16227766e-01 4.00000000e+00]
 [7.60000000e+01 1.08000000e+02 3.16227766e-01 2.00000000e+00]
 [2.18000000e+02 2.31000000e+02 3.16227766e-01 8.00000000e+00]
 [8.40000000e+01 2.35000000e+02 3.16227766e-01 6.00000000e+00]
 [2.32000000e+02 2.48000000e+02 3.16227766e-01 1.00000000e+01]
 [2.24000000e+02 2.46000000e+02 3.16227766e-01 1.20000000e+01]
 [2.15000000e+02 2.50000000e+02 3.60555128e-01 1.20000000e+01]
 [2.25000000e+02 2.27000000e+02 3.60555128e-01 8.00000000e+00]
 [2.14000000e+02 2.38000000e+02 4.12310563e-01 4.00000000e+00]
 [1.82000000e+02 2.58000000e+02 4.24264069e-01 6.00000000e+00]
 [3.20000000e+01 2.45000000e+02 4.47213595e-01 5.00000000e+00]
 [2.37000000e+02 2.42000000e+02 4.47213595e-01 1.10000000e+01]
 [2.33000000e+02 2.55000000e+02 4.47213595e-01 8.00000000e+00]
 [5.90000000e+01 2.47000000e+02 4.47213595e-01 5.00000000e+00]
 [2.36000000e+02 2.51000000e+02 4.47213595e-01 5.00000000e+00]
 [2.56000000e+02 2.60000000e+02 5.38516481e-01 1.90000000e+01]
 [2.20000000e+02 2.64000000e+02 5.83095189e-01 1.50000000e+01]
 [2.52000000e+02 2.68000000e+02 5.83095189e-01 1.10000000e+01]
 [2.61000000e+02 2.63000000e+02 5.83095189e-01 1.80000000e+01]
 [4.10000000e+01 6.00000000e+01 5.83095189e-01 2.00000000e+00]
 [2.43000000e+02 2.65000000e+02 6.00000000e-01 1.00000000e+01]
 [2.44000000e+02 2.53000000e+02 6.00000000e-01 4.00000000e+00]
 [2.62000000e+02 2.69000000e+02 6.32455532e-01 2.10000000e+01]
 [2.49000000e+02 2.70000000e+02 7.00000000e-01 1.50000000e+01]
 [2.57000000e+02 2.71000000e+02 7.07106781e-01 1.10000000e+01]
 [2.54000000e+02 2.81000000e+02 7.28010989e-01 1.90000000e+01]
 [1.09000000e+02 2.40000000e+02 7.28010989e-01 3.00000000e+00]
 [2.66000000e+02 2.72000000e+02 7.81024968e-01 9.00000000e+00]
 [2.59000000e+02 2.73000000e+02 8.00000000e-01 2.10000000e+01]
 [2.78000000e+02 2.80000000e+02 8.24621125e-01 3.10000000e+01]
 [2.74000000e+02 2.75000000e+02 9.21954446e-01 2.60000000e+01]
 [2.76000000e+02 2.82000000e+02 1.00000000e+00 2.90000000e+01]
 [2.86000000e+02 2.87000000e+02 1.14017543e+00 5.20000000e+01]
 [2.84000000e+02 2.85000000e+02 1.21655251e+00 1.20000000e+01]
 [2.79000000e+02 2.88000000e+02 1.38924440e+00 3.00000000e+01]
 [2.77000000e+02 2.89000000e+02 1.39283883e+00 3.10000000e+01]
 [2.67000000e+02 2.90000000e+02 1.41421356e+00 5.80000000e+01]
 [2.83000000e+02 2.93000000e+02 1.64924225e+00 5.00000000e+01]
 [2.92000000e+02 2.94000000e+02 2.25610283e+00 8.80000000e+01]
 [2.95000000e+02 2.96000000e+02 2.70739727e+00 1.38000000e+02]
 [2.91000000e+02 2.97000000e+02 3.71618084e+00 1.50000000e+02]]
     군집id1  군집id2        거리    멤버수
0      0.0   17.0  0.000000    2.0
1      2.0   29.0  0.000000    2.0
2      5.0   16.0  0.000000    2.0
3     11.0   24.0  0.000000    2.0
4      7.0   26.0  0.000000    2.0
..     ...    ...       ...    ...
144  267.0  290.0  1.414214   58.0
145  283.0  293.0  1.649242   50.0
146  292.0  294.0  2.256103   88.0
147  295.0  296.0  2.707397  138.0
148  291.0  297.0  3.716181  150.0

[149 rows x 4 columns]

군집형성 시각화