Python 데이터 분석
Python 데이터분석 기초 74 - Clustering(군집화) - 계층 군집분석 - data(iris)
코딩탕탕
2022. 11. 25. 16:09
계층적 군집분석
개별 대상 간의 거리에 의하여 가장 가까이 있는 대상들로 부터 시작하여 결합해 감으로써 나무모양의 계층적 구조를 형성해 나가는 방법으로 이 과정에서 군집의 수가 감소한다. 계층적 군집분석은 군집이 형성되는 과정을 정확하게 파악할 수 있다는 장점이 있으나 자료의 크기가 크면 분석하기 어렵다는 단점이 있다.
방법 : 단일결합법, 완전결합법, 평균결합법, 중심결합기준법, Ward법
# iris dataset으로 군집화
import pandas as pd
import matplotlib.pyplot as plt
plt.rc('font', family = 'malgun gothic')
from sklearn.datasets import load_iris
from scipy.spatial.distance import pdist, squareform
from scipy.cluster.hierarchy import linkage, dendrogram
iris = load_iris()
iris_df = pd.DataFrame(iris.data, columns = iris.feature_names)
print(iris_df.head(3))
print()
# dist_vec = pdist(iris_df.loc[0:4, ['sepal length (cm)', 'sepal width (cm)']], metric='euclidean')
dist_vec = pdist(iris_df.loc[:, ['sepal length (cm)', 'sepal width (cm)']], metric='euclidean')
print('dist_vec :', dist_vec)
print()
row_dist = pd.DataFrame(squareform(dist_vec))
print(row_dist) # squareform을 활용하여 데이터 프레임으로 넣어주면 보기 편하다.
row_clusters = linkage(dist_vec, method='complete') # linkage 안에는 데이터 간의 거리 데이터를 넣는다.
print('row_clusters :', row_clusters)
df = pd.DataFrame(row_clusters, columns=['군집id1', '군집id2', '거리', '멤버수'])
print(df)
# dendrogram으로 row_clusters를 시각화
low_dend = dendrogram(row_clusters)
plt.ylabel('유클리드 거리')
plt.show()
<console>
sepal length (cm) sepal width (cm) petal length (cm) petal width (cm)
0 5.1 3.5 1.4 0.2
1 4.9 3.0 1.4 0.2
2 4.7 3.2 1.3 0.2
dist_vec : [0.53851648 0.5 0.64031242 ... 0.5 0.6 0.5 ]
0 1 2 ... 147 148 149
0 0.000000 0.538516 0.500000 ... 1.486607 1.104536 0.943398
1 0.538516 0.000000 0.282843 ... 1.600000 1.360147 1.000000
2 0.500000 0.282843 0.000000 ... 1.811077 1.513275 1.216553
3 0.640312 0.316228 0.141421 ... 1.902630 1.627882 1.303840
4 0.141421 0.608276 0.500000 ... 1.615549 1.216553 1.081665
.. ... ... ... ... ... ... ...
145 1.676305 1.800000 2.009975 ... 0.200000 0.640312 0.800000
146 1.562050 1.486607 1.746425 ... 0.538516 0.905539 0.640312
147 1.486607 1.600000 1.811077 ... 0.000000 0.500000 0.600000
148 1.104536 1.360147 1.513275 ... 0.500000 0.000000 0.500000
149 0.943398 1.000000 1.216553 ... 0.600000 0.500000 0.000000
[150 rows x 150 columns]
row_clusters : [[0.00000000e+00 1.70000000e+01 0.00000000e+00 2.00000000e+00]
[2.00000000e+00 2.90000000e+01 0.00000000e+00 2.00000000e+00]
[5.00000000e+00 1.60000000e+01 0.00000000e+00 2.00000000e+00]
[1.10000000e+01 2.40000000e+01 0.00000000e+00 2.00000000e+00]
[7.00000000e+00 2.60000000e+01 0.00000000e+00 2.00000000e+00]
[9.00000000e+00 3.40000000e+01 0.00000000e+00 2.00000000e+00]
[1.20000000e+01 4.50000000e+01 0.00000000e+00 2.00000000e+00]
[4.00000000e+01 4.30000000e+01 0.00000000e+00 2.00000000e+00]
[2.00000000e+01 3.10000000e+01 0.00000000e+00 2.00000000e+00]
[1.90000000e+01 4.40000000e+01 0.00000000e+00 2.00000000e+00]
[4.60000000e+01 1.59000000e+02 0.00000000e+00 3.00000000e+00]
[6.60000000e+01 8.80000000e+01 0.00000000e+00 2.00000000e+00]
[5.50000000e+01 9.90000000e+01 0.00000000e+00 2.00000000e+00]
[6.70000000e+01 8.20000000e+01 0.00000000e+00 2.00000000e+00]
[1.01000000e+02 1.63000000e+02 0.00000000e+00 3.00000000e+00]
[1.42000000e+02 1.64000000e+02 0.00000000e+00 4.00000000e+00]
[6.10000000e+01 1.49000000e+02 0.00000000e+00 2.00000000e+00]
[8.00000000e+01 8.10000000e+01 0.00000000e+00 2.00000000e+00]
[5.60000000e+01 1.00000000e+02 0.00000000e+00 2.00000000e+00]
[9.10000000e+01 1.27000000e+02 0.00000000e+00 2.00000000e+00]
[7.10000000e+01 7.30000000e+01 0.00000000e+00 2.00000000e+00]
[5.10000000e+01 1.15000000e+02 0.00000000e+00 2.00000000e+00]
[1.04000000e+02 1.16000000e+02 0.00000000e+00 2.00000000e+00]
[1.47000000e+02 1.72000000e+02 0.00000000e+00 3.00000000e+00]
[6.50000000e+01 8.60000000e+01 0.00000000e+00 2.00000000e+00]
[1.40000000e+02 1.74000000e+02 0.00000000e+00 3.00000000e+00]
[7.70000000e+01 1.45000000e+02 0.00000000e+00 2.00000000e+00]
[1.24000000e+02 1.44000000e+02 0.00000000e+00 2.00000000e+00]
[1.28000000e+02 1.32000000e+02 0.00000000e+00 2.00000000e+00]
[5.20000000e+01 1.39000000e+02 0.00000000e+00 2.00000000e+00]
[1.41000000e+02 1.79000000e+02 0.00000000e+00 3.00000000e+00]
[7.20000000e+01 1.46000000e+02 0.00000000e+00 2.00000000e+00]
[6.20000000e+01 1.19000000e+02 0.00000000e+00 2.00000000e+00]
[1.00000000e+00 2.50000000e+01 1.00000000e-01 2.00000000e+00]
[4.00000000e+00 3.70000000e+01 1.00000000e-01 2.00000000e+00]
[1.50000000e+02 1.57000000e+02 1.00000000e-01 4.00000000e+00]
[2.10000000e+01 1.60000000e+02 1.00000000e-01 4.00000000e+00]
[3.90000000e+01 1.54000000e+02 1.00000000e-01 3.00000000e+00]
[2.30000000e+01 4.90000000e+01 1.00000000e-01 2.00000000e+00]
[9.40000000e+01 1.21000000e+02 1.00000000e-01 2.00000000e+00]
[1.14000000e+02 1.62000000e+02 1.00000000e-01 3.00000000e+00]
[7.90000000e+01 9.20000000e+01 1.00000000e-01 2.00000000e+00]
[1.38000000e+02 1.66000000e+02 1.00000000e-01 3.00000000e+00]
[1.36000000e+02 1.48000000e+02 1.00000000e-01 2.00000000e+00]
[6.90000000e+01 8.90000000e+01 1.00000000e-01 2.00000000e+00]
[6.30000000e+01 7.80000000e+01 1.00000000e-01 2.00000000e+00]
[1.26000000e+02 1.33000000e+02 1.00000000e-01 2.00000000e+00]
[9.70000000e+01 1.03000000e+02 1.00000000e-01 2.00000000e+00]
[1.10000000e+02 1.71000000e+02 1.00000000e-01 3.00000000e+00]
[7.50000000e+01 1.73000000e+02 1.00000000e-01 4.00000000e+00]
[1.12000000e+02 1.76000000e+02 1.00000000e-01 3.00000000e+00]
[5.00000000e+01 1.20000000e+02 1.00000000e-01 2.00000000e+00]
[5.40000000e+01 1.78000000e+02 1.00000000e-01 3.00000000e+00]
[3.00000000e+00 4.70000000e+01 1.00000000e-01 2.00000000e+00]
[8.00000000e+00 3.80000000e+01 1.00000000e-01 2.00000000e+00]
[3.00000000e+01 1.56000000e+02 1.00000000e-01 3.00000000e+00]
[2.70000000e+01 2.80000000e+01 1.00000000e-01 2.00000000e+00]
[6.40000000e+01 1.61000000e+02 1.00000000e-01 3.00000000e+00]
[9.50000000e+01 9.60000000e+01 1.00000000e-01 2.00000000e+00]
[5.70000000e+01 1.06000000e+02 1.00000000e-01 2.00000000e+00]
[5.30000000e+01 1.67000000e+02 1.00000000e-01 3.00000000e+00]
[1.00000000e+01 4.80000000e+01 1.00000000e-01 2.00000000e+00]
[1.11000000e+02 1.23000000e+02 1.00000000e-01 2.00000000e+00]
[1.02000000e+02 1.29000000e+02 1.00000000e-01 2.00000000e+00]
[1.05000000e+02 1.35000000e+02 1.00000000e-01 2.00000000e+00]
[3.50000000e+01 1.88000000e+02 1.41421356e-01 3.00000000e+00]
[1.65000000e+02 1.90000000e+02 1.41421356e-01 7.00000000e+00]
[8.30000000e+01 1.70000000e+02 1.41421356e-01 3.00000000e+00]
[1.43000000e+02 1.77000000e+02 1.41421356e-01 3.00000000e+00]
[6.80000000e+01 8.70000000e+01 1.41421356e-01 2.00000000e+00]
[3.60000000e+01 1.58000000e+02 1.41421356e-01 3.00000000e+00]
[1.85000000e+02 1.87000000e+02 1.41421356e-01 7.00000000e+00]
[1.55000000e+02 1.83000000e+02 1.41421356e-01 4.00000000e+00]
[9.00000000e+01 1.94000000e+02 1.41421356e-01 3.00000000e+00]
[1.13000000e+02 1.91000000e+02 1.41421356e-01 3.00000000e+00]
[1.68000000e+02 1.93000000e+02 1.41421356e-01 4.00000000e+00]
[1.69000000e+02 1.95000000e+02 1.41421356e-01 4.00000000e+00]
[1.37000000e+02 1.98000000e+02 1.41421356e-01 4.00000000e+00]
[5.80000000e+01 1.99000000e+02 1.41421356e-01 5.00000000e+00]
[1.75000000e+02 2.00000000e+02 1.41421356e-01 6.00000000e+00]
[7.40000000e+01 2.02000000e+02 1.41421356e-01 4.00000000e+00]
[1.80000000e+02 2.01000000e+02 1.41421356e-01 5.00000000e+00]
[1.96000000e+02 1.97000000e+02 1.41421356e-01 4.00000000e+00]
[1.30000000e+01 2.04000000e+02 1.41421356e-01 3.00000000e+00]
[1.51000000e+02 2.03000000e+02 1.41421356e-01 4.00000000e+00]
[2.07000000e+02 2.08000000e+02 1.41421356e-01 5.00000000e+00]
[1.07000000e+02 1.30000000e+02 1.41421356e-01 2.00000000e+00]
[1.34000000e+02 2.17000000e+02 2.00000000e-01 4.00000000e+00]
[1.18000000e+02 1.22000000e+02 2.00000000e-01 2.00000000e+00]
[6.00000000e+00 2.20000000e+01 2.00000000e-01 2.00000000e+00]
[1.17000000e+02 1.31000000e+02 2.00000000e-01 2.00000000e+00]
[9.80000000e+01 2.09000000e+02 2.23606798e-01 3.00000000e+00]
[1.92000000e+02 2.26000000e+02 2.23606798e-01 7.00000000e+00]
[7.00000000e+01 8.50000000e+01 2.23606798e-01 2.00000000e+00]
[1.40000000e+01 1.80000000e+01 2.23606798e-01 2.00000000e+00]
[1.52000000e+02 2.11000000e+02 2.23606798e-01 4.00000000e+00]
[1.89000000e+02 2.16000000e+02 2.23606798e-01 9.00000000e+00]
[9.30000000e+01 2.41000000e+02 2.23606798e-01 4.00000000e+00]
[2.12000000e+02 2.30000000e+02 2.23606798e-01 6.00000000e+00]
[2.05000000e+02 2.22000000e+02 2.23606798e-01 7.00000000e+00]
[2.06000000e+02 2.21000000e+02 2.23606798e-01 9.00000000e+00]
[1.25000000e+02 2.13000000e+02 2.23606798e-01 3.00000000e+00]
[1.84000000e+02 1.86000000e+02 2.82842712e-01 6.00000000e+00]
[1.50000000e+01 3.30000000e+01 2.82842712e-01 2.00000000e+00]
[1.53000000e+02 2.39000000e+02 2.82842712e-01 4.00000000e+00]
[4.20000000e+01 2.34000000e+02 3.00000000e-01 5.00000000e+00]
[2.28000000e+02 2.29000000e+02 3.00000000e-01 1.10000000e+01]
[2.10000000e+02 2.23000000e+02 3.00000000e-01 6.00000000e+00]
[1.81000000e+02 2.19000000e+02 3.16227766e-01 4.00000000e+00]
[7.60000000e+01 1.08000000e+02 3.16227766e-01 2.00000000e+00]
[2.18000000e+02 2.31000000e+02 3.16227766e-01 8.00000000e+00]
[8.40000000e+01 2.35000000e+02 3.16227766e-01 6.00000000e+00]
[2.32000000e+02 2.48000000e+02 3.16227766e-01 1.00000000e+01]
[2.24000000e+02 2.46000000e+02 3.16227766e-01 1.20000000e+01]
[2.15000000e+02 2.50000000e+02 3.60555128e-01 1.20000000e+01]
[2.25000000e+02 2.27000000e+02 3.60555128e-01 8.00000000e+00]
[2.14000000e+02 2.38000000e+02 4.12310563e-01 4.00000000e+00]
[1.82000000e+02 2.58000000e+02 4.24264069e-01 6.00000000e+00]
[3.20000000e+01 2.45000000e+02 4.47213595e-01 5.00000000e+00]
[2.37000000e+02 2.42000000e+02 4.47213595e-01 1.10000000e+01]
[2.33000000e+02 2.55000000e+02 4.47213595e-01 8.00000000e+00]
[5.90000000e+01 2.47000000e+02 4.47213595e-01 5.00000000e+00]
[2.36000000e+02 2.51000000e+02 4.47213595e-01 5.00000000e+00]
[2.56000000e+02 2.60000000e+02 5.38516481e-01 1.90000000e+01]
[2.20000000e+02 2.64000000e+02 5.83095189e-01 1.50000000e+01]
[2.52000000e+02 2.68000000e+02 5.83095189e-01 1.10000000e+01]
[2.61000000e+02 2.63000000e+02 5.83095189e-01 1.80000000e+01]
[4.10000000e+01 6.00000000e+01 5.83095189e-01 2.00000000e+00]
[2.43000000e+02 2.65000000e+02 6.00000000e-01 1.00000000e+01]
[2.44000000e+02 2.53000000e+02 6.00000000e-01 4.00000000e+00]
[2.62000000e+02 2.69000000e+02 6.32455532e-01 2.10000000e+01]
[2.49000000e+02 2.70000000e+02 7.00000000e-01 1.50000000e+01]
[2.57000000e+02 2.71000000e+02 7.07106781e-01 1.10000000e+01]
[2.54000000e+02 2.81000000e+02 7.28010989e-01 1.90000000e+01]
[1.09000000e+02 2.40000000e+02 7.28010989e-01 3.00000000e+00]
[2.66000000e+02 2.72000000e+02 7.81024968e-01 9.00000000e+00]
[2.59000000e+02 2.73000000e+02 8.00000000e-01 2.10000000e+01]
[2.78000000e+02 2.80000000e+02 8.24621125e-01 3.10000000e+01]
[2.74000000e+02 2.75000000e+02 9.21954446e-01 2.60000000e+01]
[2.76000000e+02 2.82000000e+02 1.00000000e+00 2.90000000e+01]
[2.86000000e+02 2.87000000e+02 1.14017543e+00 5.20000000e+01]
[2.84000000e+02 2.85000000e+02 1.21655251e+00 1.20000000e+01]
[2.79000000e+02 2.88000000e+02 1.38924440e+00 3.00000000e+01]
[2.77000000e+02 2.89000000e+02 1.39283883e+00 3.10000000e+01]
[2.67000000e+02 2.90000000e+02 1.41421356e+00 5.80000000e+01]
[2.83000000e+02 2.93000000e+02 1.64924225e+00 5.00000000e+01]
[2.92000000e+02 2.94000000e+02 2.25610283e+00 8.80000000e+01]
[2.95000000e+02 2.96000000e+02 2.70739727e+00 1.38000000e+02]
[2.91000000e+02 2.97000000e+02 3.71618084e+00 1.50000000e+02]]
군집id1 군집id2 거리 멤버수
0 0.0 17.0 0.000000 2.0
1 2.0 29.0 0.000000 2.0
2 5.0 16.0 0.000000 2.0
3 11.0 24.0 0.000000 2.0
4 7.0 26.0 0.000000 2.0
.. ... ... ... ...
144 267.0 290.0 1.414214 58.0
145 283.0 293.0 1.649242 50.0
146 292.0 294.0 2.256103 88.0
147 295.0 296.0 2.707397 138.0
148 291.0 297.0 3.716181 150.0
[149 rows x 4 columns]