-
Python 데이터분석 기초 74 - Clustering(군집화) - 계층 군집분석 - data(iris)Python 데이터 분석 2022. 11. 25. 16:09
계층적 군집분석
개별 대상 간의 거리에 의하여 가장 가까이 있는 대상들로 부터 시작하여 결합해 감으로써 나무모양의 계층적 구조를 형성해 나가는 방법으로 이 과정에서 군집의 수가 감소한다. 계층적 군집분석은 군집이 형성되는 과정을 정확하게 파악할 수 있다는 장점이 있으나 자료의 크기가 크면 분석하기 어렵다는 단점이 있다.
방법 : 단일결합법, 완전결합법, 평균결합법, 중심결합기준법, Ward법
# iris dataset으로 군집화 import pandas as pd import matplotlib.pyplot as plt plt.rc('font', family = 'malgun gothic') from sklearn.datasets import load_iris from scipy.spatial.distance import pdist, squareform from scipy.cluster.hierarchy import linkage, dendrogram iris = load_iris() iris_df = pd.DataFrame(iris.data, columns = iris.feature_names) print(iris_df.head(3)) print() # dist_vec = pdist(iris_df.loc[0:4, ['sepal length (cm)', 'sepal width (cm)']], metric='euclidean') dist_vec = pdist(iris_df.loc[:, ['sepal length (cm)', 'sepal width (cm)']], metric='euclidean') print('dist_vec :', dist_vec) print() row_dist = pd.DataFrame(squareform(dist_vec)) print(row_dist) # squareform을 활용하여 데이터 프레임으로 넣어주면 보기 편하다. row_clusters = linkage(dist_vec, method='complete') # linkage 안에는 데이터 간의 거리 데이터를 넣는다. print('row_clusters :', row_clusters) df = pd.DataFrame(row_clusters, columns=['군집id1', '군집id2', '거리', '멤버수']) print(df) # dendrogram으로 row_clusters를 시각화 low_dend = dendrogram(row_clusters) plt.ylabel('유클리드 거리') plt.show() <console> sepal length (cm) sepal width (cm) petal length (cm) petal width (cm) 0 5.1 3.5 1.4 0.2 1 4.9 3.0 1.4 0.2 2 4.7 3.2 1.3 0.2 dist_vec : [0.53851648 0.5 0.64031242 ... 0.5 0.6 0.5 ] 0 1 2 ... 147 148 149 0 0.000000 0.538516 0.500000 ... 1.486607 1.104536 0.943398 1 0.538516 0.000000 0.282843 ... 1.600000 1.360147 1.000000 2 0.500000 0.282843 0.000000 ... 1.811077 1.513275 1.216553 3 0.640312 0.316228 0.141421 ... 1.902630 1.627882 1.303840 4 0.141421 0.608276 0.500000 ... 1.615549 1.216553 1.081665 .. ... ... ... ... ... ... ... 145 1.676305 1.800000 2.009975 ... 0.200000 0.640312 0.800000 146 1.562050 1.486607 1.746425 ... 0.538516 0.905539 0.640312 147 1.486607 1.600000 1.811077 ... 0.000000 0.500000 0.600000 148 1.104536 1.360147 1.513275 ... 0.500000 0.000000 0.500000 149 0.943398 1.000000 1.216553 ... 0.600000 0.500000 0.000000 [150 rows x 150 columns] row_clusters : [[0.00000000e+00 1.70000000e+01 0.00000000e+00 2.00000000e+00] [2.00000000e+00 2.90000000e+01 0.00000000e+00 2.00000000e+00] [5.00000000e+00 1.60000000e+01 0.00000000e+00 2.00000000e+00] [1.10000000e+01 2.40000000e+01 0.00000000e+00 2.00000000e+00] [7.00000000e+00 2.60000000e+01 0.00000000e+00 2.00000000e+00] [9.00000000e+00 3.40000000e+01 0.00000000e+00 2.00000000e+00] [1.20000000e+01 4.50000000e+01 0.00000000e+00 2.00000000e+00] [4.00000000e+01 4.30000000e+01 0.00000000e+00 2.00000000e+00] [2.00000000e+01 3.10000000e+01 0.00000000e+00 2.00000000e+00] [1.90000000e+01 4.40000000e+01 0.00000000e+00 2.00000000e+00] [4.60000000e+01 1.59000000e+02 0.00000000e+00 3.00000000e+00] [6.60000000e+01 8.80000000e+01 0.00000000e+00 2.00000000e+00] [5.50000000e+01 9.90000000e+01 0.00000000e+00 2.00000000e+00] [6.70000000e+01 8.20000000e+01 0.00000000e+00 2.00000000e+00] [1.01000000e+02 1.63000000e+02 0.00000000e+00 3.00000000e+00] [1.42000000e+02 1.64000000e+02 0.00000000e+00 4.00000000e+00] [6.10000000e+01 1.49000000e+02 0.00000000e+00 2.00000000e+00] [8.00000000e+01 8.10000000e+01 0.00000000e+00 2.00000000e+00] [5.60000000e+01 1.00000000e+02 0.00000000e+00 2.00000000e+00] [9.10000000e+01 1.27000000e+02 0.00000000e+00 2.00000000e+00] [7.10000000e+01 7.30000000e+01 0.00000000e+00 2.00000000e+00] [5.10000000e+01 1.15000000e+02 0.00000000e+00 2.00000000e+00] [1.04000000e+02 1.16000000e+02 0.00000000e+00 2.00000000e+00] [1.47000000e+02 1.72000000e+02 0.00000000e+00 3.00000000e+00] [6.50000000e+01 8.60000000e+01 0.00000000e+00 2.00000000e+00] [1.40000000e+02 1.74000000e+02 0.00000000e+00 3.00000000e+00] [7.70000000e+01 1.45000000e+02 0.00000000e+00 2.00000000e+00] [1.24000000e+02 1.44000000e+02 0.00000000e+00 2.00000000e+00] [1.28000000e+02 1.32000000e+02 0.00000000e+00 2.00000000e+00] [5.20000000e+01 1.39000000e+02 0.00000000e+00 2.00000000e+00] [1.41000000e+02 1.79000000e+02 0.00000000e+00 3.00000000e+00] [7.20000000e+01 1.46000000e+02 0.00000000e+00 2.00000000e+00] [6.20000000e+01 1.19000000e+02 0.00000000e+00 2.00000000e+00] [1.00000000e+00 2.50000000e+01 1.00000000e-01 2.00000000e+00] [4.00000000e+00 3.70000000e+01 1.00000000e-01 2.00000000e+00] [1.50000000e+02 1.57000000e+02 1.00000000e-01 4.00000000e+00] [2.10000000e+01 1.60000000e+02 1.00000000e-01 4.00000000e+00] [3.90000000e+01 1.54000000e+02 1.00000000e-01 3.00000000e+00] [2.30000000e+01 4.90000000e+01 1.00000000e-01 2.00000000e+00] [9.40000000e+01 1.21000000e+02 1.00000000e-01 2.00000000e+00] [1.14000000e+02 1.62000000e+02 1.00000000e-01 3.00000000e+00] [7.90000000e+01 9.20000000e+01 1.00000000e-01 2.00000000e+00] [1.38000000e+02 1.66000000e+02 1.00000000e-01 3.00000000e+00] [1.36000000e+02 1.48000000e+02 1.00000000e-01 2.00000000e+00] [6.90000000e+01 8.90000000e+01 1.00000000e-01 2.00000000e+00] [6.30000000e+01 7.80000000e+01 1.00000000e-01 2.00000000e+00] [1.26000000e+02 1.33000000e+02 1.00000000e-01 2.00000000e+00] [9.70000000e+01 1.03000000e+02 1.00000000e-01 2.00000000e+00] [1.10000000e+02 1.71000000e+02 1.00000000e-01 3.00000000e+00] [7.50000000e+01 1.73000000e+02 1.00000000e-01 4.00000000e+00] [1.12000000e+02 1.76000000e+02 1.00000000e-01 3.00000000e+00] [5.00000000e+01 1.20000000e+02 1.00000000e-01 2.00000000e+00] [5.40000000e+01 1.78000000e+02 1.00000000e-01 3.00000000e+00] [3.00000000e+00 4.70000000e+01 1.00000000e-01 2.00000000e+00] [8.00000000e+00 3.80000000e+01 1.00000000e-01 2.00000000e+00] [3.00000000e+01 1.56000000e+02 1.00000000e-01 3.00000000e+00] [2.70000000e+01 2.80000000e+01 1.00000000e-01 2.00000000e+00] [6.40000000e+01 1.61000000e+02 1.00000000e-01 3.00000000e+00] [9.50000000e+01 9.60000000e+01 1.00000000e-01 2.00000000e+00] [5.70000000e+01 1.06000000e+02 1.00000000e-01 2.00000000e+00] [5.30000000e+01 1.67000000e+02 1.00000000e-01 3.00000000e+00] [1.00000000e+01 4.80000000e+01 1.00000000e-01 2.00000000e+00] [1.11000000e+02 1.23000000e+02 1.00000000e-01 2.00000000e+00] [1.02000000e+02 1.29000000e+02 1.00000000e-01 2.00000000e+00] [1.05000000e+02 1.35000000e+02 1.00000000e-01 2.00000000e+00] [3.50000000e+01 1.88000000e+02 1.41421356e-01 3.00000000e+00] [1.65000000e+02 1.90000000e+02 1.41421356e-01 7.00000000e+00] [8.30000000e+01 1.70000000e+02 1.41421356e-01 3.00000000e+00] [1.43000000e+02 1.77000000e+02 1.41421356e-01 3.00000000e+00] [6.80000000e+01 8.70000000e+01 1.41421356e-01 2.00000000e+00] [3.60000000e+01 1.58000000e+02 1.41421356e-01 3.00000000e+00] [1.85000000e+02 1.87000000e+02 1.41421356e-01 7.00000000e+00] [1.55000000e+02 1.83000000e+02 1.41421356e-01 4.00000000e+00] [9.00000000e+01 1.94000000e+02 1.41421356e-01 3.00000000e+00] [1.13000000e+02 1.91000000e+02 1.41421356e-01 3.00000000e+00] [1.68000000e+02 1.93000000e+02 1.41421356e-01 4.00000000e+00] [1.69000000e+02 1.95000000e+02 1.41421356e-01 4.00000000e+00] [1.37000000e+02 1.98000000e+02 1.41421356e-01 4.00000000e+00] [5.80000000e+01 1.99000000e+02 1.41421356e-01 5.00000000e+00] [1.75000000e+02 2.00000000e+02 1.41421356e-01 6.00000000e+00] [7.40000000e+01 2.02000000e+02 1.41421356e-01 4.00000000e+00] [1.80000000e+02 2.01000000e+02 1.41421356e-01 5.00000000e+00] [1.96000000e+02 1.97000000e+02 1.41421356e-01 4.00000000e+00] [1.30000000e+01 2.04000000e+02 1.41421356e-01 3.00000000e+00] [1.51000000e+02 2.03000000e+02 1.41421356e-01 4.00000000e+00] [2.07000000e+02 2.08000000e+02 1.41421356e-01 5.00000000e+00] [1.07000000e+02 1.30000000e+02 1.41421356e-01 2.00000000e+00] [1.34000000e+02 2.17000000e+02 2.00000000e-01 4.00000000e+00] [1.18000000e+02 1.22000000e+02 2.00000000e-01 2.00000000e+00] [6.00000000e+00 2.20000000e+01 2.00000000e-01 2.00000000e+00] [1.17000000e+02 1.31000000e+02 2.00000000e-01 2.00000000e+00] [9.80000000e+01 2.09000000e+02 2.23606798e-01 3.00000000e+00] [1.92000000e+02 2.26000000e+02 2.23606798e-01 7.00000000e+00] [7.00000000e+01 8.50000000e+01 2.23606798e-01 2.00000000e+00] [1.40000000e+01 1.80000000e+01 2.23606798e-01 2.00000000e+00] [1.52000000e+02 2.11000000e+02 2.23606798e-01 4.00000000e+00] [1.89000000e+02 2.16000000e+02 2.23606798e-01 9.00000000e+00] [9.30000000e+01 2.41000000e+02 2.23606798e-01 4.00000000e+00] [2.12000000e+02 2.30000000e+02 2.23606798e-01 6.00000000e+00] [2.05000000e+02 2.22000000e+02 2.23606798e-01 7.00000000e+00] [2.06000000e+02 2.21000000e+02 2.23606798e-01 9.00000000e+00] [1.25000000e+02 2.13000000e+02 2.23606798e-01 3.00000000e+00] [1.84000000e+02 1.86000000e+02 2.82842712e-01 6.00000000e+00] [1.50000000e+01 3.30000000e+01 2.82842712e-01 2.00000000e+00] [1.53000000e+02 2.39000000e+02 2.82842712e-01 4.00000000e+00] [4.20000000e+01 2.34000000e+02 3.00000000e-01 5.00000000e+00] [2.28000000e+02 2.29000000e+02 3.00000000e-01 1.10000000e+01] [2.10000000e+02 2.23000000e+02 3.00000000e-01 6.00000000e+00] [1.81000000e+02 2.19000000e+02 3.16227766e-01 4.00000000e+00] [7.60000000e+01 1.08000000e+02 3.16227766e-01 2.00000000e+00] [2.18000000e+02 2.31000000e+02 3.16227766e-01 8.00000000e+00] [8.40000000e+01 2.35000000e+02 3.16227766e-01 6.00000000e+00] [2.32000000e+02 2.48000000e+02 3.16227766e-01 1.00000000e+01] [2.24000000e+02 2.46000000e+02 3.16227766e-01 1.20000000e+01] [2.15000000e+02 2.50000000e+02 3.60555128e-01 1.20000000e+01] [2.25000000e+02 2.27000000e+02 3.60555128e-01 8.00000000e+00] [2.14000000e+02 2.38000000e+02 4.12310563e-01 4.00000000e+00] [1.82000000e+02 2.58000000e+02 4.24264069e-01 6.00000000e+00] [3.20000000e+01 2.45000000e+02 4.47213595e-01 5.00000000e+00] [2.37000000e+02 2.42000000e+02 4.47213595e-01 1.10000000e+01] [2.33000000e+02 2.55000000e+02 4.47213595e-01 8.00000000e+00] [5.90000000e+01 2.47000000e+02 4.47213595e-01 5.00000000e+00] [2.36000000e+02 2.51000000e+02 4.47213595e-01 5.00000000e+00] [2.56000000e+02 2.60000000e+02 5.38516481e-01 1.90000000e+01] [2.20000000e+02 2.64000000e+02 5.83095189e-01 1.50000000e+01] [2.52000000e+02 2.68000000e+02 5.83095189e-01 1.10000000e+01] [2.61000000e+02 2.63000000e+02 5.83095189e-01 1.80000000e+01] [4.10000000e+01 6.00000000e+01 5.83095189e-01 2.00000000e+00] [2.43000000e+02 2.65000000e+02 6.00000000e-01 1.00000000e+01] [2.44000000e+02 2.53000000e+02 6.00000000e-01 4.00000000e+00] [2.62000000e+02 2.69000000e+02 6.32455532e-01 2.10000000e+01] [2.49000000e+02 2.70000000e+02 7.00000000e-01 1.50000000e+01] [2.57000000e+02 2.71000000e+02 7.07106781e-01 1.10000000e+01] [2.54000000e+02 2.81000000e+02 7.28010989e-01 1.90000000e+01] [1.09000000e+02 2.40000000e+02 7.28010989e-01 3.00000000e+00] [2.66000000e+02 2.72000000e+02 7.81024968e-01 9.00000000e+00] [2.59000000e+02 2.73000000e+02 8.00000000e-01 2.10000000e+01] [2.78000000e+02 2.80000000e+02 8.24621125e-01 3.10000000e+01] [2.74000000e+02 2.75000000e+02 9.21954446e-01 2.60000000e+01] [2.76000000e+02 2.82000000e+02 1.00000000e+00 2.90000000e+01] [2.86000000e+02 2.87000000e+02 1.14017543e+00 5.20000000e+01] [2.84000000e+02 2.85000000e+02 1.21655251e+00 1.20000000e+01] [2.79000000e+02 2.88000000e+02 1.38924440e+00 3.00000000e+01] [2.77000000e+02 2.89000000e+02 1.39283883e+00 3.10000000e+01] [2.67000000e+02 2.90000000e+02 1.41421356e+00 5.80000000e+01] [2.83000000e+02 2.93000000e+02 1.64924225e+00 5.00000000e+01] [2.92000000e+02 2.94000000e+02 2.25610283e+00 8.80000000e+01] [2.95000000e+02 2.96000000e+02 2.70739727e+00 1.38000000e+02] [2.91000000e+02 2.97000000e+02 3.71618084e+00 1.50000000e+02]] 군집id1 군집id2 거리 멤버수 0 0.0 17.0 0.000000 2.0 1 2.0 29.0 0.000000 2.0 2 5.0 16.0 0.000000 2.0 3 11.0 24.0 0.000000 2.0 4 7.0 26.0 0.000000 2.0 .. ... ... ... ... 144 267.0 290.0 1.414214 58.0 145 283.0 293.0 1.649242 50.0 146 292.0 294.0 2.256103 88.0 147 295.0 296.0 2.707397 138.0 148 291.0 297.0 3.716181 150.0 [149 rows x 4 columns]
'Python 데이터 분석' 카테고리의 다른 글
iris dataset으로 지도학습(KNN) / 비지도학습(K-Means) 비교 (1) 2022.11.25 Python 데이터분석 기초 75 - K-means Clustering(비계층적 군집분석) (1) 2022.11.25 Python 데이터분석 기초 73 - Clustering(군집화) - 비계층 군집분석 (0) 2022.11.25 MLP(multi-layer perceptron) - 다층 신경망 예제, breast_cancer dataset, 표준화 (0) 2022.11.25 Python 데이터분석 기초 72 - MLP(multi-layer perceptron) - 다층 신경망 (0) 2022.11.25