- Nhận đường liên kết
- X
- Ứng dụng khác
- Nhận đường liên kết
- X
- Ứng dụng khác
Fix a k. We want to partition our observation to k clusters such that any two observations in one cluster are somehow "more familiar" than in different clusters. In the last post of k-nearest neighborhood, we have presented each observation as a vector and defined distance on them. Moreover, we have seen that small distance stands for the familiarity.
For each cluster, we introduce a vector \mu_k that is associated with it. Our aim is to minimize
S = \sum_{k=1}^{K} \sum_{n=1}^{N}Z_{nk} \|X_n - \mu_k\|^2,
where Z_{nk} = 1 if observation n belongs to cluster k and =0 otherwise.
Note that for each n, only one Z_{nk} equals one and the other equal 0 so that S contains no more than N non-zero terms. We will use the iterated method with two step: Minimize by Z_{nk} and minimize by \mu_k. The algorithm goes as below
k-means algorithm:
(1) Choose \{\mu_k \}_{k=1}^K randomly, often coincide with some observation.
(2) Step 1: Choose Z_{nk} = 1 for one k such that \|X_n- \mu_k\| = \min_{j=1}^K \| X_n-\mu_j\|.
Step 2: Choose \{\mu_k\}_{k=1}^{K} minimizing S: Since it is a quadratic form of \mu_k, we just take the gradient and let it be zero
\sum_{n=1,Z_{nk} = 1}^{N} 2(X_n - \mu_k) = 0,
which is equivalent to
\mu_k = \dfrac{\sum_{n=1}^{N} Z_{nk} X_n }{\sum_{n=1}^{N} Z_{nk}} \,\,\,(centroid).
(3) Repeat Step 1 and Step 2 until S converges.
For each cluster, we introduce a vector \mu_k that is associated with it. Our aim is to minimize
S = \sum_{k=1}^{K} \sum_{n=1}^{N}Z_{nk} \|X_n - \mu_k\|^2,
where Z_{nk} = 1 if observation n belongs to cluster k and =0 otherwise.
Note that for each n, only one Z_{nk} equals one and the other equal 0 so that S contains no more than N non-zero terms. We will use the iterated method with two step: Minimize by Z_{nk} and minimize by \mu_k. The algorithm goes as below
k-means algorithm:
(1) Choose \{\mu_k \}_{k=1}^K randomly, often coincide with some observation.
(2) Step 1: Choose Z_{nk} = 1 for one k such that \|X_n- \mu_k\| = \min_{j=1}^K \| X_n-\mu_j\|.
Step 2: Choose \{\mu_k\}_{k=1}^{K} minimizing S: Since it is a quadratic form of \mu_k, we just take the gradient and let it be zero
\sum_{n=1,Z_{nk} = 1}^{N} 2(X_n - \mu_k) = 0,
which is equivalent to
\mu_k = \dfrac{\sum_{n=1}^{N} Z_{nk} X_n }{\sum_{n=1}^{N} Z_{nk}} \,\,\,(centroid).
(3) Repeat Step 1 and Step 2 until S converges.
Nhận xét
Đăng nhận xét