Phản ví dụ của một số sự kéo theo hội tụ trong xác suất

k-means algorithm

Fix a $k$. We want to partition our observation to $k$ clusters such that any two observations in one cluster are somehow "more familiar" than in different clusters. In the last post of $k-$nearest neighborhood, we have presented each observation as a vector and defined distance on them. Moreover, we have seen that small distance stands for the familiarity.

For each cluster, we introduce a vector $\mu_k$ that is associated with it. Our aim is to minimize
$$S = \sum_{k=1}^{K} \sum_{n=1}^{N}Z_{nk} \|X_n - \mu_k\|^2,$$
where $Z_{nk} = 1$ if observation $n$ belongs to cluster $k$ and $=0$ otherwise.
Note that for each $n$, only one $Z_{nk}$ equals one and the other equal 0 so that $S$ contains no more than $N$ non-zero terms. We will use the iterated method with two step: Minimize by $Z_{nk}$ and minimize by $\mu_k$. The algorithm goes as below

k-means algorithm:
(1) Choose $\{\mu_k \}_{k=1}^K$ randomly, often coincide with some observation.
(2) Step 1: Choose $Z_{nk} = 1$ for one $k$ such that $\|X_n- \mu_k\| = \min_{j=1}^K \| X_n-\mu_j\|$.
Step 2: Choose $\{\mu_k\}_{k=1}^{K}$ minimizing $S$: Since it is a quadratic form of $\mu_k$, we just take the gradient and let it be zero
$$\sum_{n=1,Z_{nk} = 1}^{N} 2(X_n - \mu_k) = 0,$$
which is equivalent to
$$ \mu_k = \dfrac{\sum_{n=1}^{N} Z_{nk} X_n }{\sum_{n=1}^{N} Z_{nk}} \,\,\,(centroid).$$
(3) Repeat Step 1 and Step 2 until $S$ converges.


Nhận xét