Gaussian Mixture Model for Clustering (EM Approach)

The $k-$means algorithms has a disadvantage that it assigns each observation to a hard cluster. What happen, for example, if we have to group an articles $X$ about the vitamin of some kind of foods; which cluster will we assign it in, science or food? That is when we need to give our observation some probabilistic structure, for instance, $P(X \in \text{science cluster}) = P(X \in \text{food cluster}) = 50 \%.$

From that motivation, we assume that our data is generated from some $k-$normal distribution mixing together (each cluster is normal distributed). The $k-$th cluster having density function of a multivariate normal distribution with mean $\mu_k$ and covariance matrix $\Sigma_k$, $N(X|\mu_k, \Sigma_k)$. And let $Z$ be the random variable of the cluster that the data belongs to. From that, we have
$$f(X|Z = k) = N(X | \mu_k, \Sigma_k),$$
and
$$p(Z = k) = \pi_k \,\, \forall \, k, \,\,\,\,\,\,\,\ \sum_{k=1}^{K} \pi_k = 1.$$
Law of total probability tells us that
$$f(X) = \sum_{k=1}^{K} f(X| Z=k) P(Z=k) = \sum_{k=1}^{K} \pi_k N(X | \mu_k, \Sigma_k),$$
And what we care here is the probability that observation $n$ belongs to cluster $k$, which can compute by Bayes' rule
$$z_{nk}:= P(Z=k|X_n) = \dfrac{f(X_n|Z=k)P(Z=k)}{\sum_{j=1}^{K} f(X_n|Z=j)P(Z=j) }$$
$$ = \dfrac{\pi_k N(X_n|\mu_k,\Sigma_k)}{\sum_{j=1}^{K} \pi_j N(X_n|\mu_j,\Sigma_j) }.$$
Thus, we need to find $\{\pi_j, \mu_j, \Sigma_j\}_{j=1}^K$. And here we will use the maximum likelihood approach. The log likelihood function based on $N$ observation is
$$ L = \sum_{n=1}^{N} \log \sum_{k=1}^{K} \pi_k N(X_n|\mu_k,\Sigma_k).
$$
Take the gradient of $\mu_k$ we get
$$0 = -\sum_{n=1}^{N} \dfrac{\pi_k N(X_n|\mu_k,\Sigma_k)}{\sum_{j=1}^{K} \pi_j N(X_n|\mu_j,\Sigma_j) } \Sigma_k^{-1} (X_n - \mu_k).
$$
Write it in the notion of $z_{nk}$,
$$\mu_k = \dfrac{ \sum_{n=1}^{N} z_{nk} X_n}{\sum_{n=1}^{N} z_{nk}} .$$
Similarly, take the gradient of $\Sigma_k^{-1}$, doing similarly in the post,
$$\Sigma_k = \dfrac{\sum_{n=1}^{N} z_{nk}(X_n-\mu_k)^T (X_n-\mu_k)}{\sum_{n=1}^{N} z_{nk}}.$$
Finally, to maximize by $\pi_k$, note that $\sum_k \pi_k = 1$, we have to use the Lagrange multiplier method. Take the gradient of $\pi_k$ of
$$ L + \lambda(\sum_{k=1}^{K}\pi_k - 1),$$
we got
$$\sum_{n=1}^{N} \dfrac{N(X_n|\mu_k,\Sigma_k)}{\sum_{j=1}^{K} \pi_j N(X_n|\mu_j,\Sigma_j)} +\lambda = 0, \,\,\, \forall \, k.$$
Multiple both side of the $k-$th equation with $\pi_k$ then sum up, it has that $\lambda = -N$, then $\sum_{n=1}^{N} z_{nk} + \pi_k N = 0$. Hence,
$$\pi_k = \dfrac{\sum_{n=1}^{N} z_{nk}}{N}.$$
Hence we can calculate $z_{nk}$ if $\{\pi_j, \mu_j, \Sigma_j\}_{j=1}^K$ are known and vice versa. We can sum up the EM algorithm for Gaussian Mixture Model as follows
(1) Choose the initial value $\{\pi_j, \mu_j, \Sigma_j\}_{j=1}^K$.
(2) Step E (Expectation): Calculate $$z_{nk} = \dfrac{\pi_k N(X_n|\mu_k,\Sigma_k)}{\sum_{j=1}^{K} \pi_j N(X_n|\mu_j,\Sigma_j) }\,\,\,\, \forall \,\, n,k.$$
Step M (Maximization): Calculate the new means, covariance matrices and the coefficients $\pi_k$
$$\mu_k = \dfrac{ \sum_{n=1}^{N} z_{nk} X_n}{\sum_{n=1}^{N} z_{nk}} .$$
$$\Sigma_k = \dfrac{\sum_{n=1}^{N} z_{nk}(X_n-\mu_k)^T (X_n-\mu_k)}{\sum_{n=1}^{N} z_{nk}}.$$
$$\pi_k = \dfrac{\sum_{n=1}^{N} z_{nk}}{N}.$$
(3) Repeat Step E and Step M until it converge.

Blog xác suất - thống kê

Tìm kiếm Blog này

Phản ví dụ của một số sự kéo theo hội tụ trong xác suất

Gaussian Mixture Model for Clustering (EM Approach)

Nhận xét

Đăng nhận xét