Phản ví dụ của một số sự kéo theo hội tụ trong xác suất

Gaussian Mixture Model for Clustering (EM Approach)

The $k-$means algorithms has a disadvantage that it assigns each observation to a hard cluster. What happen, for example, if we have to group an articles $X$ about the vitamin of some kind of foods; which cluster will we assign it in, science or food? That is when we need to give our observation some probabilistic structure, for instance, $P(X \in \text{science cluster}) = P(X \in \text{food cluster}) = 50 \%.$

From that motivation, we assume that our data is generated from some $k-$normal distribution mixing together (each cluster is normal distributed). The $k-$th cluster having density function of a  multivariate normal distribution with mean $\mu_k$ and covariance matrix $\Sigma_k$, $N(X|\mu_k, \Sigma_k)$. And let $Z$ be the random variable of the cluster that the data belongs to. From that, we have
$$f(X|Z = k) = N(X | \mu_k, \Sigma_k),$$
and
$$p(Z = k) = \pi_k \,\, \forall \, k, \,\,\,\,\,\,\,\ \sum_{k=1}^{K} \pi_k = 1.$$
Law of total probability tells us that
$$f(X) = \sum_{k=1}^{K} f(X| Z=k) P(Z=k) = \sum_{k=1}^{K} \pi_k  N(X | \mu_k, \Sigma_k),$$
And what we care here is the probability that observation $n$ belongs to cluster $k$, which can compute by Bayes' rule
$$z_{nk}:= P(Z=k|X_n) = \dfrac{f(X_n|Z=k)P(Z=k)}{\sum_{j=1}^{K} f(X_n|Z=j)P(Z=j) }$$
$$ = \dfrac{\pi_k N(X_n|\mu_k,\Sigma_k)}{\sum_{j=1}^{K} \pi_j N(X_n|\mu_j,\Sigma_j) }.$$
Thus, we need to find $\{\pi_j, \mu_j, \Sigma_j\}_{j=1}^K$. And here we will use the maximum likelihood approach. The log likelihood function based on $N$ observation is
$$ L = \sum_{n=1}^{N} \log \sum_{k=1}^{K} \pi_k N(X_n|\mu_k,\Sigma_k).
$$
Take the gradient of $\mu_k$ we get
$$0 = -\sum_{n=1}^{N}  \dfrac{\pi_k N(X_n|\mu_k,\Sigma_k)}{\sum_{j=1}^{K} \pi_j N(X_n|\mu_j,\Sigma_j) } \Sigma_k^{-1} (X_n - \mu_k).
$$
Write it in the notion of $z_{nk}$,
$$\mu_k = \dfrac{ \sum_{n=1}^{N} z_{nk} X_n}{\sum_{n=1}^{N} z_{nk}} .$$
Similarly, take the gradient of $\Sigma_k^{-1}$, doing similarly in the post,
$$\Sigma_k = \dfrac{\sum_{n=1}^{N} z_{nk}(X_n-\mu_k)^T (X_n-\mu_k)}{\sum_{n=1}^{N} z_{nk}}.$$
Finally, to maximize by $\pi_k$, note that $\sum_k \pi_k = 1$, we have to use the Lagrange multiplier method. Take the gradient of $\pi_k$ of
$$ L + \lambda(\sum_{k=1}^{K}\pi_k - 1),$$
we got
$$\sum_{n=1}^{N} \dfrac{N(X_n|\mu_k,\Sigma_k)}{\sum_{j=1}^{K} \pi_j N(X_n|\mu_j,\Sigma_j)} +\lambda = 0, \,\,\, \forall \, k.$$
Multiple both side of the $k-$th equation with $\pi_k$ then sum up, it has that $\lambda =  -N$, then $\sum_{n=1}^{N} z_{nk} + \pi_k N = 0$. Hence,
$$\pi_k = \dfrac{\sum_{n=1}^{N} z_{nk}}{N}.$$
Hence we can calculate $z_{nk}$ if $\{\pi_j, \mu_j, \Sigma_j\}_{j=1}^K$ are known and vice versa. We can sum up the EM algorithm for Gaussian Mixture Model as follows
(1) Choose the initial value $\{\pi_j, \mu_j, \Sigma_j\}_{j=1}^K$.
(2) Step E (Expectation): Calculate $$z_{nk} = \dfrac{\pi_k N(X_n|\mu_k,\Sigma_k)}{\sum_{j=1}^{K} \pi_j N(X_n|\mu_j,\Sigma_j) }\,\,\,\, \forall \,\, n,k.$$
Step M (Maximization): Calculate the new means, covariance matrices and the coefficients $\pi_k$
$$\mu_k = \dfrac{ \sum_{n=1}^{N} z_{nk} X_n}{\sum_{n=1}^{N} z_{nk}} .$$
$$\Sigma_k = \dfrac{\sum_{n=1}^{N} z_{nk}(X_n-\mu_k)^T (X_n-\mu_k)}{\sum_{n=1}^{N} z_{nk}}.$$
$$\pi_k = \dfrac{\sum_{n=1}^{N} z_{nk}}{N}.$$
(3) Repeat Step E and Step M until it converge.

Nhận xét