Phản ví dụ của một số sự kéo theo hội tụ trong xác suất

Gaussian Mixture Model for Clustering (EM Approach)

The k-means algorithms has a disadvantage that it assigns each observation to a hard cluster. What happen, for example, if we have to group an articles X about the vitamin of some kind of foods; which cluster will we assign it in, science or food? That is when we need to give our observation some probabilistic structure, for instance, P(X \in \text{science cluster}) = P(X \in \text{food cluster}) = 50 \%.

From that motivation, we assume that our data is generated from some k-normal distribution mixing together (each cluster is normal distributed). The k-th cluster having density function of a  multivariate normal distribution with mean \mu_k and covariance matrix \Sigma_k, N(X|\mu_k, \Sigma_k). And let Z be the random variable of the cluster that the data belongs to. From that, we have
f(X|Z = k) = N(X | \mu_k, \Sigma_k),
and
p(Z = k) = \pi_k \,\, \forall \, k, \,\,\,\,\,\,\,\ \sum_{k=1}^{K} \pi_k = 1.
Law of total probability tells us that
f(X) = \sum_{k=1}^{K} f(X| Z=k) P(Z=k) = \sum_{k=1}^{K} \pi_k  N(X | \mu_k, \Sigma_k),
And what we care here is the probability that observation n belongs to cluster k, which can compute by Bayes' rule
z_{nk}:= P(Z=k|X_n) = \dfrac{f(X_n|Z=k)P(Z=k)}{\sum_{j=1}^{K} f(X_n|Z=j)P(Z=j) }
= \dfrac{\pi_k N(X_n|\mu_k,\Sigma_k)}{\sum_{j=1}^{K} \pi_j N(X_n|\mu_j,\Sigma_j) }.
Thus, we need to find \{\pi_j, \mu_j, \Sigma_j\}_{j=1}^K. And here we will use the maximum likelihood approach. The log likelihood function based on N observation is
L = \sum_{n=1}^{N} \log \sum_{k=1}^{K} \pi_k N(X_n|\mu_k,\Sigma_k).
Take the gradient of \mu_k we get
0 = -\sum_{n=1}^{N}  \dfrac{\pi_k N(X_n|\mu_k,\Sigma_k)}{\sum_{j=1}^{K} \pi_j N(X_n|\mu_j,\Sigma_j) } \Sigma_k^{-1} (X_n - \mu_k).
Write it in the notion of z_{nk},
\mu_k = \dfrac{ \sum_{n=1}^{N} z_{nk} X_n}{\sum_{n=1}^{N} z_{nk}} .
Similarly, take the gradient of \Sigma_k^{-1}, doing similarly in the post,
\Sigma_k = \dfrac{\sum_{n=1}^{N} z_{nk}(X_n-\mu_k)^T (X_n-\mu_k)}{\sum_{n=1}^{N} z_{nk}}.
Finally, to maximize by \pi_k, note that \sum_k \pi_k = 1, we have to use the Lagrange multiplier method. Take the gradient of \pi_k of
L + \lambda(\sum_{k=1}^{K}\pi_k - 1),
we got
\sum_{n=1}^{N} \dfrac{N(X_n|\mu_k,\Sigma_k)}{\sum_{j=1}^{K} \pi_j N(X_n|\mu_j,\Sigma_j)} +\lambda = 0, \,\,\, \forall \, k.
Multiple both side of the k-th equation with \pi_k then sum up, it has that \lambda =  -N, then \sum_{n=1}^{N} z_{nk} + \pi_k N = 0. Hence,
\pi_k = \dfrac{\sum_{n=1}^{N} z_{nk}}{N}.
Hence we can calculate z_{nk} if \{\pi_j, \mu_j, \Sigma_j\}_{j=1}^K are known and vice versa. We can sum up the EM algorithm for Gaussian Mixture Model as follows
(1) Choose the initial value \{\pi_j, \mu_j, \Sigma_j\}_{j=1}^K.
(2) Step E (Expectation): Calculate z_{nk} = \dfrac{\pi_k N(X_n|\mu_k,\Sigma_k)}{\sum_{j=1}^{K} \pi_j N(X_n|\mu_j,\Sigma_j) }\,\,\,\, \forall \,\, n,k.
Step M (Maximization): Calculate the new means, covariance matrices and the coefficients \pi_k
\mu_k = \dfrac{ \sum_{n=1}^{N} z_{nk} X_n}{\sum_{n=1}^{N} z_{nk}} .
\Sigma_k = \dfrac{\sum_{n=1}^{N} z_{nk}(X_n-\mu_k)^T (X_n-\mu_k)}{\sum_{n=1}^{N} z_{nk}}.
\pi_k = \dfrac{\sum_{n=1}^{N} z_{nk}}{N}.
(3) Repeat Step E and Step M until it converge.

Nhận xét