Phản ví dụ của một số sự kéo theo hội tụ trong xác suất

Maximum Likelihood for Multivariate Normal Distribution

In this post, we will produce the Maximum Likelihood Estimator (MLE) for multivariate normal distribution.

Firstly, recall that the multivariate normal distribution $N(\mu, \sigma)$ where $\mu \in \mathbb{R}^d$ and $\sigma \in \text{Mat}(d, \mathbb{R})$ symmetric, positively definite, has the density function
$$f(x) = \dfrac{1}{(2\pi)^{d/2} |\Sigma|^{1/2}} \exp{-\dfrac{1}{2}((x-\mu)^T \Sigma^{-1} (x-\mu))}, \,\,\,\,\, x\in \mathbb{R}^{d}.$$

Let $x_1,x_2,\dots, x_n$ be generated from $N(\mu,\Sigma)$; the log likelihood function is
$$L = \dfrac{n}{2} \log |\Sigma^{-1}| -\dfrac{1}{2} \sum_{k=1}^{n} (x_k-\mu)^T \Sigma^{-1} (x_k-\mu)+const$$
Then taking the derivation with respect to $\mu$
$$D_{\mu} L =  -\dfrac{1}{2} \sum_{k=1}^{n} D_{\mu}((x_k-\mu)^T \Sigma^{-1} (x_k-\mu)).$$
Using the chain rule and the fact that $\Sigma^{-1}$ is also positive definite, we have $ D_{\mu}((x_k-\mu)^T \Sigma^{-1} (x_k-\mu)) = D_{\mu} \| \Sigma^{-1/2}(x_k-\mu) \|^2 = (\mu - x_k)^T \Sigma^{-1}$. Hence
$$D_{\mu} L =  \sum_{k=1}^{n} (x_k-\mu)^T \Sigma^{-1} .$$
Hence, $D_{\mu} L = 0 \Leftrightarrow \mu = \dfrac{1}{n} \sum_{k=1}^{n} x_k = \overline{x}$. Moreover, the hessian matrix of $\mu$ is $-\Sigma^{-1}$ which is negative definite. So $\hat{\mu}= \overline{x}$ is the MLE for $\mu$.

For $\Sigma$, taking derivation with respect to $\Sigma^{-1}$, note that for all invertible matrix $A$, $D_A | A |= A^{-1}|A|$ and
$$ \sum_{k=1}^{n} (x_k-\mu)^T \Sigma^{-1}(x_k-\mu) =   \sum_{k=1}^{n}Tr( (x_k-\mu)^T \Sigma^{-1}(x_k-\mu)) =   \sum_{k=1}^{n}Tr( \Sigma^{-1}(x_k-\mu)(x_k-\mu)^T)$$
and $\dfrac{\partial}{\partial A} Tr(AB) = B$ we have
$$D_{\Sigma} L = \dfrac{n}{2} \Sigma - \dfrac{1}{2} \sum_{k=1}^{n} (x_k-\mu)(x_k-\mu)^T=0.$$
So the MLE for $\Sigma$ is
$$\hat{\Sigma}= \dfrac{1}{n}  \sum_{k=1}^{n} (x_k-\mu)(x_k-\mu)^T = \dfrac{1}{n}  \sum_{k=1}^{n} (x_k-\overline{x})(x_k-\overline{x})^T.$$


Nhận xét