- Nhận đường liên kết
- X
- Ứng dụng khác
- Nhận đường liên kết
- X
- Ứng dụng khác
In this post, we will produce the Maximum Likelihood Estimator (MLE) for multivariate normal distribution.
Firstly, recall that the multivariate normal distribution $N(\mu, \sigma)$ where $\mu \in \mathbb{R}^d$ and $\sigma \in \text{Mat}(d, \mathbb{R})$ symmetric, positively definite, has the density function
$$f(x) = \dfrac{1}{(2\pi)^{d/2} |\Sigma|^{1/2}} \exp{-\dfrac{1}{2}((x-\mu)^T \Sigma^{-1} (x-\mu))}, \,\,\,\,\, x\in \mathbb{R}^{d}.$$
Let $x_1,x_2,\dots, x_n$ be generated from $N(\mu,\Sigma)$; the log likelihood function is
$$L = \dfrac{n}{2} \log |\Sigma^{-1}| -\dfrac{1}{2} \sum_{k=1}^{n} (x_k-\mu)^T \Sigma^{-1} (x_k-\mu)+const$$
Then taking the derivation with respect to $\mu$
$$D_{\mu} L = -\dfrac{1}{2} \sum_{k=1}^{n} D_{\mu}((x_k-\mu)^T \Sigma^{-1} (x_k-\mu)).$$
Using the chain rule and the fact that $\Sigma^{-1}$ is also positive definite, we have $ D_{\mu}((x_k-\mu)^T \Sigma^{-1} (x_k-\mu)) = D_{\mu} \| \Sigma^{-1/2}(x_k-\mu) \|^2 = (\mu - x_k)^T \Sigma^{-1}$. Hence
$$D_{\mu} L = \sum_{k=1}^{n} (x_k-\mu)^T \Sigma^{-1} .$$
Firstly, recall that the multivariate normal distribution $N(\mu, \sigma)$ where $\mu \in \mathbb{R}^d$ and $\sigma \in \text{Mat}(d, \mathbb{R})$ symmetric, positively definite, has the density function
$$f(x) = \dfrac{1}{(2\pi)^{d/2} |\Sigma|^{1/2}} \exp{-\dfrac{1}{2}((x-\mu)^T \Sigma^{-1} (x-\mu))}, \,\,\,\,\, x\in \mathbb{R}^{d}.$$
Let $x_1,x_2,\dots, x_n$ be generated from $N(\mu,\Sigma)$; the log likelihood function is
$$L = \dfrac{n}{2} \log |\Sigma^{-1}| -\dfrac{1}{2} \sum_{k=1}^{n} (x_k-\mu)^T \Sigma^{-1} (x_k-\mu)+const$$
Then taking the derivation with respect to $\mu$
$$D_{\mu} L = -\dfrac{1}{2} \sum_{k=1}^{n} D_{\mu}((x_k-\mu)^T \Sigma^{-1} (x_k-\mu)).$$
Using the chain rule and the fact that $\Sigma^{-1}$ is also positive definite, we have $ D_{\mu}((x_k-\mu)^T \Sigma^{-1} (x_k-\mu)) = D_{\mu} \| \Sigma^{-1/2}(x_k-\mu) \|^2 = (\mu - x_k)^T \Sigma^{-1}$. Hence
$$D_{\mu} L = \sum_{k=1}^{n} (x_k-\mu)^T \Sigma^{-1} .$$
Hence, $D_{\mu} L = 0 \Leftrightarrow \mu = \dfrac{1}{n} \sum_{k=1}^{n} x_k = \overline{x}$. Moreover, the hessian matrix of $\mu$ is $-\Sigma^{-1}$ which is negative definite. So $\hat{\mu}= \overline{x}$ is the MLE for $\mu$.
For $\Sigma$, taking derivation with respect to $\Sigma^{-1}$, note that for all invertible matrix $A$, $D_A | A |= A^{-1}|A|$ and
$$ \sum_{k=1}^{n} (x_k-\mu)^T \Sigma^{-1}(x_k-\mu) = \sum_{k=1}^{n}Tr( (x_k-\mu)^T \Sigma^{-1}(x_k-\mu)) = \sum_{k=1}^{n}Tr( \Sigma^{-1}(x_k-\mu)(x_k-\mu)^T)$$
and $\dfrac{\partial}{\partial A} Tr(AB) = B$ we have
$$D_{\Sigma} L = \dfrac{n}{2} \Sigma - \dfrac{1}{2} \sum_{k=1}^{n} (x_k-\mu)(x_k-\mu)^T=0.$$
So the MLE for $\Sigma$ is
$$\hat{\Sigma}= \dfrac{1}{n} \sum_{k=1}^{n} (x_k-\mu)(x_k-\mu)^T = \dfrac{1}{n} \sum_{k=1}^{n} (x_k-\overline{x})(x_k-\overline{x})^T.$$
$$ \sum_{k=1}^{n} (x_k-\mu)^T \Sigma^{-1}(x_k-\mu) = \sum_{k=1}^{n}Tr( (x_k-\mu)^T \Sigma^{-1}(x_k-\mu)) = \sum_{k=1}^{n}Tr( \Sigma^{-1}(x_k-\mu)(x_k-\mu)^T)$$
and $\dfrac{\partial}{\partial A} Tr(AB) = B$ we have
$$D_{\Sigma} L = \dfrac{n}{2} \Sigma - \dfrac{1}{2} \sum_{k=1}^{n} (x_k-\mu)(x_k-\mu)^T=0.$$
So the MLE for $\Sigma$ is
$$\hat{\Sigma}= \dfrac{1}{n} \sum_{k=1}^{n} (x_k-\mu)(x_k-\mu)^T = \dfrac{1}{n} \sum_{k=1}^{n} (x_k-\overline{x})(x_k-\overline{x})^T.$$
- Nhận đường liên kết
- X
- Ứng dụng khác
Nhận xét
Đăng nhận xét