- Nhận đường liên kết
- X
- Ứng dụng khác
- Nhận đường liên kết
- X
- Ứng dụng khác
In this post, we will produce the Maximum Likelihood Estimator (MLE) for multivariate normal distribution.
Firstly, recall that the multivariate normal distribution N(\mu, \sigma) where \mu \in \mathbb{R}^d and \sigma \in \text{Mat}(d, \mathbb{R}) symmetric, positively definite, has the density function
f(x) = \dfrac{1}{(2\pi)^{d/2} |\Sigma|^{1/2}} \exp{-\dfrac{1}{2}((x-\mu)^T \Sigma^{-1} (x-\mu))}, \,\,\,\,\, x\in \mathbb{R}^{d}.
Let x_1,x_2,\dots, x_n be generated from N(\mu,\Sigma); the log likelihood function is
L = \dfrac{n}{2} \log |\Sigma^{-1}| -\dfrac{1}{2} \sum_{k=1}^{n} (x_k-\mu)^T \Sigma^{-1} (x_k-\mu)+const
Then taking the derivation with respect to \mu
D_{\mu} L = -\dfrac{1}{2} \sum_{k=1}^{n} D_{\mu}((x_k-\mu)^T \Sigma^{-1} (x_k-\mu)).
Using the chain rule and the fact that \Sigma^{-1} is also positive definite, we have D_{\mu}((x_k-\mu)^T \Sigma^{-1} (x_k-\mu)) = D_{\mu} \| \Sigma^{-1/2}(x_k-\mu) \|^2 = (\mu - x_k)^T \Sigma^{-1}. Hence
D_{\mu} L = \sum_{k=1}^{n} (x_k-\mu)^T \Sigma^{-1} .
Firstly, recall that the multivariate normal distribution N(\mu, \sigma) where \mu \in \mathbb{R}^d and \sigma \in \text{Mat}(d, \mathbb{R}) symmetric, positively definite, has the density function
f(x) = \dfrac{1}{(2\pi)^{d/2} |\Sigma|^{1/2}} \exp{-\dfrac{1}{2}((x-\mu)^T \Sigma^{-1} (x-\mu))}, \,\,\,\,\, x\in \mathbb{R}^{d}.
Let x_1,x_2,\dots, x_n be generated from N(\mu,\Sigma); the log likelihood function is
L = \dfrac{n}{2} \log |\Sigma^{-1}| -\dfrac{1}{2} \sum_{k=1}^{n} (x_k-\mu)^T \Sigma^{-1} (x_k-\mu)+const
Then taking the derivation with respect to \mu
D_{\mu} L = -\dfrac{1}{2} \sum_{k=1}^{n} D_{\mu}((x_k-\mu)^T \Sigma^{-1} (x_k-\mu)).
Using the chain rule and the fact that \Sigma^{-1} is also positive definite, we have D_{\mu}((x_k-\mu)^T \Sigma^{-1} (x_k-\mu)) = D_{\mu} \| \Sigma^{-1/2}(x_k-\mu) \|^2 = (\mu - x_k)^T \Sigma^{-1}. Hence
D_{\mu} L = \sum_{k=1}^{n} (x_k-\mu)^T \Sigma^{-1} .
Hence, D_{\mu} L = 0 \Leftrightarrow \mu = \dfrac{1}{n} \sum_{k=1}^{n} x_k = \overline{x}. Moreover, the hessian matrix of \mu is -\Sigma^{-1} which is negative definite. So \hat{\mu}= \overline{x} is the MLE for \mu.
For \Sigma, taking derivation with respect to \Sigma^{-1}, note that for all invertible matrix A, D_A | A |= A^{-1}|A| and
\sum_{k=1}^{n} (x_k-\mu)^T \Sigma^{-1}(x_k-\mu) = \sum_{k=1}^{n}Tr( (x_k-\mu)^T \Sigma^{-1}(x_k-\mu)) = \sum_{k=1}^{n}Tr( \Sigma^{-1}(x_k-\mu)(x_k-\mu)^T)
and \dfrac{\partial}{\partial A} Tr(AB) = B we have
D_{\Sigma} L = \dfrac{n}{2} \Sigma - \dfrac{1}{2} \sum_{k=1}^{n} (x_k-\mu)(x_k-\mu)^T=0.
So the MLE for \Sigma is
\hat{\Sigma}= \dfrac{1}{n} \sum_{k=1}^{n} (x_k-\mu)(x_k-\mu)^T = \dfrac{1}{n} \sum_{k=1}^{n} (x_k-\overline{x})(x_k-\overline{x})^T.
\sum_{k=1}^{n} (x_k-\mu)^T \Sigma^{-1}(x_k-\mu) = \sum_{k=1}^{n}Tr( (x_k-\mu)^T \Sigma^{-1}(x_k-\mu)) = \sum_{k=1}^{n}Tr( \Sigma^{-1}(x_k-\mu)(x_k-\mu)^T)
and \dfrac{\partial}{\partial A} Tr(AB) = B we have
D_{\Sigma} L = \dfrac{n}{2} \Sigma - \dfrac{1}{2} \sum_{k=1}^{n} (x_k-\mu)(x_k-\mu)^T=0.
So the MLE for \Sigma is
\hat{\Sigma}= \dfrac{1}{n} \sum_{k=1}^{n} (x_k-\mu)(x_k-\mu)^T = \dfrac{1}{n} \sum_{k=1}^{n} (x_k-\overline{x})(x_k-\overline{x})^T.
- Nhận đường liên kết
- X
- Ứng dụng khác
Nhận xét
Đăng nhận xét