Phản ví dụ của một số sự kéo theo hội tụ trong xác suất

Maximum Likelihood for Multivariate Normal Distribution

In this post, we will produce the Maximum Likelihood Estimator (MLE) for multivariate normal distribution.

Firstly, recall that the multivariate normal distribution N(\mu, \sigma) where \mu \in \mathbb{R}^d and \sigma \in \text{Mat}(d, \mathbb{R}) symmetric, positively definite, has the density function
f(x) = \dfrac{1}{(2\pi)^{d/2} |\Sigma|^{1/2}} \exp{-\dfrac{1}{2}((x-\mu)^T \Sigma^{-1} (x-\mu))}, \,\,\,\,\, x\in \mathbb{R}^{d}.


Let x_1,x_2,\dots, x_n be generated from N(\mu,\Sigma); the log likelihood function is
L = \dfrac{n}{2} \log |\Sigma^{-1}| -\dfrac{1}{2} \sum_{k=1}^{n} (x_k-\mu)^T \Sigma^{-1} (x_k-\mu)+const

Then taking the derivation with respect to \mu
D_{\mu} L =  -\dfrac{1}{2} \sum_{k=1}^{n} D_{\mu}((x_k-\mu)^T \Sigma^{-1} (x_k-\mu)).

Using the chain rule and the fact that \Sigma^{-1} is also positive definite, we have D_{\mu}((x_k-\mu)^T \Sigma^{-1} (x_k-\mu)) = D_{\mu} \| \Sigma^{-1/2}(x_k-\mu) \|^2 = (\mu - x_k)^T \Sigma^{-1}. Hence
D_{\mu} L =  \sum_{k=1}^{n} (x_k-\mu)^T \Sigma^{-1} .

Hence, D_{\mu} L = 0 \Leftrightarrow \mu = \dfrac{1}{n} \sum_{k=1}^{n} x_k = \overline{x}. Moreover, the hessian matrix of \mu is -\Sigma^{-1} which is negative definite. So \hat{\mu}= \overline{x} is the MLE for \mu.

For \Sigma, taking derivation with respect to \Sigma^{-1}, note that for all invertible matrix A, D_A | A |= A^{-1}|A| and
\sum_{k=1}^{n} (x_k-\mu)^T \Sigma^{-1}(x_k-\mu) =   \sum_{k=1}^{n}Tr( (x_k-\mu)^T \Sigma^{-1}(x_k-\mu)) =   \sum_{k=1}^{n}Tr( \Sigma^{-1}(x_k-\mu)(x_k-\mu)^T)

and \dfrac{\partial}{\partial A} Tr(AB) = B we have
D_{\Sigma} L = \dfrac{n}{2} \Sigma - \dfrac{1}{2} \sum_{k=1}^{n} (x_k-\mu)(x_k-\mu)^T=0.

So the MLE for \Sigma is
\hat{\Sigma}= \dfrac{1}{n}  \sum_{k=1}^{n} (x_k-\mu)(x_k-\mu)^T = \dfrac{1}{n}  \sum_{k=1}^{n} (x_k-\overline{x})(x_k-\overline{x})^T.



Nhận xét