Given the data set
It is hard to obtain the optimal solution of the above objective function directly if the data dimension is more than three. The main idea of the EM algorithm is to optimize the following augmented log-likelihood function:
where the latent mixing variables
By the mean-variance mixture definition of generalized hyperbolic distributions, the log-likelihood function can be rewritten as
where
The conditional normal density is
with
Thus, maximizing the likelihood can be separated:
-
$L_1(\mu, \Sigma, \gamma)$ for location, scale, skewness parameters. -
$L_2(\lambda, \chi, \psi)$ for mixing parameters.
From
Maximization of
This reduces to solving for
From which
-
Normal Inverse Gaussian (NIG):
$\lambda = -0.5$ . -
Variance Gamma (VG):
$\chi=0, \lambda > 0$ . -
Skew-t:
$\lambda = -\nu/2, \chi = \nu$ .
Each case yields simplifications in the root-solving equations.
Introduce conditional expectations:
These expectations are expressed in terms of modified Bessel functions
Update rules at iteration
To address the identification problem, normalize:
where
- Iterative "online" updating of
$\mu, \gamma, \Sigma$ accelerates convergence. - This EM algorithm applies to GH, NIG, VG, and skew-t as special cases of the NMVM framework.