EM: an information geometric view (Amari, Neural Computation 7, 1994)
M: Model-Manifold: all distributions, realizable by the architecture
D: Data-Manifold: Degrees of Freedom due to hidden states.
Model and Data distribution lay in a dually flat space:
Iterate between e and m step:
- e-projection: project from the current distribution in the data-manifold onto the model-manifold.
- m-projection: project from the current distribution in the data-manifold onto the model-manifold.
Unique measure of divergence between distributions: KL-Distance