Both GANs and VAEs have been remarkably effective at modeling images, and the learned latent representations often correspond to interesting, semantically-meaningful representations of the observed data. In contrast, GANs and VAEs have been less successful at modeling natural language, but for different reasons.
- GANs have difficulty dealing with discrete output spaces (such as natural language) as the resulting objective is no longer differentiable with respect to the generator.
- VAEs can deal with discrete output spaces, but when a powerful model (e.g. LSTM) is used as a generator, the model learns to ignore the latent variable and simply becomes a language model.