GAN
KL Divergence, Jensen-Shannon Divergence and f-divergence
Given two probability distributions p and q,
KL divergence measures how “far” q is from p without requiring that D is a metric.
- with equality iff p=q. Can be proven using Jensen’s inequality.
- Asymmetric, i.e.
- Minimising KL divergence to the empirical distribution is equivalent to maximising likelihood.
Jensen-Shannon divergence
JS divergence is symmetric and more smooth.
GAN consists of two models:
- Discriminator or critic is a comparison model that estimates the probability that the sample matches the true distribution .
- Generator that outputs samples, given a latent variable , as close to true distribution. Or informally, generator’s job is to trick the critic into offering a high probability for the synthetic output.
Our goal is to play the adversarial game between and , where the generator tries to create images that match the true distribution as close as possible, and critic becomes better at detecting any errors generator is making by classifying the generated samples.
We can compare the two distribution by computing the density ratio , and converting the problem into binary classification: . Using cross-entropy loss, objective becomes
Optimal discriminator’s maximises the probability of classifying true distribution , and maximises the probability of detecting a fake sample . While generator’s objective is to minimise the discriminator’s probability of classifying incorrect samples .
Together this turns into minimax game: