Generative Adversarial Networks

KL Divergence, Jensen-Shannon Divergence and f-divergence

Given two probability distributions p and q,

KL divergence $D_{KL} (p ∥ q)$ measures how “far” q is from p without requiring that D is a metric.
$D_{KL} (p ∥ q) = k = 1 \sum K p_{k} lo g \frac{p _{k}}{q _{k}} = \int p (x) lo g \frac{p ( x )}{q ( x )} d x = - H (p) k = 1 \sum K p_{k} lo g p_{k} - H_{ce} (p, q) k = 1 \sum K p_{k} lo g q_{k}$

$D_{KL} \geq 0$ with equality iff p=q. Can be proven using Jensen’s inequality.

Asymmetric, i.e. $D_{KL} (p ∥ q) \neq = D_{KL} (q ∥ p)$

Minimising KL divergence to the empirical distribution is equivalent to maximising likelihood.

Jensen-Shannon divergence
$D_{JS} (p ∥ q) = \frac{1}{2} D_{KL} (p ∥ \frac{p + q}{2}) + \frac{1}{2} D_{KL} (q ∥ \frac{p + q}{2})$
JS divergence is symmetric and more smooth.

GAN consists of two models:

Discriminator or critic $D_{ϕ} : X \to Δ$ is a comparison model that estimates the probability that the sample $q$ matches the true distribution $p^{*}$ .
Generator $G_{θ} : Z \to X$ that outputs samples, given a latent variable $z$ , as close to true distribution. Or informally, generator’s job is to trick the critic into offering a high probability for the synthetic output.

Our goal is to play the adversarial game between $D_{ϕ}$ and $G_{θ}$ , where the generator tries to create images that match the true distribution as close as possible, and critic becomes better at detecting any errors generator is making by classifying the generated samples.

We can compare the two distribution by computing the density ratio $r (x) = \frac{p ^{*} ( x )}{q _{θ} ( x )}$ , and converting the problem into binary classification: $\frac{p ^{*} ( x )}{q _{θ} ( x )} = \frac{D ( x )}{1 - D ( x )}$ . Using cross-entropy loss, objective becomes

V (q_{θ}, p^{*}) = ϕ ar g max E_{p (x ∣ y) p (y)} [y lo g D_{ϕ} (x) + (1 - y) lo g (1 - D_{ϕ} (x))]

Optimal discriminator’s maximises the probability of classifying true distribution $E_{x \sim p_{D} (x)} [lo g D_{ϕ} (x)]$ , and maximises the probability of detecting a fake sample $E_{z \sim p_{z}} [lo g (1 - D_{ϕ} (G_{θ} (z)))]$ . While generator’s objective is to minimise the discriminator’s probability of classifying incorrect samples $E_{z \sim p_{z}} [lo g (1 - D_{ϕ}^{*} (G_{θ} (z)))]$ .

Together this turns into minimax game:

ar g θ min ϕ max E_{x \sim p_{D}} [lo g D_{ϕ} (x)] + E_{z \sim p_{z}} [lo g (1 - D_{ϕ} (G_{θ} (z)))]

lonerapier.me

Backlinks

Explorer

Generative Adversarial Networks