Notes: BEGAN

Published: April 12, 2017

This post provides summary of the paper by Berthelot et al. 2017. They proposed a robust architecture for GAN with usual training procedure. In order to have stable convergence, they propose use to use equilibrium concept between Generator and Discriminator. The results are much imporoved in terms of both image diversity and visual quality.

paper : Berthelot et al. 2017

Proposed Method

Wasserstein distance for auto-encoders

To study the effects of matching the distribution of errors, an auto-encoder loss is approximated as normal distribution and further Wasserstein distance is computed between auto-encoder loss of real and generated samples.

Pixel-wise autoencoder training loss is given as \(L(v) = |v - D(v) |^{\eta} \\\) where \(D: \mathbb{R} ^{N_x} \rightarrow \mathbb{R}^{N_x}\) is autoencoder function, \(\eta \in \{1,2\}\) is target norm, \(v \in \mathbb{R}^{N_x}\) is a sample of dimension \(N_x\) .

Using Central limit theorem, if number of pixels are assumed to be large and per pixel loss to i.i.d. the image wise losses approximately follow normal distribution.

Wasserstein distance

Given two normal distributions \(\mu_1 = \mathcal{N}(m_1, C_1)\) and \(\mu_2 = \mathcal{N}(m_2, C_2)\) with mean \(m_{1,2} \in \mathbb{R}^p\) and covariances \(C_{1,2} \in \mathbb{R}^{p*p}\) , the distance is given as

\[W(\mu_1, \mu_2)^2 = ||m_1 - m_2||^2_2 + \text{trace}(C_1 + C_2 - 2(c_2^{1/2}C_1c_2^{1/2})^{1/2}) \\\]

when \(p=1\) , above equation reduces to

\[W(\mu_1, \mu_2)^2 = ||m_1 - m_2||^2_2 + (c_1 +c_2 - 2\sqrt{c_1c_2}) \\\]

if \(\frac{(c_1 +c_2 - 2\sqrt{c_1c_2})}{||m_1 - m_2||^2_2} \text{is constant or monotously increasing}\)

the problem is simplified as

\[W(\mu_1, \mu_2)^2 \propto |m_1 - m_2||^2_2\]

In this paper the aim is to reduce Wasserstein distance between loss distribution and not sample distribution.

GAN Objective

\(\mu_1\) be the normal distibution of the loss \(\mathcal{L}(x)\) where \(x\) are real samples

\(\mu_2\) be the normal distribution of the loss \(\mathcal{L}(G(z))\) where \(G : \mathbb{R}^{N_z} \rightarrow \mathbb{R}^{N_x}\) is the generator function and \(z \in [-1, 1]^{N_z}\) are random uniform samples

The possible solutions of maximizing eq(5)

\[W(\mu_1, \mu_2) \propto m_2 - m_1 \\ m_1 \rightarrow 0 \\ m_2 \rightarrow \infty\]

\[W(\mu_1, \mu_2) \propto m_1 - m_2 \\ m_1 \rightarrow \infty \\ m_2 \rightarrow 0\]

the choice for objective in the paper is (a)

Overall GAN objective is given as

\[\mathcal{L}_D = \mathcal{L} (x; \theta_D) - \mathcal{L}(G(z_D; \theta_G); \theta_D) \text{ for } \theta_D \\ \mathcal{L}_G = \mathcal{L}(G(z_G; \theta_G);\theta_D)\]

Equilibrium

Usually, \(G\) and \(D\) are not well balanced and \(D\) wins easily. To maintain a balance between generator and descriminator. The equilibrium condition is given as :

\[\mathbb{E} [\mathcal{L}(x)] = \mathbb{E} [\mathcal{L}(G(z))]\]

However, in general settings this is not the case and hence a new parameter is considered.

\[\gamma = \frac{\mathbb{E} [\mathcal{L}(G(z))]}{\mathbb{E} [\mathcal{L}(x)]}\]

where \(\gamma \in [0,1]\)

BEGAN with Equilibrium

\[\mathcal{L}_D = \mathcal{L} (x; \theta_D) - \mathcal{L}(G(z_D; \theta_G); \theta_D) \text{ for } \theta_D \\ \mathcal{L}_G = \mathcal{L}(G(z_G; \theta_G);\theta_D) \\ k_{t+1} = k_t + \lambda_k (\gamma \mathcal{L}(x) - \mathcal{L}(G(z_G))) \text{ for each training step } t\]

Training

both the objectives of generator and discriminator are trained simultaneously

Model

For Generator

img —> conv(3,3) —> {[conv33, conv33] —>upsample(2,2) } x 3—> conv3,3 x 2 —> fc —> emb(h)

For discriminator

emb(h) —> {conv33, conv33, subsample(2,2)} x 3 —> {conv3,3 , conv3,3} -> conv3,3(img)

Share on

Twitter Facebook Google+ LinkedIn

Abhinav Dadhich

Proposed Method

Wasserstein distance for auto-encoders

Wasserstein distance

GAN Objective

Equilibrium

BEGAN with Equilibrium

Training

Model

Share on

Leave a Comment

You May Also Enjoy

Training with AdvProp

Self Supervised Learning for Vision

Do Imagenet Models transfer well across datasets?

Pytorch Tutorial