Understanding StyleGAN - An Introduction‍

June 19, 2023

Building Blocks of StyleGAN

Equivariant Resnet Blocks or the Generator
AdaIN Layers
Batch Normalization
Leaky ReLU activation Function
Discriminator

Equivalent Resnet Blocks or the Generator

Resnet Blocks or the generator learn by avoiding unimportant neurons that do not contribute to the accuracy and directly jumping to the output layer. It helps increase the accuracy of the generative model.

AdaIN Layers

AdaIN stands for Adaptive Instance Normalization. It aligns the mean and variance of the features of an image with that of the style features.

Batch Normalization

Batch normalization is responsible for stabilizing and accelerating the training of the deep neural network. It takes care of the vanishing gradient problem and hence makes the images more realistic.

Leaky ReLU activation Function

The graph given below is that of a leaky ReLU activation function which is used in StyleGANs.

‍

‍

It outputs the exact same number when the input is positive but when the input is negative, it multiplies the output by 0.01. This indicates that the style elements are only taken and the unimportant factors are skipped which makes the output of the model more realistic.

Discriminator

Discriminator is analogous to the classification model which differentiates and classifies images into different classes. For example, over the course of training the deep neural network, the generator model gives such accurate faces that the discriminator model classifies it as a real human face.

‍

Deep Dive Into The Architecture of StyleGAN

Source

The above is the overall architecture of the Style Based Generative Adversarial Networks. So what it basically does is, it takes in a pretrained model and batch normalization is performed on it. It is then passed through a mapping network where the first image is of very low resolution and the resolution increases higher up as we pass the mapping network. It is then passed through a feed forward neural network (Generative Adversarial Network) which consists of multiple multilayer perceptrons and AdaIN. Gaussian noise is also introduced in between each layer to bring about changes in the stochastic and style properties of the image.

The one disadvantage of this algorithm is that this is highly computationally complex but yields excellent results with high quality. Below is the sample of images generated by the model using the FFHQ dataset. It is a dataset of human faces of high quality.

Experiments conducted using StyleGAN

Style Mixing
Stochastic Variation
Separation of global effects from stochasticity

Style Mixing

Two different pictures, which are referred to as two latent codes in the official research paper, are used as references and the style is induced from those pictures. We can form a matrix and tune the model to obtain different and unique images.

Style Mixing is done using two reference images

Stochastic Variation

The random features such as freckles, placement of hair, wrinkles etc are introduced into the model to make it more realistic. That is called the stochastic variation. It is depicted in the picture below:

Notice the placement of hair in the above images

Separation of global effects from stochasticity

Global factors such as lighting, background, etc are also added to the model to increase the authenticity of the pictures.

We can clearly notice the change of background here

Final Thoughts

There has been a lot of advancement in the field of styleGAN. Various versions have been released. There are many pros and cons to this model. For example, it can be used as lifelike avatars in the field of gaming, advertisement, modeling of different races, medical imagery etc.

‍

Source

People also end up using this novel approach for unethical purposes to create fake photographs and faces. It can even be used to create propaganda. The internet is flooded with fake profiles that it has become difficult even for tech giants like meta and twitter to use the state of the art technologies to remove fake content.

References

This work is purely a result of research from the following two papers:

[1] Generative Adversarial Networks by

Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, Yoshua Bengio

[2] A Style-Based Generator Architecture for Generative Adversarial Networks by

Tero Karras, Samuli Laine, Timo Aila

[3] Pictures from pexels.com, giphy.com

‍

Sign up for Free Trial

Latest Blogs

A vector illustration of a tech city using latest cloud technologies & infrastructure