In this article, we discuss what StyleGAN is and how it can fool you with fake images
Do You See Me?
Facebook removed about 500 million accounts in the first quarter of 2018. Social media platforms were filled with scammers and were at risk. Cyber crimes hit a very high gradient all of a sudden with all the scammers posing as people who don’t exist. So the question is, how was this made possible?
The answer is StyleGAN which is able to create realistic images from scratch. It is an extension of the GAN algorithm which was introduced way back in 2014. Although Generative Adversarial Networks were a revolutionary change in the field of machine learning, it did have some drawbacks. For instance, some of the images it created were very blurry, and they did not seem realistic. One could say that they seemed fake.
Images created by GAN (Source)
Style-based generative adversarial networks solved this problem just by making some modifications in the architecture of GAN. We will get into the nitty gritties of its architecture soon.
Building Blocks of StyleGAN
- Equivariant Resnet Blocks or the Generator
- AdaIN Layers
- Batch Normalization
- Leaky ReLU activation Function
- Discriminator
Equivalent Resnet Blocks or the Generator
Resnet Blocks or the generator learn by avoiding unimportant neurons that do not contribute to the accuracy and directly jumping to the output layer. It helps increase the accuracy of the generative model.
AdaIN Layers
AdaIN stands for Adaptive Instance Normalization. It aligns the mean and variance of the features of an image with that of the style features.
Batch Normalization
Batch normalization is responsible for stabilizing and accelerating the training of the deep neural network. It takes care of the vanishing gradient problem and hence makes the images more realistic.
Leaky ReLU activation Function
The graph given below is that of a leaky ReLU activation function which is used in StyleGANs.
It outputs the exact same number when the input is positive but when the input is negative, it multiplies the output by 0.01. This indicates that the style elements are only taken and the unimportant factors are skipped which makes the output of the model more realistic.
Discriminator
Discriminator is analogous to the classification model which differentiates and classifies images into different classes. For example, over the course of training the deep neural network, the generator model gives such accurate faces that the discriminator model classifies it as a real human face.
Deep Dive Into The Architecture of StyleGAN
The above is the overall architecture of the Style Based Generative Adversarial Networks. So what it basically does is, it takes in a pretrained model and batch normalization is performed on it. It is then passed through a mapping network where the first image is of very low resolution and the resolution increases higher up as we pass the mapping network. It is then passed through a feed forward neural network (Generative Adversarial Network) which consists of multiple multilayer perceptrons and AdaIN. Gaussian noise is also introduced in between each layer to bring about changes in the stochastic and style properties of the image.
The one disadvantage of this algorithm is that this is highly computationally complex but yields excellent results with high quality. Below is the sample of images generated by the model using the FFHQ dataset. It is a dataset of human faces of high quality.
Experiments conducted using StyleGAN
- Style Mixing
- Stochastic Variation
- Separation of global effects from stochasticity
Style Mixing
Two different pictures, which are referred to as two latent codes in the official research paper, are used as references and the style is induced from those pictures. We can form a matrix and tune the model to obtain different and unique images.
Style Mixing is done using two reference images
Stochastic Variation
The random features such as freckles, placement of hair, wrinkles etc are introduced into the model to make it more realistic. That is called the stochastic variation. It is depicted in the picture below:
Notice the placement of hair in the above images
Separation of global effects from stochasticity
Global factors such as lighting, background, etc are also added to the model to increase the authenticity of the pictures.
We can clearly notice the change of background here
Final Thoughts
There has been a lot of advancement in the field of styleGAN. Various versions have been released. There are many pros and cons to this model. For example, it can be used as lifelike avatars in the field of gaming, advertisement, modeling of different races, medical imagery etc.
People also end up using this novel approach for unethical purposes to create fake photographs and faces. It can even be used to create propaganda. The internet is flooded with fake profiles that it has become difficult even for tech giants like meta and twitter to use the state of the art technologies to remove fake content.
References
This work is purely a result of research from the following two papers:
[1] Generative Adversarial Networks by
Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, Yoshua Bengio
[2] A Style-Based Generator Architecture for Generative Adversarial Networks by
Tero Karras, Samuli Laine, Timo Aila
[3] Pictures from pexels.com, giphy.com