Image augmentations used during training are critical for image classifier generalization performance. As a result, a considerable amount of research has concentrated on determining the best augmentation policy for a given job. RandAugment, a simple random augmentation policy, recently outperformed more advanced approaches. Only Adversarial AutoAugment, a method based on adversarial training, has proven to be superior to RandAugment.
In this blog, we will discuss that random augmentation are still competitive when compared to an optimal adversarial approach and simple curricula, and we hypothesize that Adversarial Auto Augments success is due to the stochasticity of the policy controller network, which introduces a mild form of curriculum.
Introduction
The data augmentations utilized during training can have a significant impact on the effectiveness of a deep vision model. These augmentations for natural photos may be classified as geometric (e.g., horizontal flips, translations), colour (e.g., histogram equalisation), and distortion (e.g., random cropping, or random grey painting of image sections, usually referred to as Cutout).
A significant amount of study has gone into determining the best augmentation policy, which decides the type and size of augmentation to utilise at a specific training point. To mention a few, novel techniques include reinforcement learning, Bayesian optimization, and evolutionary algorithms. While these strategies exceeded previous handmade augmentation baselines, they were eventually surpassed by RandAugment, a simple random augmentation strategy. The lone exception is Adversarial AutoAugment, which employs an adversarial strategy to direct the search for the best augmentation policy and surpasses RandAugment substantially. However, it is unknown how adversarial the policies used by Adversarial AutoAugment are; so, the issue remains: can adversarial augmentations genuinely help?
Image Augmentation Using an Adversarial Approach
The primary idea of Adversarial AutoAugment is to discover augmentations that result in "hard" samples as defined by training loss. Learning from challenging cases, according to the premise, should drive the model to acquire more robust characteristics. The optimization goal is transformed into a min-max problem:
where D represents the training set, x represents a sample with label y, w represents the model parameters, fw() represents the model output, L(fw()) represents the loss function, and S represents the collection of all accessible augmentations (as in AutoAugment). In the inner maximisation (i.e., find the most adversarial policies), the maximisation is solved by introducing a controller augmentation policy network that, given the losses produced by the main model, is trained to output probabilities p1,...,p|S| for each τ1, . . . , τ|S|, so that the maximisation is solved in expectation:
where P(·, θ) is defined by the controller policy network output with parameters θ.
An Adversarial Strategy
For computational considerations, the magnitudes are set to 5 rather than 10, and the search space of the operations is limited to 15. We start with a batch size of 128 and employ a multiplicity of M = 1 and M = 2, which means we supplement each batch with one or two operations. This training lasts 200 epochs and reports the greatest test accuracy overall training epochs averaged across five runs. We employ a warm-up of 10 epochs with no augmentation, Nesterov optimizer, batch-norm momentum of 0.1, and weight decay of 5e-4.
Removing the Controller Augmentation Policy Network Training's Stochasticity
Let τ ∗ = τj be the best solution to the inner maximization problem for some j ≤ |S|
TrueAdv is an experiment in which the empirical loss on the complete training set and all policies is assessed at the conclusion of each epoch in order to determine the true value of τ*. As a result, we analyze all training points N and each policy τ ∈ S.
TrueAdv is really detrimental to generalisation since it reduces test set accuracy across all data sets and multiplicities examined. Using random augmentations is not only more economical in terms of computing, but also broader. This begs the question, "Why don't learned rules exhibit the same behavior while attempting to solve an objective whose exact answer decreases generalisation performance?"
This is because policy network training is stochastic.
The variations in policy selection are obvious:
- TrueAdv relies on only three operations (Rotation, Inversion, and Brightness) for more than half of the time, whereas AdvAA's probability of sampling a specific operation remains fairly uniform (some milder differences can be appreciated after epoch 300);
- TrueAdv uses the highest magnitudes (8-9) 77% of the time, whereas AdvAA uses 30%.
Figure: A comparison between AdvAA with TrueAdv.
The percentage of times each operation or magnitude was utilised during training is indicated in the graph above. We present the average for all epochs for TrueAdv, and magnitudes are binned by increasing strength (i.e., bin 8-9 is the highest magnitude). Notice how the operations (a) in AdvAA maintain an almost uniform sampling throughout epochs, whereas TrueAdv depends on fewer operations more often. Also, observe in (b) how AdvAA selects the top two largest magnitudes (8-9) less than 30% of the time but TrueAdv 77% of the time. This disparity would be magnified if we simply looked at the first 200 epochs (the starting point for TrueAdv).
These findings imply that AdvAA does not maximise the inner objective, but rather samples rather uniformly. A curriculum might potentially play an impact, although the chances of sampling harsh policies remain very modest when compared to TrueAdv.
Conclusion
In this blog, we discussed that when selecting training augmentations, addressing exactly the adversarial objective introduced degrades test set accuracy, even when compared to the baseline. We discussed that the findings obtained must be due to its stochasticity and a modest type of curriculum generated by the network policy controller.
The idea of reverting to milder augmentations as the model completes training has proven to be intriguing, and more research work will continue to investigate how this strategy can be refined to finally produce results that are better than using random policies while automating as many of the curriculum's hyperparameters as possible.