Network trained are not able to provide better performance on clean test data. Even with fine-tuning larger models perform slightly better but smaller models are not performing well. This paper
Adversarial Propagation (AdvProp)
A method that learns from both clean images as well as adversarially modified images. Since clean images are derived from a different distribution as compared to adversarial images, the model needs to also use the batch statistics according to image source. If not, the model may not be effective in extracting accurate features.
In case of data perturbed adversarially, the batch statistics can change due different distributions of the data.
BatchNorm(BN), present widely in modern architectures, captures data statistics by using mean and variance of the input mini-batch. BN for adversarial perturbations may not provide disentangled mean and variance. As an example in the following figure from the paper, right hand graph shows different distributions of clean and adversarial data. But if we use BN in the model as described in the left flow chart the resulting distributions statistics will be the solid line(right plot).
Source: paper:
An Auxillary BN can be used to disentangle these distributions and provide individual statistics during training. And during inference, usual primary BN only is used. This is shown in the following figure.
The training of these algo is performed as per the following psuedocode:
Results
Performance of this method results in increase in accuracy of efficient-nets by upto 0.7% on Imagenet dataset
Similarly, comparing the Adversarial propagation wrt Adversarial training(using combined data distribution) of the model, relative improvements of efficientnets is as :
Smaller size efficientnets are shown to perform significantly better than the larger ones, specifically B5 and above.
Self-supervised learning consists of a learning framework designed to learn representation of data using pretext tasks. A pretext task is supervised learning setting created automatically from the input such that the cost of label is free. Read more in How? section.
Leave a Comment