Adversarial Attack with FGSM (Fast Gradient Signed Method)
Last modified: 2023-08-22
Adversarial Attack is the method to fool a neural network. This leads misclassification of a classification model. The FGSM attack is also known as white-box attack. In short, we need to know about the model’s architecture to achieve this attack
Create Adversarial Examples against ResNet
Reference: PyTorch Docs
It's recommended to use an environment which is optimized to implement a machine learning model such as Google Colaboratory, Jupyter Notebook.
1. Import Modules
import torch
import torch.nn as nn
import torch.nn.functional as F
from torchvision import datasets, models, transforms
import numpy as np
from PIL import Image
2. Load ResNet Model
We load the ResNet50 pretrained on ImageNet. It's no problem whether ResNet18, ResNet34, etc.
model = models.resnet50(pretrained=True)
model.eval()
torch.manual_seed(42)
use_cuda = True
device = "cuda" if torch.cuda.is_available() else "cpu"
print("Device: ", device)
3. Load/Preprocess Image
We use the image of the fluffy samoyed dog.
wget https://github.com/pytorch/hub/raw/master/images/dog.jpg
Then need to preprocess it.
# Define a function which preprocesss the original image
preprocess = transforms.Compose([
transforms.Resize(256),
transforms.CenterCrop(224),
transforms.ToTensor(),
transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])
orig_img_tensor = preprocess(orig_img)
# Prepend one dimension to the tensor for inference
orig_img_batch = orig_img_tensor.unsqueeze(0)
# Attach device to the image and the model
orig_img_batch = orig_img_batch.to(device)
model = model.to(device)
4. Load ImageNet Classes
We use the ImageNet classes. The labels will be used for checking which label the original image and adversarial images are classfied by the model.
wget https://raw.githubusercontent.com/pytorch/hub/master/imagenet_classes.txt
Then read this text file and assign to labels.
with open("imagenet_classes.txt", "r") as f:
labels = [s.strip() for s in f.readlines()]
5. Initial Prediction
Before creating adversarial examples, we need to know the classes and probabilities of the original image by the ResNet model.
pred = model(orig_img_batch)
probs = F.softmax(pred[0], dim=0)
probs_top5, idx_top5 = torch.topk(probs, 5)
print("The top 5 labels of highly probabilies:")
for i in range(probs_top5.size(0)):
print(f"{labels[idx_top5[i]]}: {probs_top5[i].item()*100:.2f}%")
# Extract the top probability and index (target) for use in the next sections
target_prob = probs_top5[0]
target_idx = idx_top5[0]
The top5 labels/accuracies should be such as below.
The top 5 labels of highly probabilies:
Samoyed: 87.33%
Pomeranian: 3.03%
white wolf: 1.97%
keeshond: 1.11%
Eskimo dog: 0.92%
As we imagine, the ResNet model predicted the original image as Samoyed
with 87.33%
accuracy.
6. Define Function to Denormalize
Create a function to denormalize an input image. Since the original image must be denormalized before FGSM process, this function is used to do that.
def denorm(batch, mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]):
if isinstance(mean, list):
mean = torch.tensor(mean).to(device)
if isinstance(std, list):
std = torch.tensor(std).to(device)
return batch * std.view(1, -1, 1, 1) + mean.view(1, -1, 1, 1)
7. Calculate Perturbations
This process is the main role of the Adversarial Attack.
It calculates the sign of the backpropagated gradients. It will be used for adjusting the input data to maximize the loss value in the next section.
def calc_perturbations(image, target):
image.requires_grad = True
# Predict the original image
pred = model(image)
loss = F.nll_loss(pred, target)
model.zero_grad()
loss.backward()
gradient = image.grad.data
signed_grad = gradient.sign()
return signed_grad
perturbations = calc_perturbations(orig_img_batch, torch.tensor([target_idx]))
8. Start Creating Adversarial Examples
Now generate adversarial exampels by each epsilon.
The adversarial image is generated by adding the multiply of epsilong and perturbations to the original image data.
Generally, the higher the value of epsilon, the less accuracy of the prediction by the model.
epsilons = [0, .01, .05, .1, .2]
adv_examples = []
for eps in epsilons:
orig_img_batch_denorm = denorm(orig_img_batch)
adv_img = orig_img_batch_denorm + eps * perturbations
adv_img = torch.clamp(adv_img, 0, 1)
# Normalize the adversarial image
adv_img_norm = transforms.Normalize((0.485, 0.456, 0.406), (0.229, 0.224, 0.225))(adv_img)
# Predict the adversarial example
adv_pred = model(adv_img_norm)
adv_probs = F.softmax(adv_pred[0], dim=0)
adv_probs_top5, adv_idx_top5 = torch.topk(adv_probs, 5)
print("-"*28 + f"Eps {eps}" + "-"*28)
for i in range(adv_probs_top5.size(0)):
print(f"{labels[adv_idx_top5[i]]}: {adv_probs_top5[i]*100:.2f}%")
print()
# Make the adversarial example to the image to be saved
adv_ex = adv_img.squeeze().detach().cpu().numpy()
adv_examples.append((labels[adv_idx_top5[0]], adv_probs_top5[0], adv_ex))
The output should be such as below.
----------------------------Eps 0----------------------------
Samoyed: 87.33%
Pomeranian: 3.03%
white wolf: 1.97%
keeshond: 1.11%
Eskimo dog: 0.92%
----------------------------Eps 0.01----------------------------
West Highland white terrier: 43.36%
Scotch terrier: 8.47%
wallaby: 7.29%
cairn: 4.53%
Angora: 1.87%
----------------------------Eps 0.05----------------------------
West Highland white terrier: 92.15%
cairn: 1.28%
Angora: 1.16%
Scotch terrier: 1.06%
Maltese dog: 0.66%
----------------------------Eps 0.1----------------------------
West Highland white terrier: 97.47%
Scotch terrier: 0.57%
cairn: 0.31%
Angora: 0.17%
Maltese dog: 0.15%
----------------------------Eps 0.2----------------------------
West Highland white terrier: 50.01%
white wolf: 12.23%
ice bear: 8.72%
Arctic fox: 3.96%
Samoyed: 2.19%
We should notice that adversarial images were not classified as Samoyed
, but the other labels such as West Highland white terrier
after the epsilon 0.01.
In short, we succeeded to fool the model’s predictions by modifying the original image.
9. Plot the Result
Although this section is optional, we can plot the result above.
import matplotlib.pyplot as plt
cnt = 0
plt.figure(figsize=(28, 10))
for i, eps in enumerate(epsilons):
cnt += 1
plt.subplot(1, len(adv_examples), cnt)
plt.xticks([])
plt.yticks([])
label, prob, img = adv_examples[i]
plt.title(f"Eps {eps}\nClass: {label}\nAccuracy: {prob*100:.2f}%", fontsize=14)
plt.imshow(img.T)
plt.show()
We should see that the noise gets louder as the epsilon increases.
However, from human eyes, these images are Samoyed
no matter how you look at them.
10. Save the Adversarial Examples
Finally, we save the generated adversarial images.
Create new folder to store all adversarial images to be downloaded.
mkdir fake_dogs
Now save the images. We can use them to fool ResNet models.
# Save adversarial images
from torchvision.utils import save_image
for i, eps in enumerate(epsilons):
label, prob, ex = adv_examples[i]
ex_tensor = torch.from_numpy(ex).clone()
save_image(ex_tensor, f"fake_dogs/fake_dog_eps{eps}.png")
Create Adversarial Examples against MobileNetV2
Reference: TensorFlow Docs
1. Load Pretrained Model (MobileNetV2)
import tensorflow as tf
pretrained_model = tf.keras.applications.MobileNetV2(include_top=True, weights='imagenet')
pretrained_model.trainable = False
# ImageNet labels
decode_predictions = tf.keras.applications.mobilenet_v2.decode_predictions
2. Prepare Original Image
We create functions to preprocess image and get label at first.
# Helper function to preprocess the image so that it can be inputted in MobileNetV2
def preprocess(image):
image = tf.cast(image, tf.float32)
image = tf.image.resize(image, (224, 224))
image = tf.keras.applications.mobilenet_v2.preprocess_input(image)
image = image[None, ...]
return image
# Helper function to extract labels from probability vector
def get_imagenet_label(probs):
return decode_predictions(probs, top=1)[0][0]
Then load the original image and preprocess it.
orig_image_path = tf.keras.utils.get_file('YellowLabradorLooking_new.jpg', 'https://storage.googleapis.com/download.tensorflow.org/example_images/YellowLabradorLooking_new.jpg')
orig_image_raw = tf.io.read_file(image_path)
orig_image = tf.image.decode_image(image_raw)
orig_image = preprocess(image)
orig_image_probs = pretrained_model.predict(image)
To get the label of the image that the model predicted, execute the following code.
_, orig_image_class, orig_class_confidence = get_imagenet_label(orig_image_probs)
print(f"class: {orig_image_class}")
print(f"confidence: {orig_class_confidence}")
# The output
# class: Labrador_retriever
# confidence: 0.418184757232666
3. Create Adversarial Image with FGSM
From this, we create the adversarial image to fool the MobileNetV2 model. The following code creates the perturbations to modify the original image.
# Instantiate a function that computes the crossentropy loss between labels and predictions.
loss_obj = tf.keras.losses.CategoricalCrossentropy()
def create_adversarial_pattern(input_image, input_label):
# The gradient tape records the operations which are executed inside it.
with tf.GradientTape() as tape:
tape.watch(input_image)
prediction = pretrained_model(input_image)
loss = loss_obj(input_label, prediction)
# Get the gradients of the loss w.r.t (with respect to) to the input image.
gradient = tape.gradient(loss, input_image)
# Get the sign of the gradients to create the perturbation.
signed_grad = tf.sign(gradient)
return signed_grad
# The index of the label for labrador retriever
target_label_idx = 208
orig_label = tf.one_hot(target_label_idx, orig_image_probs.shape[-1])
orig_label = tf.reshape(orig_label, (1, orig_image_probs.shape[-1]))
perturbations = create_adversarial_pattern(orig_image, orig_label)
Now create adversarial examples and predict the labels by the classification model while increasing epsilon.
# Epsilons are error terms (very small numbers)
epsilons = [0, 0.01, 0.1, 0.15]
for i, eps in enumerate(epsilons):
adv_image = orig_image + eps*perturbations
adv_image = tf.clip_by_value(adv_image, -1, 1)
# Predict the label and the confidence for the adversarial image
_, label, confidence = get_imagenet_label(pretrained_model.predict(adv_image))
print(f"predicted label: {label}")
print(f"confidence: {confidence*100:.2f}%")
print("-"*128)
The outputs are something like below.
1/1 [==============================] - 0s 25ms/step
predicted label: Labrador_retriever
confidence: 41.82%
--------------------------------------------------------------------------------------------------------------------------------
1/1 [==============================] - 0s 27ms/step
predicted label: Saluki
confidence: 13.08%
--------------------------------------------------------------------------------------------------------------------------------
1/1 [==============================] - 0s 24ms/step
predicted label: Weimaraner
confidence: 15.13%
--------------------------------------------------------------------------------------------------------------------------------
1/1 [==============================] - 0s 26ms/step
predicted label: Weimaraner
confidence: 16.58%
--------------------------------------------------------------------------------------------------------------------------------
As above, the adversarial examples were predicted as different labels from the label that the original image was predicted (the original label is labrador retriever).
To display the final adversarial image, execute the following code.
import matplotlib.pyplot as plt
plt.imshow(adv_image[0])
4. Save/Load the Adversarial Image
We can save the generated adversarial image as below.
tf.keras.utils.save_img("fake.png", adv_image[0])
To load this image, use Pillow.
from PIL import Image
fake_img = Image.open("fake.png")
fake_img