Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refine README #16

Merged
merged 16 commits into from
Dec 5, 2024
7 changes: 7 additions & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
FROM python:3.12-slim

WORKDIR /app

COPY . /app/

RUN pip install -e .
Binary file added FGSM_panda.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
103 changes: 89 additions & 14 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,17 @@
A library for conducting Adversarial Attacks on pytorch image classifier models.

## Overview
Adversarial Attack is a Python library that provides a simple API and CLI for conducting adversarial attacks on PyTorch image classifier models. The library supports both standard and targeted attacks using the Fast Gradient Sign Method (FGSM) algorithm (https://arxiv.org/abs/1412.6572).
Adversarial Attack is a Python library that provides a simple API and CLI for conducting adversarial Fast Gradient Sign Method (FGSM) (https://arxiv.org/abs/1412.6572) attacks on PyTorch image classifier models.

Given a pre-trained PyTorch model and an input image, the library generates an adversarial image that is misclassified by the model but looks almost identical to the original image.
The paper demonstrates that it is possible to generate adversarial examples by adding small perturbations to the input image that are imperceptible to the human eye but can cause the model to misclassify the image.
This is acomploshed by taking the gradient of the loss function with respect to the input image and then adding a small perturbation in the direction that increases the loss the most.

The library comes with a set of pre-trained PyTorch models (e.g., ResNet18, ResNet50) and utility functions for loading images, preprocessing images. However users can also use their own models and images but must include their own preprocessing and loading steps (see *Running via API* section).
![alt text](FGSM_panda.png)
The above image (taken from the FGSM paper) illstrates the result of an FGSM attack that has been able to trick the model into classifying a panda as a gibbon.

The library implements the standard FGSM attack and a targeted FGSM attack. The standard attack aims to generate an adversarial image that is misclassified by the model, while the targeted attack aims to generate an adversarial image that is misclassified as a specific target category.

The library comes with a set of pre-trained PyTorch models (e.g., ResNet18, ResNet50) and utility functions for loading images, preprocessing images. However, users can also use their own models and images but must include their own preprocessing and loading steps (see *Running via API* section).

## Installation
Adversarial Attack can be installed by first cloning the repository and the installing dependecies using pip. It is reccomended to use a virtual environment to install dependencies.
Expand Down Expand Up @@ -34,17 +40,17 @@ python -m adversarial_attack --model <MODEL_NAME> --mode <MODE> --image <IMAGE_P
```
### Parameters:

- `--model, -m`: The model to attack (e.g., `resnet18`, `resnet50`).
- `--mode`: The type of attack:
- `--model, -m`: The model to attack (e.g., `resnet18`, `resnet50`) (required).
- `--mode`: The type of attack (optional):
- `standard`: Standard FGSM attack.
- `targeted`: Targeted FGSM attack (default).
- `--image, -i`: Path to the input image to attack.
- `--category-truth, -c`: The true class label of the image (e.g., `cat`).
- `--image, -i`: Path to the input image to attack (required).
- `--category-truth, -c`: The true class label of the image (e.g., `cat`) (required).
- `--category-target, -ct`: The target class label for the targeted attack (only required for targeted mode).
- `--epsilon, -eps`: The epsilon value for the attack (default: `1.0e-3`).
- `--max-iterations, -it`: Maximum number of iterations for the FGSM attack (default: `50`).
- `--output, -o`: Path to save the resulting adversarial image.
- `--log, -l`: Log level (e.g., `DEBUG`, `INFO`, `WARNING`, `ERROR`, `CRITICAL`).
- `--epsilon, -eps`: The epsilon value for the attack (optional. default: `1.0e-3`).
- `--max-iterations, -it`: Maximum number of iterations for the FGSM attack (optional. default: `50`).
- `--output, -o`: Path to save the resulting adversarial image (optional).
- `--log, -l`: Log level (e.g., `DEBUG`, `INFO`, `WARNING`, `ERROR`, `CRITICAL`) (optional).


## Running via API
Expand Down Expand Up @@ -86,12 +92,81 @@ if result_image is not None:
result_image.save('path/to/output.jpg')
```

## Examples
## Running CLI via Docker

The library can also be run via Docker. To build the Docker image, run the following command:

```bash
docker build -t adversarial_attack .
```

The latest version of the library is available on Docker Hub. To pull the image, run the following command:

```bash
docker pull tomcarter23/adversarial_attack
```

To run the Docker container and CLI, use the following command as an example:

```bash
docker run -v /path/to/images:/app/images -v /path/to/output:/app/output tomcarter23/adversarial_attack python -m adversarial_attack --model resnet50 --mode targeted --image ./images/goldfish.JPEG --category-truth goldfish --category-target hare --epsilon 1.0e-3 --max-iterations 50 --output ./output/adversarial_goldfish.JPEG --log DEBUG

```

The command mounts the `/path/to/images` directory to the `/app/images` directory in the container and the `/path/to/output` directory to the `/app/output` directory in the container. The command then runs the adversarial attack on the `goldfish.JPEG` image in the `/images` directory and saves the resulting adversarial image as `adversarial_goldfish.JPEG` in the `/output` directory.

## Example

The `sample_images/imagenet` directory contains a set of example images from the ILSCVR2012 Imagenet validation dataset which constitute part of the same dataset that the pre-trained models were trained on.
The images are named according to their true class label (e.g., `lawn_mower_ILSVRC2012_val_00020327.JPEG`), where the true class label is the part of the filename before the `ILSVRC2012` identifier.
True classes in each of the provided models do not contain underscores e.g. `lawn mower`. This format should be used if using these sample images for testing.

The following command demonstrates how to run a targeted adversarial attack on a ResNet50 model using the `hare` image from the `sample_images` directory.
The target category is `goldfish`, and the epsilon value is `1.0e-3` with a maximum of `50` iterations. The resulting adversarial image is saved as `output_images/hare_to_goldfish.jpg`.

```bash
python -m adversarial_attack --model resnet50 --mode targeted --image sample_images/imagenet/lawn_mower_ILSVRC2012_val_00020327.JPEG --category-truth "lawn mower" --category-target goldfish --epsilon 1.0e-3 --max-iterations 50 --output output.jpg
```
python -m adversarial_attack --model resnet50 --mode targeted --image sample_images/imagenet/hare_ILSVRC2012_val_00004064.JPEG --category-truth hare --category-target goldfish --epsilon 1.0e-3 --max-iterations 50 --output output_images/hare_to_goldfish.JPEG --log DEBUG
```

The following table shows the original and perturbed images generated by the above command:


| Orignal Image | Perturbed Image |
|----------------------------------------------------------------------|-------------------------------------------------|
| Classification: `hare` | Classification: `goldfish` |
| Confidence: 0.44 | Probability: 0.25 |
| ![Image 1](sample_images/imagenet/hare_ILSVRC2012_val_00004064.JPEG) | ![Image 2](output_images/hare_to_goldfish.JPEG) |

As can be seen, the perturbed image is misclassified by the model as the target category `goldfish` with a confidence of `0.9999`. The perturbed image looks almost identical to the original image, demonstrating the effectiveness of the adversarial attack and the limitations of the model.

NOTE: The perturbed image is a cropped version of the original image due to the preprocessing steps needed to pass the image throught the model. The perturbed image could be upscaled using information from the original image. This is left as future work.
## Testing

### Unit Tests

The library comes with a set of unit tests that can be run using the following command:

```bash
pytest tests/unit -v
```

These unit tests cover the core functionality of the library, including loading models, images, and performing adversarial attacks.

The tests use mocking where appropriate to isolate the testing of individual components of the library.

### End-to-End Tests


The library also comes with a set of end-to-end tests that can be run using the following command:

```bash
pytest tests/e2e -v
```

The end-to-end tests are broken down into two categories: standard and targeted attacks.

The standard attack tests test the success rate of the standard FGSM attack on the set of `sample_images` for the default models `resnet50`, `resnet101` and `resnet152`. We observe a success rate of `96.30%` across all 27 tests.

The targeted attack tests test the success rate of the targeted FGSM attack on the set of `sample_images` for the default models `resnet50`, `resnet101` and `resnet152`. For each image and model the target attack using the target category of the other remaining 8 categories that are represented in the `sample_images` directory. We observe a success rate of `89.63%` across all 270 tests.

Failures for each type of tests typically occur when the models original prediction does not match the true category of the image. When this occurs performing an attack is pointless and the attack fails.
2 changes: 1 addition & 1 deletion adversarial_attack/__main__.py
Original file line number Diff line number Diff line change
Expand Up @@ -94,7 +94,7 @@ def main():
)
parser.add_argument(
"--log",
default="WARNING",
default="INFO",
help=f"Set the logging level. Available options: {list(logging._nameToLevel.keys())}",
)

Expand Down
6 changes: 3 additions & 3 deletions adversarial_attack/api.py
Original file line number Diff line number Diff line change
Expand Up @@ -50,9 +50,9 @@ def perform_attack(

if results is not None:
new_image, orig_pred_idx, new_pred_idx = results
print("Adversarial attack succeeded!")
print(f"Original Prediction: {categories[orig_pred_idx]}")
print(f"New Prediction: {categories[new_pred_idx]}")
logger.info("Adversarial attack succeeded!")
logger.info(f"Original Prediction: {categories[orig_pred_idx]}")
logger.info(f"New Prediction: {categories[new_pred_idx]}")

return new_image

Expand Down
21 changes: 15 additions & 6 deletions adversarial_attack/fgsm.py
Original file line number Diff line number Diff line change
Expand Up @@ -78,12 +78,15 @@ def standard_attack(

for i in range(max_iter):
model.zero_grad()
grad = compute_gradient(
model=model, input=adv_tensor, target=torch.tensor([orig_pred_idx])
)
grad = compute_gradient(model=model, input=adv_tensor, target=torch.tensor([orig_pred_idx]))
adv_tensor = torch.clamp(adv_tensor + epsilon * grad.sign(), -2, 2)
new_pred_idx = model(adv_tensor).argmax()
new_output = model(adv_tensor)
new_pred_idx = new_output.argmax()
logger.debug(
f"attack iteration {i}, current prediction: {new_pred_idx}, current max probability: {torch.nn.functional.softmax(new_output, dim=1).max()}"
)
if orig_pred_idx != new_pred_idx:
logger.info(f"Standard attack successful.")
return adv_tensor, orig_pred_idx, new_pred_idx

logger.warning(
Expand Down Expand Up @@ -124,6 +127,7 @@ def targeted_attack(

orig_pred_idx: int = orig_pred.argmax().item()
truth_idx: int = truth.item()
target_idx: int = target.item()

if orig_pred_idx != truth_idx:
logger.warning(
Expand All @@ -141,8 +145,13 @@ def targeted_attack(
model.zero_grad()
grad = compute_gradient(model=model, input=adv_tensor, target=target)
adv_tensor = torch.clamp(adv_tensor - epsilon * grad.sign(), -2, 2)
new_pred_idx = model(adv_tensor).argmax().item()
if orig_pred_idx != new_pred_idx:
new_output = model(adv_tensor)
new_pred_idx = new_output.argmax(dim=1).item()
logger.debug(
f"Attack iteration {i}, target: {target_idx}, current prediction: {new_pred_idx}, current max probability: {torch.nn.functional.softmax(new_output, dim=1).max()}"
)
if new_pred_idx == target_idx:
logger.info(f"Targeted attack successful.")
return adv_tensor, orig_pred_idx, new_pred_idx

logger.warning(
Expand Down
Binary file added output_images/hare_to_goldfish.JPEG
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
12 changes: 12 additions & 0 deletions tests/e2e/test_end2end.py
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,12 @@ def test_perform_attack_standard():
max_iter=10,
)

if result is not None:
# if attack is successful, the new prediction should be different from the true category
# feed the adversarial image to the model and get the new prediction
new_pred_idx = model(result).argmax().item()
assert new_pred_idx != categories.index(true_category), "Attack should change the model prediction."

total_tests += 1
if result is not None:
success_count += 1
Expand Down Expand Up @@ -151,6 +157,12 @@ def test_perform_attack_targeted():
max_iter=10,
)

if result is not None:
# if attack is successful, the new prediction should be the target category
# feed the adversarial image to the model and get the new prediction
new_pred_idx = model(result).argmax().item()
assert new_pred_idx == categories.index(target_category), "Attack should change the model prediction to target."

total_tests += 1
if result is not None:
success_count += 1
Expand Down
Loading