Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sanity checks #8

Merged
merged 12 commits into from
Apr 12, 2019
Merged

Sanity checks #8

merged 12 commits into from
Apr 12, 2019

Conversation

martinmatak
Copy link
Collaborator

Three tests introduced (all passing):

GIVEN a NN and fixed hyperparameters for the FGSM attack
WHEN the attack is executed twice against the NN
THEN results should be completely the same

GIVEN random seed and hyperparameters
WHEN two neural networks are trained
THEN they should be completely the same

GIVEN enough number of epochs for training
WHEN two neural networks with same architectures are trained (using the different seed)
THEN they should have similar accuracy

@zvonimir
Copy link
Member

zvonimir commented Apr 4, 2019

These are not all the tests we discussed, right?

@martinmatak
Copy link
Collaborator Author

martinmatak commented Apr 4, 2019 via email

@zvonimir
Copy link
Member

zvonimir commented Apr 4, 2019

GIVEN fixed hyperparameters for the FGSM attack and enough number of epochs for training
WHEN two neural networks with same architectures are trained (using the different seed)
THEN results of attacks should be completely (or almost) the same

@martinmatak
Copy link
Collaborator Author

@zvonimir Thank you, I updated PR based on your comment.

I wasn't certain how to assert THEN results of attacks should be completely (or almost) the same precisely, so I did as follows:

  • accuracies of NNs before the attack must be similar [3% diff allowed]
  • accuracies of NNs after the attack must be similar [3% diff allowed]
  • perturbations introduced must be similar (i.e. mean, std dev, min and max of differences between adv samples and legit samples) [1% diff allowed]

Should I add something else? What do you think?

@zvonimir
Copy link
Member

zvonimir commented Apr 4, 2019

Why would accuracies of NNs change before and after attack? I mean, your target NNs remain constant. So I don't get that part.

Yep, perturbations should be the same. Meaning that the generated pairs of adversarial images should be (almost) identical. So not just average diff and so on, but the actual adversarial images should be the same. Georg mentioned opening a few pairs of images in photoshop and doing a diff there to make sure they are the same.

@martinmatak
Copy link
Collaborator Author

Why would accuracies of NNs change before and after attack? I mean, your target NNs remain constant. So I don't get that part.

Sorry, I didn't express myself precisely enough. What I meant is the following:

  • Accuracy of NN_1 and NN_2 measured on legit samples should be similar
  • Accuracy of NN_1 measured on adv samples crafted for NN_1 should be similar to accuracy of NN_2 measured on adv samples crafted for NN_2

Meaning that the generated pairs of adversarial images should be (almost) identical. So not just average diff and so on, but the actual adversarial images should be the same. Georg mentioned opening a few pairs of images in photoshop and doing a diff there to make sure they are the same.

I added plotting of samples (image below).

Four columns in the image below represent the following:

  1. original sample
  2. adv sample for NN_1
  3. adv sample for NN_2
  4. aboslute difference between adv samples from 2. and 3. column

mnist-fashion-result

Regarding completely the same (values of pixels for) adversarial samples, they occur only when the attack is executed against the same NN twice (or two NNs with with same weights, i.e. trained with the same seed etc. - effectively same NN as verified in https://github.com/soarlab/AAQNN/pull/8/files#diff-d0b33b1baec7d17a5a87a9ce85c0f612).

This is verified with this assertion:

assert np.array_equal(adv_1, adv_2)

and I added plotting of that (image below). Columns represent same values as in previous image.

same-images

Do you have maybe any other idea for sanity check? To me the attack seems good for our use case.

@martinmatak
Copy link
Collaborator Author

martinmatak commented Apr 5, 2019

Now when I think about it, it might be the case that perturbation introduced by this attack is always of the same size because it just changes the image in the opposite direction of gradient for some eps.

Nevertheless, if we measure robustness per quantization level, it's still a suitable attack.

I believe an optimization approach would be more informative regarding the needed perturbation, i.e. results could vary depending on the quantization. For instance #7

@zvonimir
Copy link
Member

zvonimir commented Apr 5, 2019

I think we should maybe move this exchange to email so that Georg can participate as well. Could you please summarize all this in an email to Georg and me? Thanks!

@martinmatak
Copy link
Collaborator Author

@zvonimir can I merge this branch?

@zvonimir
Copy link
Member

Yes, please go ahead and merge.

@martinmatak martinmatak merged commit c39bf5d into master Apr 12, 2019
@martinmatak martinmatak deleted the sanity-checks branch April 12, 2019 09:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants