Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add Min-K%++ attack #19

Merged
merged 2 commits into from
Apr 5, 2024
Merged

add Min-K%++ attack #19

merged 2 commits into from
Apr 5, 2024

Conversation

zjysteven
Copy link
Contributor

Hi @iamgroot42, first thanks for putting up this unified benchmark.

Overview

This pull request add our proposed method, Min-K%++, which outperforms other reference-free methods and performs on par with the Ref method on MIMIR. See below experiment part for details.

Changed files

We follow the instruction to create attacks/mink_plus_plus.py and register the method in attacks/all_attacks.py and attacks/utils.py. Our method needs the full probabilities over the vocabulary, so we modify the get_probabilities function in models.py with an additional argument return_all_probs to return full probs.

Meanwhile, we change run.py accordingly to include our Min-K%++ attack. Additionally, we let both Min-K% and Min-K%++ to run over a list of k values, which allows both methods to be compared by their upper bounds.

Lastly, we change the readme to incorporate Min-K%++.

Experiments

image
I'm putting this result table here in case MIMIR wants to have a leaderboard or something. We are using non-deduped pythia models and the default ngram_13_0.8 setting. The data splits are fetched from cache_100_200_1000_512 folder on huggingface.

Let us know if there are any issues. Otherwise, we look forward to Min-K%++ being integrated into the MIMIR benchmark.

@iamgroot42
Copy link
Owner

Thanks a lot for the PR, @zjysteven ! I have a couple suggestions that I have added via comments. The fact that you submitted such a well-structured PR is alone sufficient and I would understand if you would rather have us make these changes, but would be great if you could make those changes

@zjysteven
Copy link
Contributor Author

Yes I can make further changes. Somehow I'm not seeing your comments. Would you please point me to the comments?

target_prob, all_prob = (
(probs, all_probs)
if (probs is not None and all_probs is not None)
else self.model.get_probabilities(document, tokens=tokens, return_all_probs=True)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason for computing softmax and log_softmax separately? Given access to the latter (which is default right now), a exp() operation should give you the desired softmax outputs for use in mink++?

Copy link
Contributor Author

@zjysteven zjysteven Apr 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Earlier I misread and thought that we have softmax, and since log_softmax is numerically more stable than log(softmax) I compute them separately (but yeah as you pointed what we have already is log_softmax). I will make the change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do want to clarify here that while target_prob is the log probability of each input token, all_prob is the log probabilities of every token in the vocabulary (the whole categorical distribution).

run.py Outdated
loss=loss,
)
sample_information[attack].append(score)
if attack in [AllAttacks.MIN_K, AllAttacks.MIN_K_PLUS_PLUS]:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our intent with the structure was to have as few attack-specific checks as possible. We plan on adding attack-specific hyper-parameters in the config file, so that any changes required for the attack here can be achieved by different config files, and not hard-coded hyper-parameters and switch-case in code!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also a valid point. I will just remove this modification so run.py follows the current design.

@iamgroot42
Copy link
Owner

Thanks! They should be visible now

@zjysteven
Copy link
Contributor Author

zjysteven commented Apr 4, 2024

I've made changes accordingly in the latest commit. @iamgroot42 Would you take another look?

@@ -116,7 +113,6 @@ def get_probabilities(self,
if no_grads:
logits = logits.cpu()
shift_logits = logits[..., :-1, :].contiguous()
probabilities = torch.nn.functional.softmax(shift_logits, dim=-1)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like suggested here we don't need to separately call softmax given that log_softmax will be computed anyway.

@iamgroot42
Copy link
Owner

Thanks for making the changes! Will run our standard workflow and merge it

@iamgroot42 iamgroot42 merged commit f50d0b3 into iamgroot42:main Apr 5, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants