-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add Min-K%++ attack #19
Conversation
Thanks a lot for the PR, @zjysteven ! I have a couple suggestions that I have added via comments. The fact that you submitted such a well-structured PR is alone sufficient and I would understand if you would rather have us make these changes, but would be great if you could make those changes |
Yes I can make further changes. Somehow I'm not seeing your comments. Would you please point me to the comments? |
target_prob, all_prob = ( | ||
(probs, all_probs) | ||
if (probs is not None and all_probs is not None) | ||
else self.model.get_probabilities(document, tokens=tokens, return_all_probs=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason for computing softmax and log_softmax separately? Given access to the latter (which is default right now), a exp() operation should give you the desired softmax outputs for use in mink++?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Earlier I misread and thought that we have softmax, and since log_softmax is numerically more stable than log(softmax) I compute them separately (but yeah as you pointed what we have already is log_softmax). I will make the change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do want to clarify here that while target_prob
is the log probability of each input token, all_prob
is the log probabilities of every token in the vocabulary (the whole categorical distribution).
run.py
Outdated
loss=loss, | ||
) | ||
sample_information[attack].append(score) | ||
if attack in [AllAttacks.MIN_K, AllAttacks.MIN_K_PLUS_PLUS]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Our intent with the structure was to have as few attack-specific checks as possible. We plan on adding attack-specific hyper-parameters in the config file, so that any changes required for the attack here can be achieved by different config files, and not hard-coded hyper-parameters and switch-case in code!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also a valid point. I will just remove this modification so run.py
follows the current design.
Thanks! They should be visible now |
I've made changes accordingly in the latest commit. @iamgroot42 Would you take another look? |
@@ -116,7 +113,6 @@ def get_probabilities(self, | |||
if no_grads: | |||
logits = logits.cpu() | |||
shift_logits = logits[..., :-1, :].contiguous() | |||
probabilities = torch.nn.functional.softmax(shift_logits, dim=-1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like suggested here we don't need to separately call softmax
given that log_softmax
will be computed anyway.
Thanks for making the changes! Will run our standard workflow and merge it |
Hi @iamgroot42, first thanks for putting up this unified benchmark.
Overview
This pull request add our proposed method, Min-K%++, which outperforms other reference-free methods and performs on par with the Ref method on MIMIR. See below experiment part for details.
Changed files
We follow the instruction to create
attacks/mink_plus_plus.py
and register the method inattacks/all_attacks.py
andattacks/utils.py
. Our method needs the full probabilities over the vocabulary, so we modify theget_probabilities
function inmodels.py
with an additional argumentreturn_all_probs
to return full probs.Meanwhile, we change
run.py
accordingly to include our Min-K%++ attack. Additionally, we let both Min-K% and Min-K%++ to run over a list ofk
values, which allows both methods to be compared by their upper bounds.Lastly, we change the readme to incorporate Min-K%++.
Experiments
I'm putting this result table here in case MIMIR wants to have a leaderboard or something. We are using non-deduped pythia models and the default
ngram_13_0.8
setting. The data splits are fetched fromcache_100_200_1000_512
folder on huggingface.Let us know if there are any issues. Otherwise, we look forward to Min-K%++ being integrated into the MIMIR benchmark.