Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how to load and use Transformers version model #1

Open
schmidtj3 opened this issue Jun 5, 2024 · 7 comments
Open

how to load and use Transformers version model #1

schmidtj3 opened this issue Jun 5, 2024 · 7 comments
Assignees

Comments

@schmidtj3
Copy link

@robinzixuan Hello authors,
I came across this arXiv paper which mentions the use of this model and Iwould like to know how to use this model to reproduce the retrieval results in the paper.

Specifically, I'm looking into the magicslabnu/OutEffHop_bert_base (the one used in the paper?) model card, from HuggingFace Transformers model Hub. Could you provide instructions on how to load and use this model (w/ Transformers package), and to reproduce the results in the abovementioned paper?

Thank you.!

@schmidtj3 schmidtj3 changed the title how to load Transformer model how to load and use Transformers version model Jun 5, 2024
@robinzixuan
Copy link
Contributor

Thanks for your message, we will upload the model file to Hugging Face next week. And you can use the model file from Hugging Face in a short period. However, for now, you still can reproduce the result. For example, as we mentioned the Attention is a special case of the Hopfield, and BERT model is based on the Attention Architecture, so you can simply change the Vanilla SoftMax to Softmax_1 in BERT model, then you get the OutEffHop version of the BERT model. After that you can reproduce your results from the Hugging Face checkpoints (load the model from the Hugging Face with the changed Architecture). If you have more question, welcome to directly contact me with [email protected]

@robinzixuan robinzixuan self-assigned this Jun 5, 2024
@schmidtj3
Copy link
Author

Thanks for your message, we will upload the model file to Hugging Face next week. And you can use the model file from Hugging Face in a short period. However, for now, you still can reproduce the result. For example, as we mentioned the Attention is a special case of the Hopfield, and BERT model is based on the Attention Architecture, so you can simply change the Vanilla SoftMax to Softmax_1 in BERT model, then you get the OutEffHop version of the BERT model. After that you can reproduce your results from the Hugging Face checkpoints (load the model from the Hugging Face with the changed Architecture). If you have more question, welcome to directly contact me with [email protected]

Thank you for your response! Could you also list down the steps to load the model on HuggingFace Hub? And just to check does it mean that model weights that reproduce the retrieval results in the paper will be uploaded to HuggingFace Hub next week?
thank you!

@robinzixuan
Copy link
Contributor

Sorry for confusion, the model weight of the model is on Hugging Face. But as you known, if we directly load the model weight into our model, the Transformers will give us the Vanilla Version of the model (BERT), so we should change the code for that

@robinzixuan
Copy link
Contributor

modeling_bert.py.zip
You can use this model directly for OutEffHop version BERT model. In our experiment, we use the hooks to replace the softmax to softmax_1. You can also use like that.

@robinzixuan
Copy link
Contributor

Sorry for the delay of the update on Hugging Face, because this week I am qualify exam.
`if model_args.model_name_or_path:
torch_dtype = (
model_args.torch_dtype
if model_args.torch_dtype in ["auto", None]
else getattr(torch, model_args.torch_dtype)
)
model = AutoModelForMaskedLM.from_pretrained(
model_args.model_name_or_path,
from_tf=bool(".ckpt" in model_args.model_name_or_path),
config=config,
cache_dir=model_args.cache_dir,
revision=model_args.model_revision,
token=model_args.token,
trust_remote_code=model_args.trust_remote_code,
torch_dtype=torch_dtype,
low_cpu_mem_usage=model_args.low_cpu_mem_usage,
)
else:
logger.info("Training new model from scratch")
model = AutoModelForMaskedLM.from_config(config, trust_remote_code=model_args.trust_remote_code)

# >> replace Self-attention module with ours
# NOTE: currently assumes BERT
for layer_idx in range(len(model.bert.encoder.layer)):
    old_self = model.bert.encoder.layer[layer_idx].attention.self
    print("----------------------------------------------------------")
    print("Inside BERT custom attention")
    print("----------------------------------------------------------")
    new_self = BertUnpadSelfAttentionWithExtras(
        config,
        position_embedding_type=None,
        softmax_fn=SOFTMAX_MAPPING["softmax1"],
        ssm_eps=None,
        tau=None,
        max_seq_length=data_args.max_seq_length,
        skip_attn=False,
        fine_tuning=False,
    )

    # copy loaded weights
    if model_args.model_name_or_path is not None:
        new_self.load_state_dict(old_self.state_dict(), strict=False)
    model.bert.encoder.layer[layer_idx].attention.self = new_self
print(model)`

@schmidtj3
Copy link
Author

schmidtj3 commented Jun 5, 2024

@robinzixuan Thanks for inclduing these implementations!
Would it be possible to also provide the finetuned weights (HR w/ training in paper) for reproducing the retrieval results?

@robinzixuan
Copy link
Contributor

I think you can find the related code on theory verification

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants