how to load and use Transformers version model #1

schmidtj3 · 2024-06-05T05:42:11Z

@robinzixuan Hello authors,
I came across this arXiv paper which mentions the use of this model and Iwould like to know how to use this model to reproduce the retrieval results in the paper.

Specifically, I'm looking into the magicslabnu/OutEffHop_bert_base (the one used in the paper?) model card, from HuggingFace Transformers model Hub. Could you provide instructions on how to load and use this model (w/ Transformers package), and to reproduce the results in the abovementioned paper?

Thank you.!

robinzixuan · 2024-06-05T06:39:41Z

Thanks for your message, we will upload the model file to Hugging Face next week. And you can use the model file from Hugging Face in a short period. However, for now, you still can reproduce the result. For example, as we mentioned the Attention is a special case of the Hopfield, and BERT model is based on the Attention Architecture, so you can simply change the Vanilla SoftMax to Softmax_1 in BERT model, then you get the OutEffHop version of the BERT model. After that you can reproduce your results from the Hugging Face checkpoints (load the model from the Hugging Face with the changed Architecture). If you have more question, welcome to directly contact me with [email protected]

schmidtj3 · 2024-06-05T07:37:21Z

Thanks for your message, we will upload the model file to Hugging Face next week. And you can use the model file from Hugging Face in a short period. However, for now, you still can reproduce the result. For example, as we mentioned the Attention is a special case of the Hopfield, and BERT model is based on the Attention Architecture, so you can simply change the Vanilla SoftMax to Softmax_1 in BERT model, then you get the OutEffHop version of the BERT model. After that you can reproduce your results from the Hugging Face checkpoints (load the model from the Hugging Face with the changed Architecture). If you have more question, welcome to directly contact me with [email protected]

Thank you for your response! Could you also list down the steps to load the model on HuggingFace Hub? And just to check does it mean that model weights that reproduce the retrieval results in the paper will be uploaded to HuggingFace Hub next week?
thank you!

robinzixuan · 2024-06-05T07:51:12Z

Sorry for confusion, the model weight of the model is on Hugging Face. But as you known, if we directly load the model weight into our model, the Transformers will give us the Vanilla Version of the model (BERT), so we should change the code for that

robinzixuan · 2024-06-05T08:00:00Z

modeling_bert.py.zip
You can use this model directly for OutEffHop version BERT model. In our experiment, we use the hooks to replace the softmax to softmax_1. You can also use like that.

robinzixuan · 2024-06-05T08:02:15Z

Sorry for the delay of the update on Hugging Face, because this week I am qualify exam.
`if model_args.model_name_or_path:
torch_dtype = (
model_args.torch_dtype
if model_args.torch_dtype in ["auto", None]
else getattr(torch, model_args.torch_dtype)
)
model = AutoModelForMaskedLM.from_pretrained(
model_args.model_name_or_path,
from_tf=bool(".ckpt" in model_args.model_name_or_path),
config=config,
cache_dir=model_args.cache_dir,
revision=model_args.model_revision,
token=model_args.token,
trust_remote_code=model_args.trust_remote_code,
torch_dtype=torch_dtype,
low_cpu_mem_usage=model_args.low_cpu_mem_usage,
)
else:
logger.info("Training new model from scratch")
model = AutoModelForMaskedLM.from_config(config, trust_remote_code=model_args.trust_remote_code)

# >> replace Self-attention module with ours
# NOTE: currently assumes BERT
for layer_idx in range(len(model.bert.encoder.layer)):
    old_self = model.bert.encoder.layer[layer_idx].attention.self
    print("----------------------------------------------------------")
    print("Inside BERT custom attention")
    print("----------------------------------------------------------")
    new_self = BertUnpadSelfAttentionWithExtras(
        config,
        position_embedding_type=None,
        softmax_fn=SOFTMAX_MAPPING["softmax1"],
        ssm_eps=None,
        tau=None,
        max_seq_length=data_args.max_seq_length,
        skip_attn=False,
        fine_tuning=False,
    )

    # copy loaded weights
    if model_args.model_name_or_path is not None:
        new_self.load_state_dict(old_self.state_dict(), strict=False)
    model.bert.encoder.layer[layer_idx].attention.self = new_self
print(model)`

schmidtj3 · 2024-06-05T08:36:04Z

@robinzixuan Thanks for inclduing these implementations!
Would it be possible to also provide the finetuned weights (HR w/ training in paper) for reproducing the retrieval results?

robinzixuan · 2024-06-05T18:13:47Z

I think you can find the related code on theory verification

schmidtj3 changed the title ~~how to load Transformer model~~ how to load and use Transformers version model Jun 5, 2024

robinzixuan self-assigned this Jun 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

how to load and use Transformers version model #1

how to load and use Transformers version model #1

schmidtj3 commented Jun 5, 2024

robinzixuan commented Jun 5, 2024

schmidtj3 commented Jun 5, 2024

robinzixuan commented Jun 5, 2024

robinzixuan commented Jun 5, 2024

robinzixuan commented Jun 5, 2024

schmidtj3 commented Jun 5, 2024 •

edited

Loading

robinzixuan commented Jun 5, 2024

how to load and use Transformers version model #1

how to load and use Transformers version model #1

Comments

schmidtj3 commented Jun 5, 2024

robinzixuan commented Jun 5, 2024

schmidtj3 commented Jun 5, 2024

robinzixuan commented Jun 5, 2024

robinzixuan commented Jun 5, 2024

robinzixuan commented Jun 5, 2024

schmidtj3 commented Jun 5, 2024 • edited Loading

robinzixuan commented Jun 5, 2024

schmidtj3 commented Jun 5, 2024 •

edited

Loading