BiLLM

Tool for converting LLMs from uni-directional to bi-directional for tasks like classification and sentence embeddings. Compatible with 🤗 transformers.

Supported Models

LLaMA
Mistral
Qwen2
OpenELM

Usage

python -m pip install -U billm
Specify start index for bi-directional layers via export BiLLM_START_INDEX={layer_index}. if not specified, default is 0, i.e., all layers are bi-directional. If set to -1, BiLLM is disabled.
Import LLMs from BiLLM and initialize them as usual with transformers.

- from transformers import (
-    LLamaModel,
-    LLamaForCausalLM,
-    LLamaForSequenceClassification,
-    MistralModel,
-    MistralForCausalLM,
-    MistralForSequenceClassification
-    Qwen2Model,
-    Qwen2ForCausalLM,
-    Qwen2ForSequenceClassification
- )

+ from billm import (
+    LLamaModel,
+    LLamaForCausalLM,
+    LLamaForSequenceClassification,
+    LLamaForTokenClassification,
+    MistralModel,
+    MistralForCausalLM,
+    MistralForSequenceClassification,
+    MistralForTokenClassification,
+    Qwen2Model,
+    Qwen2ForCausalLM,
+    Qwen2ForSequenceClassification,
+    Qwen2ForTokenClassification
+    OpenELMModel,
+    OpenELMForCausalLM,
+    OpenELMForSequenceClassification,
+    OpenELMForTokenClassification
+ )

Examples

NER

training:

$ cd examples
$ WANDB_MODE=disabled BiLLM_START_INDEX=0 CUDA_VISIBLE_DEVICES=3 python billm_ner.py \
--model_name_or_path mistralai/Mistral-7B-v0.1 \
--dataset_name_or_path conll2003 \
--push_to_hub 0

inference:

from transformers import AutoTokenizer, pipeline
from peft import PeftModel, PeftConfig
from billm import MistralForTokenClassification


label2id = {'O': 0, 'B-PER': 1, 'I-PER': 2, 'B-ORG': 3, 'I-ORG': 4, 'B-LOC': 5, 'I-LOC': 6, 'B-MISC': 7, 'I-MISC': 8}
id2label = {v: k for k, v in label2id.items()}
model_id = 'WhereIsAI/billm-mistral-7b-conll03-ner'
tokenizer = AutoTokenizer.from_pretrained(model_id)
peft_config = PeftConfig.from_pretrained(model_id)
model = MistralForTokenClassification.from_pretrained(
    peft_config.base_model_name_or_path,
    num_labels=len(label2id), id2label=id2label, label2id=label2id
)
model = PeftModel.from_pretrained(model, model_id)
# merge and unload is necessary for inference
model = model.merge_and_unload()

token_classifier = pipeline("token-classification", model=model, tokenizer=tokenizer, aggregation_strategy="simple")
sentence = "I live in Hong Kong. I am a student at Hong Kong PolyU."
tokens = token_classifier(sentence)
print(tokens)

Sentence Embeddings

refer to AnglE: https://github.com/SeanLee97/AnglE

Citation

If you use this toolkit in your work, please cite the following paper:

For sentence embeddings modeling:

@inproceedings{li2024bellm,
    title = "BeLLM: Backward Dependency Enhanced Large Language Model for Sentence Embeddings",
    author = "Li, Xianming and Li, Jing",
    booktitle = "Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics",
    year = "2024",
    publisher = "Association for Computational Linguistics"
}

For other tasks:

@article{li2023label,
  title={Label supervised llama finetuning},
  author={Li, Zongxi and Li, Xianming and Liu, Yuzhang and Xie, Haoran and Li, Jing and Wang, Fu-lee and Li, Qing and Zhong, Xiaoqin},
  journal={arXiv preprint arXiv:2310.01208},
  year={2023}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

BiLLM

Supported Models

Usage

Examples

NER

Sentence Embeddings

Citation

Files

README.md

Latest commit

History

README.md

File metadata and controls

BiLLM

Supported Models

Usage

Examples

NER

Sentence Embeddings

Citation