diff --git a/README.md b/README.md
index 8b3fbf0..c7c8f5c 100644
--- a/README.md
+++ b/README.md
@@ -93,7 +93,9 @@ LLM-based models:
 python -m pip install -U angle-emb
 ```
 
-### ⌛ Load BERT-based Model
+### ⌛ Infer BERT-based Model
+[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1QJcA2Mvive4pBxWweTpZz9OgwvE42eJZ?usp=sharing)
+
 
 1) **With Prompts**: You can specify a prompt with `prompt=YOUR_PROMPT` in `encode` method. If set a prompt, the inputs should be a list of dict or a single dict with key `text`, where `text` is the placeholder in the prompt for the input text. You can use other placeholder names. We provide a set of predefined prompts in `Prompts` class, you can check them via `Prompts.list_prompts()`.
 
@@ -137,27 +139,88 @@ for i, dv1 in enumerate(doc_vecs):
 ```
 
 
-### ⌛ Load LLM-based Models
+### ⌛ Infer LLM-based Models
+[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1QJcA2Mvive4pBxWweTpZz9OgwvE42eJZ?usp=sharing)
 
 If the pretrained weight is a LoRA-based model, you need to specify the backbone via `model_name_or_path` and specify the LoRA path via the `pretrained_lora_path` in `from_pretrained` method. 
 
 ```python
 from angle_emb import AnglE, Prompts
+from angle_emb.utils import cosine_similarity
 
 angle = AnglE.from_pretrained('NousResearch/Llama-2-7b-hf',
                               pretrained_lora_path='SeanLee97/angle-llama-7b-nli-v2',
                               pooling_strategy='last',
                               is_llm=True,
-                              torch_dtype='float16')
+                              torch_dtype='float16').cuda()
 
 print('All predefined prompts:', Prompts.list_prompts())
-vec = angle.encode({'text': 'hello world'}, to_numpy=True, prompt=Prompts.A)
-print(vec)
-vecs = angle.encode([{'text': 'hello world1'}, {'text': 'hello world2'}], to_numpy=True, prompt=Prompts.A)
-print(vecs)
+doc_vecs = angle.encode([
+    'The weather is great!',
+    'The weather is very good!',
+    'i am going to bed'
+], prompt=Prompts.A)
+
+for i, dv1 in enumerate(doc_vecs):
+    for dv2 in doc_vecs[i+1:]:
+        print(cosine_similarity(dv1, dv2))
+```
+
+
+### ⌛ Infer BiLLM-based Models
+[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1QJcA2Mvive4pBxWweTpZz9OgwvE42eJZ?usp=sharing)
+
+Specify `apply_billm` and `billm_model_class` to load and infer billm models
+
+
+```python
+from angle_emb import AnglE, Prompts
+from angle_emb.utils import cosine_similarity
+
+# specify `apply_billm` and `billm_model_class` to load billm models
+angle = AnglE.from_pretrained('NousResearch/Llama-2-7b-hf',
+                              pretrained_lora_path='SeanLee97/bellm-llama-7b-nli',
+                              pooling_strategy='last',
+                              is_llm=True,
+                              apply_billm=True,
+                              billm_model_class='LlamaForCausalMask',
+                              torch_dtype='float16').cuda()
+
+doc_vecs = angle.encode([
+    'The weather is great!',
+    'The weather is very good!',
+    'i am going to bed'
+], prompt='The representative word for sentence {text} is:"')
+
+for i, dv1 in enumerate(doc_vecs):
+    for dv2 in doc_vecs[i+1:]:
+        print(cosine_similarity(dv1, dv2))
+```
+### ⌛ Infer Espresso/Matryoshka Models
+[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1QJcA2Mvive4pBxWweTpZz9OgwvE42eJZ?usp=sharing)
+
+Specify `layer_index` and `embedding_size` to truncate embeddings.
+
+
+```python
+from angle_emb import AnglE
+from angle_emb.utils import cosine_similarity
+
+
+angle = AnglE.from_pretrained('mixedbread-ai/mxbai-embed-2d-large-v1', pooling_strategy='cls').cuda()
+# specify layer_index and embedding size to truncate embeddings
+doc_vecs = angle.encode([
+    'The weather is great!',
+    'The weather is very good!',
+    'i am going to bed'
+], layer_index=22, embedding_size=768)
+
+for i, dv1 in enumerate(doc_vecs):
+    for dv2 in doc_vecs[i+1:]:
+        print(cosine_similarity(dv1, dv2))
 ```
 
-### ⌛ Load Third-party Models
+### ⌛ Infer Third-party Models
 
 You can load any transformer-based third-party models such as `mixedbread-ai/mxbai-embed-large-v1`, `sentence-transformers/all-MiniLM-L6-v2`, and `BAAI/bge-large-en-v1.5` using `angle_emb`.
 
diff --git a/docs/notes/pretrained_models.rst b/docs/notes/pretrained_models.rst
index a4dd39c..aa0ef4a 100644
--- a/docs/notes/pretrained_models.rst
+++ b/docs/notes/pretrained_models.rst
@@ -24,9 +24,9 @@ LLM-based models:
 +------------------------------------+-----------------------------+------------------+--------------------------+------------------+---------------------------------+
 | 🤗 HF (lora weight)                | Backbone                    | Max Tokens       | Prompts                  | Pooling Strategy | Scenario                        |
 +====================================+=============================+==================+==========================+==================+=================================+
-| `SeanLee97/angle-llama-13b-nli`_   | NousResearch/Llama-2-13b-hf | 4096             | ``Prompts.A``            | last token       | English, Similarity Measurement |
+| `SeanLee97/angle-llama-13b-nli`_   | NousResearch/Llama-2-13b-hf | 4096             | ``Prompts.A``            | last             | English, Similarity Measurement |
 +------------------------------------+-----------------------------+------------------+--------------------------+------------------+---------------------------------+
-| `SeanLee97/angle-llama-7b-nli-v2`_ | NousResearch/Llama-2-7b-hf  | 4096             | ``Prompts.A``            | last token       | English, Similarity Measurement |
+| `SeanLee97/angle-llama-7b-nli-v2`_ | NousResearch/Llama-2-7b-hf  | 4096             | ``Prompts.A``            | last             | English, Similarity Measurement |
 +------------------------------------+-----------------------------+------------------+--------------------------+------------------+---------------------------------+
 
 .. _SeanLee97/angle-llama-13b-nli: https://huggingface.co/SeanLee97/angle-llama-13b-nli
diff --git a/docs/notes/quickstart.rst b/docs/notes/quickstart.rst
index 6d1012a..348d9fe 100644
--- a/docs/notes/quickstart.rst
+++ b/docs/notes/quickstart.rst
@@ -14,7 +14,7 @@ A few steps to get started with AnglE:
 
 Other installation methods, please refer to the `Installation` section.
 
-⌛ Load BERT-based Model
+⌛ Infer BERT-based Model
 ------------------------------------
 
 1) **With Prompts**: You can specify a prompt with `prompt=YOUR_PROMPT` in `encode` method.
@@ -65,7 +65,7 @@ You can use other placeholder names. We provide a set of predefined prompts in `
 
 
 
-⌛ Load LLM-based Models
+⌛ Infer LLM-based Models
 ------------------------------------
 
 If the pretrained weight is a LoRA-based model, you need to specify the backbone via `model_name_or_path` and specify the LoRA path via the `pretrained_lora_path` in `from_pretrained` method. 
@@ -78,7 +78,7 @@ If the pretrained weight is a LoRA-based model, you need to specify the backbone
                                 pretrained_lora_path='SeanLee97/angle-llama-7b-nli-v2',
                                 pooling_strategy='last',
                                 is_llm=True,
-                                torch_dtype='float16')
+                                torch_dtype='float16').cuda()
 
     print('All predefined prompts:', Prompts.list_prompts())
     vec = angle.encode({'text': 'hello world'}, to_numpy=True, prompt=Prompts.A)
@@ -86,3 +86,57 @@ If the pretrained weight is a LoRA-based model, you need to specify the backbone
     vecs = angle.encode([{'text': 'hello world1'}, {'text': 'hello world2'}], to_numpy=True, prompt=Prompts.A)
     print(vecs)
 
+
+⌛ Infer BiLLM-based Models
+------------------------------------
+
+Specify `apply_billm` and `billm_model_class` to load and infer billm models
+
+.. code-block:: python
+
+    from angle_emb import AnglE, Prompts
+    from angle_emb.utils import cosine_similarity
+
+    # specify `apply_billm` and `billm_model_class` to load billm models
+    angle = AnglE.from_pretrained('NousResearch/Llama-2-7b-hf',
+                                pretrained_lora_path='SeanLee97/bellm-llama-7b-nli',
+                                pooling_strategy='last',
+                                is_llm=True,
+                                apply_billm=True,
+                                billm_model_class='LlamaForCausalMask',
+                                torch_dtype='float16').cuda()
+
+    doc_vecs = angle.encode([
+        'The weather is great!',
+        'The weather is very good!',
+        'i am going to bed'
+    ], prompt='The representative word for sentence {text} is:"')
+
+    for i, dv1 in enumerate(doc_vecs):
+        for dv2 in doc_vecs[i+1:]:
+            print(cosine_similarity(dv1, dv2))
+
+
+
+⌛ Infer Espresso/Matryoshka Models
+------------------------------------
+
+Specify `layer_index` and `embedding_size` to truncate embeddings.
+
+.. code-block:: python
+
+    from angle_emb import AnglE
+    from angle_emb.utils import cosine_similarity
+
+
+    angle = AnglE.from_pretrained('mixedbread-ai/mxbai-embed-2d-large-v1', pooling_strategy='cls').cuda()
+    # specify layer_index and embedding_size to truncate embeddings
+    doc_vecs = angle.encode([
+        'The weather is great!',
+        'The weather is very good!',
+        'i am going to bed'
+    ], layer_index=22, embedding_size=768)
+
+    for i, dv1 in enumerate(doc_vecs):
+        for dv2 in doc_vecs[i+1:]:
+            print(cosine_similarity(dv1, dv2))
\ No newline at end of file