Merge pull request #72 from SeanLee97/feature/espresso

Feature/espresso
SeanLee97 · May 21, 2024 · 08984f3 · 08984f3
2 parents 75e3ce9 + 7ad543d
commit 08984f3
Show file tree

Hide file tree

Showing 20 changed files with 1,139 additions and 513 deletions.
diff --git a/.github/workflows/pytest.yml b/.github/workflows/pytest.yml
@@ -7,7 +7,7 @@ jobs:
     runs-on: ubuntu-latest
     strategy:
       matrix:
-        python-version: ["3.8", "3.9", "3.10", "3.11"]
+        python-version: ["3.8", "3.9", "3.10", "3.11", "3.12"]
     steps:
     - uses: actions/checkout@v3
     - name: Set up Python ${{ matrix.python-version }}

diff --git a/README.md b/README.md
diff --git a/README_2DMSE.md b/README_2DMSE.md
@@ -2,38 +2,16 @@
 
 > Paper: https://arxiv.org/abs/2402.14776
 
-# Usage
+"🪆 2D Matryoshka Sentence Embeddings" has been renamed to ☕️ "ESE: Espresso Sentence Embeddings". 
 
-**⚠️ The Document is Working in Progress!**
-
-
-Example:
-
-```bash
-WANDB_MODE=disabled CUDA_VISIBLE_DEVICES=0 angle-trainer \
---model_name_or_path WhereIsAI/UAE-Large-V1 \
---train_name_or_path data.jsonl --save_dir ckpts/custom-UAE-2dmse \
---w2 20.0 --w1 1. --w3 1. --angle_tau 20.0 --learning_rate 1e-5 --maxlen 128 \
---workers 16 \
---pooling_strategy all \
---epochs 1 \
---batch_size 16 \
---apply_tdmse 1 \
---fixed_teacher_name_or_path WhereIsAI/UAE-Large-V1 \
---logging_steps 1000 \
---warmup_steps 100 \
---is_llm 0 \
---save_steps 1000 --seed -1 --gradient_accumulation_steps 6 --fp16 1
-```
-
-The `--apply_tdmse 1` is required.
+Please find the document in [☕️ Espresso](README_Espresso.md)
 
 
 # Citation
 
 ```bibtex
 @article{li20242d,
-  title={2D Matryoshka Sentence Embeddings},
+  title={ESE: Espresso Sentence Embeddings},
   author={Xianming Li and Zongxi Li and Jing Li and Haoran Xie and Qing Li},
   journal={arXiv preprint arXiv:2402.14776},
   year={2024}

diff --git a/README_ESE.md b/README_ESE.md
@@ -0,0 +1,49 @@
+# Espresso Sentence Embeddings (previously known as 2DMSE)
+
+> Paper: https://arxiv.org/abs/2402.14776
+
+## Abstract
+
+High-quality sentence embeddings are fundamental in many natural language processing (NLP) tasks, such as semantic textual similarity (STS) and retrieval-augmented generation (RAG). 
+Nevertheless, most existing methods leverage fixed-length embeddings from full-layer language models, which lack the scalability to accommodate the diverse available resources across various applications.
+Viewing this gap, we propose a novel sentence embedding model $\mathrm{Espresso}$ $\mathrm{Sentence}$ $\mathrm{Embeddings}$ (ESE) with two learning processes. 
+First, the **learn-to-express** process encodes more salient representations to lower layers.
+Second, the **learn-to-compress** process compacts essential features into the initial dimensions using Principal Component Analysis (PCA).
+This way, ESE can scale model depth via the former process and embedding size via the latter.
+Extensive experiments on STS and RAG suggest that ESE can effectively produce high-quality embeddings with less model depth and embedding size, enhancing embedding inference efficiency.
+
+## How to train
+
+To enable espresso sentence embeddings (ESE), please specify `--apply_ese 1` and configure appropriate ESE hyperparameters via `--ese_kl_temperature float` and `--ese_compression_size integer`.
+
+Here is an training example:
+
+```bash
+WANDB_MODE=disabled CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --nproc_per_node=4 --master_port=1234 -m angle_emb.angle_trainer \
+--model_name_or_path WhereIsAI/UAE-Large-V1 \
+--train_name_or_path SeanLee97/nli_for_simcse --save_dir ckpts/UAE-Large-Espresso \
+--ibn_w 10.0 --cosine_w 0. --angle_w 1.0 --angle_tau 20.0 --learning_rate 1e-6 --maxlen 75 \
+--workers 16 \
+--pooling_strategy cls \
+--epochs 1 \
+--batch_size 128 \
+--logging_steps 100 \
+--warmup_steps 200 \
+--save_steps 1000 \
+--fp16 1 \
+--gradient_accumulation_steps 4 \
+--apply_ese 1 \
+--ese_compression_size 128 \
+--ese_kl_temperature 1.0
+```
+
+# Citation
+
+```bibtex
+@article{li20242d,
+  title={ESE: Espresso Sentence Embeddings},
+  author={Xianming Li and Zongxi Li and Jing Li and Haoran Xie and Qing Li},
+  journal={arXiv preprint arXiv:2402.14776},
+  year={2024}
+}
+```