diff --git a/README.md b/README.md
index 00ca765..083500a 100644
--- a/README.md
+++ b/README.md
@@ -1 +1,140 @@
-# LLM_Judge_ku
\ No newline at end of file
+# LLM Judge
+
+In this package, you can use Vicuna-Japanese questions and prompts to evaluate your models with LLM-as-a-judge.
+To automate the evaluation process, we prompt strong LLMs like GPT-4 to act as judges and assess the quality of the models' responses.
+
+## Contents
+- [Install](#install)
+- [Review Pre-Generated Model Answers and Judgments](#review-pre-generated-model-answers-and-judgments)
+## Install
+```
+git clone https://github.com/hitoshizuku7/LLM_Judge_ku.git
+cd LLM_Judge_ku
+pip install -e .
+pip install openai anthropic ray
+cd fastchat/llm_judge
+```
+
+
+### Evaluate a model on jp-bench (Vicuna-Japanese)
+
+#### Step 1. Generate model answers to jp-bench questions
+```
+python gen_model_answer.py \
+--base_model [MODEL-PATH] \
+--lora_model [LORA-PATH] \
+--model-id [MODEL-ID] \
+--with_prompt \
+--gpus [GPU_Num] \
+--max_new_tokens [NUM of NEW TOKENS] \
+--benchmark jp_bench
+```
+Arguments:
+  - `[MODEL-PATH]` is the path to the weights, which can be a local folder or a Hugging Face repo ID.
+  - `[LORA-PATH]` is the path to the lora weights if needed.
+  - `[MODEL-ID]` is a name you give to the model.
+  - `[GPU_Num]` denotes which GPU you decide to use
+
+
+e.g.,
+```
+python gen_model_answer.py \
+--model-path rinna/japanese-gpt-neox-3.6b-instruction-ppo \
+--model-id rinna-3.6b-ppo \
+--with_prompt \
+--gpus 0 \
+--max_new_tokens 2048 \
+--benchmark jp_bench
+```
+The answers will be saved to `data/jp_bench/model_answer/[MODEL-ID].jsonl`.
+
+You can also specify `--num-gpus-per-model` for model parallelism (needed for large 65B models) and `--num-gpus-total` to parallelize answer generation with multiple GPUs.
+
+#### Step 2. Generate GPT-4 judgments
+There are several options to use GPT-4 as a judge, such as pairwise winrate and single-answer grading.
+In MT-bench, we recommond single-answer grading as the default mode.
+This mode asks GPT-4 to grade and give a score to model's answer directly without pairwise comparison.
+For each turn, GPT-4 will give a score on a scale of 10. We then compute the average score on all turns.
+
+```
+OPENAI_API_KEY=[YOUR-KEY] python -B gen_judgment.py \
+--bench-name "jp_bench" \
+--mode [pairwise-all, single, pairwise-baseline] \
+--model-list [LIST-OF-MODEL-ID] \
+--parallel [num-concurrent-api-call]
+```
+
+e.g.,
+```
+OPENAI_API_KEY=[YOUR-KEY] python -B gen_judgment.py \
+--bench-name "jp_bench" \
+--mode single \
+--model-list rinna-3.6b rinna-3.6b-ppo \
+--parallel 2
+```
+The judgments will be saved to `data/jp_bench/model_judgment/gpt-4_single.jsonl`
+
+#### Step 3. Show jp-bench scores
+
+- Show the scores for selected models
+  ```
+  python show_result.py \
+  --bench-name "jp_bench" \
+  --mode single \
+  --model-listrinna-3.6b rinna-3.6b-ppo 
+  ```
+- Show all scores
+  ```
+  python show_result.py
+  ```
+
+---
+
+### Other grading options
+Besides score-based single-answer grading, we also support two additional grading options based on win rates:
+- `pariwise-baseline`: run pairwise comparison against a baseline model.
+- `pairwise-all`: run pairwise comparison between all model pairs on all questions.
+
+#### Option 2: pairwise comparison against a baseline (default: gpt-3.5-turbo)
+
+- Generate GPT-4 judgments
+```
+OPENAI_API_KEY=[YOUR-KEY] python -B gen_judgment.py \
+--bench-name "jp_bench" \
+--mode pairwise-baseline \
+--model-list rinna-3.6b rinna-3.6b-ppo \
+--parallel 2
+```
+The judgments will be saved to `data/jp_bench/model_judgment/gpt-4_pair.jsonl`
+
+- Show results
+```
+python show_result.py \
+--bench-name "jp_bench" \
+--mode pairwise-baseline
+```
+
+#### Option 3: Run GPT-4 judge with all pair comparisons
+
+Another option is to run pairwise comparisons on all possible pairs.
+This could be more expensive when #models increases, but it gives you a more comprehensive information.
+
+```
+OPENAI_API_KEY=[YOUR-KEY] python -B gen_judgment.py \
+--bench-name "jp_bench" \
+--mode pairwise-all \
+--model-list [LIST-OF-MODEL-ID] \
+--parallel [num-concurrent-api-call]
+```
+
+```
+python show_result.py \
+--bench-name "jp_bench" \
+--mode pairwise-all
+```
+
+
+## Sample Outputs
+```
+Question: 
+```
diff --git a/fastchat/llm_judge/README.md b/fastchat/llm_judge/README.md
deleted file mode 100644
index ebd0c7d..0000000
--- a/fastchat/llm_judge/README.md
+++ /dev/null
@@ -1,174 +0,0 @@
-# LLM Judge
-
-In this package, you can use Vicuna-Japanese questions and prompts to evaluate your models with LLM-as-a-judge.
-To automate the evaluation process, we prompt strong LLMs like GPT-4 to act as judges and assess the quality of the models' responses.
-
-## Contents
-- [Install](#install)
-- [Review Pre-Generated Model Answers and Judgments](#review-pre-generated-model-answers-and-judgments)
-## Install
-```
-git clone https://github.com/hitoshizuku7/LLM_Judge_ku.git
-cd LLM_Judge_ku
-pip install -e .
-pip install openai anthropic ray
-cd fastchat/llm_judge
-```
-
-
-### Evaluate a model on jp-bench (Vicuna-Japanese)
-
-#### Step 1. Generate model answers to jp-bench questions
-```
-python gen_model_answer.py \
---base_model [MODEL-PATH] \
---lora_model [LORA-PATH] \
---model-id [MODEL-ID] \
---with_prompt \
---gpus [GPU_Num] \
---max_new_tokens [NUM of NEW TOKENS] \
---benchmark jp_bench
-```
-Arguments:
-  - `[MODEL-PATH]` is the path to the weights, which can be a local folder or a Hugging Face repo ID.
-  - `[LORA-PATH]` is the path to the lora weights if needed.
-  - `[MODEL-ID]` is a name you give to the model.
-  - `[GPU_Num]` denotes which GPU you decide to use
-
-
-e.g.,
-```
-python gen_model_answer.py \
---model-path rinna/japanese-gpt-neox-3.6b-instruction-ppo \
---model-id rinna-3.6b-ppo \
---with_prompt \
---gpus 0 \
---max_new_tokens 2048 \
---benchmark jp_bench
-```
-The answers will be saved to `data/jp_bench/model_answer/[MODEL-ID].jsonl`.
-
-You can also specify `--num-gpus-per-model` for model parallelism (needed for large 65B models) and `--num-gpus-total` to parallelize answer generation with multiple GPUs.
-
-#### Step 2. Generate GPT-4 judgments
-There are several options to use GPT-4 as a judge, such as pairwise winrate and single-answer grading.
-In MT-bench, we recommond single-answer grading as the default mode.
-This mode asks GPT-4 to grade and give a score to model's answer directly without pairwise comparison.
-For each turn, GPT-4 will give a score on a scale of 10. We then compute the average score on all turns.
-
-```
-OPENAI_API_KEY=[YOUR-KEY] python -B gen_judgment.py \
---bench-name "jp_bench" \
---mode [pairwise-all, single, pairwise-baseline] \
---model-list [LIST-OF-MODEL-ID] \
---parallel [num-concurrent-api-call]
-```
-
-e.g.,
-```
-OPENAI_API_KEY=[YOUR-KEY] python -B gen_judgment.py \
---bench-name "jp_bench" \
---mode single \
---model-list rinna-3.6b rinna-3.6b-ppo \
---parallel 2
-```
-The judgments will be saved to `data/jp_bench/model_judgment/gpt-4_single.jsonl`
-
-#### Step 3. Show jp-bench scores
-
-- Show the scores for selected models
-  ```
-  python show_result.py \
-  --bench-name "jp_bench" \
-  --mode single \
-  --model-listrinna-3.6b rinna-3.6b-ppo 
-  ```
-- Show all scores
-  ```
-  python show_result.py
-  ```
-
----
-
-### Other grading options
-Besides score-based single-answer grading, we also support two additional grading options based on win rates:
-- `pariwise-baseline`: run pairwise comparison against a baseline model.
-- `pairwise-all`: run pairwise comparison between all model pairs on all questions.
-
-#### Option 2: pairwise comparison against a baseline (default: gpt-3.5-turbo)
-
-- Generate GPT-4 judgments
-```
-OPENAI_API_KEY=[YOUR-KEY] python -B gen_judgment.py \
---bench-name "jp_bench" \
---mode pairwise-baseline \
---model-list rinna-3.6b rinna-3.6b-ppo \
---parallel 2
-```
-The judgments will be saved to `data/jp_bench/model_judgment/gpt-4_pair.jsonl`
-
-- Show results
-```
-python show_result.py \
---bench-name "jp_bench" \
---mode pairwise-baseline
-```
-
-#### Option 3: Run GPT-4 judge with all pair comparisons
-
-Another option is to run pairwise comparisons on all possible pairs.
-This could be more expensive when #models increases, but it gives you a more comprehensive information.
-
-```
-OPENAI_API_KEY=[YOUR-KEY] python -B gen_judgment.py \
---bench-name "jp_bench" \
---mode pairwise-all \
---model-list [LIST-OF-MODEL-ID] \
---parallel [num-concurrent-api-call]
-```
-
-```
-python show_result.py \
---bench-name "jp_bench" \
---mode pairwise-all
-```
-
-
-## Agreement Computation
-We released 3.3K human annotations for model responses generated by 6 models in response to 80 MT-bench questions. The dataset is available at [lmsys/mt_bench_human_judgments](https://huggingface.co/datasets/lmsys/mt_bench_human_judgments).
-You can use this data to compute the agreement between human and GPT-4.
-
-### Download data
-
-```
-wget https://huggingface.co/datasets/lmsys/mt_bench_human_judgments/resolve/main/human_judgments.json
-wget https://huggingface.co/datasets/lmsys/mt_bench_human_judgments/resolve/main/gpt4_pair_judgments.json
-```
-
-### Compute the agreement between human and GPT-4
-
-```
-python compute_agreement.py --judges gpt4-pair human --votefiles human_judgments.json gpt4_pair_judgments.json
-```
-
-## Release Plan
-Our current release contains:
-- The MT-bench questions, prompts, pre-generated answers, and pre-generated judgments.
-- The 3K expert-level human annotations.
-
-The next release will include:
-- 30K arena conversations with human votes
-
-## Citation
-
-If you find the repository helpful for your study, please consider citing the following [paper](https://arxiv.org/abs/2306.05685): "Judging LLM-as-a-judge with MT-Bench and Chatbot Arena":
-```
-@misc{zheng2023judging,
-      title={Judging LLM-as-a-judge with MT-Bench and Chatbot Arena}, 
-      author={Lianmin Zheng and Wei-Lin Chiang and Ying Sheng and Siyuan Zhuang and Zhanghao Wu and Yonghao Zhuang and Zi Lin and Zhuohan Li and Dacheng Li and Eric. P Xing and Hao Zhang and Joseph E. Gonzalez and Ion Stoica},
-      year={2023},
-      eprint={2306.05685},
-      archivePrefix={arXiv},
-      primaryClass={cs.CL}
-}
-```