From 62392aebae989efe6ad0529819e08d04cac69078 Mon Sep 17 00:00:00 2001 From: Daoyuan Chen <67475544+yxdyc@users.noreply.github.com> Date: Thu, 26 Dec 2024 12:21:23 +0800 Subject: [PATCH] clearly point out the DJ format --- .../post_tuning_dialog/README.md | 19 ++++++++++++++++++- 1 file changed, 18 insertions(+), 1 deletion(-) diff --git a/tools/fmt_conversion/post_tuning_dialog/README.md b/tools/fmt_conversion/post_tuning_dialog/README.md index 5b88bceae..e3317ec21 100644 --- a/tools/fmt_conversion/post_tuning_dialog/README.md +++ b/tools/fmt_conversion/post_tuning_dialog/README.md @@ -76,4 +76,21 @@ For post tuning formats, we mainly consider 4 formats to support [ModelScope-Swi } ``` -In Data-Juicer, we pre-set fields to align with the last Query-Response format, which serves as our intermediate format for post-tuning dialog datasets. Correspondingly, we provide several tools to convert datasets in other formats to Query-Response format and vice versa. +In Data-Juicer, we pre-set fields to align with the last two formats (Alpaca and Query-Response), which serves as our intermediate format for post-tuning dialog datasets. Correspondingly, we provide several tools to convert datasets in other formats to Query-Response format and vice versa. + +- DJ default format for post-tuning OPs: + +```python +{ + "system": "", + "instruction": "", + "query": "", + "response": "", + "history": [ + [ + "", + "" + ] + ] +} +```