How to build a custom data set for Question Answering #2207
-
Hi community, |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
Hi @gabriead! To train an extractive QA model, you would need a context which contains the answer and the exact position of the answer inside this context. Therefore, you would need to map your question-answer pairs to a document containing the answer and extract the position of the answer. However, you might use your data to do open-domain evaluation, as this does not require to extract the exact position of an answer. Like this, you can check whether the existing models are already good enough for your use case such that you don't need to train a custom model. See this blog post for more information on evaluation. As to how many labels are needed to do reasonable training: This depends highly on your domain and how much your use case diverges from SQuAD. We have seen that models trained on SQuAD show very strong general question answering capabilities. Therefore, we’d recommend trying one of the off the shelf models before trying to adapt these models to your domain. |
Beta Was this translation helpful? Give feedback.
Hi @gabriead! To train an extractive QA model, you would need a context which contains the answer and the exact position of the answer inside this context. Therefore, you would need to map your question-answer pairs to a document containing the answer and extract the position of the answer. However, you might use your data to do open-domain evaluation, as this does not require to extract the exact position of an answer. Like this, you can check whether the existing models are already good enough for your use case such that you don't need to train a custom model. See this blog post for more information on evaluation.
As to how many labels are needed to do reasonable training: This depends …