-
Notifications
You must be signed in to change notification settings - Fork 308
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
关于数据的问题 #16
Comments
你好,句子个数即为数据个数,如测试集行数为172448,即有172448个句子。另外,训练时我将和测试集中entity pair重复的部分去掉了得到的522611个训练样例。 |
好的,了解了,感谢! |
关于完全相同的行不太清楚,我是根据NYT10那份数据直接处理得到。 |
请问可以公布一下处理数据的源码吗? |
Can you plese share the processed data many thanks |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
您论文里提到训练集有522611句子、测试集有172448句子。但在您发布的data.zip文件中测试集行数为172448,但句子去重后为61707;训练集行数为570088,句子去重后为368099,即使句子+实体对+关系联合再去重后也是510415,而非522611。
请问是哪里出了问题?您论文中的“句子数量”指的是什么?
The text was updated successfully, but these errors were encountered: