New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

数据集划分问题 #21

Open

cqcl1 opened this issue Feb 17, 2023 · 1 comment

cqcl1 commented Feb 17, 2023

请问数据集文件有dev、train和test，test是没标签，请问带有标签的测试集用来评估测试结果是哪个文件呢？dev文件是验证集吗？evaluate.py这个文件是做什么的呢？评估测试集结果和预测未知标签数据集都是这个吗？

Owner

gaohongkui commented Mar 23, 2023 •

edited

Loading

你好，已更新了这部分混乱的逻辑。
dev.json 文件用于模型选择最优 f1 的依据，是验证集。
带有标签的测试集，可以在配置文件 train_config 中配置测试集文件，并切换 run_type 为 eval。具体实验逻辑是

GlobalPointer_pytorch/train.py

Lines 291 to 295 in 64d5cb8

    
           elif config["run_type"] == "eval": 
        
               # 此处的 eval 是为了评估测试集的 p r f1（如果测试集有标签的情况），无标签预测使用 evaluate.py 
        
               model = load_model() 
        
               test_dataloader = data_generator(data_type="test") 
        
               valid(model, test_dataloader)

evaluate.py 是用于最终无标签数据集的评价，也即预测过程

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment