Rank | TeamName | Organization | Acc-score |
---|---|---|---|
1 | HUMAN | NLPCC | 86.33 |
2 | BIT-Event | Beijing Institute of Technology | 79.13 |
3 | happy | Northeastern University | 77.98 |
4 | Real | Waseda University | 77.40 |
5 | Alphonse | Harbin Institute of Technology, Shenzhen | 76.22 |
6 | ZM | CUG | 76.01 |
7 | The Brainiacs | 中国人民大学 | 75.00 |
8 | guagua | Tianjin University | 74.95 |
9 | JHX | China University of Geosciences (Beijing) | 74.91 |
10 | CC | Renmin University | 74.88 |
11 | CUGB1012 | China University of Geosciences | 74.65 |
12 | ZZN | 中国地质大学 | 74.53 |
13 | 52wtq | NanChang Hangkong University | 74.25 |
14 | buaa_iip | Beihang University | 73.28 |
15 | IE_X | 北京理工大学 | 57.15 |
16 | BaselineSystem | NLPCC | 70.52 |
Rank | TeamName | Organization | Acc-score |
---|---|---|---|
1 | HUMAN | NLPCC | 86.33 |
2 | BIT-Event | Beijing Institute of Technology | 79.10 |
3 | happy | Northeastern University | 76.50 |
4 | guagua | Tianjin University | 76.08 |
5 | Alphonse | Harbin Institute of Technology, Shenzhen | 76.05 |
6 | CC | Renmin University | 74.72 |
7 | The Brainiacs | 中国人民大学 | 74.42 |
8 | CUGB1012 | China University of Geosciences | 74.35 |
9 | 52wtq | NanChang Hangkong University | 73.73 |
10 | JHX | China University of Geosciences (Beijing) | 73.68 |
11 | Real | Waseda University | 73.38 |
12 | buaa_iip | Beihang University | 73.18 |
13 | IE_X | 北京理工大学 | 67.22 |
14 | BaselineSystem | NLPCC | 70.85 |
Rank | TeamName | Organization | Acc-score |
---|---|---|---|
1 | HUMAN | NLPCC | 86.33 |
2 | BIT-Event | Beijing Institute of Technology | 79.12 |
3 | happy | Northeastern University | 77.24 |
4 | Alphonse | Harbin Institute of Technology, Shenzhen | 76.14 |
5 | guagua | Tianjin University | 75.52 |
6 | Real | Waseda University | 75.39 |
7 | CC | Renmin University | 74.80 |
8 | The Brainiacs | 中国人民大学 | 74.71 |
9 | CUGB1012 | China University of Geosciences | 74.50 |
10 | JHX | China University of Geosciences (Beijing) | 74.30 |
11 | 52wtq | NanChang Hangkong University | 73.99 |
12 | ZM | CUG | 73.43 |
13 | buaa_iip | Beihang University | 73.23 |
14 | ZZN | 中国地质大学 | 72.69 |
15 | IE_X | 北京理工大学 | 62.19 |
16 | BaselineSystem | NLPCC | 70.69 |
Sub-events identification is a very fundamental problem in the field of information extraction, especially in emergency situations (e.g., terrorist attacks). It is challenging for two reasons:
- data confusing and imbalance. Events usually evolve rapidly and successive sub-events occur. Only a few target sub-events data need to be identifid from the large volume of events related data.
- low resource. Usually only a limited amount of labelled seed data is given for learning and more annotating datasets are expensive and time consuming.
However, the existing works cannot fully meet the requirements, and thus better few shot learning and data selection models for sub-event identification are crucial.
The goal of this task is to build an IE system (Information Extraction system) that can quickly adapt to a new occurring sub-event. Specifically, there are two settings of this task:
- Given a large number of event-related corpus and a few labelled seed data, the task aims to build an IE system which may identify the target sub-events.
- Besides the machine learning model designing, annotating data selected from the unlabeled corpus is also allowed, but the size of the labelled data from the unlabeled corpus is fixed. How to select the best data to annotate and supply training dataset is also an important step in this task.
The task settings are very practical, thus the proposed solutions may generalize well in real world applications.
Note:
- Three sub-event would be presented in the task and will be released with the seed dataset.
- Human annotation and correction are allowed for training dataset which is composed of seedset and data annotated by participants from unlabeled corpus.
- The size of data annotated by participants may not exceed 100 per sub-event.
All corpus provided are obtained from comments (generally 8 to 120 characters long). The corpus are split into three parts, i.e., unlabeled dataset, seed dataset and testing dataset. The labelled seed dataset(100 samples per event) and unlabeled dataset(100K for 3 events) are released to participants to construct their own training set and developing set, and the testing dataset(around 2k per event) is used for final evaluation.
More details about these three datasets are as follows:
- Unlabelled dateset: totally 100,000 samples related to the three sub-events.
- Seed dataset: 100 labeled samples per sub-event.
- Test dataset : 2000 labeled samples per sub-event.
Tip: Testset will be released untill 2021/6/5.
For submission, please write the prediction result into a single file and email it to Xingyu Bai [email protected]
There are two settings for this evaluation task and the final evaluation is the average accuracy of two settings;
- few sample problem setting: the size of training data may not exceed 100 per sub-event, human annotation is not allowed.
- data selection problem setting: human annotation is allowed. The size of training data which is composed of seedset and data annotated from unlabeled corpus may not exceed 200.
The format of submission file should be the same as the format of given submission-format.txt file under taskdata folder. To be specific, each sample in the test dataset is labelled by 3, 2, 1 and 0, which represent Trade War, Tokyo Olympic Games, COVID-19 and negative samples, respectively.
An eval.py script is provided to calculate the accuracy and verify prediction format.
This task will award prizes for top 3 teams. Winners will get the award certificates issued by NLPCC and CCF Technical Committee on Chinese Information Technology.
First prize: 5000 RMB + award certificate
Second prize: 3000 RMB + award certificate
Third prize: 1000 RMB + award certificate
Further arrangement and baseline system will be published in https://github.com/IIGROUP/AutoIE2
Xuefeng Yang
email: [email protected]
Weigang Guo
email: [email protected]
Xingyu Bai (Tsinghua University, Shenzhen International Graduate School)
email: [email protected]
Yujiu Yang (Tsinghua University, Shenzhen International Graduate School)
email: [email protected]