Trec Covid Dataset Term Project for CmpE493 Information Retrieval
data :
relevance-data :
Relevance data 1st column topic-id -- 2nd column not relevant -- 3rd column document-id -- 4th column relevancy 0/1/2
The field query-id is a alphanumeric sequence to identify the query. The second field, with "Q0" value, is currently ignored by trec_eval, just put it in the file. The field document-id is a alphanumeric sequence to identify the retrieved document. The field rank is an integer value which represents the document position in the ranking, but this field is also ignored by trec_eval. The field score can be an integer or float value to indicate the similarity degree between document and query, so the most relevants docs will have higher scores. The last field, with "STANDARD" value, is used only to identify this run (this name is also showed in the output), you can use any alphanumeric sequence.
We will use Mean Average Precision (MAP), Normalized Discounted Cumulative Gain (NDCG), and Precision of top 10 results (P@10) for evaluation. You should use the official evaluation tool available at Some relevant papers are provided below and more papers are avaiable at https://ir. I also suggest you to look for other relevant publications on the Web.
Shared Task for COVID-19.” Journal of the American Medical Informatics Association (2020). Available at
Test Collection.” ACM SIGIR Forum (2020). Available at covidSubmit/papers/Forum_TRECCOVID1.pdf.
TREC-COVID Information Retrieval Challenge.” medRxiv (2020). Available at https: //
for the COVID-19 Open Research Dataset.” Proceedings of the First Workshop on Scholarly 1Wang, Lucy Lu, et al. ”CORD-19: The Covid-19 Open Research Dataset.” ArXiv (2020). Document Processing. 2020. Available at 2020.sdp-1.5/.
Esteva, Andre, et al. ”Co-search: Covid-19 information retrieval with semantic search, question answering, and abstractive summarization.” arXiv preprint arXiv:2006.09595 (2020).
Available at