Methodology for automated COVID-19 literature search to identify AI-related articles.
We retrieve articles from the CORD-19 open research dataset, available at:
COVID-19 Open Research Dataset (CORD-19). 2020. Version 2020-08-04. Retrieved from https://www.semanticscholar.org/cord19/. Accessed 2020-08-04. doi:10.5281/zenodo.3715505
The query used by CORD-19 to retrieve results is as follows:
"COVID-19"[All Fields] OR ("coronavirus"[MeSH Terms] OR "coronavirus"[All Fields]) OR
"Corona virus"[All Fields] OR "2019-nCoV"[All Fields] OR "SARS-CoV"[All Fields] OR
"MERS-CoV"[All Fields] OR "Severe Acute Respiratory Syndrome"[All Fields] OR
"Middle East Respiratory Syndrome"[All Fields]
The latest metadataset can be accessed here. A historical dataset archive is available here. The preprint describing the dataset is available here.
We kept only articles containing either a title and an abstract, and only articles with a publication year of 2019 or 2020. We removed articles with a duplicate dataset ID, SHA, DOI, PubMed ID, URL, S2 ID, Arxiv ID, PMC ID, or WHO Covidence ID (these fields are described here). For the time period between 1st January 2020 and 1st August 2020, the search retrieved 36,292 COVID-related articles and 1,305 AI-related articles.