Summarise the number of word in each section of submitted articles on bioRxiv.
After data cleaning, a total of 42,348 submitted papers on bioRxiv were analyzed here (before Oct 15, 2019).
- ABSTRACT
[Bule vertical dashed lines indicate integer numbers from 150 to 400 with step = 50. Clear peaks were showed in these vertical lines.]
- It seems many authors were trying to delete some words to meet the criteria of journals before submitted.*
- INTRODUCTION
- METHOD
- RESULT
- DISCUSSION
- Number of REFERENCE
- Put all section together
[x-aixs was truncated at 50000]
Using mutilple linear regression, all sections expect ABSTRACT had impacts on the number of REFERENCE.As expected, the length of DISCUSSION has the largest impact on the number of REFERENCE.
https://github.com/Yiguan/crawl_bioRxiv2/blob/master/bioData_clean.txt