PubMedCentral (PMC) automatic PDF/XML Downloader

Bulk-Downloads full-text PDFs (or any other soruces) from PubmedCentral OA articles

How to use

After inputting your desired document identifiers into the file ids.txt, just execute the main.sh script. It will do the following:

Download the document index file from PMC
For each id check wether it is available; if it is, download the sources from PMCs ftp server
Extract the archives
Move and rename the pdfs (or any other source type if you want to adapt the copy_rename.sh scripts suffix constant, e.g. to nxml)
Remove the downloaded files

Your files will be in the pdf folder, named as PMC${ID}.${SUFFIX}

Depending on the number of requested ids, this process may take a while and may require several GBs of disk space

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
LICENSE		LICENSE
README.md		README.md
copy_rename.sh		copy_rename.sh
download_ids.sh		download_ids.sh
ids.txt		ids.txt
main.sh		main.sh