PDF2TXT can be used to either convert a single .pdf file to a .txt file or all .pdf files in a given directory to .txt files.
when in the python 3 virtual environment:
To install PDF2TXT:
git clone https://github.com/NLPatVCU/PDF2TXT.git
You would also need to install the Haystack framework and milvus.
pip3 install pymilvus==1.0.0
pip3 install farm-haystack==1.0.0
If you experience any difficulties, try visiting their site: https://github.com/deepset-ai/haystack
To convert a single file, run:
python3 pdf2txt.py -f <input_file_path>
To convert an entire directory, run:
python3 pdf2txt.py -d <input_directory_path>
To write output files into a specific directory, append with:
-o <output_directory_path>
This package is licensed under the GNU General Public License