Skip to content

Latest commit

 

History

History
45 lines (37 loc) · 1.22 KB

README.md

File metadata and controls

45 lines (37 loc) · 1.22 KB

PDF2TXT

PDF2TXT can be used to either convert a single .pdf file to a .txt file or all .pdf files in a given directory to .txt files.

alt text

Installation

when in the python 3 virtual environment:

To install PDF2TXT:

git clone https://github.com/NLPatVCU/PDF2TXT.git

You would also need to install the Haystack framework and milvus.

pip3 install pymilvus==1.0.0
pip3 install farm-haystack==1.0.0

If you experience any difficulties, try visiting their site: https://github.com/deepset-ai/haystack

Use

To convert a single file, run:

python3 pdf2txt.py -f <input_file_path>

To convert an entire directory, run:

python3 pdf2txt.py -d <input_directory_path>

To write output files into a specific directory, append with:

-o <output_directory_path>

License

This package is licensed under the GNU General Public License

Acknowledgments