Skip to content

XML to PDF Converter

Pitanga Innovare edited this page Dec 12, 2024 · 19 revisions

Description

The pdf_generator script is a utility designed to convert XML files into PDF articles using Python. Each XML file corresponds to a single PDF output. The script employs the lxml and python-docx libraries to extract data and populate a structured DOCX layout. Once the DOCX file is created, LibreOffice is utilized to convert it into a PDF document. For any questions or contributions, feel free to contact the maintainers at SciELO.

Installation

To install the necessary dependencies, follow these steps:

Using pip

pip install packtools

From source

git clone https://github.com/scieloorg/packtools.git
cd packtools
python setup.py install

Usage

To use the utility, you need to provide the path to the SciELO XML file and the desired output path for the PDF file. Optionally, you can provide a DOCX layout file for custom formatting. You can find a default layout file here. This file contains a set of predefined DOCX styles used to format the article content.

Command-Line Arguments

usage: pdf_generator [-h] -i PATH_TO_READ [-l LAYOUT] -o PATH_TO_WRITE

Convert XML file from SciELO format to PDF format.

optional arguments:
  -h, --help            show this help message and exit
  -i PATH_TO_READ, --xml_scielo PATH_TO_READ
                        Path for reading the SciELO XML file.
  -l LAYOUT, --layout LAYOUT
                        Path for reading the DOCX layout file.
  -o PATH_TO_WRITE, --pdf PATH_TO_WRITE
                        Path for writing the PDF file.

Example

pdf_generator -i path/to/article.xml -o path/to/article.pdf

If you have a custom DOCX layout file, you can include it as follows:

pdf_generator -i path/to/article.xml -l path/to/layout.docx -o path/to/article.pdf

Output:

Documento intermediário salvo em path/to/article.docx
convert /home/user/article.docx as a Writer document -> /home/user/article.pdf using filter : writer_pdf_Export

Screenshots

image Figure 1. XML file used as input.

image Figure 2. PDF file generated using the pdf_generator utility.

Clone this wiki locally