Skip to content

XML to PDF Converter

Roberta Takenaka edited this page Dec 18, 2024 · 19 revisions

Description (MVP1)

The pdf_generator is a command line utility designed to convert XML files into PDF articles.

Technologies

  • Python 3.x
  • lxml
  • python-docx
  • LibreOffice

Features

  • Single Language Support: Generates PDFs in a single language for consistency.
  • Basic Tables: Supports tables occupying a single column, without merged cells.
  • Two-Column Layout: Adopts a standard two-column layout for text.
  • Section Styling: Automatically styles sections according to their hierarchy.
  • Simple Citations: Formats citations in a clean and straightforward style.
  • Headers and Footers:
    • First Page Header: Includes the journal name and the article DOI.
    • Subsequent Pages Header: Includes the journal name and the short article title.
  • Footers: Includes page numbers, issue details, and “cite as” (on the first page).
  • Intermediate Formats: Supports generation of .docx intermediate files.

Future versions

  • Web Interface version
  • Library version
  • New document structures
  • New pdf templates

Prerequisites

To use the XML to PDF converter, you must have LibreOffice version 24.2 installed. You can download it directly from this link or visit the LibreOffice website.

Installation

You can install packtools in two ways:

Linux

python3 -m venv .venv
source .venv/bin/activate
pip install packtools>=4.10.0

Windows

Create a new folder named scielo-packtools-v4.x

md scielo-packtools-v4.x

Access the folder named scielo-packtools-v4.x

cd scielo-packtools-v4.x

Create a virtual environment named env

python3 -m venv env

Activate the virtual environment

env\Scripts\activate

Install packtools

pip install packtools>=4.10.0

Deactivate the virtual environment

deactivate

Usage

Access the folder named scielo-packtools-v4.x

cd scielo-packtools-v4.x

Activate the virtual environment

env\Scripts\activate

To use the utility, you need to provide the path to the XML file and the desired output path for the PDF file. Optionally, you can provide a DOCX layout file for custom formatting. You can find a default layout file here. This file contains a set of predefined DOCX styles used to format the article content.

Command-Line Arguments

usage: pdf_generator [-h] -i PATH_TO_READ [-l LAYOUT] -o PATH_TO_WRITE

Convert XML file from SciELO format to PDF format.

optional arguments:
  -h, --help            show this help message and exit
  -i PATH_TO_READ, --xml_scielo PATH_TO_READ
                        Path for reading the SciELO XML file.
  -l LAYOUT, --layout LAYOUT
                        Path for reading the DOCX layout file.
  -o PATH_TO_WRITE, --pdf PATH_TO_WRITE
                        Path for writing the PDF file.

Example

pdf_generator -i path/to/article.xml -o path/to/article.pdf

If you have a custom DOCX layout file, you can include it as follows:

pdf_generator -i path/to/article.xml -l path/to/layout.docx -o path/to/article.pdf

Output:

Documento intermediário salvo em path/to/article.docx
convert /home/user/article.docx as a Writer document -> /home/user/article.pdf using filter : writer_pdf_Export

Screenshots

image Figure 1. XML file used as input.

image Figure 2. PDF file generated using the pdf_generator utility.

Clone this wiki locally