A tool to convert Powerpoint pptx file into markdown.
Preserved formats:
- Titles. Custom table of contents with fuzzy matching is supported.
- Lists with arbitrary depth.
- Text with bold, italic, color and hyperlink
- Pictures. They are extracted into image file and relative path is inserted.
- Tables with merged cells.
- Top-to-bottom then left-to-right block order.
Supported output:
- Markdown
- Tiddlywiki's wikitext
- Madoko
- Quarto
Please star this repo if you like it!
You need to have Python with version later than 3.10 and pip installed on your system, then run in the terminal:
pip install pptx2md
Once you have installed it, use the command pptx2md [pptx filename]
to convert pptx file into markdown.
The default output filename is out.md
, and any pictures extracted (and inserted into .md) will be placed in /img/
folder.
Note: older .ppt files are not supported, convert them to the new .pptx version first.
Upgrade & Remove:
pip install --upgrade pptx2md
pip uninstall pptx2md
By default, this tool parse all the pptx titles into level 1
markdown titles, in order to get a hierarchical table of contents, provide your predefined title list in a file and provide it with -t
argument.
This is a sample title file (titles.txt):
Heading 1
Heading 1.1
Heading 1.1.1
Heading 1.2
Heading 1.3
Heading 2
Heading 2.1
Heading 2.2
Heading 2.1.1
Heading 2.1.2
Heading 2.3
Heading 3
The first line with spaces in the begining is considered a second level heading and the number of spaces is the unit of indents. In this case, Heading 1.1
will be outputted as ## Heading 1.1
. As it has two spaces at the begining, 2 is the unit of heading indent, so Heading 1.1.1
with 4 spaces will be outputted as ### Heading 1.1.1
. Header texts are matched with fuzzy matching, unmatched pptx titles will be regarded as the deepest header.
Use it with pptx2md [filename] -t titles.txt
.
-t [filename]
provide the title file-o [filename]
path of the output file-i [path]
directory of the extracted pictures--image-width [width]
the maximum width of the pictures, in px. If set, images are put as html img tag.--disable-image
disable the image extraction--disable-escaping
do not attempt to escape special characters--disable-notes
do not add presenter notes--disable-wmf
keep wmf formatted image untouched (avoid exceptions under linux)--disable-color
disable color tags in HTML--enable-slides
deliniate slides\n---\n
, this can help if you want to convert pptx slides to markdown slides--try-multi-column
try to detect multi-column slides (very slow)--min-block-size [size]
the minimum number of characters for a text block to be outputted--wiki
/--mdk
if you happen to be using tiddlywiki or madoko, this argument outputs the corresponding markup language--qmd
outputs to the qmd markup language used for quarto powered presentations--page [number]
only convert the specified page--keep-similar-titles
keep similar titles and add "(cont.)" to repeated slide titles
Note: install wand for better chance of successfully converting wmf images, if needed.
Data Link Layer Design Issues
Services Provided to the Network Layer
Framing
Error Control & Flow Control
Error Detection and Correction
Error Correcting Code (ECC)
Error Detecting Code
Elementary Data Link Protocols
Sliding Window Protocols
One-Bit Sliding Window Protocol
Protocol Using Go Back N
Using Selective Repeat
Performance of Sliding Window Protocols
Example Data Link Protocols
PPP
- Top: Title list file content.
- Bottom: The table of contents generated.
- Left: Source pptx file.
- Right: Generated markdown file (rendered by madoko).
You can also use pptx2md programmatically in your Python code:
from pptx2md import convert, ConversionConfig
from pathlib import Path
# Basic usage
convert(
ConversionConfig(
pptx_path=Path('presentation.pptx'),
output_path=Path('output.md'),
image_dir=Path('img'),
disable_notes=True
)
)
The ConversionConfig
class accepts the same parameters as the command line arguments:
pptx_path
: Path to the input PPTX file (required)output_path
: Path for the output markdown file (required)image_dir
: Directory for extracted images (required)title_path
: Path to custom titles fileimage_width
: Maximum width for images in pxdisable_image
: Skip image extractiondisable_escaping
: Skip escaping special charactersdisable_notes
: Skip presenter notesdisable_wmf
: Skip WMF image conversiondisable_color
: Skip color tags in HTMLenable_slides
: Add slide delimiterstry_multi_column
: Attempt to detect multi-column slidesmin_block_size
: Minimum text block sizewiki
: Output in TiddlyWiki formatmdk
: Output in Madoko formatqmd
: Output in Quarto formatpage
: Convert only specified page numberkeep_similar_titles
: Keep similar titles with "(cont.)" suffix
- Text blocks are identified in two ways:
- Paragraphs marked as "body" placeholders in the slide
- Text shapes containing more than the minimum block size (configurable)
- Lists are generated when paragraphs in a block have different indentation levels
- Single-level paragraphs are output as regular text blocks
- Multi-column layouts can be detected with
--try-multi-column
flag - Grouped shapes are recursively flattened to process their contents
- Shapes are processed in top-to-bottom, left-to-right order
- When using custom titles:
- Fuzzy matching is used to match slide titles with the provided title list
- Matching score must be > 92 for a match to be accepted
- Unmatched titles default to the deepest header level
- Similar titles (matching score > 92) are omitted by default unless
--keep-similar-titles
is used
- Text formatting is preserved through markdown syntax:
- Bold text from PPT is converted to
**bold**
- Italic text is converted to
_italic_
- Hyperlinks are preserved as
[text](url)
- Bold text from PPT is converted to
- Color handling:
- Theme colors marked as "Accent 1-6" are preserved
- RGB colors are converted to HTML color codes
- Dark theme colors are converted to bold text
- Color tags can be disabled with
--disable-color
- Images:
- Extracted to specified image directory
- WMF images are converted to PNG when possible
- Image width can be constrained with
--image-width
- HTML img tags are used when width is specified
- Tables:
- Merged cells are supported
- Complex formatting within cells is preserved
- Special characters are escaped by default (can be disabled with
--disable-escaping
) - Presenter notes are included unless disabled with
--disable-notes