Skip to content

Commit

Permalink
Add mdoc reader
Browse files Browse the repository at this point in the history
This change introduces a reader for mdoc, a roff-derived semantic markup
language for manual pages. The two relevant contemporary implementations
of mdoc for manual pages are mandoc (https://mandoc.bsd.lv/), which
implements the language from scratch in C, and groff
(https://www.gnu.org/software/groff/), which implements it as roff macros.

mdoc has a lot of semantics specific to technical manuals that aren't
representable in Pandoc's AST. I've taken a cue from the mandoc HTML
output and many mdoc elements are encoded as Codes or Spans with classes
named for the mdoc macro that produced them.

Much like web browsers with HTML, mandoc attempts to produce best-effort
output given all kinds of weird and crappy mdoc input. Part of the
reason it's able to do this is it uses a very accommodating parse tree
and stateful output routines specialized to the output mode, and when it
encounters some macro it wasn't expecting, it can easily give up on
whatever it was outputting and output something else. I've encoded as
much flexibility as I reasonably could into the mdoc reader here, but I
don't know how to be as flexible as mandoc.

This branch has been developed almost exclusively against mandoc's
documentation and implementation of mdoc as a reference, and the
real-world manual pages tested against are those from the OpenBSD base
system. Of ~3500 manuals in mdoc format shipped with a fresh OpenBSD
install, 17 cause the mdoc reader to exit with a parse error. Any
further chasing of edge cases is deferred to future work.

Many of the tests in test/Tests/Readers/Mdoc.hs are derived directly
from mandoc's extensive regression tests.

[API change] Adds readMdoc to the public API
  • Loading branch information
silby committed Dec 5, 2024
1 parent 27c54af commit 827c40a
Show file tree
Hide file tree
Showing 13 changed files with 2,378 additions and 2 deletions.
5 changes: 5 additions & 0 deletions pandoc.cabal
Original file line number Diff line number Diff line change
Expand Up @@ -596,6 +596,7 @@ library
Text.Pandoc.Readers.EPUB,
Text.Pandoc.Readers.Muse,
Text.Pandoc.Readers.Man,
Text.Pandoc.Readers.Mdoc,
Text.Pandoc.Readers.FB2,
Text.Pandoc.Readers.DokuWiki,
Text.Pandoc.Readers.Ipynb,
Expand Down Expand Up @@ -707,6 +708,9 @@ library
Text.Pandoc.Readers.LaTeX.Parsing,
Text.Pandoc.Readers.LaTeX.SIunitx,
Text.Pandoc.Readers.LaTeX.Table,
Text.Pandoc.Readers.Mdoc.Lex,
Text.Pandoc.Readers.Mdoc.Macros,
Text.Pandoc.Readers.Mdoc.Standards,
Text.Pandoc.Readers.Typst.Parsing,
Text.Pandoc.Readers.Typst.Math,
Text.Pandoc.Readers.ODT.Base,
Expand Down Expand Up @@ -831,6 +835,7 @@ test-suite test-pandoc
Tests.Readers.Muse
Tests.Readers.Creole
Tests.Readers.Man
Tests.Readers.Mdoc
Tests.Readers.FB2
Tests.Readers.DokuWiki
Tests.Writers.Native
Expand Down
3 changes: 3 additions & 0 deletions src/Text/Pandoc/Readers.hs
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,7 @@ module Text.Pandoc.Readers
, readEPUB
, readMuse
, readMan
, readMdoc
, readFB2
, readIpynb
, readCSV
Expand Down Expand Up @@ -106,6 +107,7 @@ import Text.Pandoc.Readers.TWiki
import Text.Pandoc.Readers.Txt2Tags
import Text.Pandoc.Readers.Vimwiki
import Text.Pandoc.Readers.Man
import Text.Pandoc.Readers.Mdoc
import Text.Pandoc.Readers.CSV
import Text.Pandoc.Readers.CslJson
import Text.Pandoc.Readers.BibTeX
Expand Down Expand Up @@ -168,6 +170,7 @@ readers = [("native" , TextReader readNative)
,("rtf" , TextReader readRTF)
,("typst" , TextReader readTypst)
,("djot" , TextReader readDjot)
,("mdoc" , TextReader readMdoc)
]

-- | Retrieve reader, extensions based on format spec (format+extensions).
Expand Down
Loading

0 comments on commit 827c40a

Please sign in to comment.