Skip to content

Latest commit

 

History

History
2241 lines (1785 loc) · 107 KB

readme.md

File metadata and controls

2241 lines (1785 loc) · 107 KB

wmk

wmk is a flexible and versatile static site generator written in Python.

Main features

The following features are present in several static site generators (SSGs); you might almost call them standard:

  • Markdown or HTML content with YAML metadata in the frontmatter.
  • Support for themes.
  • Sass/SCSS support (via either libsass or Dart Sass).
  • Can generate a search index for use by lunr.js.
  • Shortcodes for more expressive and extensible content.

The following features are among the ones that set wmk apart:

  • By default, the content is rendered using Mako, a template system which makes all the resources of Python easily available to you. However Jinja2 templates are also supported if that is what you prefer.
  • "Stand-alone" templates – i.e. templates that are not used for presenting markdown-based content – are also rendered if present. This can e.g. be used for list pages or content based on external sources (such as a database).
  • Additional data for the site may be loaded from separate YAML files ­ or even (with a small amount of Python/Mako code) from other data sources such as CSV files, SQL databases or REST/graphql APIs.
  • The shortcode system is quite powerful and flexible. For instance, among the default shortcodes are an image thumbnailer and a page list component. A shortcode is just a template, so you can easily build your own.
  • Optional support for the powerful Pandoc document converter, for the entire site or on a page-by-page basis. This gives you access to such features as LaTeX math markup and academic citations, as well as to Pandoc's well-designed filter system for extending markdown. Pandoc also enables you to export your content to other formats (such as PDF) in addition to HTML, if you so wish.
  • Also via Pandoc, support for several non-markdown input formats for content, namely LaTeX, Org, RST, Textile, Djot, Typst, man, JATS, TEI, Docbook, RTF, DOCX, ODT and EPUB.

The only major feature that wmk is missing compared to some other SSGs is tight integration with a Javascript assets pipeline and interaction layer. Although wmk allows you to configure virtually any assets processing you like, this nevertheless means that if your site is reliant upon React, Vue or similar, then other options are probably more convenient.

That exception aside, wmk is suitable for building any small or medium-sized static website (up to a couple of thousand pages, depending on the content).

Installation

Method 1: git + pip

Clone this repo into your chosen location ($myrepo) and install the necessary Python modules into a virtual environment:

cd $myrepo
python3 -m venv venv
. venv/bin/activate
pip install -r requirements.txt

After that, either put $myrepo/bin into your $PATH or create a symlink from somewhere in your $PATH to $myrepo/bin/wmk.

Required software (aside from Python, of course):

  • rsync (for static file copying).
  • For wmk watch functionality (as well as watch-serve), you need either inotifywait or fswatch to be installed and in your $PATH. If both are available, the former is preferred.

wmk requires a Unix-like environment. In particular, bash must be installed in /bin/bash, and the directory separator is assumed to be /.

Method 2: Homebrew

If you are on MacOS and already have Homebrew, this is the easiest installation method.

First add the tap to your repositories:

brew tap bk/wmk

Then install wmk from it:

brew install --build-from-source wmk

Method 3: Docker

If you are neither on a modern Linux system nor on MacOS with Homebrew, it may be a better option for you to run wmk via Docker. In that case, after cloning the repo (or simply copying the Dockerfile from it) you can give the command

docker build -t wmk .

in the directory containing the Dockerfile, in order to build an image called wmk. You can then run the various wmk subcommands via Docker, for instance

docker run --rm --volume $(pwd):/data --user $(id -u):$(id -g) wmk b .

to build the wmk project in the current directory, or

docker run --rm -i -t --volume $(pwd):/data --user $(id -u):$(id -g) -p 7007:7007 wmk ws . -i 0.0.0.0

to watch for changes in the current directory and run a webserver for the built files.

Obviously, such commands can be unwieldy, so if you run them regularly you may want to create aliases or wrappers for them.

Usage

The wmk command structure is wmk <action> <base_directory>. The base directory is of course the directory containing the source files for the site. (They are actually in subdirectories such as templates, content, etc. – see the "File organization" section below).

  • wmk info $basedir: Shows the real path to the location of wmk.py and of the content base directory. E.g. wmk info .. Synonyms for info are env and debug.

  • wmk init $basedir: In a folder which contains content/ (with markdown or HTML files) but no wmk_config.yaml, creates some initial templates as well as a sample wmk_config.yaml, thus making it quicker for you to start a new project.

  • wmk build $basedir [-q|--quick]: Compiles/copies files into $basedir/htdocs. If -q or --quick is specified as the third argument, only files considered to have changed, based on timestamp checking, are processed. Synonyms for run are run, b and r.

  • wmk watch $basedir: Watches for changes in the source directories inside $basedir and recompiles if changes are detected. (Note that build is not performed automatically before setting up file wathcing, so you may want to run that first). A synonym for watch is w.

  • wmk serve $basedir [-p|--port <portnum>] [-i|--ip <ip-addr>]: Serves the files in $basedir/htdocs on http://127.0.0.1:7007/ by default. The IP and port can be modified with the -p and -i switches or be be configured via wmk_config.yaml – see the "Configuration file" section). Synonyms for serve are srv and s.

  • wmk watch-serve $basedir [-p|--port <portnum>] [-i|--ip <ip-addr>]: Combines watch and serve in one command. Synonym: ws.

  • wmk clear-cache $basedir: Remove the HTML rendering cache, which is a SQLite file in $basedir/tmp/. This should only be necessary in case of changed shortcodes or shortcode dependencies. Note that the cache can be disabled in wmk_config.yaml by setting use_cache to false, or on file-by-file basis via a frontmatter setting (no_cache). A synonym for clear-cache is c.

  • wmk preview $basedir $filename where $filename is the name of a file relative to the content subdirectory of $basedir. This prints (to stdout) the HTML which the given file will be converted to (before it is passed to the template and before potential post-processing). Example: wmk preview . index.md.

  • wmk admin $basedir: Build the site and then start wmkAdmin, which must have been installed beforehand into the admin subdirectory of the $basedir (or into the subdirectory specified with wmk admin $basedir $subdir). The subdirectory may be a symbolic link pointing to a central instance. wmkAdmin allows you to manage the content of the site via a web interface. It is not designed to allow you to install or modify themes or perform tasks that require more technical knowledge, and works best for a standard site based on Markdown or HTML files in the content directory.

  • wmk repl $basedir: Launch a Python shell (ipython, bpython or python3, in order of preference) with the wmk environment loaded and with the $basedir as current working directory. Useful for examining wmk's view of the site content or debugging MDContent filtering methods. For these purposes, from wmk import get_content_info, followed by content = get_content_info('.') is often a good start.

  • wmk pip <pip-command>: Run pip in the virtual environment used by wmk. Mainly useful for installing or upgrading Python modules that you want to use in Python files belonging to your projects.

  • wmk homedir: Outputs the path to wmk's installation directory. May be useful in shell scripts.

File organization

Inside a given working directory, wmk assumes the following subdirectories for content and output. They will be created if they do not exist:

  • htdocs: The output directory. Rendered, processed or copied content is placed here, and wmk serve will serve files from this directory.

  • templates: Mako templates (or Jinja2 templates if jinja2_templates is set to true in wmk_config.yaml). Templates with the extension .mhtml (.html if Jinja2 templates are being used) are rendered directly into htdocs as .html files (or another extension if the filename ends with .$ext\.mhtml/$ext\.html, where $ext is a string consisting of 2-4 alphanumeric characters), unless their filename starts with a dot or underscore or contains the string base, or if they are inside a subdirectory named base. For details on context variables received by such stand-alone templates, see the "Context variables" section below.

  • content: typically markdown (*.md) and/or HTML (*.html*) content with YAML metadata, although other formats are also supported. For a full list, see the "Input formats" section above.

    • Markdown (or other supported content) will be converted into HTML and then "wrapped" in a layout using the template specified in the metadata or md_base.mhtml by default.
    • HTML files inside content are assumed to be fragments rather than complete documents. Accordingly, they will be wrapped in a layout just like the converted markdown. In general, such content is treated just like markdown files except that the markdown-to-html conversion step is skipped. For instance, shortcodes can be used normally, although they may not work as expected if they return markdown rather than HTML. (Complete HTML documents are best placed in static rather than content).
    • The YAML metadata may be (a) at the top of the md/html document itself, inside a frontmatter block delimited by ---; (b) in a separate file with the same filename as the content file, but with an extra .yaml extension added; or (c) it may be in index.yaml files which are inherited by subdirectories and the files contained in them. For details, see the "Site, page and nav variables" section below.
    • The target filename will be index.html in a directory corresponding to the basename of the source file – unless pretty_path in the metadata is false or the name of the file itself is index.md or index.html (in which case the relative path is remains the same, except that the extension is of course changed to .html if the source is a markdown file).
    • The processed content will be passed to the template as a string in the context variable CONTENT, along with other metadata.
    • A YAML datasource can be specified in the metadata block as LOAD; the data in this file will be added to the context. For further details on the context variables, see the "Context variables" section.
    • Files that have other extensions than .md, .html or .yaml will be copied directly over to the (appropriate subdirectory of the) htdocs directory. This is so as to enable "bundling", i.e. keeping images and "attachments" together with related markdown files.
  • data: YAML files for additional metadata. May be referenced in frontmatter data or used by templates. Other data files (CSV, SQLite, etc.) should typically also be placed here.

  • py: Directory for Python files. This directory is automatically added to the front of sys.path before Mako or Jinja2 is initialized, meaning that templates can import modules placed here. Implicit imports (for Mako only) are possible by setting mako_imports in the config file (see the "Configuration file" section). There are also two special files that may be placed here: wmk_autolaod.py in your project, and wmk_theme_autoload.py in the theme's py/ directory. If one or both of these is present, wmk imports a dict named autoload from them. This means that you can assign PREPROCESS and POSTPROCESS page actions by name (i.e. keys in the autoload dict) rather than as function references, which in turn makes it possible to specify them in the frontmatter directly rather than having to do it via a shortcode. (For more on PRE- and POSTPROCESS, see the "Site, page and nav variables" section).

  • assets: Assets for an asset pipeline. The only default handling of assets involves compiling SCSS/Sass files in the subdirectory scss. They will be compiled to CSS which is placed in the target directory htdocs/css. Other assets handling can be configured via settings in the configuration file, e.g. assets_commands and assets_fingerprinting. This will be described in more detail in the "Site, page and nav variables" section. Also take note of the fingerprint template filter, described in the "Template filters" section.

  • static: Static files. Everything in here will be rsynced directly over to htdocs.

Input formats

The format of the files in the content/ directory is determined on the basis if their file extension. The following extensions are recognized by default:

  • .md, .mdwn, .mdown, .markdown, .gfm, .mmd: Markdown files. If Pandoc is being used, the input formats .gfm and .mmd will be assumed to be gfm (GitHub-flavored markdown) and markdown_mmd (MultiMarkdown), respectively. Note, however, that currently non-YAML metadata given in MultiMarkdown format is not picked up automatically in .mmd files).

  • .htm, .html: HTML. These are typically not standalone HTML documents but will be "wrapped" by the configured layout template. Like other input files, they may have a YAML frontmatter block.

  • .tex: LaTeX format. Currently ConTeXt is not supported.

  • .org: Org-mode format.

  • .rst: ReStructured Text format (RST).

  • .pod (as well as .pm and .pl): Perl POD (Plain Old Documentation).

  • .textile: Textile markup format.

  • .dj: The Djot lightweight markup format.

  • .man: Roff man format.

  • .rtf: Rich Text Format (RTF).

  • .typ: Typst format.

  • .jats, .xml: The XML-based JATS (Journal Article Tag Suite) format.

  • .docbook: The XML-based DocBook format.

  • .tei: The Simple variant of the XML-based TEI (Text Encoding Initiative) format.

  • .docx: MS Word DOCX a.k.a. "Office Open XML" format.

  • .odt: OpenDocument Text format.

  • .epub: The EPUB e-book format.

Pandoc is turned on automatically for all non-markdown, non-HTML formats in the above list. In order to use such content, a sufficiently recent version of Pandoc therefore must be installed. Full support for the above list requires Pandoc version 3.6.2 or later.

The list of input formats and how they are handled is configurable through the content_extensions setting in the config file. See the "Configuration file" section below for details.

Note: The three formats JATS, DocBook and TEI are all XML-based. Files in all three formats would therefore often use the generic .xml extension. However, wmk currently assumes that .xml implies that the JATS format is intended. If you want to force wmk to handle a file with that extension as DocBook or TEI, you would have to add an external YAML metadata file with pandoc_input_format set to the appropriate value.

In-file YAML frontmatter is supported for all of the above except for the three binary formats DOCX, ODT and EPUB. Of course, metadata from an associated external YAML file or inherited metadata applies in all cases. In addition, the "native" metadata seen by Pandoc for most of the formats (more precisely all non-markdown, non-HTML formats other than Textile, which uses YAML frontmatter natively) will be used as a fallback source of in-file metadata, although this is limited to specific standard keys such as title, author and date.

Note that although other input formats are supported, the canonical format is still markdown. Unless there is a special reason to do otherwise it is the most sensible and efficient choice for websites generated using wmk.

A few gotchas

When creating a website with wmk, you might want to keep the following things in mind lest they surprise you:

  • The order of operations is as follows: (1) Copy files from static/; (2) run asset pipeline; (3) render standalone templates from templates; (4) render markdown content from content. As a consequence, later steps may overwrite files placed by earlier steps. This is intentional but definitely something to keep in mind.

  • For the run and watch actions when -q or --quick is specified as a modifier, wmk.py uses timestamps to prevent unnecessary re-rendering of templates, markdown files and SCSS sources. The check is rather primitive and does not take account of such things as shortcodes or changed dependencies in the template chain. As a rule, --quick is therefore not recommended unless you are working on a small, self-contained set of content files.

  • If templates or shortcodes have been changed it may sometimes be necessary to clear out the page rendering cache with wmc c. During development you may want to add use_cache: no to the wmk_config.yaml file. Also, some pages should never be cached, in which case it is a good idea to add no_cache: true to their frontmatter.

  • If files are removed from source directories the corresponding files in htdocs/ will not disappear automatically. You have to clear them out manually – or simply remove the entire directory and regenerate.

Context variables

The Mako/Jinja2 templates, whether they are stand-alone or being used to render markdown (or other) content, receive the following context variables:

  • DATADIR: The full path to the data directory.
  • WEBROOT: The full path to the htdocs directory.
  • CONTENTDIR: The full path to the content directory.
  • TEMPLATES: A list of all templates which will potentially be rendered as stand-alone. Each item in the list contains the keys src (relative path to the source template), src_path (full path to the source template), target (full path of the file to be written), and url (relative url to the file to be written).
  • MDCONTENT: An MDContentList representing all the content files which will potentially be rendered by a template. Each item in the list contains the keys source_file, source_file_short (truncated and full paths to the source), target (html file to be written), template (filename of the template which will be used for rendering), data (most of the context variables seen by this content), doc (the raw content document source), and url (the SELF_URL value for this content – see below). Note that MDCONTENT is not available inside shortcodes. An MDContentList is a list object with some convenience methods for filtering and sorting. It will be described further later on.
  • Whatever is defined under template_context in the wmk_config.yaml file (see the "Configuration file" section below).
  • SELF_URL: The relative path to the HTML file which the output of the template will be written to.
  • SELF_TEMPLATE: The path to the current template file (from the template root).
  • ASSETS_MAP: A map of fingerprinted assets (such as javascript or css files), used by the fingerprint template filter.
  • LOADER: The template loader/env. In the case of Mako, this is a TemplateLookup object; in the case of Jinja2 this is an Environment object with a FileSystemLoader loader.
  • site: A dict-like object containing the variables specified under the site key in wmk_config.yaml.
  • CACHE: An ordinary dictionary object, intended for use by templates as a simple shared in-memory cache.

In the case of Jinja2 templates, three extra context variables are available:

  • mako_lookup: A Mako TemplateLookup instance which makes it possible to call Mako templates from a Jinja2 template.
  • get_context: A function returning all context variables as a dict.
  • import: An alias for importlib.import_module and can thus be used to import a Python module into a Jinja template as the value of a variable, e.g. {% set utils = import('my_utils') %}. The main intent is to make code inside the project py/ subdirectory as easily available in Jinja templates as it is in Mako templates.

When templates are rendering markdown (or other) content, they additionally get the following context variables:

  • CONTENT: The rendered HTML produced from the source document.
  • RAW_CONTENT: The original source document.
  • SELF_FULL_PATH: The full filesystem path to the source document file.
  • SELF_SHORT_PATH: The path to the source document file relative to the content directory.
  • MTIME: A datetime object representing the modification time for the source file.
  • DATE: A datetime object representing the first found value of date, pubdate, modified_date, expire_date, or created_date found in the YAML front matter, or the MTIME value as a fallback. Since this is guaranteed to be present, it is natural to use it for sorting and generic display purposes.
  • RENDERER: A callable which enables a template to render markdown in wmk's own environment. This is mainly so that it is possible to support shortcodes which depend on other markdown content which itself may contain shortcodes. The callable receives a dict containing the keys doc (the markdown) and data (the context variables) and returns rendered HTML.
  • page: A dict-like object containing the variables defined in the YAML meta section at the top of the markdown file, in index.yaml files in the markdown file directory and its parent directories inside content, and possibly in YAML files from the data directory loaded via the LOAD directive in the metadata.

For further details on context variables set in the document frontmatter and in index.yaml files, see the "Site, page and nav variables" section below.

Configuration file

A config file, $basedir/wmk_config.yaml, can be used to configure many aspects of how wmk operates. The name of the file may be changed by setting the environment variable WMK_CONFIG which should contain a filename without a leading directory path.

The configuration file must exist (but may be empty). If it specifies a theme and a file named wmk_config.yaml (regardless of the WMK_CONFIG environment variable setting) exists in the theme directory, then any settings in that file will be merged with the main config – unless ignore_theme_conf is true.

It is also possible to split the configuration file up into several smaller files. These are placed in the wmk_config.d/ directory (inside the base directory). The filename of each yaml file in that directory (minus the .yaml extension) is treated as a key and the contents as its value. Subdirectories can be used to represent a nested structure. For instance, the file wmk_config.d/site/colors/darkmode.yaml would contain the settings that will be visible to templates as the site.colors.darkmode variable. Note that the WMK_CONFIG environment variable affects the name of the directory looked for; setting it to myconf.yaml would e.g. mean that wmk will inspect myconf.d/ for extra configuration settings instead of wmk_config.d/ (although this does not apply to themes, whose configuration file/directory name is fixed).

Currently there is support for the following settings:

  • template_context: Default values for the context passed to templates. This should be a dict.

  • site: Values for common information relating to the website. These are also added to the template context under the key site. They are often used by templates and themes to affect the look and feel of the website. For further discussion, see the "Site, page and nav variables" section below.

  • render_drafts: Normally, content files with draft set to a true value in the metadata section will be skipped during rendering. This can be turned off (so that the draft status flag is ignored) by setting render_drafts to True in the config file.

  • markdown_extensions: A list of extensions to enable for markdown processing by Python-Markdown. The default is ['extra', 'sane_lists']. If you specify third-party extensions here, you have to install them into the Python virtual environment first. Obviously, this has no effect if pandoc is true. May be set or overridden through frontmatter variables.

  • markdown_extension_configs: Settings for your markdown extensions. May be set in the config file or in the frontmatter. For convenience, there are special frontmatter settings for two extensions, namely for toc and wikilinks:

    • The toc boolean setting will turn the toc extension off if set to False and on if set to True, regardless of its presence in markdown_extensions.
    • If toc is in markdown_extensions (or has been turned on via the toc boolean), then the toc_depth frontmatter variable will affect the configuration of the extension regardless of the markdown_extension_configs setting.
    • If wikilinks is in markdown_extensions then the options specified in the wikilinks frontmatter setting will be passed on to the extension. Example: wikilinks: {'base_url': '/somewhere'}.
  • pandoc: Normally Python-Markdown is used for markdown processing, but if this boolean setting is true, then Pandoc via Pypandoc is used by default instead. This can be turned off or on through frontmatter variables as well. Another config setting which affects whether Pandoc is used is content_extensions, for which see below.

  • pandoc_filters, pandoc_options: Lists of filters and options for Pandoc. Has no effect unless pandoc is true. May be set or overridden through frontmatter variables.

  • pandoc_input_format: Which input format to assume for Pandoc; has no effect unless pandoc is true. The default value is markdown. If set, the value should be a markdown subvariant for markdown-like content, i.e. one of markdown (pandoc-flavoured), gfm (github-flavoured), markdown_mmd (MultiMarkdown), markdown_phpextra, markdown_strict, commonmark, or commonmark_x. As for other supported input formats, there is little reason to set pandoc_input_format explicitly for them, since they have no variants in the relevant sense, and the right format is picked based on the file extension. May be set or overridden through frontmatter variables.

  • pandoc_output_format: Output format for Pandoc; has no effect unless pandoc is true. This should be a HTML variant, i.e. either html, html5 or html4, or alternatively one of the HTML-based slide formats, i.e. s5, slideous, slidy, dzslides or reavealjs. Chunked HTML (new in Pandoc 3) is not supported. May be set or overridden through frontmatter variables.

  • pandoc_extra_formats, pandoc_extra_formats_settings: If pandoc is True, then pandoc_extra_formats in the frontmatter can be used to convert to other formats than HTML, for instance PDF or MS Word (docx). pandoc_extra_formats is a dict where each key is a format name (e.g. pdf) and its value is the output filename relative to the web root (e.g. subdir/myfile.pdf). The special value auto indicates that the name of the output file should be based on that of the source file but with the file extension replaced by the name of the format. For instance, a source file named subdir/index.md (relative to the content directory) maps to an output file named subdir/index.pdf (relative to the web root directory) if the output format is pdf, and so on. pandoc_extra_formats_settings, if present, contains any special settings for the conversion in the form of a dict where each key is a format name and its value is either a dict with the keys extra_args and/or filters, or a list (which then is interpreted as the value of the extra_args setting).

  • slugify_dirs: Affects the names of directories created in htdocs because of the pretty_path setting. If true (which is the default), the name will be identical to the slug of the source file. If explicitly set to false, then the directory name will be the same as the basename of the source file, almost regardless of the characters in the filename.

  • use_cache: boolean, True by default. If you set this to False, the rendering cache will be disabled. This is useful for small and medium-sized projects where the final HTML output often depends on factors other than the content file alone. Note that caching for a specific file can be turned off by putting no_cache: true in the frontmatter.

  • cache_mtime_matters: boolean, False by default. Normally only the body of the markdown file and a few selected processing settings make up the cache key. If, on the other hand, this setting is True (either in the configuration file or in the frontmatter), then the modification time of the markdown file affects the cache key, so touching the file is sufficient for refreshing its cache entry.

  • use_sass: A boolean indicating whether to handle Sass/SCSS files in assets/scss automatically. True by default.

  • use_dart_sass: By default, Sass/SCSS is handled by libsass. If use_dart_sass is true, Dart Sass is used instead. This requires it to be installed as an external command.

  • dart_sass_bin: This can be set to point to the location of the sass executable that will be run when use_dart_sass is true. Normally not needed unless sass is not in your PATH or you want to add parameters (or use a specific version).

  • sass_output_style: The output style for Sass/SCSS rendering. This should be one of compact, compressed, expanded or nested. The default is expanded. Has no effect if use_sass is false. When using Dart Sass, compact and nested have the same effect as expanded.

  • assets_map: An assets map is a mapping from filenames or aliases to names of files containing a hash identifier (under the webroot). A typical entry might thus map from /css/style.css to /css/style.1234abcdef56.css. The value of this setting is either a dict or the name of a JSON or YAML file (inside the data directory) containing the mapping. It will be available to templates as ASSETS_MAP.

  • assets_fingerprinting: A boolean indicating whether to automatically fingerprint assets files (i.e. add hash indicators to their names). If true, any fingerprinted files will be added to the ASSETS_MAP template variable.

  • assets_fingerprinting_conf: A dict where the keys are subdirectories of the webroot, e.g. js or img/icons, and the values are dicts containing the keys pattern and (optionally) exclude. These are regular expressions indicating which files to fingerprint under these directories. The filename is fingerprinted if it matches pattern but does not match exclude. (The default value of exclude looks for files that appear to have been fingerprinted already and thus does not normally need to be set). The default value of this setting is a simple setup for the js and css subdirectories of the webroot.

  • assets_commands: A list of arbitrary commands to run at the assets compilation stage (just before Sass/SCSS files in assets/scss are processed, assuming use_sass is not false). The commands are run in order inside the base directory of the site. Example: ['bin/fetch_external_assets.sh', 'node esbuild.mjs'].

  • lunr_index: If this is True, a search index for lunr.js is written as a file named idx.json in the root of the htdocs/ directory. Basic information about each page (title and summary) is additionally written to idx.summaries.json.

  • lunr_index_fields: The default fields for generating the lunr search index are title and body. Additional fields and their weight can be configured through this variable. For instance {"title": 10, "tags": 5, "body": 1}. Aside from body, the fields are assumed to be attributes of page.

  • lunr_languages: A two-letter language code or a list of such codes, indicating which language(s) to use for stemming when building a Lunr index. The default language is en. For more on this, see the "Site search" section below.

  • http: This is is a dict for configuring the address used for wmk serve. It may contain either or both of two keys: port (default: 7007) and ip (default: 127.0.0.1). Can also be set directly via command line options.

  • output_directory: Normally the output will be written to the directory htdocs inside the basedir, but this can be overridden by setting this configuration variable. The value should be a relative path that does not start with / or ., e.g. site or public.

  • mako_imports: A list of Python statements to add to the top of each generated Mako template module file. Generally these are import statements.

  • theme: This is the name of a subdirectory to the directory $basedir/themes (or a symlink placed there) in which to look for extra static, assets, py and template directories. Note that neither content nor data directories of a theme will be used by wmk. A theme-provided template may be rendered as stand-alone page, but only if no local template overrides it (i.e. has the same relative path). Mako's internal template lookup will similarly first look for referenced components in the normal template directory before looking in the theme directory. Configuration settings from wmk_config.yaml in the theme directory will be used as long as they do not conflict with those in the main config file.

  • ignore_theme_conf: If set to true in the main configuration file, this tells wmk to ignore any settings in wmk_config.yaml in the theme directory.

  • extra_template_dirs: A list of directories in which to look for template files. These are placed after both $basedir/templates and theme-provided templates in the template engine search path. This makes it possible to build up a library of components which can be easily used on multiple sites and across different themes.

  • jinja2_templates: If this boolean setting is true, it indicates that the template files in the template directory (and supplied by the theme, or otherwise in the template engine search path) are to be interpreted by Jinja2 rather than Mako. Note that Jinja2 templates used standalone or as layout templates for Markdown content should have the extension .html rather than .mhtml.

  • redirects: If this is True or a string pointing to a YAML file in the data/ directory (whose default name is redirects.yaml), then wmk will write HTML stubs containing <meta http-equiv="refresh" ...> in the indicated locations. The contents of the YAML file is a list of entries with the keys from and to. The former is a path under htdocs/ or a list of such paths, while to is an absolute or relative URL which you are to be redirected to.

  • content_extensions: Customize which file extensions are handled inside the content/ directory. May be a list (e.g. ['.md', '.html']) or a dict. The value for each key in the dict should itself be a dict where the following keys have an effect: pandoc (boolean), pandoc_input_format (string), is_binary (boolean), raw (boolean), pandoc_binary_format (string). See the value of DEFAULT_CONTENT_EXTENSIONS in wmk.py for details.

  • mdcontent_json: This option may specify the name of a JSON file to which to write the entire MDCONTENT object in serialized form, along with the environment variables for each page. The destination file may be either in htdocs/, data/ or tmp/. If the file path does not start with one of these, data is assumed. The specified (or implied) directory must exist.

  • init_commands: A list of arbitrary commands to run at the very beginning of processing, just after theme settings have been loaded and the Python search path configured. They are run in order inside the base directory of the site.

  • cleanup_commands: A list of arbitrary commands to run at the very end of wmk processing. The commands are run in order inside the base directory of the site.

A note on Pandoc

Pandoc's variant of markdown is very featureful and sophisticated, but since its use in wmk involves spawning an external process for each content file being converted, it is quite a bit slower than Python-Markdown. Therefore, it is only recommended if you really do need it. Often, even if you do, it can be turned on for individual pages or site sections rather than for the entire site. (Of course, if you are working with non-markdown, non-HTML input content, using Pandoc is unavoidable.)

If you decide to use Pandoc for a medium or large site (or if you have a significant amount of non-markdown content), it is recommended to turn the use_cache setting on in the configuration file. When doing this, be aware that content that is sensitive to changes apart from the content file itself will need to be marked as non-cacheable by adding no_cache: true to the frontmatter. If you for instance call the pagelist() shortcode in the page, you would normally want to mark the file in this way.

The markdown_extensions setting will of course not affect pandoc, but there is one extension which is partially emulated in wmk's Pandoc setup, namely toc.

If the toc frontmatter variable is true and the string [TOC] is present as a separate line in a document which is to be processed by pandoc, then it will be asked to generate a table of contents which will be placed in the indicated location, just like the toc extension for Python-Markdown does. The toc_depth setting (whose default value is 3) is respected as well, although only in its integer form and not as a range (such as "2-4"). This applies not only to markdown documents but also to the non-markdown formats handled by Pandoc.

Available themes

There are several wmk themes available:

Shortcodes

A shortcode consists of an opening tag, {{<, followed by any number of whitespace characters, followed by a string representing the "short version" of the content, followed by any number of whitespace characters and the closing tag >}}.

A typical use case is to easily embed content from external sites into your markdown (or other) content. More advanced possibilities include formatting a table containing data from a CSV file or generating a cropped and scaled thumbnail image.

Shortcodes are normally implemented as Mako components named <shortcode>.mc in the shortcodes subdirectory of templates (or of some other directory in your template search path, e.g. themes/<my-theme>/templates/shortcodes). If jinja2_templates is set to true, however, the shortcode templates are in Jinja2 format instead, and use the .jc extension rather than .mc.

The shortcode itself looks like a function call. Note that positional arguments can only be used if the component has an appropriate <%page> block declaring the expected arguments.

The shortcode component will have access to a context composed of (1) the parameters directly specified in the shortcode call; (2) the information from the metadata block of the markdown file in which it appears; (3) a counter variable, nth, indicating number of invocations for that kind of shortcode in that markdown document; and (4) the global template variables.

Shortcodes are applied before the content document is converted to HTML, so it is possible to replace a shortcode with markdown content which will then be processed normally. Note, however, that this may lead to undesirable results when you use such shortcodes in a non-markdown content document.

A consequence of this is that shortcodes do not have direct access to (1) the list of files to be processed, i.e. MDCONTENT, or (2) the rendered HTML (including the parts supplied by the Mako template). A shortcode which needs either of these must place a (potential) placeholder in the markdown source as well as a callback in page.POSTPROCESS. Each callback in this list will be called just before the generated HTML is written to htdocs/ (or, in the case of a cached page, after document conversion but right before the Mako layout template is called), receiving the full HTML as a first argument followed by the rest of the context for the page. Examples of such shortcodes are linkto and pagelist, described below. (For more on page.POSTPROCESS and page.PREPROCESS, see the "Site, page and nav variables" section below).

Here is an example of a simple shortcode call in markdown content:

### Yearly expenses

{{< csv_table('expenses_2021.csv') >}}

Here is an example csv_table.mc Mako component that might handle the above shortcode call:

<%page args="csvfile, delimiter=',', caption=None"/>
<%! import os, csv %>
<%
info = []
with open(os.path.join(context.get('DATADIR'), csvfile.strip('/'))) as f:
    info = list(csv.DictReader(f, delimiter=delimiter))
if not info:
    return ''
keys = info[0].keys()
%>
<table class="csv-table">
  % if caption:
    <caption>${ caption }</caption>
  % endif
  <thead>
    <tr>
      % for k in keys:
        <th>${ k }</th>
      % endfor
    </tr>
  </thead>
  <tbody>
    % for row in info:
      <tr>
        % for k in keys:
          <td>${ row[k] }</td>
        % endfor
      </tr>
    % endfor
  </tbody>
</table>

Note that if Jinja2 templates are being used, positional arguments are not supported except for in built-in shortcodes, so the shortcode call in the Markdown in the above example would have to be changed to cvs_table(csvfile='expenses_2021.csv') or similar.

Shortcodes can take up more than one line if desired, for instance:

{{< figure(
      src="/img/2021/11/crocodile-or-alligator.jpg",
      caption="""
Although they appear similar, **crocodiles** and **alligators** differ in easy-to-spot ways:

- crocodiles have narrower and longer heads;
- their snouts are more V-shaped;
- also, crocodiles have a protruding tooth, visible when their mouth is closed.
""") >}}

In this example, the caption contains markdown which would be converted to HTML by the shortcode component (assuming we're dealing with the default figure shortcode).

Note that shortcodes are not escaped inside code blocks, so if you need to show examples of shortcode usage in your content they must be escaped in some way in such contexts. One relatively painless way is to put a non-breaking space character after the opening tag {{< instead of a space.

Default shortcodes

The following default shortcodes are provided by the wmk installation:

  • figure: An image wrapped in a <figure> tag. Accepts the following arguments: src (the image path or URL), img_link, link_target, caption, figtitle, alt, credit (image attribution), credit_link, width, height, resize. Except for src, all arguments are optional. The caption and credit will be treated as markdown. If resize is True and width and height have been provided, then a resized version of the image is used instead of the original via the resize_image shortcode (the details can be controlled by specifying a dict representing resize_image arguments rather than a boolean; see below).

  • gist: A Github gist. Two arguments, both required: username and gist_id.

  • include: Insert the contents of the named file at this point. One required argument: filename. Optional argument: fallback (which defaults to the empty string), indicating what to show if the file is not found. The file must be inside the content directory (CONTENTDIR), otherwise it will not be read. The path is interpreted as relative to the directory in which the content file is placed. A path starting with / is taken to start at CONTENTDIR. Nested includes are possible but the paths of sub-includes are interpreted relative to the original directory (rather than the directory in which the included file has been placed). Note that include() is always handled before other shortcodes.

  • linkto: Links to the first matching (markdown-based) page. The first parameter, page, specifies the page which is to be linked to. This is either (a) a simple string representing a slug, title, (partial) path/filename or (partial) URL; or (b) a match_expr in the form of a dict or list which will be passed to page_match() with a limit of 1. Optional arguments: label (the link text; the default is the title of the matching page); ordering, passed to page_match() if applicable; fallback, the text to be shown if no matching page is found: (LINKTO: page not found) by default; the boolean unique, which if set to True causes a fatal error to be raised if multiple pages are found to match; and link_attr, which is a string to insert into the <a> tag (by default class="linkto"). A query string or anchor ID fragment for the link can be added via link_append, e.g. link_append='#section2' or link_append='?q=searchstring'. If the boolean parameter url_only is True, then the output will not be a link but only the URL (including link_append, if any).

  • pagelist: Runs a page_match() and lists the found pages. Required argument: match_expr. Optional arguments: exclude_expr, ordering, limit, template, fallback, template_args, sql_match. exclude_expr is a match expression which serves to exclude pages from the list found using the match_expr. For instance, pagelist({'has_tag': True}, exclude_expr={'has_tag': 'private'}) finds all tagged pages except those that have the tag private. The default way of representing the found pages is a simple unordered list of links to them, using the page titles as the link text. If nothing is found, a string specified in the fallback parameter (by default an empty string) replaces the shortcode call. The formatting of the list can be changed by pointing to a Mako template using the template argument, which will receive the argument pagelist (a MDContentList of found pages), as well as template_args, if any. The template will only be called if something is found. If sql_match is True, the match_expr and ordering and limit will be passed to page_match_sql() (as where_clause, order_by, and limit, respectively) rather than to page_match().

  • resize_image: Scales and crops images to a specified size. Required arguments: path, width, height. Optional arguments: op ('fit_width', 'fit_height', 'fit', 'fill'; the last is the default), format ('jpg' or 'png'; default is 'jpg'), quality (default 0.75 and applies only to jpegs), focal_point (default center; only used for op='fill'). Returns a path under /resized_images/ (possibly prefixed with the value of site.leading_path) pointing to the resized version of the image. The filename incorporates a SHA1 hash, so repeated requests for the same resize operation are only performed once. The source path is taken to be relative to the WEBROOT, i.e. the project htdocs directory.

  • template: The first argument (template) is either the filename of a template or literal template source code. The heuristic used to distinguish between these two cases is simply that filenames are assumed never to contain whitespace while source code always does. In either case, the template is called and its output inserted into the content document. The boolean argument is_jinja (default False) can be used to indicate that the given template source code is to be handled by Jinja2; otherwise Mako is assumed. For template files, however, the currently active engine as determined by the value of the jinja2_templates is always used, regardless of the is_jinja parameter. Any additional arguments are passed directly on to the template (which will also see the normal template context for the shortcode itself).

  • twitter: A tweet. Takes a tweet_id, which may be a Twitter status URL or the last part (i.e. the actual ID) of the URL.

  • var: The value of a variable, e.g. "page.title" or "site.description". One required argument: varname. Optional argument: default (which defaults to the empty string), indicating what to show if the variable is not available.

  • vimeo: A Vimeo video. One required argument: id. Optional arguments: css_class, autoplay, dnt (do not track), muted, title.

  • youtube: A YouTube video. One required argument: id. Optional arguments: css_class, autoplay, title, nowrap, nocookie, width, height.

  • wp: A link to Wikipedia. One required argument: title. Optional arguments: label, lang. Example: {{< wp('L.L. Zamenhof', lang='eo') >}}.

Template library

It is generally up to the site or theme author to define any needed Mako/Jinja templates. In rare cases, however, the templates are general enough that it may be natural to distribute them with wmk itself in the form of a Mako template library located under /lib/.

seo.mc

The template /lib/seo.mc makes it easier to format metadata for use in the <head> section of a base template. It is used in something like the following way:

<%namespace import="seo" file="/lib/seo.mc" />
% if page:
  ${ seo(site, page, url=SELF_URL, title=self.page_title) }
% else:
  ${ seo(site, page=None, url=SELF_URL, title=self.page_title,
         img=self.attr.main_image) }
% endif

This will add common meta tags (including basic OpenGraph and JSON-LD information). By default, it also adds a <title> tag. For further details regarding the functionality, see the template file itself.

atom_xml.mc

The template /lib/atom_xml.mc can be used to facilitate the creation of an Atom feed for the website. Set site.base_url to a valid URL and set site.atom_feed to a true value. Then create a file named atom.xml.mhtml in the template root, containing something like the following:

<%namespace name="atom" file="/lib/atom_xml.mc" />\
${ atom.feed(contentlist=MDCONTENT.sorted_by_date()) }\

There are several optional parameters (with_img, get_img, with_summary, get_summary, pubdate_attr, updated_attr, with_full_text, limit) for tweaking the output.

sitemap_xml.mc

Similarly, /lib/sitemap_xml.mc can be used to create a siteamp.xml file. Set site.enable_sitemap to a true value and ensure that site.base_url is present. Then create a file named sitemap.xml.mhtml in the template root, with the following content:

<%namespace import="sitemap" file="/lib/sitemap_xml.mc" />\
${ sitemap(contentlist=MDCONTENT) }\

Usage in Jinja templates

No Jinja version of these components has been created, but the Mako version can be called from a Jinja2 template using code such as the following:

{% set seo = mako_lookup.get_template("/lib/seo.mc").get_def("seo") %}
{{ seo.render(site, page, url=SELF_URL, title=page.title) |safe }}

Site, page and nav variables

When a markdown file (or other supported content) is rendered, the Mako template receives a number of context variables as partly described above. A few of these variables, such as MDTEMPLATES and DATADIR are set directly by wmk (see above). Others are user-configured either (1) in wmk_config.yaml (the contents of the site object and potentially additional "global" variables in template_context); or (2) the cascade of index.yaml files in the content directory and its subdirectories along with the YAML frontmatter of the markdown file itself, the result of which is placed in the page object.

When gathering the content of the page variable, wmk will start by looking for index.yaml files in each parent directory of the markdown file in question, starting at the root of the content directory and moving upwards, at each step extending and potentially overriding the data gathered at previous stages. Only then will the YAML in the frontmatter of the file itself be parsed and added to the page data.

The file-specific frontmatter may be in the content file itself, or it may be in a separate YAML file with the same name as the content file but with an extra .yaml extension. For instance, if the content filename is important.md, then the YAML file would be named important.md.yaml. If both in-file and external frontmatter is present, the two will be merged, with the in-file values "winning" in case of conflict.

At any point, a data source in this cascade may specify an extra YAML file using the special LOAD variable. This file will then be loaded as well and subsequently treated as if the data in it had been specified directly at the start of the file containing the LOAD directive.

Which variables are defined and used by templates is very much up the user, although a few of them have a predefined meaning to wmk itself. For making it easier to switch between different themes it is however suggested to stick to the following meaning of some of the variables:

The variables site and page are dicts with a thin convenience layer on top which makes it possible to reference subkeys belonging to them in templates using dot notation rather than subscripts. For instance, if page has a dict variable named foo, then a template could contain a fragment such as ${ page.foo.bar or 'splat' } -- even if the foo dict does not contain a key named bar. Without this syntactic sugar you would have to write something much more defensive and long-winded such as ${ page.foo.bar if page.foo and 'bar' in page.foo else 'splat' }.

The nav variable

The nav variable is an easy way of configuring a navigation tree for websites with content that has a hierarchical structure, such as a typical documentation site. It is set via the nav key in the wmk_config.yaml file and is represented in templates as a Nav object.

A Nav instance is a list-like object with two types of entries: links and sections. A link is just a title and a URL. A section has a title and a list of links or sections (possibly nested). It may or may not have a url as well.

Each item has a parent (with the nav itself as the top level parent) and a level (starting from 0 for the immediate children of the nav). The nav has a homepage attribute which by default is the first local link in the nav. Each local link has previous and next attributes. Each section has children. There are other attributes but these are the basics.

Manually configured

A typical explicit nav setting looks something like this:

nav:
    - Home: /
    - 'User Guide [url=/guide/]':
        - Lorem:
            - Ipsum: /guide/ipsum/
            - Eu fuit: /guide/mageisse/
        - Dolor sit amet: /guide/concupescit/
    - Resources:
        - Community: 'https://example.com/'
        - Source code: 'https://github.com/example/com/'
    - About:
        - License: /about/license/
        - History: /about/history/

A manually configured nav setting of this kind is only necessary if you want to link to something outside of the site from the nav (as in the above example). Otherwise, it depends on the kind of content you have whether a manually defined or an automatically generated nav would be more appropriate to your use case.

Automatically generated

A nav object can also be generated by wmk from the frontmatter of the content files. In order for this to happen two conditions must be met:

  1. The value of nav in wmk_config.yaml is set to auto.

  2. Each item in the config that is to appear in the navigation tree must have at least the key nav_section in the frontmatter. To determine ordering, nav_order or (equivalently) weight may also be specified; and if necessary the page title may be overridden in the nav by setting the nav_title attribute.

The nav_section value Root is special. Pages assigned to that section are placed directly at the front of the nav structure. For many sites, you would simply place this in the index.yaml file at the root of your content directory.

Other sections are simply grouped by their nav_section values. Please note that these values are case-sensitive.

Within each section the link items are ordered by the their nav_order/weight value, which should be an integer. If two or more items have the same ordering number, they are ordered by nav_title/title.

The sections themselves are ordered within the nav by the lowest nav_order/weight value of the pages assigned to them. Sections with the same ordering number are sorted alphabetically.

A page may be excluded from the nav (even if it has a nav_section) by setting its nav_exclude to a true value.

The pages inside each section may be nested to an arbitrary depth by using the nav_parent (or parent) variable in the frontmatter of the subpages. The value of this is normally the nav_title/title (case-insensitive) of the parent page. However, if more than one page in the same section has the same title, then one may disambiguate by specifying the slug or (in extreme cases) the id of the target page instead.

The TOC variable

When a page is rendered, the generated HTML is examined and a simple table of contents object constructed, which will be available to templates as TOC. It contains a list of the top-level headings (i.e. H1 headings, or H2 headings if no H1 headings are present, etc.), with lower-level headings hierarchically arranged in its children. Other attributes are url and title. TOC.item_count contains the heading count (regardless of nesting).

The TOC variable can e.g. be used by the page template to show a table of contents elsewhere on the page.

The table of contents object is not constructed unless each heading has an id attribute. When using the default python-markdown, this means that the toc extension must be active.

System variables

The following frontmatter variables affect the operation of wmk itself, rather than being exclusively used by templates.

Templates

Note that a variable called something like page.foo below is referenced as such in templates but specified in YAML frontmatter simply as foo: somevalue.

  • page.template specifies the template which will render the content.

  • page.layout is used by several other static site generators. For compatibility with them, this variable is supported as a fallback synonym with template. It has no effect unless template has not been specified explicitly anywhere in the cascade of frontmatter data sources.

For both template and layout, the .mhtml (or .html in the case of Jinja2) extension of the template may be omitted. If the template value appears to have no extension, .mhtml or .html (depending on the template engine) is assumed; but if the intended template file has a different extension, then it must of course be specified.

Likewise, a leading base/ directory may be omitted when specifying template or layout. For instance, a layout value of post would find the template file base/post.mhtml unless a post.mhtml file exists in the template root somewhere in the template search path.

If neither template nor layout has been specified and no default_template setting is found in wmk_config.yaml, the default template name for markdown files is md_base.mhtml (or md_base.html if Jinja2 templates have been selected).

The special template/layout value __empty__ (case-insensitive) indicates that no base template should be applied to the given content file.

Taxonomy handling

A taxonomy is a classification of pieces of content for the purpose of grouping them together. Common taxonomy types are tags, categories, sections and article authors. However, the taxonomy that is appropriate to a particular website mainly depends on the content. On a site with book reviews you would have genres, book authors and publishers, on a movie site you would have genres and actors, and so on. Each set of frontmatter classifiers (e.g. the single classifier tag or the list ['tag', 'tags']) used in a taxonomy may be called a term. Each term may have several values, and each value represents a list of content items associated with it.

Up to version 1.13 of wmk, taxonomies had to be handled by templates, and this is still the best way to do it if you want a form of presentation which is tailored to a particular term. However, as a consequence, themes had to be designed around specific taxonomies, typically tags, categories, or sections. In other words, the presentation of taxonomies was not primarily content-driven.

From version 1.13 it is therefore possible to specify the taxonomy criteria directly in the front matter of the main content page for the corresponding term. Here is an example based on a movie site, for the term director. The content file might be named directors/index.md:

---
title: Directors
date: 2024-11-01
template: base/taxonomy/list.mhtml
TAXONOMY:
  taxon: ['director', 'directors']
  order: name
  detail_template: base/taxonomy/detail.mhtml
  list_settings:
    pagination: true
    per_page: 24
  detail_settings:
    biographies: directors.yaml
    item_template: lib/movie_teaser.mc
---

Below is a list of the directors of the movies
that have been covered on this website.

Click on the name of a director to see a short biography
and an overview of their movies.

The frontmatter variable page.TAXONOMY triggers the special processing of the page, provided that it contains at least the subkeys taxon and detail_template. This special processing consists in the following:

  1. wmk fetches a list of values for the term specified in taxon using the taxonomy_info() method of MDCONTENT. This will be added to the template context as TAXONS.

  2. For each value in the list, wmk renders the template detail_template with the same context, except that the two keys TAXON (the value) and TAXON_INDEX (the 0-based index of the value in the list) are added. (If no detailt_template is specified, then the template for the page is used). Each TAXON has items which represent the pages tagged with that director, and the main job of thet detail page is to show a list of them to the user. The result is written to a destination file the name of which is based on the destination of the rendered Markdown content plus the slug of the string identifying the value (e.g. directors/orson-welles/index.html in this example). The target url will be available as TAXON['url'] (and thus also under the key 'url' for each item in TAXONS).

  3. wmk resumes normal operation by calling the main template with the modified template context as well as the content from the markdown file, and writes the result to the target file.

Please note that the settings in list_settings and detail_settings in the example above are merely for the purposes of illustration. Whether any of them are actually supported is entirely up to the template or theme author. The only subvariables used by wmk itself are taxon, order (if present), and detail_template (if present).

Variables affecting rendering

  • page.slug: If the value of slug is nonempty and consists exclusively of lowercase alphanumeric characters, underscores and hyphens (i.e. matches the regular expression ^[a-z0-9_-]+$), then this will be used instead of the basename of the markdown file to determine where to write the output. If a slug variable is missing, one will be automatically added by wmk based on the basename of the current content file (as well as, in the case of index.* files, their proximate directory). Note that autogenerated slugs do not affect the location of the destination file. Slugs are not necessarily unique, but page.id values are – see below.

  • page.pretty_path: If this is true, the basename of the markdown filename (or the slug) will become a directory name and the HTML output will be written to index.html inside that directory. By default it is false for files named index.md or index.html and true for all other files. If the filename contains symbols that do not match the character class [\w.,=-], then it will be "slugified" before final processing (although this only works for languages using the Latin alphabet).

  • page.do_not_render: Tells wmk not to write the output of this template to a file in htdocs. All other processing will be done, so the gathered information can be used by templates for various purposes. (This is similar to the headless setting in Hugo).

  • page.draft: If this is true, it prevents further processing of the markdown file unless render_drafts has been set to true in the config file.

  • page.no_cache: If this is true, the rendering cache will not be used for this file. (See also the use_cache setting in the configuration file).

  • page.markdown_extensions, page.markdown_extension_configs, page.pandoc, page.pandoc_filters, page.pandoc_options, page.pandoc_input_format, page.pandoc_output_format: See the description of these options in the section on the configuration file, above.

  • page.POSTPROCESS: This contains a list of processing instructions which are called on the rendered HTML just before writing it to the output directory. Each instruction is either a function (placed into POSTPROCESS by a shortcode) or a string (possibly specified in the frontmatter). If the latter, it points to a function entry in the autoload dict imported from either the project's py/wmk_autoload.py file or the theme's py/wmk_theme_autoload.py file. In either case, the function receives the html as the first argument while the rest of the arguments constitute the template context. It should return the processed html.

  • page.PREPROCESS: This is analogous to page.POSTPROCESS, except that the instructions in the list are applied to the markdown (or other content document) just before converting it to HTML. The function receives two arguments: the document text and the page object. It should return the altered document source. Note that this happens before shortcodes have been expanded, so (unlike page.POSTPROCESS) such actions cannot be added via shortcode.

Note that if two files in the same directory have the same slug, they may both be rendered to the same output file; it is unpredictable which of them will go last (and thus "win the race"). The same kind of conflict may arise between a slug and a filename or even between two filenames containing non-ascii characters. It is up to the content author to take care to avoid this; wmk does nothing to prevent it.

Standard variables and their recommended meaning

The following variables are not used directly by wmk but affect templates in different ways. It is a list of recommendations rather than something which must be necessarily followed.

Typical site variables

Site variables are the keys-value pairs under site: in wmk_config.yaml.

  • site.title: Name or title of the site.

  • site.lang: Language code, e.g. 'en' or 'en-us'. Used e.g. for translations by some themes.

  • site.locale: Locale code, e.g. 'en_US.utf8'. Used when sorting MDCONTENT by name or title.

  • site.tagline: Subtitle or slogan.

  • site.description: Site description.

  • site.author: Main author/proprietor of the site. Depending on the site templates (or the theme), may be a string or a dict with keys such as "name", "email", etc.

  • site.base_url: The protocol and hostname of the site (perhaps followed by a directory path if site.leading_path is not being used). Normally without a trailing slash.

  • site.leading_path: If the web pages built by wmk are not at the root of the website but in a subdirectory, this is the appropriate prefix path. Normally without a trailing slash.

  • site.build_time: This is automatically added to the site variable by wmk. It is a datetime object indicating when the rendering phase of the current run started.

  • site.lunr_search: A boolean automatically added to the site variable. It is true when lunr_index is true in the configuration file.

Templates or themes may be configurable through various site variables, e.g. site.paginate for number of items per page in listings or site.mainfont for configuring the font family.

Classic meta tags

These variables mostly relate to the text content and affect the metadata section of the <head> of the HTML page.

  • page.title: The title of the page, typically placed in the <title> tag in the <head> and used as a heading on the page. Normally the title should not be repeated as a header in the body of the markdown file. Most markdown documents should have a title. If it is not explicitly specified, the title will be generated automatically from the filename.

  • page.slug: See above. If it is missing, the slug is created from the title.

  • page.id: This is guaranteed to be unique at rendering time. If it is present but not unique, then "-1", "-2", etc., will be appended as necessary. If it is not explicitly specified, then it is generated by slugifying the full path to the source markdown file (relative to the content directory). For instance, blog/2022/09/The letter Þ in Old English.md will become the ID blog-2022-09-the-letter-th-in-old-english.

  • page.description: Affects the <meta name="description" ...> tag in the <head> of the page. The variable summary (see later) may also be used as fallback here.

  • page.keywords: Affects the <meta name="keywords" ...> tag in the <head> of the page. This may be either a list or a string (where items are separated with commas).

  • page.robots: Instructions for Google and other search engines relating to this content (e.g. noindex, nofollow) should be placed in this variable.

  • page.author: The name of the author (if there is only one). May lead to <meta name="keywords" ...> tag in the <head> as well as appear in the body of the rendered HTML file. Some themes may expect this to be a dict with keys such as name, email, image, etc.

  • page.authors: If there are many authors they may be specified here as a list. It is up to the template how to handle it if both author and authors are specified, but one way is to add the author to the authors unless already present in the list.

  • page.summary: This may affect the <meta name="description" ...> tag as a fallback if no description is provided, but its main purpose is for list pages with article teasers and similar content. If it is initially not present but page.generate_summary is True, then it will be generated from the body of the page, as follows: (1) if the HTML comment <!--more--> is present in the body, then any non-heading content before that will be used as the summary; (2) otherwise the first paragraph of the body will be used. In either case, if the autogenerated summary is longer than 300 characters, then it is truncated so as to be shorter than that (this maximum length is configurable with page.summary_max_length). Autogenerated summaries will contain neither HTML tags nor Markdown markup; if this is desired, the summary must be explicitly added to the frontmatter.

Note that this is by no means an exhaustive list of variables likely to affect the <head> part of the generated HTML. For instance, several other variables may affect meta tags used for sharing on social media. One of the more common ones is probably page.image (described below). In any case, the list of supported frontmatter attributes and how they are interpreted is for the most part up to the theme or template author.

Dates

Dates and datetimes should normally be in a format conformant with or similar to ISO 8601, e.g. 2021-09-19 and 2021-09-19T09:19:21+00:00. The T may be replaced with a space and the time zone may be omitted (localtime is assumed). If the datetime string contains hours it should also contain minutes, but seconds may be omitted. If these rules are followed, the following variables are converted to date or datetime objects (depending on the length of the string) before they are passed on to templates.

  • page.date: A generic date or datetime associated with the document.

  • page.pubdate: The date/datetime when first published. Currently wmk b does not omit rendering files with date or pubdate in the future, but it may do so in a later version.

  • page.modified_date: The last-modified date/datetime. Note that wmk will also add the variable MTIME, which is the modification time of the file containing the markdown source, so this information can be inferred from that if this variable is not explicitly specified.

  • page.created_date: The date the document was first created.

  • page.expire_date: The date from which the document should no longer be published. Similarly to pubdate, this currently has no direct effect on how wmk builds and renders the site but may do so in a later version.

  • page.auto_date: If this is True and no page.date is present (or rather the field specified in page.auto_date_field, which deafults to date), then wmk tries to look for an ISO date in the source filename or its directory path. In this context, that means a group of 4+2+2 digits with a separator which may be either -, _, or /: e.g. posts/2024-05-13-find-the-fish.md or diary/2024/02/19/spam.org. If a date is found, then page.date is set accordingly. (Obviously you would normally set auto_date in an index.yaml file so as to affect all content files in that directory and its subdirectories.)

See also the description of the DATE and MTIME context variables above.

Media content

  • page.image: The main image associated with the document. Affects the og:image meta tag in HTML output and may be used for both teasers and content rendering.

  • page.images: A list of images associated with the document. If image is not specified, the main image will be taken to be the first in the list.

  • page.audio: A list of audio files/urls associated with this document.

  • page.videos: A list of video files/urls associated with this document.

  • page.attachments: A list of attachments (e.g. PDF files) associated with this document.

Taxonomy

See also the description of page.TAXONOMY above. The following are terms commonly used for taxonomy purposes:

  • page.section: One of a quite small number of sections on the site, often corresponding to the leading subdirectory in content. E.g. "blog", "docs", "products".

  • page.categories: A list of broad categories the page belongs to. E.g. "Art", "Science", "Food". The first-named category may be regarded as the primary one.

  • page.tags: A list of tags relevant to the content of the page. E.g. "quantum physics", "knitting", "Italian food".

  • page.weight: A measure of importance attached to a page and used as an ordering key for a list of pages. This should be a positive integer. The list is normally ascending, i.e. with the lower numbers at the top. (Pages may of course be ordered by other criteria, e.g. by pubdate).

Template filters

In addition to the built-in template filters provided by Mako or Jinja2 respectively, the following filters are by default made available in templates:

  • date: date formatting using strftime. By default, the format '%c' is used. A different format is specified using the fmt parameter, e.g.: ${ page.pubdate | date(fmt=site.date_format) }.

  • date_to_iso: Format a datetime as ISO 8601 (or similarly, depending on parameters). The parameters are sep (the separator between the date part and the time part; by default 'T', but a space is sensible as well); upto (by default 'sec', but 'day', 'hour' and 'frac' are also acceptable values); and with_tz (by default False).

  • date_to_rfc822: Format a datetime as RFC 822 (a common datetime format in email headers and some types of XML documents).

  • date_short: E.g. "7 Nov 2022".

  • date_short_us: E.g. "Nov 7th, 2022".

  • date_long: E.g. "7 November 2022".

  • date_long_us: E.g. "November 7th, 2022".

  • slugify: Turns a string into a slug. Only works for strings in the Latin alphabet.

  • markdownify: Convert markdown to HTML. It is possible to specify custom extensions using the extensions argument.

  • truncate: Convert markdown/html to plaintext and return the first length characters (default: 200), with an ellipsis (default: "…") appended if any shortening has taken place.

  • truncate_words: Convert markdown/html to plaintext and return the first length words (default: 25), with an ellipsis (default "…") appended if any shortening has taken place.

  • p_unwrap: Remove a wrapping <p> tag if and only if there is only one paragraph of text. Suitable for short pieces of text to which a markdownify filter has previously been applied. Example: <h1>${ page.title | markdownify,p_unwrap }</h1>.

  • strip_html: Remove any markdown/html markup from the text. Paragraphs will not be preserved.

  • cleanurl: Remove trailing 'index.html' from URLs.

  • url: Unless the given path already starts with '/', '.' or 'http', prefix it with the first defined leading path of site.leading_path, site.base_url or a literal /. Postfix a / unless the path already has one or seems to end with a file extension. Calls cleanurl on the result.

  • to_json: converts the given data structure to JSON. Note that this should not normally be used as a string filter (i.e. ${ value | to_json }) but directly as a function, like this: ${ to_json(value) }.

  • fingerprint: Replace an unadorned path to an assets file with its fingerprinted (i.e. versioned) equivalent. Example: ${ 'js/site.js' | url, fingerprint }. Uses the corresponding entry from the ASSETS_MAP context variable if it is available but otherwise proceeds to do the fingerprinting itself.

If you wish to provide additional filters in Mako without having to explicitly define or import them in templates, the best way of doing this his to add them via the mako_imports setting in wmk_config.yaml (see above). There is currently no easy way to do this if Jinja2 templates are being used, however.

Please note that in order to avoid conflicts with the above filters you should not place a file named wmk_mako_filters.py or wmk_jinja2_extras.py in your py/ directories.

Working with lists of pages

Templates which render a list of content files (e.g. a list of blog posts or pages belonging to a category) will need to filter or sort MDCONTENT accordingly. In order to make this easier, MDCONTENT is wrapped in a list-like object called MDContentList, which has the following methods:

General searching/filtering

Each of the following methods returns a new MDContentList containing those entries for which the predicate (pred) is True.

  • match_entry(self, pred): The pred (i.e. predicate) is a callable which receives the full information on each entry in the MDContentList and returns True or False.

  • match_ctx(self, pred): The pred receives the context for each entry and returns a boolean.

  • match_page(self, pred): The pred receives the page object for each entry and returns a boolean.

  • match_doc(self, pred): The pred receives the markdown body for each entry and returns a boolean.

  • url_match(self, url_pred): The pred receives the url (relative to htdocs) for each entry and returns a boolean.

  • path_match(self, src_pred): The pred receives the path to the source document for each entry and returns a boolean.

Specialized searching/filtering

All of these return a new MDContentList object (at least by default).

  • posts(self, ordered=True): Returns a new MDContentList with those entries which are blog posts. In practice this means those with markdown sources in the posts/ or blog/ subdirectories or those which have a page.type of "post", "blog", "blog-entry" or "blog_entry". Normally ordered by date (newest first), but this can be turned off by setting ordered to False.

  • not_posts(self): Returns a new MDContentList with "pages", i.e. those entries which are not blog posts.

  • has_slug(self, sluglist), has_id(self, idlist): Entries with specific slugs/ids.

  • in_date_range(self, start, end, date_key='DATE'): Posts/pages with a date between start and end. The key for the date field can be specifed using date_key. Unless the value for date_key is either DATE or MTIME, then the key is looked for in the page variables for the entry.

  • has_taxonomy(self, haystack_keys, needles): A general search for entries belonging to a taxonomy group, such as category, tag, section or type. They haystack_keys are the page variables to examine while needles is a list of the values to look for in the values of those variables. A string value for needles is treated as a one-item list. The search is case-insensitive.

  • in_category(self, catlist): A shortcut method for self.has_taxonomy(['category', 'categories'], catlist)

  • has_tag(self, taglist): A shortcut method for self.has_taxonomy(['tag', 'tags'], taglist).

  • in_section(self, sectionlist): A shortcut method for self.has_taxonomy(['section', 'sections'], sectionlist).

  • get_used_taxonomies(self): Get a list of all known taxonomies that are actually used by items in this MDContentList (i.e. content files). These may be of two types: (1) the standard taxonomies tags, sections, categories and authors; and (2) anything defined as a TAXONOMY in the frontmatter of a page. Returns a list of dicts with the keys taxon, name, name_singular and name_plural. If the taxonomy belongs to the latter group, then order, list_url, item_url_pattern and page_id will be present as well, and name_singular/name_plural may be empty. If a standard taxonomy (e.g. tags) has been handled as a content page TAXONOMY, then the latter type takes precedence (i.e. the standard one is omitted from the list).

  • group_by(self, pred, normalize=None, keep_empty=False): Group items in an MDContentList using a given criterion. Parameters: pred is a callable receiving a content item and returning a string or a list of strings. For convenience, pred may also be specified as a string and is then interpreted as the value of the named page variable, e.g. category; normalize is an optional callable that transforms the grouping values, e.g. by truncating and lowercasing them; keep_empty should be set to True when the content items whose predicate evaluates to the empty string are to be included in the result, since they otherwise will be omitted. Returns a dict whose keys are strings and whose values are MDContentList instances.

  • taxonomy_info(self, keys, order='count', tostring=None): Returns a list of dicts, where each dict corresponds to the slugified value of any of the keys in keys. The keys in the dict are name, slug, forms (different forms of name that appear in the result, e.g. upper/lowercase), count, and items (an MDContentList object). tostring, if present, is a callable that changes non-string and non-list values into strings for the purposes of grouping. Shorthand forms for common taxonomy types are available, namely get_categories(self, order='name'), get_tags(self, order='name'), get_sections(self, order='name'), and get_authors(self, order='name', tostring=None). These look for both singular and plural forms of the given keys, e.g. ['tag', 'tags'] for get_tags().

  • page_match(self, match_expr, ordering=None, limit=None): This is actually quite a general matching method but does not require the caller to pass a predicate callable to it, which means that it can be employed in more varied contexts than the general methods described in the last section. A match_expr contains the filtering specification. It will be described further below. The ordering parameter, if specified, should be either title, slug, url or date, with an optional - in front to indicate reverse ordering. The date option for ordering may be followed by the preferred frontmatter date field after a colon, e.g. ordering='-date:modified_date' for a list with the most recently changed files at the top. The limit, if specified, obviously indicates the maximum number of pages to return.

  • page_match_sql(), get_db(), get_db_columns() – see "Searching/filtering using SQL" below.

A match_expr for page_match() is either a dict or a list of dicts. If it is a dict, each page in the result set must match each of the attributes specified in it. If it is a list of dicts, each page in the result set must match at least one of the dicts (i.e., the returned result set contains the union of all matches from all dicts in the list). When a string or regular expression match is being performed in this process, it will be case-insensitive. The supported attributes (i.e. dict keys) are as follows:

  • title: A regular expression which will be applied to the page title.
  • slug: A regular expression which will be applied to the slug.
  • id: A string or list of strings (one of) which must match the page id exactly.
  • url: A regular expression which will be applied to the target URL.
  • path: A regular expression which will be applied to the path to the markdown source file (i.e. the source_file_short field).
  • doc: A regular expression which will be applied to the body of the markdown source document.
  • date_range: A list containing two ISO-formatted dates and optionally a date key (DATE by default) - see the description of in_date_range() above.
  • has_attrs: A list of frontmatter variable names. Matching pages must have a non-empty value for each of them.
  • attrs: A dict where each key is the name of a frontmatter variable and the value is the value of that attribute. If the value is a string, it will be matched case-insensitively. All key-value pairs must match.
  • has_tag, in_section, in_category: The values are lists of tags, sections or categories, respectively, at least one of which must match (case-insensitively). See the methods with these names above.
  • is_post: If set to True, will match if the page is a blog post; if set to False will match if the page is not a blog post.
  • exclude_url: The page with this URL should be omitted from the results (normally the calling page).

Searching/filtering using SQL

An MDContentList has three methods for examining the content using an SQLite in-memory database:

  • get_db(self): Builds a SQLite database containing a single table, content, whose structure is described below. Returns a connection to this database which can then be worked with using normal sqlite3/DBAPI methods. The database has a locale-sensitive collation called locale (which applies locale.strxfrm) and a custom function casefold (which simply applies the Python casefold string method). The row factory is sqlite3.Row, so row fields can be read using either column names or integer indices.

  • get_db_columns(self): Returns a simple list of the columns in the content table.

  • page_match_sql(self, where_clause=None, bind=None, order_by=None, limit=None, offset=None, raw_sql=None, raw_result=False, first=False): Either where_clause or raw_sql must be specified. In either case, if bind is specified, the bind variables there will be applied to the SQL upon execution. If order_by (a string), limit or offset (integers) are specified, they will be appended to the SQL before executing it against the database connection. The result will be a MDContentList unless raw_result is True, in which case it is a cursor object. (If raw_result is False but raw_sql is supplied, the column list in the SQL select statement must include source_file so as to permit the construction of an appropriate MDContentList). If first is True, only the first item from the results is returned (or None, if the results are empty).

The content table constructed by get_db() always contains the columns source_file, source_file_short, url target, template, MTIME, DATE, doc, and rendered. In addition, it contains each page metadata field that appears in any of the entries in the MDContentList in question. These will be added as columns with the page_ prefix; for instance, the title field will become page_title.

It should be noted that all page fields added to the table will have to match the regular expression ^[a-z]\w*$. Thus, any metadata field with a key that is all uppercase, titlecased, or contains non-word characters (such as hyphens) will be omitted. Also, field names are case-sensitive in the raw metadata, but case-insensitive in the database table, so inconsistently capitalized field names may lead to unexpected results.

A field value that is not either string, integer, float, boolean, date, datetime, or None, will be serialized using json.dumps() with ensure_ascii set to False (for easier utf-8 matching). Dates and datetimes are stringified. Booleans will be represented as 1 or 0.

Sorting

All of these return a new MDContentList object with the entries in the specified order.

  • sorted_by(self, key, reverse=False, default_val=-1): A general sorting method. The key is the page variable to sort on, default_val is the value to assume if there is no such variable present in the entry, while reverse indicates whether the sort is to be descending (True) or ascending (False, the default).

  • sorted_by_date(self, newest_first=True, date_key='DATE'): Sorting by date, newest first by default. The date key to sort on can be specified if desired.

  • sorted_by_title(self, reverse=False): Sorting by page.title, ascending by default.

Pagination

  • paginate(self, pagesize=5, context=None): Divides the MDContentList into chunks of size pagesize and returns a tuple consisting of the chunks and a list of page_urls (one for each page, in order). If an appropriate template context is provided, pages 2 and up will be written to the webroot output directory to destination files whose names are based upon the URL for the first page (and the page number, of course). Without the context, the page_urls will be None. It is the responsibility of the calling template to check the _page variable for the current page to be rendered (this defaults to 1). Each iteration will get all chunks and must use this variable to limit itself appropriately.

Typical usage of paginate():

<%
  posts = MDCONTENT.posts()
  chunks, page_urls = posts.paginate(5, context)
  curpage = context.get('_page', 1)
%>

% for post in chunks[curpage-1]:
  ${ show_post(post) }
% endfor

% if len(chunks) > 1:
  ${ prevnext(len(chunks), curpage, page_urls) }
% endif

Render to an arbitrary file

  • def write_to(self, dest, context, extra_kwargs=None, template=None): Calls a template with the MDContentList in self as the value of CHUNK and write the result to the file named in dest. The file is of course relative to the webroot. Any directories are created if necessary. The template is by default the calling template while extra_kwargs may be added if desired.

Typical usage of write_to():

<%
  if not CHUNK:
     for tag in tags:
         tagged = MDCONTENT.has_tag([tag])
         if not tagged:
             continue  # avoid potential infinite loop!
         outpath = '/tags/' + slugify(tag) + '/index.html'
         tagged.write_to(outpath, context, {'TAG': tag})
%>

% if CHUNK:
  ${ list_tagged_pages(TAG, CHUNK) }
% else:
  ${ list_tags() }
% endif

Site search

Using Lunr

Lunr is the only search solution "natively" supported by wmk. That being said, implementing site search is not a simple matter of turning lunr indexing on. It takes a bit of work by the author of the site or theme templates, so depending on your needs it may even be easier to base your search functionality on another solution.

With lunr_index (and optionally lunr_index_fields) in wmk_config.yaml, wmk will build a search index for Lunr.js and place it in idx.json in the webroot. In order to minimize its size, no metadata about each record is saved to the index. Instead, a simple list of pages (with title and summary) is placed in idx.summaries.json. The summary is taken either from one of the frontmatter fields summary, intro or description (in order of preference) or, failing that, from the start of the page body.

If lunr_languages is present in wmk_config.yaml, stemming rules for those languages will be applied when building the index. The value may be a two-letter lowercase country code (ISO-639-1) or a list of such codes. The currently accepted languages are de, da, en, fi, fr, hu, it, nl, no, pt, ro, and ru (this is the intersection of the languages supported by lunr.js and NLTK, respectively). The default language is en. Attempting to specify a non-supported language will raise an exception.

The index is built via the lunr.py module and the stemming support is provided by the Python Natural Language Toolkit.

For information about the supported syntax of the search expression, see the Lunr documentation.

Limitations of Lunr

  • Building the index does not mean that the search functionality is complete. It remains to point to lunr.js in the templates and write some javascript to interface with it and display the results. However, since every website is different, this cannot be provided by wmk directly. It is up to the template (or theme) author to actually load the index and present a search interface to the user.

  • Similarly, if a "fancy" preview of results is required which cannot be fulfilled using the information in idx.summaries.json, this must currently be solved independently by the template/theme author.

  • Note that only the raw content document is indexed, not the HTML after the markdown (or other input content) has been processed. The only exception to this is that the binary input formats (DOCX, ODT, EPUB) are converted to markdown before being indexed. The output of templates (including even text resulting from shortcodes called from the content documents) is not indexed either.

  • Because Lunr creates a single index file for the whole site, it may not be a practical option for large sites with lots of content – a realistic limit may be somewhere around 1,000 pages or so. Some other client-side search solutions break the index into smaller chunks and may therefore be a viable option for such sites.

Overview of alternative solutions

If you are looking for an alternative to lunr, the first thing to consider is whether a server-based solution is needed or whether a Javascript-based client-side solution would be enough.

If the site has a lot of text (more than 200,000 words or so) or if it needs to work even without Javascript, then a server-based solution is required. You then need to decide whether you want to self-host it or if you are ready to pay for a third-party hosted solution. Meilisearch is open source and allows for self-hosting (although a hosted solution called Meilisearch Cloud is also available), while the market leader in hosted site search is probably Algolia.

If, however, a client-side Javascript solution is sufficient, there are several alternatives to lunr that could come into consideration, e.g. Pagefind, Tinysearch, Elasticlunr or Stork.

Whichever solution is picked, you of course need to add the required HTML, CSS and Javascript to the templates for the search functionality to work. You also need to take care of updating the search index whenever the site is built.

Assuming you have opted not to use the built-in lunr support, the index creation/updating step can basically be implemented in two ways:

  1. By running after the build step has finished via a cleanup_commands entry in wmk_config.yaml. This calls a script or another external program which can update the index based on either the HTML in the output folder or the JSON file specified using the mdcontent_json configuration option.

  2. By implementing a hook function in wmk_hooks.py (or wmk_theme_hooks.py), most likely for post_build_actions() or index_content(); see the "Overriding and extending wmk via hooks" section below.

Example: Pagefind

Taking Pagefind as an example of the steps described above, you would, per their documentation, add something similar to this to your templates in an appropriate location:

<link href="/pagefind/pagefind-ui.css" rel="stylesheet">
<script src="/pagefind/pagefind-ui.js"></script>
<div id="search"></div>
<script>
    window.addEventListener('DOMContentLoaded', (event) => {
        new PagefindUI({ element: "#search", showSubResults: true });
    });
</script>

It would also be a good idea to make sure you modify all base templates so as to identify the main part of each page with the data-pagefind-body attribute and thus omit repeated elements such as navigation and footer from the index.

Finally, in order to actually create or update the search index whenever the site is built, you would need to add the following to the wmk_config.yaml file:

cleanup_commands:
  - "npx -y pagefind --site htdocs"

This obviously assumes that you have npm installed on your system.

Overriding and extending wmk via hooks

Much of the functionality of wmk can be changed by overriding or extending specific steps it performs. This is done by adding Python code to a file named wmk_hooks.py in the project py/ directory. Themes can do the same thing via the wmk_theme_hooks.py file in the theme's py/ directory. If both try to affect the same functionality, the project directory takes precedence.

Currently, the following defs from wmk.py can be extended by running hooks before or after them, or can be redefined entirely:

  • auto_nav_from_content
  • binary_to_markdown
  • build_lunr_index
  • copy_static_files
  • doc_with_yaml
  • fingerprint_assets
  • generate_summary
  • get_assets_map
  • get_content_extensions
  • get_content
  • get_extra_content
  • get_index_yaml_data
  • get_nav
  • get_template_lookup
  • get_template_vars
  • get_templates
  • handle_redirects
  • handle_shortcode
  • handle_taxonomy
  • index_content
  • locale_and_translation
  • lunr_summary
  • markdown_extensions_settings
  • maybe_extra_meta
  • maybe_save_mdcontent_as_json
  • pandoc_extra_formats
  • pandoc_metadata
  • parse_dates
  • post_build_actions
  • postprocess_html
  • preferred_date
  • process_assets
  • process_content_item
  • process_markdown_content
  • process_templates
  • render_markdown
  • run_init_commands
  • run_cleanup_commands
  • write_redir_file

In order to override any of these entirely, define a function of the same name in the hooks file. One may also define a function that runs before or after:

  • A function that runs before any of the above has the same name but with __before appended, e.g. index_content__before. It receives the arguments passed to the original function and can modify them and return new arguments in the form of either a two-tuple of a list and a dict (for *args and **kwargs) or a single dict (for **kwargs only). In either case, these will be passed to the affected function instead of the original arguments. If the before hook function returns nothing, the original arguments will be passed on unchanged.
  • A function that runs after any of the above has the same name but with __after appended, e.g. index_content__after. It receives the return value of the original function and can return a new value that will be returned to the caller instead. (If it returns nothing, the original return value will be returned unchanged).

You should examine wmk's source code to make sure that any replacement function you may write is compatible with the original in terms of its parameters and possible return values. Updates to wmk may of course make it necessary to change your hook functions.

Examples

Here is a generic get_extra_content() def which adds HTML pages fetched from a database to the "normal" content from the content/ directory:

def get_extra_content(content, ctdir, datadir, outputdir, template_vars, conf):
    known_ids = set([_['data']['page']['id'] for _ in content])
    content_extensions = { '.html': {'raw': True}, }
    extpat = re.compile(r'\.html$')
    result = _get_articles_from_database()
    for i, row in enumerate(result):
        meta, doc, pseudo = _munge_row(row, i, result, ctdir)
        wmk.process_content_item(
            meta, doc, content, conf, template_vars,
            ctdir, outputdir, datadir, content_extensions, known_ids,
            pseudo['root'], pseudo['fn'],
            pseudo['source_file'], pseudo['source_file_short'],
            extpat, False)

The functions _get_articles_from_database() and _munge_row() are left as an exercise for the reader.

Here is an __after hook for maybe_extra_meta() which fetches a conference schedule (e.g. from from an online calendar) if the conference_id key is present in the frontmatter. The retrieved information will then be available to the templates for that page as page.schedule.

def maybe_extra_meta__after(meta):
    if 'conference_id' in meta:
        meta['schedule'] = _get_conference_schedule(meta['conference_id'])
    return meta

A third example: Let's say you want to show information from a few RSS sources in a sidebar that will appear on several pages. In order to avoid refetching it for each page you can use something like this:

def get_template_vars__after(template_vars):
    if 'rss_sources' in template_vars:
        template_vars['rss_info'] = fetch_rss_feeds(template_vars['rss_sources'])
    return template_vars

This assumes that you set rss_sources in the template_context section of your wmk_config.yaml file.

Incorporating external sources

A wmk-maintained website may incorporate material that does not originate as content files in the site's content/ directory. The source of the material may be a database or an external API, perhaps provided by a headless CMS system such as Sanity, Directus, or DatoCMS.

In either case, there are two main approaches as to how to integrate such content into a wmk site. The first is to use the hooks system described earlier, especially get_extra_content(). The second is to fetch the material independently of wmk (or perhaps from the init_commands that can be specified in the configuration file) and write it as a set of html or markdown files into content/, whereupon wmk can treat it as normal file-based content.

Example: Import from WordPress

As an example of the latter approach, a set of scripts is available in the extras/ subdirectory to fetch and maintain content from a WordPress site.

The script wordpress2content.py uses the WordPress REST API to get posts and pages from a WordPress site and export them as content files in content/. Images and other media files from the origin's wp-content/uploads/ folder go into static/_fetched/.

This may either be used to migrate from WordPress to a static site maintained by wmk, or to use a (possibly non-public) WordPress installation as a headless CMS for external authors or non-technical users.

When used in the latter way, the helper scripts duplicate_wp_content.py and removed_wp_content.py may help with the housekeeping involved in keeping the content properly synchronized.

For further details, see the readme in the extras/ directory.