Pandoc/Quarto filter for self-citing BibTeX bibliographies.
BibTeX's documentation allows self-citing bibliographies, that is bibliography entries citing other bibliography entries in note, title or abstract fields. These aren't handled properly by Pandoc's and Quarto's internal bibliography engine, Citeproc. This filter extends Citeproc's abilities to cover self-citing bibliographies.
The filter acts as drop-in replacement for Citeproc. It still runs Citeproc in the background: bibliography style files are applied as expected.
BibTeX bibliographies can self-cite: one bibliography entry
may cite another entry. That is done in two ways: the crossref
field to cite a collection from which an entry is extracted
(see the BibTeX's documentation), or by entering
citation commands, e.g. in a note field:
@incollection{Doe:2000,
author = 'Jane Doe',
title = 'What are Fish Even Doing Down There',
crossref = 'Snow:2000',
}
@book{Snow:2010,
editor = 'Jane Snow',
title = 'Fishy Works',
note = 'Reprint of~\citet{Snow:2000}',
}
@collection{Snow:2000,
editor = 'Jane Snow',
title = 'Fishy Works',
}
LaTeX's bibliography engines (natbib
, biblatex
) handle
self-citations of both kinds.
Pandoc and Quarto can use those engines but only for PDF output. They come instead with their own engine, Citeproc, which conveniently uses citation styles files and covers all output formats.
However, Citeproc only handles crossref
self-citations. It fails
to process citation commands in bibliographies.
This filter enables Citeproc to process cite commands in the bibliography. It ensures that the self-cited entries are displayed in the document's bibliography.
Are self-citing bibliographies a good idea? It ensures consistency by avoiding multiple copies of the same data, but creates dependencies between entries. The citation sytle language doesn't seem to permit it. Be that as it may, many of us have legacy self-citing bibliographies, so we may as well handle them.
Pandoc 2.17+ or Quarto 1.4+
Note. Version 1 of this filter does not work with Pandoc 3.1.10+
and Quarto 1.4.530+. If switching from version 1 to current
version, make sure you do not call -C
or --citeproc
in Pandoc
or set citeproc: false
in Quarto. See
below for details.
This filter remplaces Citeproc.
The filter modifies the internal document representation; it can be used with many publishing systems that are based on Pandoc.
Pass the filter to pandoc via the --lua-filter
(or -L
) command
line option:
pandoc --lua-filter recursive-citeproc.lua ...
Or via a defaults file:
filters:
- recursive-citeproc.lua
Copy the file in your Pandoc user data directory to make
it available to Pandoc anywhere. Run pandoc -v
to see
where your Pandoc user data directory is.
Do not use Citeproc. Do not use the --citeproc
or -C
option in combination with this filter. If applied before the
filter, it is redundant; if after, it generates a duplicate
bibliography.
Users of Quarto can install this filter as an extension with
quarto install extension dialoa/recursive-citeproc.git
and use it by adding recursive-citeproc
to the filters
entry
in their YAML header. You should also deactivate Citeproc:
---
citeproc: false
filters:
- recursive-citeproc
---
If you use other filters and specify their order relative to Quarto, it is safer to run this filter after Quarto's own:
---
citeproc: false
filters:
- ...
- quarto
- recursive-citeproc
---
Use pandoc_args
to invoke the filter. See
the R Markdown
Cookbook
for details.
---
output:
word_document:
pandoc_args: ['--lua-filter=recursive-citeproc.lua']
---
Do not use Citeproc. Before this filter, it is redundant; after, it duplicates the bibliography.
You can specify the filter's maximum recursive depth in the document's metadata. Use 0 for infinte (default 10):
recursive-citeproc:
max-depth: 5
A max-depth
of 2, for instance, means that the filter inserts
references that are only cited by references cited in the
document's body, but not references that are only cited by
references that are themselves only cited by references cited in
the document.
If the max depth is reached before all self-recursive citations are processed, PDF output may generate an error.
To try the filter with Pandoc or Quarto, clone the directory.
Generate Pandoc outputs with make generate
. Change the output
format with make generate FORMAT=docx
. Use FORMAT=latex
for
latex outputs. You can list multiple formats, make generate FORMAT="docx pdf"
. The outputs will be in the test
folder,
named expected.<format>
.
Requires Pandoc.
As above, replacing generate
with qgenerate
.
Requires Quarto.
With Quarto installed, you can also use the Pandoc engine
embedded in Quarto: add the argument PANDOC="quarto pandoc"
to
the Pandoc commands above, e.g. make generate FORMAT=docx PANDOC="quarto pandoc"
.
Version 2 is meant to replace Citeproc. It returns the document
appended with a refs
Div containing Citeproc bibliography
output.
The filter runs Citeproc on the document and checks whether the generated bibliography contains citations. If not, it simply returns the document with bibliography.
If the bibliography contains citations, the filter recursively
runs Citeproc on those citations, generated citations, and so on
recursively until all needed citations are identified. They are
then added to the document's nocite
metadata field.
Citeproc is then run on the document, which typesets Cite elements in the document body and adds a bibliography with all needed entries to cover self-citations. However, Cite elements in the bibliography may still contain LaTeX cite commands that aren't typeset yet. To ensure these are typeset, we run Citeproc on the bibliography itself, and update the document's bibliography with the result.
The last step of the process generates a duplicate bilbiography
which we discard. There is no way around it since Pandoc 3.1.10:
if we ran Citeproc on the bibliography with
suppress-bibliography
the Cite commands couldn't be converted to
links. To ensure link-references
adds links to citations even in
the bibliography, we must leave suppress-bibliography
to false.
Version 1 of this filter was supposed to be run in combination with and before Citeproc.
It added a Citeproc-generated bibliography to the document, which
could contain Cite
elements whose
content
could contain a LaTeX citation commands, and exited with
the document's metadata key suppress-bibliography
to true
.
Citeproc running after this would:
- convert any LaTeX citation in
the
content
of Cite elements in the the bibliography. - add Links to the the
content
of Cite elements, if document's metadata keylink-references
wastrue
,
The filter's main task was to ensure that the Citeproc-generated
bibliography contained all entries cited in bibliography entries,
and entries cited in bibliography entries cited in other
bibliographies entries, and so on. That was done by generating a
the bibliography a first time, checking whether it added citations,
adding them to the metadata nocite
key and trying again until
no new citations was added or the maximal depth was reached.
Since Pandoc 3.1.10, suppress-bibliography
deactivates
link-references
. The filter would still handle self-citing
bibliographies but link-references
would have no effect:
citations would not be linked to bibliographies. To let Citeproc
link references, we would need to remove suppress-bibliography
,
but we would then get a duplicate bibliography.
The solution in version 2 was to incorporate the last Citeproc
step within the filter; we run it witout suppress-bibliography
for the references to be linked if link-references
is set and we
take out the duplicate bibliography it outputs.
Based on an idea given by John MacFarlane on the pandoc-discuss mailing list.
This pandoc Lua filter is published under the MIT license, see
file LICENSE
for details.