Skip to content

Commit

Permalink
Compatibility Quarto, update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
jdutant committed Dec 16, 2024
1 parent e01e058 commit 69e631f
Show file tree
Hide file tree
Showing 7 changed files with 179 additions and 98 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/ci.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ jobs:
# - 2.19.2

container:
image: pandoc/latex:${{ matrix.pandoc }}
image: pandoc/core::${{ matrix.pandoc }}

steps:
- name: Checkout
Expand Down
207 changes: 148 additions & 59 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,20 @@ Recursive-citeproc
Overview
------------------------------------------------------------------

[BibTeX's documentation][BibTeX] allows self-citing
bibliographies, that is bibliography entries citing other
bibliography entries in note, title or abstract fields. These
aren't handled properly by [Pandoc][]'s and [Quarto][]'s internal
bibliography engine, Citeproc. This filter extends Citeproc's
abilities to cover self-citing bibliographies.

The filter acts as drop-in replacement for Citeproc. It still runs
Citeproc in the background: bibliography style files are applied
as expected.

Background
------------------------------------------------------------------

BibTeX bibliographies can *self-cite*: one bibliography entry
may cite another entry. That is done in two ways: the `crossref`
field to cite a collection from which an entry is extracted
Expand All @@ -43,81 +57,100 @@ field to cite a collection from which an entry is extracted
LaTeX's bibliography engines (`natbib`, `biblatex`) handle
self-citations of both kinds.

[Pandoc][] and [Quarto][] can use those engines but for PDF output
only. They come instead with their own engine, [Citeproc][], which
conveniently uses [citation styles files][CSLs] and covers all
output formats.
[Pandoc][] and [Quarto][] can use those engines but only for PDF
output. They come instead with their own engine, [Citeproc][],
which conveniently uses [citation styles files][CSLs] and covers
all output formats.

However, Citeproc only handles `crossref` self-citations. It fails
to process citation commands in bibliographies.

This filter enables Citeproc to process cite commands in the
bibliography. It ensures that the self-cited entries are displayed
in the document's bibliography.

Are self-citing bibliographies a good idea? It ensures consistency
by avoiding multiple copies of the same data, but creates
dependencies between entries. The [citation sytle language][CSLs]
doesn't seem to permit it. Be that as it may, many of us have
legacy self-citing bibliographies, so we may as well handle them.

However, Citeproc only handles `crossref` self-citations.
It fails to process citation commands in bibliographies.
Requirements
------------------------------------------------------------------

This filter enables Citeproc to process cite commands in
the bibliography. It ensures that the self-cited entries
are displayed in the document's bibliography.
Pandoc 2.17+ or Quarto 1.4+

Are self-citing bibliographies a good idea? It ensures
consistency by avoiding multiple copies of the same
data, but creates dependencies between entries. The
[citation sytle language][CSLs] doesn't seem to
permit it. Be that as it may, many of us have legacy
self-citing bibliographies, so we may as well
handle them.
_Note_. Version 1 of this filter does not work with Pandoc 3.1.10+
and Quarto 1.4.530+. If switching from version 1 to current
version, make sure you do not call `-C` or `--citeproc` in Pandoc
or set `citeproc: false` in Quarto. See
[below](#how-the-filter-works) for details.

Usage
------------------------------------------------------------------

This filter remplaces Citeproc.

The filter modifies the internal document representation; it can
be used with many publishing systems that are based on Pandoc.

When using several filters on a document, this filter must
be placed:
* after any filter that adds citations to the document,
* before Citeproc or Quarto

The filter must be used in combination with Citeproc.

### Plain pandoc

Pass the filter to pandoc via the `--lua-filter` (or `-L`) command
line option, followed by Citeproc (`--citeproc` or `-C`):
line option:

pandoc --lua-filter recursive-citeproc.lua -C ...
pandoc --lua-filter recursive-citeproc.lua ...

Or via a defaults file:

``` yaml
filters:
- recursive-citeproc.lua
- citeproc
```
Copy the file in your Pandoc user data directory to make
it available to Pandoc anywhere. Run `pandoc -v` to see
where your Pandoc user data directory is.

__Do not use Citeproc__. Do not use the `--citeproc` or `-C`
option in combination with this filter. If applied before the
filter, it is redundant; if after, it generates a duplicate
bibliography.

### Quarto

Users of Quarto can install this filter as an extension with

quarto install extension tarleb/recursive-citeproc.git
quarto install extension dialoa/recursive-citeproc.git

and use it by adding `recursive-citeproc` to the `filters` entry
in their YAML header, before `quarto`.
in their YAML header. You should also deactivate Citeproc:

``` yaml
---
citeproc: false
filters:
- recursive-citeproc
---
```

If you use other filters and specify their order relative to
Quarto, it is safer to run this filter after Quarto's own:

``` yaml
---
citeproc: false
filters:
- ...
- quarto
- recursive-citeproc
---
```

You must explicitly specify that the filter comes before Quarto's own,
by default Quarto runs its own (incl. Citeproc) first.

### R Markdown

Use `pandoc_args` to invoke the filter, followed by Citeproc. See
Use `pandoc_args` to invoke the filter. See
the [R Markdown
Cookbook](https://bookdown.org/yihui/rmarkdown-cookbook/lua-filters.html)
for details.
Expand All @@ -126,28 +159,32 @@ for details.
---
output:
word_document:
pandoc_args: ['--lua-filter=recursive-citeproc.lua', '--citeproc']
pandoc_args: ['--lua-filter=recursive-citeproc.lua']
---
```

__Do not use Citeproc__. Before this filter, it is redundant;
after, it duplicates the bibliography.

Options
------------------------------------------------------------------

You can specify the filter's maximum recursive depth in the
document's metadata. Use 0 for infinte (default 100):
document's metadata. Use 0 for infinte (default 10):

```
recursive-citeproc:
max-depth: 5
```

A `max-depth` of 2, for instance, means that the filter inserts
references that are only cited by references cited in the document's
body, but not references that are only cited by references that are
themselves only cited by references cited in the document.
references that are only cited by references cited in the
document's body, but not references that are only cited by
references that are themselves only cited by references cited in
the document.

If the max depth is reached before all self-recursive citations are
processed, PDF output may generate an error.
If the max depth is reached before all self-recursive citations
are processed, PDF output may generate an error.

Testing
------------------------------------------------------------------
Expand All @@ -156,10 +193,11 @@ To try the filter with Pandoc or Quarto, clone the directory.

### Pandoc

Generate Pandoc outputs with `make generate`. Change the output format
with `make generate FORMAT=docx`. Use `FORMAT=latex` for latex
outputs. You can list multiple formats, `make generate FORMAT="docx pdf"`.
The outputs will be in the `test` folder, named `expected.<format>`.
Generate Pandoc outputs with `make generate`. Change the output
format with `make generate FORMAT=docx`. Use `FORMAT=latex` for
latex outputs. You can list multiple formats, `make generate
FORMAT="docx pdf"`. The outputs will be in the `test` folder,
named `expected.<format>`.

Requires [Pandoc][].

Expand All @@ -172,29 +210,80 @@ Requires [Quarto][].
### Pandoc within Quarto

With [Quarto][] installed, you can also use the Pandoc engine
embedded in Quarto: add the argument `PANDOC="quarto pandoc"` to the
Pandoc commands above, e.g. `make generate FORMAT=docx
embedded in Quarto: add the argument `PANDOC="quarto pandoc"` to
the Pandoc commands above, e.g. `make generate FORMAT=docx
PANDOC="quarto pandoc"`.


How the filter works
------------------------------------------------------------------

The filter adds a Citeproc-generated bibliography to the document,
which may contain citation commands, and sets the metadata key
`suppress-bibliography` to `true`. When Citeproc itself is run
on the result, the bibliography's citation commands are converted
to text.

The filter's main task is to ensure that its Citeproc-generated
bibliography contains all the document's citations, including
those that may only appear in the bibliography itself. To do
that, it checks whether the result of generating a bibliography
with Citeproc adds new citations. If it does, the filter
adds those new citations in the metadata `nocite` field
and tries to generate the bibliography again, and so on
until generating the bibliography doesn't produce any citation
that is not already present in the bibliography.
### Version 2.0.0+

Version 2 is meant to replace Citeproc. It returns the document
appended with a `refs` Div containing Citeproc bibliography
output.

The filter runs Citeproc on the document and checks whether the
generated bibliography contains citations. If not, it simply
returns the document with bibliography.

If the bibliography contains citations, the filter recursively
runs Citeproc on those citations, generated citations, and so on
recursively until all needed citations are identified. They are
then added to the document's `nocite` metadata field.

Citeproc is then run on the document, which typesets Cite elements
in the document body and adds a bibliography with all needed
entries to cover self-citations. However, Cite elements in the
bibliography may still contain LaTeX cite commands that aren't
typeset yet. To ensure these are typeset, we run Citeproc on the
bibliography itself, and update the document's bibliography with
the result.

The last step of the process generates a duplicate bilbiography
which we discard. There is no way around it since Pandoc 3.1.10:
if we ran Citeproc on the bibliography with
`suppress-bibliography` the Cite commands couldn't be converted to
links. To ensure `link-references` adds links to citations even in
the bibliography, we must leave `suppress-bibliography` to false.

### Version 1.0.0+

Version 1 of this filter was supposed to be run *in combination
with and before* Citeproc.

It added a Citeproc-generated bibliography to the document, which
could contain [Cite
elements](https://pandoc.org/lua-filters.html#type-cite) whose
`content` could contain a LaTeX citation commands, and exited with
the document's metadata key `suppress-bibliography` to `true`.
Citeproc running after this would:

1. convert any LaTeX citation in
the `content` of Cite elements in the the bibliography.
2. add Links to the the `content` of Cite elements, if
document's metadata key `link-references` was `true`,

The filter's main task was to ensure that the Citeproc-generated
bibliography contained all entries cited in bibliography entries,
and entries cited in bibliography entries cited in other
bibliographies entries, and so on. That was done by generating a
the bibliography a first time, checking whether it added citations,
adding them to the metadata `nocite` key and trying again until
no new citations was added or the maximal depth was reached.

Since Pandoc 3.1.10, `suppress-bibliography` deactivates
`link-references`. The filter would still handle self-citing
bibliographies but `link-references` would have no effect:
citations would not be linked to bibliographies. To let Citeproc
link references, we would need to remove `suppress-bibliography`,
but we would then get a duplicate bibliography.

The solution in version 2 was to incorporate the last Citeproc
step within the filter; we run it witout `suppress-bibliography`
for the references to be linked if `link-references` is set and we
take out the duplicate bibliography it outputs.

Credits
------------------------------------------------------------------
Expand Down
19 changes: 12 additions & 7 deletions _extensions/recursive-citeproc/recursive-citeproc.lua
Original file line number Diff line number Diff line change
Expand Up @@ -373,9 +373,12 @@ local function log(type, text)
local level = {INFO = 0, WARNING = 1, ERROR = 2}
if level[type] == nil then type = 'ERROR' end
if level[PANDOC_STATE.verbosity] <= level[type] then
io.stderr:write(
'[' .. type .. '] '..FILTER_NAME..': '.. text .. '\n'
)
local message = '[' .. type .. '] '..FILTER_NAME..': '.. text .. '\n'
if quarto then
quarto.log.output(message)
else
io.stderr:write(message)
end
end
end

Expand All @@ -402,7 +405,7 @@ bibliographies in Pandoc and Quarto
@author Julien Dutant <[email protected]>
@copyright 2021-2024 Julien Dutant
@license MIT - see LICENSE file for details.
@release 1.2.1
@release 2.0.0
]]

local log = require('log')
Expand All @@ -414,8 +417,8 @@ local stringify = pandoc.utils.stringify

-- Pandoc 2.17 for relying on `elem:walk()`, `pandoc.Inlines`, pandoc.utils.type
PANDOC_VERSION:must_be_at_least '2.17'
-- Limit recursion depth
DEFAULT_MAX_DEPTH = 100
-- Limit recursion depth; 10 should do and avoid the appearance of freezing
DEFAULT_MAX_DEPTH = 10
-- Error messages
ERROR_MESSAGES = {
REFS_FOUND = 'I found a Div block with identifier `refs`. This probably means'
Expand Down Expand Up @@ -581,7 +584,9 @@ local function recursiveCiteproc(doc)
doc = runCiteproc(doc)

-- Typeset citations in the bibliography
return typesetCitationsInRefs(doc)
doc = typesetCitationsInRefs(doc)

return doc

end

Expand Down
2 changes: 1 addition & 1 deletion _quarto.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,9 @@ project:

bibliography: example/references.bib
csl: example/chicago-author-date-with-note.csl
citeproc: false

filters:
- quarto
- recursive-citeproc

profile:
Expand Down
Loading

0 comments on commit 69e631f

Please sign in to comment.