-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
(obsolete) feat: add bibliography support #200
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First thanks for putting in the effort of setting up this infra structure and integrating it with doc-gen. It does produce good results!
That being said, there are a couple of issues (apart from the Python) that need to be addressed before this can be merged. The most pressing one being, that you are effectively doing a state monad by hand. This complicates a lot of your implementation unnecessarily. If you track the state in `HtmlM, you should be able to reduce the complexity of your implementation a lot.
Besides this and other issues/questions I've marked I would again like to bring up the question: Why should we merge this with Python as is? I can only see two reasons for allowing another language into this project:
- It is currently not feasible to implement this properly otherwise. For example doing some CSS/JS is absolutely necessary. With C projects like MD4Lean it is certainly doable but not short time feasible to implement a parser and HTML generator + it relies on a C toolchain which Lean is already integrated with as of today.
- The feature is so high priority that we need to have it right now and can thus throw this principle overboard.
I do not see 1. holding here, given the fact that people have already started work on a BibTex parser. And I do also not see 2. holding here, at least I personally don't see us loosing users due to the fact that the documentation tool is not able to provide proper linkification of bibliography.
Do you have compelling arguments for why either 1 or 2 hold or a potential third reason for why we should pull in the additional dependency on Python as is?
README.md
Outdated
@@ -36,6 +36,8 @@ need to serve them from a proper http server for it to work. An easy way to do t | |||
In order to compile itself `doc-gen4` requires: | |||
- a Lean 4 or `elan` installation | |||
- a C compiler if on Linux or MacOS (on Windows it will use Lean's built-in clang compiler) | |||
- for `references.bib` parsing, an installation of `pybtex` (https://pybtex.org/) is needed, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should a) give clear instruction on what to install and b) the instructions should force the user to install one particular version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a) give clear instruction on what to install
Install the package pybtex
if you're using Linux and the package manager have this; or install python/pip and run pip install pybtex
. Is this OK?
b) the instructions should force the user to install one particular version.
I'm not sure about this, perhaps a version which is not too old should work, since we are only calling command line tools, not calling libraries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we go through with the python approach (which I'm still not seeing) I would prefer to have a clear:
pip install pybtex==x.y.z
with a clear version that we have tested and can guarantee works. Doing just pip install
or using a package manager has the potential of causing behavior on other people's machines that we do not know about due to version differences.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've updated the README, now pybtex
is optional, as the built-in bibtex parser is work in progress. I hope this is OK.
(basePath / "output.tmp").toString | ||
] | ||
} | ||
-- parse the returned XML file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to write straight to stdout here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me study if it supports stdin/stdout.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, the command line tool does not support reading from stdin and writing to stdout.
Maybe the python compromise here is to have doc-gen consume the json file, and let mathlib convert its bib file to json (via python) during the build? |
I think this is also OK if current mathlib build process already uses python. I can change the build script to recognize both |
This just amounts to changing the feature from "use python and it's integrated with doc-gen" to "use python and it's less integrated with doc-gen". That's not acceptable, the solution needs to be properly integrated with doc-gen and should either use only Lean or if someone has a compelling argument for why it has to be |
No, the user is not needed to use python, if they can generate /-- The structure representing a processed bibitem. -/
structure BibItem where
/-- The cite key as in the bib file. -/
citekey : String
/-- The tag generated by bib processor, e.g. `[Doe12]`.
Should be plain text and should not be escaped. -/
tag : String
/-- The HTML generated by bib processor, e.g.
`John Doe. <i>Test</i>. 2012.`-/
html : String
/-- The plain text form of `html` field. Should not be escaped. -/
plaintext : String
deriving FromJson, ToJson |
Yes that is the precise issue, the whole feature will be less integrated. There will be no trivial way to generate the |
Users could write the json file by hand if wanted. Agreed that's less convenient than doing it in Either way, I think there's a lot of good stuff in this PR that can be cleaned up and reused whichever way this python thing goes, so maybe discussion here should focus on the other technical aspects and leave the Python discussion to an issue / zulip? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay I think the monadic part is almost there now, already looking much cleaner than in the first attempt!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is looking very promising now. After these couple of nitpicks the only thing that's left is to replace pybtex with the Lean Bibtex parser. Note that I did not take a look at Pybtex.lean
for that reason.
After the bibtex parser has removed the batteries dependency we can just pull it in here as a dependency already and implement the feature, then merge this PR. Once the bibtex parser gets strong enough to understand mathlib's references.bib
we should be able to simply upgrade it and mathlib gets the feature.
I should say that we need a bibtex renderer, not only parser. There are several things need to be done, for example (1) generate abbreviations from author names and year, which is not trivial (how to generate abbreviations of last names? there are a lot of corner cases); (2) generate the output from bibitem, again there are a lot of corner cases, for example if there are no authors, but only editors, then use editors instead, etc. etc... |
Oops. What's wrong with the build script? I think it has not changed since the last successful build. |
Yes but batteries changed, I fixed this on main. If you rebase this branch on top of it it should work again. |
You merged the branch, please do a rebase instead, I prefer to keep a linear history in doc-gen if possible (which it is here) |
Sorry for that. I think it's better for me to re-open a new PR when all of this are done; now some of the codes are still WIP. |
# Conflicts: # DocGen4/Output.lean # Main.lean # lakefile.lean
Just to make this clear: I will not merge code that uses |
This is not to say that this PR has generally no chance of being merged, as previously explained: All of the infrastructure that you set up here is good after the rounds of review now. However we have the capability to do BibTex things ourselves with |
I'll re-open a PR without Python alternatives when the |
I think I'll close this PR. The branch will be kept as a historical reference. |
This PR adds bibliography support to doc-gen4. It includes:
DocGen4/Output/References.lean
which contains generic function support for bibliography. It is independent of the actual implementation of bib file formatting.DocGen4/Output/Pybtex.lean
which contains an implementation of bib file formatting by calling external Python programpybtex
. In the future this can be replaced by a pure Lean implementation.DocGen4/Output/DocString.lean
which analyzes citations in markdown text, inserting and modifying citation links, and collect back references.MonadState
andmodify
.bibPrepass
and change the build file accordingly.To use it in mathlib, the
references.bib
in mathlib needs to be changed slightly, as it contains something unrecognized bypybtex
. A PR to mathlib will be open later.This PR has all commit history squashed. To see the full history, see the branch https://github.com/acmepjz/doc-gen4/tree/test2.