-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🐛 Request DOI metadata as CSL-JSON after BibTeX #1073
Conversation
62476ac
to
23adc24
Compare
I don't have any data on which format is going to be better supported by cross-ref. I am pretty sure that bibtex has been around longer? My hesitation on bringing this on in the current state (or my understanding of it at least) is that it is really hard to know what our success rate is for the DOI parsing without this in the wild - I would rather an approach that improves/iterates the existing bibtex implementation rather than fully changes the implementation to using csl. What about keeping the existing bibtex default and adding CSL content negotiation/parsing as a fallback if the bibtex fails to parse or is unavailable from doi.org? That should catch the case referenced and be a bit safer to roll out. |
Good point! I'll rebase on main with the recommended changes. |
23adc24
to
75b8a6a
Compare
} | ||
|
||
export async function parseCSLJSON(source: object[]): Promise<CSL[]> { | ||
return Promise.resolve(cleanCSL(source)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't validate the CSL, but does try to "clean" it. Should we also use Cite.async
here too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So - in parseBibTeX
, Cite.async
gives us bibtex
->CSL
, but here, it would just give us CSL
-> CSL
so maybe unnecessary and just "cleaning" is all that's required?
If all that is true, then I guess the only reason to use Cite.async
is if doi.org gives an invalid CSL response...? Not totally out of the question I suppose... but then Cite.async
is another mostly unnecessary API request?
Any other nuance to this decision?
return Promise.resolve(cleanCSL(source)); | ||
} | ||
|
||
export async function getCitationRenderers(data: CSL[]): Promise<CitationRenderer> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now this function creates renderers for CSL (i.e. offloads parsing)
{ | ||
'container-title': 'Monthly Weather Review', | ||
author: [ | ||
{ given: 'C. H. B.', family: 'PRIESTLEY' }, | ||
{ given: 'R. J.', family: 'TAYLOR' }, | ||
], | ||
DOI: '10.1175/1520-0493(1972)100<0081:otaosh>2.3.co;2', | ||
type: 'article-journal', | ||
issue: '2', | ||
issued: { 'date-parts': [[1972, 2]] }, | ||
page: '81-92', | ||
publisher: 'American Meteorological Society', | ||
title: 'On the Assessment of Surface Heat Flux and Evaporation Using Large-Scale Parameters', | ||
volume: '100', | ||
}, | ||
]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I simplified this to the subset of equal-fields common to both resolvers. I think that's OK; things like the URL and ISBN slightly differ, but I think that's a consequence of the different metadata that's registered.
Works well in web mode, cache seems good and doesn't conflict. From my testing there is a current bug in writing out the bibtex: I think this could/should be isolated into a test around falling back to CSL and then writing bibtex for the DOI of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking great - keeping bibtex as the default but adding a CSL fallback, explicitly making CSL the internal representation.
I don't have any big concerns with the changes to exported functions - there are a couple dependent packages that will need to update, but giving citation-js-utils
a major bump makes that pretty clear.
I do think we need to fix the writeBibtexFromCitationRenderers
functionality before releasing this - Probably for now just a simple function we can call if the bibtex entry is not found buried in the cite._graph
- just take the CSL and return a bibtex string with title/author/year/journal/doi filled in.
} | ||
|
||
export async function parseCSLJSON(source: object[]): Promise<CSL[]> { | ||
return Promise.resolve(cleanCSL(source)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So - in parseBibTeX
, Cite.async
gives us bibtex
->CSL
, but here, it would just give us CSL
-> CSL
so maybe unnecessary and just "cleaning" is all that's required?
If all that is true, then I guess the only reason to use Cite.async
is if doi.org gives an invalid CSL response...? Not totally out of the question I suppose... but then Cite.async
is another mostly unnecessary API request?
Any other nuance to this decision?
6f0751d
to
68538b8
Compare
Good spot on the BibTeX extraction; I'd assumed that we were using citation-js for this and hadn't noticed that instead we were pulling from the graph. This PR now uses citation-js to build a BibTeX output. I'm assuming that we don't need to do anything to tweak the citation key, as it should match the In general, I think this PR represents a hardening of our BibTeX handling in addition to now providing a fallback for bad-BibTeX responses; we don't rely on implementation details from citation-js to store the original BibTeX, for example. There may be some non-round-tripping of certain BibTeX entries, but I think that should be very uncommon; BibTeX is a core plugin for citation-js. |
Yes, even the BibTeX method doesn't need to be async. I suppose I was anticipating that we might want to support async resolvers in future. I'll move them to sync for now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for addressing the bibtex writing - using citation-js
for that is much better. 🙂
I noticed a couple more things when testing today:
First, I don't think this warning needs to be raised:
I don't think the user will care about implementation details of the underlying citation data format?
Also, totally non-existent citations were still causing a crash:
This is related to return undefined
vs throwing an error
I went ahead and fixed these up in a commit: 559cbef
Hopefully that all looks ok? Otherwise, I'm happy with everything.
Co-authored-by: Franklin Koch <[email protected]>
Co-authored-by: Rowan Cockett <[email protected]>
dd9db8e
to
cec2d31
Compare
One more bug I found on
|
The bug in #1072 appears to be caused by the key passed to
citation-js
:This is not valid BibTeX, and therefore does not parse. One solution is to request CSL-JSON via content negotiation. This trades one class of bugs (invalid BibTeX markup) for another (invalid CSL structures). I think this is a better part of the problem space to find oneself in; we can more easily attempt to fix-up bad CSL than we can bad BibTeX.
Indeed, it looks like using CSL-JSON from content-negotiation isn't foolproof either; some DOIs like The Higgs Boson Paper report a
name
instead of aliteral
, or define a type ofjournal-article
instead ofarticle-journal
. It looks like crossref is returning "extended citeproc": CrossRef/rest-api-doc#580 rather than true CSL-JSON.Knowing that, we can probably do a reasonable job of coercing the result to the CSL-JSON that citation-js expects, but for now this PR does the minimum from testing the original DOI and the Higg's DOI.
I'm hoping that this change is more robust than the BibTeX approach, but we will need to try more DOIs than I've thrown at it.
Fixes #1072