Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Process bibtex #32

Draft
wants to merge 34 commits into
base: process_reference
Choose a base branch
from

Conversation

ChasNelson1990
Copy link

I've started to add the ability to use bibtex_keys and process them as part of @dalonsoa's doi & process_reference changes.

Currently this is not fully merged with @dalonsoa's ideas - I'm still trying to understand the best way to do this in my brain because his approach would, I think, mean that I had to parse my .bib file on each call of the processor classes (because the class is being reinitialised inside process_reference()) which I want to avoid.

Any suggestions/comments/commits welcome.

@ChasNelson1990
Copy link
Author

Thinking about this a bit more... I think @dalonsoa's DOI processing thoughts and my bibtex processing thoughts might both need rethinking to be compatible. Here are my understandings of the competing limitations:

  1. when working with DOIs, each new DOI has to be a request to, for example, a crossref endpoint (e.g. the bibtex endpoint); this means we call our processor object once for each entry in a BIBLIOGRAPHY
  2. thus, we should cache these individual responses so we only ever make one call to internet (currently this is done with @lru_cache)
  3. when working with bibtex, we only want to parse the bibfile once (and thus do not need to cache individual responses)
  4. the obvious way to do this is to store the parsed bib inside the processor object; however, this means we should initialise the object once for a whole BIBLIOGRAPHY

Possible solution: instead of using lru_cache, we store each DOI endpoint in a self.cache dictionary within the processor object (check there first, call crossref if not there); then, for the bib case we just store the parsed bibtex in self.cache instead?

@ChasNelson1990
Copy link
Author

Then, thinking about the big picture - one would just call PROCESSOR[args.processor](output_format=args.something) once in __main__.py (before the writer) and the class would initialise and run through all BIBLIOGRAPHY.values() and replace '[doi]something' or '[bibtex]something' with 'My output_formatted string'?

@ChasNelson1990
Copy link
Author

Of course, the other way to solve this is not to use a ProcessorBase at all but rather to mirror what has been done in writers.py and just have a defined function for each registered processor.

@ChasNelson1990
Copy link
Author

Also, I realise I have currently set-up the args in __main__.py to assume that we only want a single processor - which is of course not always going to be the case. I'll try to fix that now.

@codecov
Copy link

codecov bot commented Sep 2, 2020

Codecov Report

Merging #32 into process_reference will increase coverage by 20.73%.
The diff coverage is 23.61%.

Impacted file tree graph

@@                  Coverage Diff                   @@
##           process_reference      #32       +/-   ##
======================================================
+ Coverage              29.26%   50.00%   +20.73%     
======================================================
  Files                      6        6               
  Lines                    164      224       +60     
======================================================
+ Hits                      48      112       +64     
+ Misses                   116      112        -4     
Impacted Files Coverage Δ
r2t2/__main__.py 0.00% <0.00%> (ø)
r2t2/reference_processors.py 0.00% <0.00%> (ø)
r2t2/core.py 96.42% <88.88%> (-1.45%) ⬇️
r2t2/static_parser.py 100.00% <100.00%> (+100.00%) ⬆️
r2t2/__init__.py
r2t2/writers.py 55.55% <0.00%> (+55.55%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b4620ed...4b8d3c6. Read the comment docs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants