Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Discussion] Source files and processing references #51

Open
dalonsoa opened this issue Sep 2, 2020 · 8 comments
Open

[Discussion] Source files and processing references #51

dalonsoa opened this issue Sep 2, 2020 · 8 comments

Comments

@dalonsoa
Copy link
Collaborator

dalonsoa commented Sep 2, 2020

The main use case of R2T2 - at least in my mind - is to annotate libraries and then retrieve the references used when running a script that uses those libraries.

Under this circumstances, two comments come to my mind:

  1. It is important to note that each library should have its own references source file - at least one - and therefore it cannot be indicated by the user when running R2T2. If (let's dream) numpy, scipy and scikit-learn adopt R2T2 and I run a script using them, when processing the used references, each should be look for the information in the corresponding source file.

So, for this to work, we have to enable a way of adding references source files to the BIBLIOGRAPHY object, something like BIBLIOGRPAHY.add_source(path_to_source) in the __init__.py of the library. Then we can probably use the inspect to figure out what's the library adding the source.

  1. As @ChasNelson1990 has pointed out in [WIP] Process bibtex #32, loading the bibtex file each time a bibtex key needs to be processed is expensive and makes no sense. So, whenever a reference is processed, appart for looking for the full reference in the correct source, the loaded source should be cached, so processing further references for the same library does not incur in extra i/o operations.

In summary:

  • What do you think of point 1? Does it makes sense?
  • I believe point 2 and @ChasNelson1990 comments in his PR make Process reference #30 more or less obsolete and we have to come with a better plan. Any suggestion - in addition to the options already suggested by Chas?
@ChasNelson1990
Copy link

Comments on 1:

  • I agree, but I don't think we should force a name or place. In my experience, the more you enforce things, the less people will use them.
  • Question: is there anything about loading a subpackage e.g. scikit-image.io, that would break the run-time trackers if we did have a set file name and a set location? I.e. the bib file would be in skimage's root but the package we're loading doesn't have one. Can't quite think why this would be a problem... but it just popped into my head.
  • Could we define the bibfile location in pyproject.toml or similar? In fact, should that be the place where all arguments are given rather than as argument to the cli. This would bring things in-line with things like pytest options, etc..

@dalonsoa
Copy link
Collaborator Author

dalonsoa commented Sep 3, 2020

  • Well, all packages have an __init__.py file at the root directory, so it does not seem a lot to ask. About the name, the references source can be anywahere (as long as it is distributed with the library).
  • To be honest, I've no idea.
  • I'm afraid not. When we install a package, let's say numpy with pip, none of the pyproject.toml and similar files come with the package, so we cannot rely on any of them.

@ChasNelson1990
Copy link

I thought new-style modules didn't require __init__.py files anymore?

Also, there's a difference between Python enforcing a language standard and us... Personally, I just think that enforcing things never does well... however! maybe I'm just being pessimistic and we should do it and see if anybody complains.

Fairpoint that the toml doesn't come down when we install!

@ChasNelson1990
Copy link

*although I can't find any evidence for my first point right now...

@dalonsoa
Copy link
Collaborator Author

dalonsoa commented Sep 3, 2020

Ok, let's ignore where to put it. Do we agree we need a way for each package to indicate where their reference source is? Does something like BIBLIOGRPAHY.add_source(path_to_source) called somewhere within the library code looks sensible?

@ChasNelson1990
Copy link

I'd be happy with that. It was my original plan when I started yesterday but I decided to use cli-parameters to keep things inline with what was already being used. :-)

@dalonsoa
Copy link
Collaborator Author

dalonsoa commented Sep 3, 2020

Yep, I guess things evolve with time and the complexity of the code. The cli is still neded for users to define in what output format they want the references list, but the input of those references is up to the package using R2T2.

@ChasNelson1990
Copy link

Yea, that makes sense.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants