-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pub speed #21
Open
cameronneylon
wants to merge
16
commits into
ananelson:develop
Choose a base branch
from
cameronneylon:pub-speed
base: develop
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Pub speed #21
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…er testing required.
Need to add final tests to ensure we actually get the right answer and then would probably be a good idea to re-factor the test cases for a cleaner approach and more readable code.
Could still do with cleaning up the test cases and iterating over them in a cleaner fashion but this works and is giving the right answers.
Modified models.Article to include submit and accept dates and scrapers.pubmed to obtain those dates from Pubmed XML. Created a modified version of the excel report to dump out results and sh and yaml files in examples/pubspeed for testing purposes.
Running the loading and report generation from a python script for convenience. Some modifications still needed to the processing step and somewhat unclear how the Pubmed searching is currently working.
…ug fixes to prevent problems
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
I've done a couple of things here to enable me to use the codebase for a different project. Alongside this I've introduced a couple of fixes for problems I've found.
In scrapers/pubmed.py I made a series of changes.
First of these is that pubmed contains invalid dates. I wrapped date parsing in a try: except: clause and set it to unknown if it fails. Not sure whether this is the best approach but it worked for what I needed. Line ~184
I added end_date as a parameter for fetch_batch, include the end_date where it is called and added code to incorporate it in the parameters. I don't know whether this matters or not but I was trying to solve a problem where date searching pubmed seems to be broken. Lines 160-200.
Modified Article.create() to use Article.create_or_update_by_doi() as the period batching was duplicating article creation (because of the date search issue giving the same article in multiple periods). If there is no DOI it defaults to the old approach.