Pub speed #21

cameronneylon · 2014-07-25T09:51:38Z

I've done a couple of things here to enable me to use the codebase for a different project. Alongside this I've introduced a couple of fixes for problems I've found.

There are a bunch of new files in the examples folder, these aren't really necessary for anything, its just where I've been working but they might be useful examples.
models.py - New elements added to the Article model to enable tracking of submit and accept date based on pubmed information
pubspeed-excel report added for the external project
An openaire scraper and relevant tests (this was earlier work but should probably be incorporated)
scrapers/pubmed.py Various additions (see below)

In scrapers/pubmed.py I made a series of changes.

First of these is that pubmed contains invalid dates. I wrapped date parsing in a try: except: clause and set it to unknown if it fails. Not sure whether this is the best approach but it worked for what I needed. Line ~184

I added end_date as a parameter for fetch_batch, include the end_date where it is called and added code to incorporate it in the parameters. I don't know whether this matters or not but I was trying to solve a problem where date searching pubmed seems to be broken. Lines 160-200.

Modified Article.create() to use Article.create_or_update_by_doi() as the period batching was duplicating article creation (because of the date search issue giving the same article in multiple periods). If there is no DOI it defaults to the old approach.

…er testing required.

Need to add final tests to ensure we actually get the right answer and then would probably be a good idea to re-factor the test cases for a cleaner approach and more readable code.

Could still do with cleaning up the test cases and iterating over them in a cleaner fashion but this works and is giving the right answers.

Modified models.Article to include submit and accept dates and scrapers.pubmed to obtain those dates from Pubmed XML. Created a modified version of the excel report to dump out results and sh and yaml files in examples/pubspeed for testing purposes.

Running the loading and report generation from a python script for convenience. Some modifications still needed to the processing step and somewhat unclear how the Pubmed searching is currently working.

…ug fixes to prevent problems

…earlier versions

cameronneylon added 16 commits May 28, 2014 09:46

Added openaire scraper and tests. Not functional at this point, furth…

57f0f44

…er testing required.

OpenAIRE scraper functioning and part tested.

ba08a23

Need to add final tests to ensure we actually get the right answer and then would probably be a good idea to re-factor the test cases for a cleaner approach and more readable code.

OpenAIRE scraper now fully tested.

746a25f

Could still do with cleaning up the test cases and iterating over them in a cleaner fashion but this works and is giving the right answers.

Added Wellcome example

ae608f7

Final versions of the openaire plugin with some messy bits removed

0f632f6

PubSpeed Initital Commit of Changes

7081091

Modified models.Article to include submit and accept dates and scrapers.pubmed to obtain those dates from Pubmed XML. Created a modified version of the excel report to dump out results and sh and yaml files in examples/pubspeed for testing purposes.

PubSpeed initial working version

9e16d08

Running the loading and report generation from a python script for convenience. Some modifications still needed to the processing step and somewhat unclear how the Pubmed searching is currently working.

Updates for the pubspeed project with modifications to pubmed.py

629c13e

Cleaninup pubspeed branch to test things for Kevin

692d5b2

Added try-except clause to catch invalid dates and set them to Unknown

9a61ee5

Modified to enable script driven shell execution, a series of small b…

ff9aeaa

…ug fixes to prevent problems

Fix for date range bug at Pubmed eutils API

6c81fc9

Sorting out conflicts between my version and Ana's update

6dff950

Updating the pubmed.py version with and other fixes from Ana's branch

7fd5bf4

Added new example in pubspeed directory under update

e322bb8

Finalised a running version of pubspeed project in examples, removed …

c894574

…earlier versions

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pub speed #21

Pub speed #21

cameronneylon commented Jul 25, 2014

Pub speed #21

Are you sure you want to change the base?

Pub speed #21

Conversation

cameronneylon commented Jul 25, 2014