Skip to content

Commit

Permalink
Various updates
Browse files Browse the repository at this point in the history
  • Loading branch information
RikDTJanssen committed Mar 10, 2023
1 parent 6f1cff3 commit 0f0a78c
Show file tree
Hide file tree
Showing 2 changed files with 44 additions and 2 deletions.
12 changes: 10 additions & 2 deletions docs/ricgraph_future_work.md
Original file line number Diff line number Diff line change
@@ -1,15 +1,23 @@
## Future work

* Create an end user web interface. This interface should allow
easy and faceted browsing of Ricgraph.
easy and faceted browsing of Ricgraph.
You can use *Ricgraph explorer*, but it is very basic now.
* Modify Ricgraph to allow the use of another, preferably open source graph database engine.
It should be possible by changing minor bits of the code in file *ricgraph.py*.
* Make a web service of Ricgraph.
* Write harvesting scripts to get information from e.g. [Zenodo](https://zenodo.org),
[ORCID](https://orcid.org), [OpenAlex](https://openalex.org),
[ORCID](https://orcid.org), ~~[OpenAlex](https://openalex.org)~~,
[Scopus](https://www.scopus.com), [Lens](https://www.lens.org),
[OpenAIRE](https://explore.openaire.eu),
[DataCite Commons](https://commons.datacite.org),
[GitHub](https://github.com) (and other Gits), etc.
* Function `merge_two_personroot_nodes()` in *ricgraph.py* now uses `_graph.delete()`
from *py2neo*, but that call has the side effect of removing nodes with more than one edge,
e.g. the organization nodes in *harvest_uustaffpages_to_ricgraph.py*
(after the call to `rcg.merge_personroots_of_two_nodes()`
and then `merge_two_personroot_nodes()`
there is only one organization node left).
It should use `_graph.separate()`, but the author did not get it working.

[Return to main README.md file](../README.md).
34 changes: 34 additions & 0 deletions docs/ricgraph_programming_examples.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,40 @@ E.g., for research outputs you can adjust
the years to harvest with the parameter *PURE_RESOUT_YEARS* and the maximum number of
records to harvest with *PURE_RESOUT_MAX_RECS_TO_HARVEST*.

### Harvest of Utrecht University staff pages

There is also a script for harvesting
the [Utrecht University staff pages](https://www.uu.nl/medewerkers),
*harvest_uustaffpages_to_ricgraph.py*.
This script needs the parameter *uustaff_url* to be set in the
[Ricgraph initialization file](ricgraph_install_configure.md#ricgraph-initialization-file).

### Harvest of OpenAlex

There is also a script for harvesting
the [OpenAlex](https://openalex.org), *harvest_openalex_to_ricgraph.py*.
It harvests OpenAlex Works, and by harvesting these
Works, it also harvests OpenAlex Authors.
This script needs the parameters *organization_name*, *organization_ror*
and *openalex_polite_pool_email* to be set in the
[Ricgraph initialization file](ricgraph_install_configure.md#ricgraph-initialization-file).

There is a lot of data in OpenAlex, so your harvest may take a long time. You may
reduce this by adjusting parameters at the start of the script. Look in the section
"Parameters for harvesting persons and research outputs from OpenAlex":
*OPENALEX_RESOUT_YEARS* and *OPENALEX_MAX_RECS_TO_HARVEST*.

### Order of running the harvesting scripts
The order of running the harvesting scripts does not really matter. The author harvests
only records for Utrecht University and uses this order:
1. *harvest_pure_to_ricgraph.py* (since it has a lot of data which is mostly correct);
1. *harvest_yoda_datacite_to_ricgraph.py* (not too much data, so harvest is fast, but it
contains several data entry errors);
1. *harvest_rsd_to_ricgraph.py* (not too much data);
1. *harvest_uustaffpages_to_ricgraph.py*;
1. *harvest_openalex_to_ricgraph.py* (a lot of data from a [number of
sources](https://docs.openalex.org/additional-help/faq#where-does-your-data-come-from)).

### General program structure of a Python program using Ricgraph

```python
Expand Down

0 comments on commit 0f0a78c

Please sign in to comment.