Various updates

UtrechtUniversity · Mar 10, 2023 · 0f0a78c · 0f0a78c
1 parent 6f1cff3
commit 0f0a78c
Show file tree

Hide file tree

Showing 2 changed files with 44 additions and 2 deletions.
diff --git a/docs/ricgraph_future_work.md b/docs/ricgraph_future_work.md
@@ -1,15 +1,23 @@
 ## Future work
 
 * Create an end user web interface. This interface should allow
-  easy and faceted browsing of Ricgraph.
+  easy and faceted browsing of Ricgraph. 
+  You can use *Ricgraph explorer*, but it is very basic now.
 * Modify Ricgraph to allow the use of another, preferably open source graph database engine.
   It should be possible by changing minor bits of the code in file *ricgraph.py*.
 * Make a web service of Ricgraph.
 * Write harvesting scripts to get information from e.g. [Zenodo](https://zenodo.org),
-  [ORCID](https://orcid.org), [OpenAlex](https://openalex.org), 
+  [ORCID](https://orcid.org), ~~[OpenAlex](https://openalex.org)~~, 
   [Scopus](https://www.scopus.com), [Lens](https://www.lens.org),
   [OpenAIRE](https://explore.openaire.eu), 
   [DataCite Commons](https://commons.datacite.org), 
   [GitHub](https://github.com) (and other Gits), etc.  
+* Function `merge_two_personroot_nodes()` in *ricgraph.py* now uses `_graph.delete()`
+  from *py2neo*, but that call has the side effect of removing nodes with more than one edge, 
+  e.g. the organization nodes in *harvest_uustaffpages_to_ricgraph.py* 
+  (after the call to `rcg.merge_personroots_of_two_nodes()` 
+  and then `merge_two_personroot_nodes()`
+  there is only one organization node left).
+  It should use `_graph.separate()`, but the author did not get it working.
 
 [Return to main README.md file](../README.md).
diff --git a/docs/ricgraph_programming_examples.md b/docs/ricgraph_programming_examples.md
@@ -35,6 +35,40 @@ E.g., for research outputs you can adjust
 the years to harvest with the parameter *PURE_RESOUT_YEARS* and the maximum number of
 records to harvest with *PURE_RESOUT_MAX_RECS_TO_HARVEST*.
 
+### Harvest of Utrecht University staff pages
+
+There is also a script for harvesting
+the [Utrecht University staff pages](https://www.uu.nl/medewerkers), 
+*harvest_uustaffpages_to_ricgraph.py*.
+This script needs the parameter *uustaff_url* to be set in the
+[Ricgraph initialization file](ricgraph_install_configure.md#ricgraph-initialization-file).
+
+### Harvest of OpenAlex
+
+There is also a script for harvesting 
+the [OpenAlex](https://openalex.org), *harvest_openalex_to_ricgraph.py*. 
+It harvests OpenAlex Works, and by harvesting these
+Works, it also harvests OpenAlex Authors.
+This script needs the parameters *organization_name*, *organization_ror* 
+and *openalex_polite_pool_email* to be set in the
+[Ricgraph initialization file](ricgraph_install_configure.md#ricgraph-initialization-file).
+
+There is a lot of data in OpenAlex, so your harvest may take a long time. You may
+reduce this by adjusting parameters at the start of the script. Look in the section
+"Parameters for harvesting persons and research outputs from OpenAlex":
+*OPENALEX_RESOUT_YEARS* and *OPENALEX_MAX_RECS_TO_HARVEST*.
+
+### Order of running the harvesting scripts
+The order of running the harvesting scripts does not really matter. The author harvests
+only records for Utrecht University and uses this order:
+1. *harvest_pure_to_ricgraph.py* (since it has a lot of data which is mostly correct);
+1. *harvest_yoda_datacite_to_ricgraph.py* (not too much data, so harvest is fast, but it 
+   contains several data entry errors);
+1. *harvest_rsd_to_ricgraph.py* (not too much data);
+1. *harvest_uustaffpages_to_ricgraph.py*;
+1. *harvest_openalex_to_ricgraph.py* (a lot of data from a [number of 
+   sources](https://docs.openalex.org/additional-help/faq#where-does-your-data-come-from)). 
+
 ### General program structure of a Python program using Ricgraph
 
 ```python