Ricgraph scripts can be found in various places:
- Directory harvest: scripts for harvesting sources and inserting the results in Ricgraph. Documentation for these scripts.
- Directory import_export: scripts to export items from Ricgraph. Documentation for these scripts (this file).
- Directory enhance: scripts for finding and enriching items from Ricgraph. Documentation for these scripts (this file).
- The module code ricgraph.py can be found in directory ricgraph.
- The code for Ricgraph Explorer can be found in directory ricgraph_explorer.
- Documentation for writing your own scripts.
All code is documented and hints to use it can be found in the source files.
Return to main README.md file.
To construct a Ricgraph from a csv file, use the script construct_ricgraph_from_csv.py. You can find this script in the directory import_export.
Nodes and edges are inserted with Ricgraph calls. That means, that nodes are inserted in pairs connected by an edge. Person-root nodes cannot be inserted with this script, they will be created whenever necessary. If a node in the nodes import file is not connected to another node by an edge in de the edge import file, it will not be created. This is due to the way Ricgraph works.
This script is different compared to Import nodes and edges from a csv file, raw version and Export nodes and edges to a csv file, raw version, since the "raw" scripts import and export with Cypher queries.
Usage:
construct_ricgraph_from_csv.py [options]
Options:
--empty_ricgraph <yes|no>
'yes': Ricgraph will be emptied before importing.
'no': Ricgraph will not be emptied before importing.
If this option is not present, the script will prompt the user
what to do.
--filename <filename>
Import nodes and edges from a csv file starting with <filename>.
The file with nodes is <filename>-nodes.csv.
The file with edges is <filename>-edges.csv.
The import file containing nodes should be a csv file. At least the following columns should be present:
- name
- category
- value
Other fields that may be present:
- All fields in parameter ricgraph_properties_additional in the Ricgraph initialization file, but not the fields source_event and history_event.
The import file containing edges should be a csv file containing exactly four columns:
- name_from, value_from: the from node for the edge.
- name_to, value_to: the to node for the edge.
To import nodes and edges from a csv file, use the script ricgraph_import_raw_from_csv.py. You can find this script in the directory import_export.
This is a "raw" import, because person-root nodes are also imported, as are the connections between e.g. an ORCID node and its person-root node. When you do the import, all nodes and edges will be inserted directly in the graph database backend using a Cypher query. That means that no checking is done at all if the resulting nodes and edges conform to the "Ricgraph model". This may result in a graph not consistent with the Ricgraph model. Due to this, Ricgraph Explorer may not work as expected.
This script forms a pair with Export nodes and edges to a csv file, raw version.
Usage:
ricgraph_import_raw_from_csv.py [options]
Options:
--empty_ricgraph <yes|no>
'yes': Ricgraph will be emptied before importing.
'no': Ricgraph will not be emptied before importing.
If this option is not present, the script will prompt the user
what to do.
--filename <filename>
Import nodes and edges from a csv file starting with <filename>.
The file with nodes is <filename>-nodes.csv.
The file with edges is <filename>-edges.csv.
The import file containing nodes should be a csv file. At least the following columns should be present:
- name
- category
- value
- _key
Other fields that may be present:
- The remaining fields in parameter ricgraph_properties_hidden in the Ricgraph initialization file.
- The fields in parameter ricgraph_properties_additional in the Ricgraph initialization file.
The import file containing edges should be a csv file containing exactly four columns:
- name_from, value_from: the from node for the edge.
- name_to, value_to: the to node for the edge.
For an example import file, export the nodes and edges in Ricgraph using Export nodes and edges to a csv file, raw version.
To export nodes and edges to a csv file, use the script ricgraph_export_raw_to_csv.py. You can find this script in the directory import_export.
This is a "raw" export, because person-root nodes are also exported, as are the connections between e.g. an ORCID node and its person-root node. The export is done using a Cypher query. When you import the export generated by this script, all nodes and edges will be inserted directly in the graph database backend using a Cypher query. That means that no checking is done at all if the resulting nodes and edges conform to the "Ricgraph model". This may result in a graph not consistent with the Ricgraph model. Due to this, Ricgraph Explorer may not work as expected.
This script forms a pair with Import nodes and edges from a csv file, raw version.
Usage:
ricgraph_export_raw_to_csv.py [options]
Options:
--filename <filename>
Export all nodes and edges in Ricgraph
to a csv file starting with <filename>.
The file with nodes is <filename>-nodes.csv.
The file with edges is <filename>-edges.csv.
The export file containing nodes will be a csv file. All fields in Ricgraph will be exported.
The export file containing edges will be a csv file containing exactly four columns:
- name_from, value_from: the from node for the edge.
- name_to, value_to: the to node for the edge.
Count the number of organizations that contributed to a category (count_organizations_contributed_to_category.py)
To count the number of organizations that contributed to a category, use the script count_organizations_contributed_to_category.py. You can find this script in the directory import_export.
This script counts the (sub-)organizations of persons who have contributed to all nodes of a specified category (e.g., data set or software). Both a histogram and a collaboration table will be computed and written to a file. The histogram contains the count of (sub-)organizations of all nodes of the specified category. The collaboration table contains the count of (sub-)organizations who have worked together on all nodes of the specified category.
What makes this script interesting, is that it also counts collaborations of sub-organizations, if you have harvested them. For example, the Research Information System Pure contains a full organization hierarchy for persons. After harvesting Pure, Ricgraph contains this organization hierarchy. That is, not only the top level organization, such as Utrecht University, but also faculties, departments, units, and chairs. Using these sub-organizations, this script is able to show collaborations between e.g. different departments in the same organization. In case you have harvested organization hierarchies from different organizations, collaborations between e.g. departments of two universities can be shown.
Usage:
count_organizations_contributed_to_category.py [options]
Options:
--sort_organization <organization name>
Sort the collaboration table on this organization name.
If the name has one or more spaces, enclose it with "...".
If this option is not present, the script will prompt the user
for a name.
--category <category>
Compute histogram and collaboration table for given category.
If the name has one or more spaces, enclose it with "...".
If this option is not present, the script will prompt the user
for a name.
[This is an old script, you might want to use Export nodes and edges to a csv file, raw version].
There are two scripts which allow to export person nodes to a csv file. These can be found in the directory import_export.
- export_person_identifiers.py: exports all person identifiers connected to a person-root node.
- export_person_node_properties.py: exports all node properties for every person node connected to a person-root node.
Use the parameter EXPORT_MAX_RECS for the number of records to export and EXPORT_FILENAME for the filename to export at the start of both scripts.
With the script enrich_orcids_scopusids.py, you can enrich persons having an ORCID but no SCOPUS_AUTHOR_ID (using OpenAlex), or vice versa (using the Scopus API). Note that Scopus has a rate limit, and that you have to set some parameters in ricgraph.ini. You can find this script in the directory enhance.
With the script find_double_pids.py, you can check if there are any personal identifiers that are pointing to two or more different persons. You can find this script in the directory enhance.
To create a table of contents of the Ricgraph documentation use the script create_toc_documentation.py. You can find this script in the directory maintenance. The table of contents will be created in file ricgraph_toc_documentation.md.
Usage:
create_toc_documentation.py
To create an index of the Ricgraph documentation use the script create_index_documentation.py. You can find this script in the directory maintenance. The index will be created in file ricgraph_index_documentation.md.
Usage:
create_index_documentation.py
To create the Ricgraph REST API documentation use the script convert_openapi_to_mddoc.py. This documentation is based on the Ricgraph OpenAPI yaml file openapi.yaml in the directory ricgraph_explorer/static. You can find this script in the directory maintenance. The REST API documentation will be created in file ricgraph_restapi_gendoc.md.
Usage:
convert_openapi_to_mddoc.py