-
Notifications
You must be signed in to change notification settings - Fork 4
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Rik D.T. Janssen
committed
Jan 10, 2023
0 parents
commit 6c09737
Showing
28 changed files
with
3,801 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,142 @@ | ||
# ########################### | ||
# .gitignore for Ricgraph - Research in context graph. | ||
# January 4, 2022. | ||
# ########################### | ||
ricgraph.ini | ||
*.json | ||
*.csv | ||
*.xml | ||
.idea | ||
|
||
# ########################### | ||
# Default .gitignore from GitHub on January 4, 2022. | ||
# ########################### | ||
# Byte-compiled / optimized / DLL files | ||
__pycache__/ | ||
*.py[cod] | ||
*$py.class | ||
|
||
# C extensions | ||
*.so | ||
|
||
# Distribution / packaging | ||
.Python | ||
build/ | ||
develop-eggs/ | ||
dist/ | ||
downloads/ | ||
eggs/ | ||
.eggs/ | ||
lib/ | ||
lib64/ | ||
parts/ | ||
sdist/ | ||
var/ | ||
wheels/ | ||
pip-wheel-metadata/ | ||
share/python-wheels/ | ||
*.egg-info/ | ||
.installed.cfg | ||
*.egg | ||
MANIFEST | ||
|
||
# PyInstaller | ||
# Usually these files are written by a python script from a template | ||
# before PyInstaller builds the exe, so as to inject date/other infos into it. | ||
*.manifest | ||
*.spec | ||
|
||
# Installer logs | ||
pip-log.txt | ||
pip-delete-this-directory.txt | ||
|
||
# Unit test / coverage reports | ||
htmlcov/ | ||
.tox/ | ||
.nox/ | ||
.coverage | ||
.coverage.* | ||
.cache | ||
nosetests.xml | ||
coverage.xml | ||
*.cover | ||
*.py,cover | ||
.hypothesis/ | ||
.pytest_cache/ | ||
|
||
# Translations | ||
*.mo | ||
*.pot | ||
|
||
# Django stuff: | ||
*.log | ||
local_settings.py | ||
db.sqlite3 | ||
db.sqlite3-journal | ||
|
||
# Flask stuff: | ||
instance/ | ||
.webassets-cache | ||
|
||
# Scrapy stuff: | ||
.scrapy | ||
|
||
# Sphinx documentation | ||
docs/_build/ | ||
|
||
# PyBuilder | ||
target/ | ||
|
||
# Jupyter Notebook | ||
.ipynb_checkpoints | ||
|
||
# IPython | ||
profile_default/ | ||
ipython_config.py | ||
|
||
# pyenv | ||
.python-version | ||
|
||
# pipenv | ||
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control. | ||
# However, in case of collaboration, if having platform-specific dependencies or dependencies | ||
# having no cross-platform support, pipenv may install dependencies that don't work, or not | ||
# install all needed dependencies. | ||
#Pipfile.lock | ||
|
||
# PEP 582; used by e.g. github.com/David-OConnor/pyflow | ||
__pypackages__/ | ||
|
||
# Celery stuff | ||
celerybeat-schedule | ||
celerybeat.pid | ||
|
||
# SageMath parsed files | ||
*.sage.py | ||
|
||
# Environments | ||
.env | ||
.venv | ||
env/ | ||
venv/ | ||
ENV/ | ||
env.bak/ | ||
venv.bak/ | ||
|
||
# Spyder project settings | ||
.spyderproject | ||
.spyproject | ||
|
||
# Rope project settings | ||
.ropeproject | ||
|
||
# mkdocs documentation | ||
/site | ||
|
||
# mypy | ||
.mypy_cache/ | ||
.dmypy.json | ||
dmypy.json | ||
|
||
# Pyre type checker | ||
.pyre/ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,34 @@ | ||
# This CITATION.cff file was generated with cffinit. | ||
# Visit https://bit.ly/cffinit to generate yours today! | ||
|
||
cff-version: 1.2.0 | ||
title: Ricgraph - Research in context graph | ||
message: >- | ||
If you use this software, please cite it using the | ||
metadata from this file. | ||
type: software | ||
authors: | ||
- given-names: Rik D.T. | ||
family-names: Janssen | ||
orcid: 'https://orcid.org/0000-0001-9510-0802' | ||
affiliation: Utrecht University | ||
identifiers: | ||
- type: doi | ||
value: '[to follow]' | ||
abstract: >- | ||
Ricgraph (Research in context graph) is a graph | ||
with nodes (sometimes called vertices) and edges | ||
(sometimes called links) to represent objects and | ||
their relations. It can be used to store, | ||
manipulate and read metadata of any object that has | ||
a relation to another object, as long as every | ||
object can be "represented" by at least a *name* | ||
and a *value*. In Ricgraph, one node represents one | ||
object, and an edge represents the relation between | ||
two objects. Metadata of an object are stored as | ||
"properties" in a node, i.e. as information | ||
associated with a node. | ||
license: MIT | ||
commit: commit id | ||
version: '0.8' | ||
date-released: '2023-01-10' |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,21 @@ | ||
MIT License | ||
|
||
Copyright (c) 2023 Rik D.T. Janssen | ||
|
||
Permission is hereby granted, free of charge, to any person obtaining a copy | ||
of this software and associated documentation files (the "Software"), to deal | ||
in the Software without restriction, including without limitation the rights | ||
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell | ||
copies of the Software, and to permit persons to whom the Software is | ||
furnished to do so, subject to the following conditions: | ||
|
||
The above copyright notice and this permission notice shall be included in all | ||
copies or substantial portions of the Software. | ||
|
||
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR | ||
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, | ||
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE | ||
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER | ||
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, | ||
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE | ||
SOFTWARE. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,142 @@ | ||
# Ricgraph - Research in context graph | ||
|
||
## What is Ricgraph? | ||
|
||
Ricgraph (Research in context graph) is a | ||
[graph](https://en.wikipedia.org/wiki/Graph_theory) with | ||
nodes (sometimes called vertices) | ||
and edges (sometimes called links) to represent objects and their relations. | ||
It can be used to store, manipulate and read metadata of any object that | ||
has a relation to another object, | ||
as long as every object can be "represented" by at least a *name* and a *value*. | ||
In Ricgraph, one node represents one object, and an edge represents the | ||
relation between two objects. | ||
It is written in Python and uses [Neo4j](https://neo4j.com) | ||
as [graph database engine](https://en.wikipedia.org/wiki/Graph_database). | ||
|
||
Metadata of an object are stored as "properties" | ||
in a node, i.e. as information associated with a node. | ||
For example, a node may store two properties, *name = PET* and | ||
*value = cat*. Another node may store *name = FULL_NAME* and *value = John Doe*. | ||
Then the edge between those two nodes means that the person with FULL_NAME John Doe | ||
has a PET which is a cat. | ||
|
||
The philosophy of Ricgraph is that it stores metadata, not the objects the metadata | ||
refer to. To access an object, a node has a link to that object in | ||
the system it was obtained from. The objective is to get metadata from | ||
objects from a source system in a process called "harvesting". | ||
All information harvested from several source systems will be combined into one graph. | ||
Modification of metadata of an object is | ||
done in the source system the object was | ||
harvested from, and then reharvesting of that source system. | ||
|
||
## Why Ricgraph? | ||
|
||
Ricgraph has been developed because a university had a need to be able to show | ||
people, organizations and research outputs | ||
(e.g. books, journal articles, datasets, software, etc.) | ||
in relation to each other. This information is stored in different systems. | ||
That university needed to show research in context in a | ||
graph (hence the name). | ||
Ricgraph is able to answer questions like: | ||
|
||
* Which person has contributed to which book, journal article, dataset, | ||
software package, etc.? | ||
* Given e.g. a dataset or software package, who has contributed to it? | ||
* What identifiers does a person have (there are a lot in use at universities)? | ||
* Show a network of persons who have worked together? | ||
* For what organization does a person work? So which organizations have worked together? | ||
|
||
Ricgraph provides example code to do this. We have chosen a | ||
graph as a datastructure, since it is a logical and efficient | ||
method to access objects | ||
which are close to objects they have a relation to. For example, | ||
starting with a person, its research outputs are only one | ||
step away by following one edge, and other contributors to that research output are | ||
again one step (edge) away. | ||
|
||
In the remainder of this text, Ricgraph is described in the use case of | ||
showing people, organizations and research outputs in relation to each other | ||
in a university context. | ||
|
||
### Example | ||
|
||
In the figures below, nodes in green are datasets, nodes in yellow journal articles, | ||
nodes in red software and nodes in blue person identifiers. Small nodes are harvested from | ||
the data repository [Yoda](https://search.datacite.org/repositories/delft.uu), | ||
medium-sized nodes from | ||
the [Research Information System Pure](https://www.elsevier.com/solutions/pure), | ||
and large sized nodes from the | ||
[Research Software Directory](https://research-software-directory.org). | ||
Click the figures to enlarge. | ||
|
||
| one person with several research outputs | several persons with several research outputs | | ||
|---------------------------------------------------|------------------------------------------------------| | ||
| <img src="docs/images/rcg-all1.jpg" height="170"> | <img src="docs/images/rcg-all2-ab.jpg" height="200"> | | ||
|
||
The left figure shows that a person has 5 identifiers (blue) and 3 journal articles (yellow) | ||
from Pure, | ||
2 datasets from Yoda (green) and 1 software package from the Research Software Directory (red). | ||
*Person-root* is the central node to which everything related to a person is connected. | ||
Information from several sources is combined in one graph. | ||
The right figure shows a more extensive example. Two persons, A and B, have worked together on | ||
a software package (red), a dataset (green), and something else (grey). | ||
More examples can be found in [Ricgraph details](docs/ricgraph_details.md). | ||
|
||
## What can Ricgraph do? | ||
|
||
Some of Ricgraph's features are: | ||
|
||
* Ricgraph stores metadata of objects. | ||
The objective is to get metadata from | ||
objects from a source system in a process called "harvesting". | ||
That means that e.g. persons and publications | ||
can be harvested from one system, datasets from another system, and software from a third system. | ||
Everything found will be combined into one graph. | ||
* Ricgraph can harvest from many sources, and you can write your own | ||
harvesting scripts. Example scripts are included to | ||
harvest from the [Research Information System Pure](https://www.elsevier.com/solutions/pure), | ||
the data repository [Yoda](https://search.datacite.org/repositories/delft.uu), | ||
and the [Research Software Directory](https://research-software-directory.org). | ||
* Ricgraph can be used as an ID resolver. It can, given an identifier of a person, | ||
easily find other identifiers of that person. When new identifiers are found when | ||
harvesting from new systems, | ||
they will be added automatically. It can form the core engine for the Dutch | ||
[National Roadmap for Persistent | ||
Identifiers](https://www.surf.nl/en/national-roadmap-for-persistent-identifiers). | ||
* Since Ricgraph combines information from different sources in one graph, it | ||
can be used as a discoverer (an aggregated search engine), such as the | ||
[UU-discoverer](https://itforresearch.uu.nl/wiki/UU-discoverer). | ||
Also, it can be used as a core engine for the | ||
[Dutch Open Knowledge | ||
Base](https://communities.surf.nl/en/open-research-information/article/building-an-open-knowledge-base). | ||
* Ricgraph can check the consistency of information harvested. For example, ORCIDs and ISNIs | ||
are supposed to refer to one person, so every node representing such an identifier should have | ||
only one edge. This can be checked easily. | ||
An example script is included. | ||
* Ricgraph can enrich information. For example, | ||
if a person has an ORCID, but not a Scopus Author ID, | ||
[OpenAlex](https://openalex.org) can be used | ||
to find the missing ID. If something is found, it is added to the person record. | ||
An example script is included. | ||
* Ricgraph can store any number of properties in a node. | ||
It has function calls to | ||
create, read (find), update and delete (CRUD) nodes and to connect two nodes. | ||
* Ricgraph does not have an end user web interface yet. This is future work. | ||
The graph can be explored using Bloom, | ||
see [Execute queries and visualize the result using Bloom](docs/ricgraph_neo4j_bloom_use.md). | ||
|
||
## How can you use Ricgraph? | ||
|
||
* Read more about [Ricgraph details](docs/ricgraph_details.md), | ||
such as example graphs, person identifiers and the *person-root* node. | ||
* [Install and configure Ricgraph](docs/ricgraph_install_configure.md). | ||
* Write code, or start reusing code, | ||
see the [Ricgraph programming examples](docs/ricgraph_programming_examples.md). | ||
* Or [do a harvest for Utrecht University datasets and | ||
software](docs/ricgraph_programming_examples.md#harvest-of-utrecht-university-datasets-and-software). | ||
You will observe that the information from two sources is neatly combined into one graph. | ||
* [Execute queries and visualize the result using Bloom](docs/ricgraph_neo4j_bloom_use.md). | ||
* Of course, there is [future work to do](docs/ricgraph_future_work.md). Please let me know | ||
if you'd like to help. | ||
|
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,55 @@ | ||
## Implementation details | ||
|
||
[Return to main README.md file](../README.md). | ||
|
||
### Person identifiers | ||
|
||
In the research world, persons can have any number of different identifiers. | ||
Some of these are standard, generally accepted and more-or-less unique identifiers | ||
over the lifetime of a person. These are called | ||
[persistent identifiers](https://en.wikipedia.org/wiki/Persistent_identifier). | ||
Others are non-unique, some are specific to an organization and some are specific to a company. | ||
Examples are: | ||
|
||
* persistent identifiers: [ORCID](https://en.wikipedia.org/wiki/ORCID), | ||
[ISNI](https://en.wikipedia.org/wiki/International_Standard_Name_Identifier); | ||
* non-unique identifiers: full name (there are persons with the same name); | ||
* organization identifiers: employee ID, email address (will change when a person leaves | ||
an organization); | ||
* company identifiers: | ||
[Scopus Author ID](https://www.scopus.com/freelookup/form/author.uri). | ||
|
||
### Person-root node in Ricgraph | ||
|
||
Ricgraph uses a special node *person-root*. This node is connected to all the different | ||
person identifiers which have been harvested. | ||
*Person-root* "represents" a person. Research outputs from a person | ||
will also be connected to this *person-root* node. | ||
The following figure shows two examples (click the figure to enlarge). | ||
|
||
| a person with a few identifiers | a person with a lot of identifiers | | ||
|-----------------------------------------------|-----------------------------------------------| | ||
| <img src="images/rcg-ids1.jpg" height="130"/> | <img src="images/rcg-ids2.jpg" height="200"/> | | ||
|
||
A person can have any number of identifiers. | ||
The person in the left figure has one *ORCID*, one *ISNI* and one *FULL_NAME*. | ||
The person in the right figure has a lot more identifiers, and some identifiers appear more than once. | ||
E.g. this person has 2 SCOPUS_AUTHOR_IDs and 2 ISNIs. | ||
|
||
### Research outputs connected to persons | ||
|
||
| one person with three research outputs | three persons with one research output | | ||
|-------------------------------------------------|-------------------------------------------------| | ||
| <img src="images/rcg-resout1.jpg" height="200"> | <img src="images/rcg-resout2.jpg" height="130"> | | ||
|
||
In both figures, nodes in blue are related to a person and nodes in yellow to journal articles. | ||
The person in the left figure is identified by *FULL_NAME*, *ISNI* and *ORCID*, | ||
which are connected to the *person-root* node (as in the previous section). This person | ||
has three journal articles, identified by *DOI*. These are also connected to the *person-root* node. | ||
In the right figure, there are three *person-root* nodes, representing three different persons | ||
(other nodes with person identifiers are not shown for readability). | ||
All these persons have contributed to the same research output, identified by *DOI*. | ||
|
||
### Return to main README.md file | ||
|
||
[Return to main README.md file](../README.md). |
Oops, something went wrong.