Added support for FAIR4RS metrics #478

karacolada · 2024-01-11T15:34:12Z

Description

This PR adds support for a new set of metrics, namely the FAIR4RS (research software) metrics as laid out in the deliverable available on Zenodo.

Specifically, the changes are as follows:

added two new metric YAML files, one with metric tests for general research software, and one with CESSDA-specific metric tests
added a new GitHub harvester that uses the GitHub API to collect the necessary info for the provided repository
- added a use_github attribute to the API definition to make running this harvester optional (similar to use_datacite)
- re-generated the swagger models (see notes below)
- added a config/github.cfg file to define a GitHub API token (providing one increases the rate limit)
modified any regular expressions testing for the validity of metric (test) identifiers
added new tests to the license evaluator to implement the first FAIR4RS metric, FRSM-15
modified association between evaluator class and metric identifier (see below for details)
added documentation (see below for details)

All these changes are non-breaking. Evaluators can still be associated with a single metric, but can also be modified as described below to cater to metrics from different sources. The GitHub harvester is disabled by default, so the behaviour does not change unless use_github is set.

Details and notes

To allow an evaluator class (and its test functions) to be shared by metrics with different identifiers, the following steps were taken in the modified evaluator (FAIREvaluatorLicense):

pass list of metrics to self.set_metric() see code
introduced a self.metric_test_map dictionary mapping test functions to (potentially) a list of metric test identifiers see code
modified test definition verification in test functions see code example
- note that this change is necessary for any test function of an evaluator with multiple metrics associated, not just for shared test functions

This required the following changes outside the modified evaluator class:

modified FAIREvaluator.set_metric() to handle lists of metric identifiers as well as single metric strings see code

The changes to the documentation are as follows:

added fuji_server/data/README.md with notes an what the data files are used for
- this is incomplete as I couldn't figure it out for every file
added section about development to README.md, covering:
- running the simpleclient for easy interaction with the tool
- a walkthrough of how the components interact
- what changes need to be made to add support for a new set of metrics
- how to update the API definition and regenerate the swagger-generated models

Notes on the chanegs generated by Swagger:

disregarded harvester.py which after regenerating didn't have an auth_token property and crashed some tests
re-generating added from datetime import date, datetime # noqa: F401 to every single model
fuji_server/models/any_of_fair_results_items.py was re-generated as fuji_server/models/any_of_fair_results_results_items.py with no further changes
maturity was switched from type str to type int
in fuji_server/models/identifier_included_output.py, the property object_identifier_included was added

Related issue: #ISSUE_NUMBER

Motivation and context

The changes were necessary to

accomodate the evaluation of a new set of metrics (non-FAIRsFAIR) and
implement one FAIR4RS metric looking at the license of a software repository (FRSM-15).

This PR also serves as an opportunity for feedback on the way the license evaluator was modified to allow for different metric test configurations, so that we can keep it in mind for the addition of future metrics.

How has this been tested?

Running hatch run test resulted in 18 passed tests with 9 deprecation warnings.

Screenshots (if appropriate)

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to change)
Code style update (formatting, renaming)
Refactoring (no functional changes, no api changes)
Build related changes
Documentation content changes
Other (please describe):

Checklist

I have read the contributor guide.
My code follows the code style of this project.
My change requires a change to the documentation.
I have updated the documentation accordingly.
I have added tests to cover my changes.
All new and existing tests passed.

Synchronise with base repo

This reverts commit 850ad34.

huberrob · 2024-02-28T08:49:13Z

fuji_server/evaluators/fair_evaluator_license.py

    def setLicenseDataAndOutput(self):
        self.license_info = []
        specified_licenses = self.fuji.metadata_merged.get("license")
+        if specified_licenses is None:  # try GitHub data


I think here it has to be tested first if a FRSM metric is used

Good call, I'll add that. Just a note, I've also been thinking about merging the github_data dictionary into the merged_metadata but had to focus on other aspects first - might want to consider this further down the line though.

Added in 732b48e.

huberrob · 2024-02-28T08:53:03Z

fuji_server/helper/metadata_mapper.py

@@ -60,6 +60,7 @@ def flip_dict(dict_to_flip):
        "right_holder": {"label": "License", "sameAs": "http://purl.org/dc/terms/rightsHolder"},
        "object_size": {"label": "Object Size", "sameAs": "http://purl.org/dc/terms/extent"},
        "language": {"label": "Language", "sameAs": "http://purl.org/dc/terms/language"},
+        "license_path": {"label": "License Path", "sameAs": None},


What is the difference between license and license_path, can this be merged?

license contains the license name/identifier, so something like mit - we use this the same for the GitHub harvest as it's used by the metadata harvester. license_path on the other hand is the file path to the license file in the repository, e.g. ./LICENSE.TXT. It's relevant for one of the CESSDA-specific metric tests.

I'm not sure how they might be merged since we need both pieces of information for the tests. Happy to adjust this if you have any suggestions though!

Ah I see, this is indeed different, so let's keep it

huberrob · 2024-02-28T08:59:33Z

fuji_server/models/any_of_fair_results_results_items.py

@@ -2,6 +2,10 @@
 #
 # SPDX-License-Identifier: MIT

+# coding: utf-8
+
+from datetime import date, datetime  # noqa: F401


Is this used here?

Don't think so, it's auto-generated by swagger though (and pre-commit didn't seem to mind it).

Ah I see, this is indeed different, so let's keep it

Ups wrong comment.. as its auto-generated and not complained we better keep it;)

github-actions · 2024-02-28T13:02:45Z

📋 Code Coverage

Package	Line Rate	Branch Rate	Health
.	58%	35%	➖
config	100%	100%	✔
controllers	85%	56%	✔
evaluators	63%	37%	➖
harvester	60%	51%	➖
helper	66%	55%	➖
models	80%	87%	✔
Summary	70% (6807 / 9768)	60% (2855 / 4782)	➖

github-actions · 2024-02-28T13:02:50Z

📋 Pytest Results

19 tests +1 19 ✅ +1 25s ⏱️ -1s
1 suites ±0 0 💤 ±0
1 files ±0 0 ❌ ±0

Results for commit 732b48e. ± Comparison against base commit 708fe43.

karacolada and others added 30 commits November 14, 2023 16:54

understanding workflow

5e206f4

Merge branch 'software' of github.com:softwaresaved/fuji into software

9dab636

Merge branch 'master' into software

233805f

added microdata keywords for GitHub webpages

e4179fd

notes on data files

34daec8

debugging prints

5e4f48d

ignore google DB

1dc2315

add license_path (used in GitHub) to metadata

62d4775

data update

8a6a37a

metric version configuration added to simpleclient

4fee564

Merge branch 'pangaea-data-publisher:master' into software

3c3db7f

harvester notes

efc5c22

Merge branch 'software' of github.com:softwaresaved/fuji into software

6f9ae6d

added GitHub harvester

5909b35

ignore github cofig file from now on

f327ac1

fixed config read

096bb44

GitHub licenses recognised by evaluator

5bacfb3

FRSM metric recognition

7088573

#15 reuse metric test code

2848067

Merge branch 'pangaea-data-publisher:master' into software

389be3c

correct test mapping for license

354ac82

added domain-specific tests

4654e39

#15 stubs for new tests

27847b2

#15 implemented test txt at root

86445c0

fixed debug messages for FRSM metrics

3b087e2

#15 more specific CESSDA-1 test

e8817ac

fixed metric test score visualisation in table

93a7d95

update notes

f0d65d7

#15 implemented CESSDA-3

2a2b932

#15 CESSDA-2 fast pass if CESSDA-3

4bb7305

karacolada and others added 20 commits December 18, 2023 10:44

remove added debug prints

3a8b107

updated documentation

d837efb

Adding use_github to API call

0883b9d

swagger.json -> openapi.json

cc93f8b

Merge branch 'software' into sync-software

f023358

Merge pull request #22 from softwaresaved/sync-software

ceffe3d

Synchronise with base repo

#21 update dependencies

9e0dd68

#21 tidy up dev files

740fb49

linting checks pass

c4a822f

regenerated swagger

850ad34

Revert "regenerated swagger"

2e71b2b

This reverts commit 850ad34.

Merge branch 'pangaea-data-publisher:master' into software

dce41d6

Update github.cfg

41aba40

allow un-authenticated connection to GitHub

c57216d

added functional software eval test

7612a98

display use_github in simpleclient

1a932c0

regenerate swagger code

b2066d6

roll back changes to models/harvest.py

a893267

update docs on swagger

3218f05

index.php back to starting configuration

732e4d0

karacolada marked this pull request as ready for review January 16, 2024 11:34

karacolada mentioned this pull request Jan 16, 2024

Update docs and tidy up debug prints softwaresaved/fuji#21

Closed

huberrob reviewed Feb 28, 2024

View reviewed changes

check if FRSM metric used before accessing github_data

732b48e

huberrob merged commit b30b780 into pangaea-data-publisher:master Feb 28, 2024
4 checks passed

karacolada mentioned this pull request Apr 10, 2024

Continued extension for FAIR4RS metrics #495

Merged

14 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added support for FAIR4RS metrics #478

Added support for FAIR4RS metrics #478

karacolada commented Jan 11, 2024 •

edited

Loading

huberrob Feb 28, 2024

karacolada Feb 28, 2024

karacolada Feb 28, 2024

huberrob Feb 28, 2024

karacolada Feb 28, 2024

huberrob Feb 28, 2024

huberrob Feb 28, 2024

karacolada Feb 28, 2024

huberrob Feb 28, 2024

huberrob Feb 28, 2024

github-actions bot commented Feb 28, 2024

github-actions bot commented Feb 28, 2024

Added support for FAIR4RS metrics #478

Added support for FAIR4RS metrics #478

Conversation

karacolada commented Jan 11, 2024 • edited Loading

Description

Details and notes

Motivation and context

How has this been tested?

Screenshots (if appropriate)

Types of changes

Checklist

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-actions bot commented Feb 28, 2024

📋 Code Coverage

github-actions bot commented Feb 28, 2024

📋 Pytest Results

karacolada commented Jan 11, 2024 •

edited

Loading