Skip to content

Commit

Permalink
feat(evaluation): redid badges evaluation
Browse files Browse the repository at this point in the history
  • Loading branch information
amyheather committed Nov 21, 2024
1 parent 79a0201 commit 0604fc7
Show file tree
Hide file tree
Showing 5 changed files with 1,394 additions and 86 deletions.
81 changes: 40 additions & 41 deletions evaluation/badges.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ bibliography: ../quarto_site/references.bib

This page evaluates the extent to which the author-published research artefacts meet the criteria of badges related to reproducibility from various organisations and journals.

*Caveat: Please note that these criteria are based on available information about each badge online, and that we have likely differences in our procedure (e.g. allowed troubleshooting for execution and reproduction, not under tight time pressure to complete). Moreover, we focus only on reproduction of the discrete-event simulation, and not on other aspects of the article. We cannot guarantee that the badges below would have been awarded in practice by these journals.*
*Caveat: Please note that these criteria are based on available information about each badge online. Moreover, we focus only on reproduction of the discrete-event simulation, and not on other aspects of the article. We cannot guarantee that the badges below would have been awarded in practice by these journals.*

## Criteria

Expand All @@ -19,35 +19,34 @@ import pandas as pd
# Criteria and their definitions
criteria = {
'archive': 'Stored in a permanent archive that is publicly and openly accessible',
'id': 'Has a persistent identifier',
'license': 'Includes an open license',
'relevant': '''Artefacts are relevant to and contribute to the article's results''',
'complete': 'Complete set of materials shared (as would be needed to fully reproduce article)',
'structure': 'Artefacts are well structured/organised (e.g. to the extent that reuse and repurposing is facilitated, adhering to norms and standards of research community)',
'documentation_sufficient': 'Artefacts are sufficiently documented (i.e. to understand how it works, to enable it to be run, including package versions)',
'documentation_careful': 'Artefacts are carefully documented (more than sufficient - i.e. to the extent that reuse and repurposing is facilitated - e.g. changing parameters, reusing for own purpose)',
# This criteria is kept seperate to documentation_careful, as it specifically requires a README file
'documentation_readme': 'Artefacts are clearly documented and accompanied by a README file with step-by-step instructions on how to reproduce results in the manuscript',
'archive': 'Artefacts are archived in a repository that is: (a) public (b) guarantees persistence (c) gives a unique identifier (e.g. DOI)',
'licence': 'Open licence',
'complete': 'Complete (all relevant artefacts available)',
'docs1': 'Documents (a) how code is used (b) how it relates to article (c) software, systems, packages and versions',
'docs2': 'Documents (a) inventory of artefacts (b) sufficient description for artefacts to be exercised',
'relevant': 'Artefacts relevant to paper',
'execute': 'Scripts can be successfully executed',
'regenerated': 'Independent party regenerated results using the authors research artefacts',
'hour': 'Reproduced within approximately one hour (excluding compute time)',
'careful': 'Artefacts are carefully documented and well-structured to the extent that reuse and repurposing is facilitated, adhering to norms and standards',
'reproduce': 'Reproduced results (assuming (a) acceptably similar (b) reasonable time frame (c) only minor troubleshooting)',
'readme': 'README file with step-by-step instructions to run analysis',
'dependencies': 'Dependencies (e.g. package versions) stated',
'correspond': 'Clear how output of analysis corresponds to article'
}
# Evaluation for this study
eval = pd.Series({
'archive': 0,
'id': 0,
'license': 1,
'relevant': 1,
'licence': 1,
'complete': 0,
'structure': 0,
'documentation_sufficient': 0,
'documentation_careful': 0,
'documentation_readme': 0,
'docs1': 0,
'docs2': 0,
'relevant': 1,
'execute': 1,
'regenerated': 0,
'hour': 0,
'careful': 0,
'reproduce': 0,
'readme': 0,
'dependencies': 0,
'correspond': 0
})
# Get list of criteria met (True/False) overall
Expand Down Expand Up @@ -82,10 +81,10 @@ def create_criteria_list(criteria_dict):
return(formatted_list)
# Define groups of criteria
criteria_share_how = ['archive', 'id', 'license']
criteria_share_what = ['relevant', 'complete']
criteria_doc_struc = ['structure', 'documentation_sufficient', 'documentation_careful', 'documentation_readme']
criteria_run = ['execute', 'regenerated', 'hour']
criteria_share_how = ['archive', 'licence']
criteria_share_what = ['complete', 'relevant']
criteria_doc_struc = ['docs1', 'docs2', 'careful', 'readme', 'dependencies', 'correspond']
criteria_run = ['execute', 'reproduce']
# Create text section
display(Markdown(f'''
Expand Down Expand Up @@ -118,39 +117,39 @@ Criteria related to running and reproducing results -
# Full badge names
badge_names = {
# Open objects
'open_acm': 'ACM "Artifacts Available"',
'open_niso': 'NISO "Open Research Objects (ORO)"',
'open_niso_all': 'NISO "Open Research Objects - All (ORO-A)"',
'open_acm': 'ACM "Artifacts Available"',
'open_cos': 'COS "Open Code"',
'open_ieee': 'IEEE "Code Available"',
# Object review
'review_acm_functional': 'ACM "Artifacts Evaluated - Functional"',
'review_acm_reusable': 'ACM "Artifacts Evaluated - Reusable"',
'review_ieee': 'IEEE "Code Reviewed"',
# Results reproduced
'reproduce_niso': 'NISO "Results Reproduced (ROR-R)"',
'reproduce_acm': 'ACM "Results Reproduced"',
'reproduce_niso': 'NISO "Results Reproduced (ROR-R)"',
'reproduce_ieee': 'IEEE "Code Reproducible"',
'reproduce_psy': 'Psychological Science "Computational Reproducibility"'
}
# Criteria required by each badge
badges = {
# Open objects
'open_niso': ['archive', 'id', 'license'],
'open_niso_all': ['archive', 'id', 'license', 'complete'],
'open_acm': ['archive', 'id'],
'open_cos': ['archive', 'id', 'license', 'complete', 'documentation_sufficient'],
'open_acm': ['archive'],
'open_niso': ['archive', 'licence'],
'open_niso_all': ['archive', 'licence', 'complete'],
'open_cos': ['archive', 'licence', 'docs1'],
'open_ieee': ['complete'],
# Object review
'review_acm_functional': ['documentation_sufficient', 'relevant', 'complete', 'execute'],
'review_acm_reusable': ['documentation_sufficient', 'documentation_careful', 'relevant', 'complete', 'execute', 'structure'],
'review_acm_functional': ['docs2', 'relevant', 'complete', 'execute'],
'review_acm_reusable': ['docs2', 'relevant', 'complete', 'execute', 'careful'],
'review_ieee': ['complete', 'execute'],
# Results reproduced
'reproduce_niso': ['regenerated'],
'reproduce_acm': ['regenerated'],
'reproduce_ieee': ['regenerated'],
'reproduce_psy': ['regenerated', 'hour', 'structure', 'documentation_readme'],
'reproduce_acm': ['reproduce'],
'reproduce_niso': ['reproduce'],
'reproduce_ieee': ['reproduce'],
'reproduce_psy': ['reproduce', 'readme', 'dependencies', 'correspond']
}
# Identify which badges would be awarded based on criteria
Expand Down Expand Up @@ -256,12 +255,12 @@ create_badge_callout({k: v for (k, v) in award.items() if k.startswith('reproduc

* "Open Code"

**Institute of Electrical and Electronics Engineers (IEEE)** (@institute_of_electrical_and_electronics_engineers_ieee_about_nodate)
**Institute of Electrical and Electronics Engineers (IEEE)** (@institute_of_electrical_and_electronics_engineers_ieee_about_2024)

* "Code Available"
* "Code Reviewed"
* "Code Reproducible"

**Psychological Science** (@hardwicke_transparency_2023 and @association_for_psychological_science_aps_psychological_2023)
**Psychological Science** (@hardwicke_transparency_2024 and @association_for_psychological_science_aps_psychological_2024)

* "Computational Reproducibility"
* "Computational Reproducibility"
2 changes: 1 addition & 1 deletion logbook/posts/2024_07_12/index.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
title: "Day 8"
author: "Amy Heather"
date: "2024-07-12"
categories: [reproduce, guidelines, compendium]
categories: [reproduce, evaluation, compendium]
bibliography: ../../../quarto_site/references.bib
---

Expand Down
2 changes: 1 addition & 1 deletion logbook/posts/2024_07_15/index.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
title: "Day 9"
author: "Amy Heather"
date: "2024-07-15"
categories: [guidelines, compendium]
categories: [evaluation, compendium]
bibliography: ../../../quarto_site/references.bib
---

Expand Down
23 changes: 23 additions & 0 deletions logbook/posts/2024_11_21/index.qmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
---
title: "Day 19"
author: "Amy Heather"
date: "2024-10-21"
categories: [evaluation]
---

::: {.callout-note}

Redid badge evaluation.

:::

## 09.53-09.57: Revisit evaluation

Revisited and revised the badge criteria to (a) make them up-to-date, and (b) make sure they are *specific* to the descriptions from each badge. Hence, redoing evaluations for all eight studies.

Notes:

* Relevant - yes, as its the model needed, even if it is within an app
* Executes - yes, although required significant troubleshooting, as I had to extract it from app code

Ran this by Tom who agreed. Results remain the same (3 criteria, 0 badge).
Loading

0 comments on commit 0604fc7

Please sign in to comment.