Skip to content

Commit

Permalink
Merge pull request #259 from coderefinery/swi_march24
Browse files Browse the repository at this point in the history
Updates for March 24 workshop
  • Loading branch information
eglerean authored Mar 18, 2024
2 parents e8cab9c + 14e39de commit 89dabef
Show file tree
Hide file tree
Showing 8 changed files with 72 additions and 56 deletions.
23 changes: 15 additions & 8 deletions content/dependencies.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,19 +6,20 @@

```{instructor-note}
- 10 min teaching
- 20 min exercises
- 10 min demo
```

Our codes often depend on other codes that in turn depend on other codes ...

- **Reproducibility**: We can version-control our code with Git but how should we version-control dependencies?
How can we capture and communicate dependencies?
- **Dependency hell**: Different codes on the same environment can have conflicting dependencies.

```{figure} img/python_environment.png
:alt: An image showing a mess of dependecies in a Python environment
```{figure} img/dependency.png
:alt: An image showing blocks (=codes) depending on each other for stability
:width: 60%
From [xkcd](https://xkcd.com/).
From [xkcd - dependency](https://xkcd.com/2347/). Another image that might be familiar to some of you working with Python can be found on [xkcd - superfund](https://xkcd.com/1987/).
```

````{discussion} Kitchen analogy
Expand Down Expand Up @@ -65,9 +66,9 @@ more reproducible it is.

---

## Exercises
## Demo

``````{challenge} (optional) Dependencies-1: Time-capsule of dependencies
``````{challenge} Dependencies-1: Time-capsule of dependencies
Situation: 5 students (A, B, C, D, E) wrote a code that depends on a couple of libraries.
They uploaded their projects to GitHub. We now travel 3 years into the future
and find their GitHub repositories and try to re-run their code before adapting
Expand Down Expand Up @@ -260,8 +261,8 @@ Answer in the collaborative document:
`````
``````

``````{challenge} (optional) Dependencies-2: Create a time-capsule for the future
Now it is time to create your own time-capsule and share it with the future
``````{challenge} Dependencies-2: Create a time-capsule for the future
Now we will demo creating our own time-capsule and share it with the future
world. If we asked you now which dependencies your project is using, what would
you answer? How would you find out? And how would you communicate this
information?
Expand Down Expand Up @@ -317,3 +318,9 @@ information?
````
`````
``````


```{keypoints}
- Recording dependencies with versions can make it easier for the next person to execute your code
- There are many tools to record dependencies
```
12 changes: 9 additions & 3 deletions content/environments.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@

```{instructor-note}
- 10 min teaching/discussion
- 20 min exercise
- 10 min demo
```


Expand All @@ -18,7 +18,7 @@
Imagine if you didn't have to install things yourself, but instead you could
get a computer with the exact software for a task pre-installed? Containers
effectively do that, with various advantages and disadvantages. They are
**like an entire operating system with software installed, all in one file**,
**like an entire operating system with software installed, all in one file**.

```{figure} img/docker_meme.jpg
:alt: He said, then we will ship your machine. And that's how Docker was born.
Expand Down Expand Up @@ -277,4 +277,10 @@ the Docker containers through Singularity/Apptainer.
## Resources for further learning

- [Carpentries incubator lesson on Docker](https://carpentries-incubator.github.io/docker-introduction/)
- [Carpentries incubator lesson on Singularity/Apptainer](https://carpentries-incubator.github.io/singularity-introduction/)
- [Carpentries incubator lesson on Singularity/Apptainer](https://carpentries-incubator.github.io/singularity-introduction/)


```{keypoints}
- Containers can be helpful if complex setups are needed to running a specific software
- They can also be helpful for prototyping without "messing up" your own computing environment, or for running software that requires a different operating system than your own
```
Binary file added content/img/dependency.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
13 changes: 5 additions & 8 deletions content/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -10,19 +10,16 @@ match up? It's unpleasant for both you and science.
In this lesson we will explore different methods and tools for better
reproducibility in research software and data. We will demonstrate how version
control, workflows, containers, and package managers can be used to **record
reproducible environments and computational steps** for our future selves.
reproducible environments and computational steps** for our future selves and others.


.. admonition:: Learning outcomes

By the end of this lesson, learners should:
- be able to apply well organized directory structure for their project
- remember the FAIR principles
- understand that code can have dependencies, and know how to document them
- if a computational studies contains several steps, be able to document them
- know about use cases for containers
- knowing the pros and cons of manual documentation vs. scripted automation vs. workflow management

- Be able to apply well organized directory structure for their project
- Understand that code can have dependencies, and know how to document them
- Be able to document computational steps, and have an idea when it can be useful
- Know about use cases for containers

.. prereq::

Expand Down
5 changes: 4 additions & 1 deletion content/motivation.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,6 @@

```{instructor-note}
- 10 min teaching/discussion
- 0 min exercises
```

```{figure} img/research_comic_phd.gif
Expand Down Expand Up @@ -69,3 +68,7 @@ This also means that you can think about it from the beginning of your research
- ...
- (share your experience, but constructively)
````

```{keypoints}
- Without reproducibility in scientific computing, everyone would have to start a new project / code from scratch
```
65 changes: 30 additions & 35 deletions content/organizing-projects.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
- 10 min teaching incl. discussions
```

One of the most basic steps to make your work reproducible is to organize your projects well.
One of the first steps to make your work reproducible is to organize your projects well.
Let's go over some of the basic things which people have found to work (and not to work).


Expand Down Expand Up @@ -57,35 +57,12 @@ project_name/
```
* Check the [Git-intro lesson](https://coderefinery.github.io/git-intro/) for a reminder.

---

## Reproducible publications

- Git can be used to collaborate on manuscripts written in, e.g., LaTeX and other text-based formats but other tools exist:
- [Overleaf](https://www.overleaf.com): an online, collaborative LaTeX editor (has Git integration)
- [Authorea](https://www.authorea.com): collaborative platform for preprints (apparently also has Git integration)
- [HackMD](https://hackmd.io/): an online collaborative Markdown editor (has Git integration)
- [Manuscripts.io](https://www.manuscripts.io/): a collaborative authoring tool that support scientific content and reproducibility.
- Google Docs can be a good alternative

- Many tools exist to assist in making scholarly output reproducible:
- [rrtools](https://github.com/benmarwick/rrtools): instructions, templates, and functions for writing a reproducible article or report with R.
- [Jupyter Notebooks](https://jupyter.org): web-based computational environment for creating code and text based notebooks that can be used as, see also our [Jupyter lesson](https://coderefinery.github.io/jupyter/) later in this workshop.
supplementary material for articles.
- [Binder](https://mybinder.org): makes a repository with Jupyter notebooks available in an executable environment (discussed later in the [Jupyter lesson](https://coderefinery.github.io/jupyter/)).
- ["Research compendia"](http://inundata.org/talks/rstd19/#/): a set of good practices for
reproducible data analysis in R, but much is transferable to other languages.

```{seealso}
Do you want to practice your reproducibility skills and get inspired by working with other people's code/data? Join a [ReproHack event](https://www.reprohack.org/event/)!
```


---

## Discussion on reproducibility

````{discussion} Discuss in collaborative document or with your team members
````{discussion} Discuss in the collaborative document:
**How do you collaborate on writing academic papers?**
```
- Are you using version control for academic papers?
Expand All @@ -109,17 +86,35 @@ Do you want to practice your reproducibility skills and get inspired by working
```
````

````{discussion} Discuss in collaborative document or with your team members
```
- What tools are you using when organizing your projects?
- ...
- ...
- (share your experience)
```
````
## Some tools and templates

- [R devtools](https://devtools.r-lib.org/)
- [Python cookiecutter template](https://github.com/Materials-Data-Science-and-Informatics/fair-python-cookiecutter)
- [Reproducible research template](https://github.com/the-turing-way/reproducible-project-template) by the Turing Way

More tools and templates in [Heidi Seibolds blog](https://heidiseibold.ck.page/posts/setting-up-a-fair-and-reproducible-project).

## Reproducible publications

- Git can be used to collaborate on manuscripts written in, e.g., LaTeX and other text-based formats but other tools exist, some with git integration:
- [Overleaf](https://www.overleaf.com) or [Typst](https://typst.app/): online, collaborative LaTeX editor
- [Authorea](https://www.authorea.com): collaborative platform for preprints
- [HackMD](https://hackmd.io/) or [HedgeDoc](https://hedgedoc.org/): online collaborative Markdown editors
- [Manuscripts.io](https://www.manuscripts.io/): a collaborative authoring tool that support scientific content and reproducibility.
- Google Docs can be a good alternative

- Many tools exist to assist in making scholarly output reproducible:
- [rrtools](https://github.com/benmarwick/rrtools): instructions, templates, and functions for writing a reproducible article or report with R.
- [Jupyter Notebooks](https://jupyter.org): web-based computational environment for creating code and text based notebooks that can be used as, see also our [Jupyter lesson](https://coderefinery.github.io/jupyter/) later in this workshop.
supplementary material for articles.
- [Binder](https://mybinder.org): makes a repository with Jupyter notebooks available in an executable environment (discussed later in the [Jupyter lesson](https://coderefinery.github.io/jupyter/)).
- ["Research compendia"](http://inundata.org/talks/rstd19/#/): a set of good practices for
reproducible data analysis in R, but much is transferable to other languages.

```{seealso}
Do you want to practice your reproducibility skills and get inspired by working with other people's code/data? Join a [ReproHack event](https://www.reprohack.org/event/)!
```

```{keypoints}
- An organized project directory structure helps with reproducibility.
- Reproducibility makes work easier for the next person working on the project - and that might be you in a few years!
```
5 changes: 4 additions & 1 deletion content/where-to-go.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,4 +47,7 @@ However, you will not always need all of them. As with so many things, it again
... can be very beneficial :)



```{keypoints}
- Not everything in this lesson might be useful right now, but it is good to know that these things exist if you ever get in a situation that would require such solution.
- Caring about reproducibility makes work easier for the next person working on the project - and that might be you in a few years!
```
5 changes: 5 additions & 0 deletions content/workflow-management.md
Original file line number Diff line number Diff line change
Expand Up @@ -234,3 +234,8 @@ Tools like Snakemake help us with **reproducibility** by supporting us with **au
- [Common Workflow Language](https://www.commonwl.org/)
- Many [specialized frameworks](https://github.com/common-workflow-language/common-workflow-language/wiki/Existing-Workflow-systems) exist.
- [Book on building reproducible analytical pipelines with R](https://raps-with-r.dev/)

```{keypoints}
- Computational steps can be recorded in many ways
- Workflow tools can help, if there are many steps to be executed
```

0 comments on commit 89dabef

Please sign in to comment.