This is a template repository of a containerised R workflow built on the
targets
framework, made portable using renv
, and ran manually or
automatically using GitHub Actions
. To use this template click on the
“use this template button” and then select create a new repository.
Check out the
containerTemplateUtils
package for handling common tasks related to this repo (sending emails,
uploading files to AWS, etc. )
Note that git-crypt
is not part of the template repo. See the EHA M&A
handbook
for how to add git-crypt.
Follow the links for more information about:
Recommendations:
- One function per file in R/
- Non-function R scripts in another directory like
scripts/
- Use the same names for targets and function arguments for those targets unless a function
- Nouns for targets, verbs for functions
- Use common suffixes for target types:
_file
for files,_raw
for read-in but unprocessed data - Use
fnmate
andtflow
RStudio Add-Ins to make this easy, create shortcuts for these add-ins (talk), or theusethis
package
- Create repo from template
- rename .Rproj file
- streamline packages in
packages.R
- modify
.gitattributes
to include any files that may need encryption - initialize
git-crypt
for repo - add relevant environment variables to
.env
file - rename github actions workflows
- update safe repo section of github action
- add
git-crypt
key as secret variable to repo
GitHub Actions allows automation, customisation, and execution of your research project workflows right in your GitHub repository.
In gist, GitHub Actions is a workflow composed of a job or a number of jobs. The job/s are then composed of steps that control the order in which actions are run in order to complete a job/s. This workflow is scheduled or triggered by a specific event and runs on what is called a runner - a server that has the GitHub Actions runner application installed - that is either hosted by GitHub, or self-hosted on your own machines.
This whole workflow including the event trigger and the
runner on which the workflow will run in are specified and
detailed using a workflow .yml
file that is saved inside a directory
named .github
within your GitHub repository in which you want to use
GitHub Actions on.
This repository, contains a template GitHub
Actions workflow with its
corresponding .yml
file that illustrates how GitHub
Actions can be used to run and
maintain an R workflow that uses targets
and renv
.
A container is a standard unit of software that packages up code and all its dependencies so the application runs quickly and reliably from one computing environment to another.
Containers can be used within a GitHub Actions workflow and can be specified either at the job level or at the step level. If specified at the job level, all the steps within that job will be run inside that container. When specified at the steps level, different containers can be used for each step.
The example/template workflow can be found inside the .github
folder
and is shown below:
name: container-workflow-template
on:
push:
branches:
- main
- master
pull_request:
branches:
- main
- master
workflow_dispatch:
branches:
- '*'
#schedule:
# - cron: "0 8 * * *"
jobs:
container-workflow-tempalte:
runs-on: ubuntu-latest # Run on GitHub Actions runner
#runs-on: [self-hosted, linux, x64, onprem-aegypti] # Run the workflow on EHA aegypti runner
#runs-on: [self-hosted, linux, x64, onprem-prospero] # Run the workflow on EHA prospero runner
container:
image: rocker/verse:4.1.2
steps:
- uses: actions/checkout@v2
- name: Install system dependencies
run: |
apt-get update && apt-get install -y --no-install-recommends \
libcurl4-openssl-dev \
libssl-dev
- name: Restore R packages
run: |
renv::restore()
shell: Rscript {0}
- name: Run targets workflow
run: |
targets::tar_make()
shell: Rscript {0}
In this example, we show a data quality check workflow report for a nutrition survey of children 6-59 months old.
The trigger for GitHub Actions is specified in these lines in the workflow YAML file:
on:
push:
branches:
- main
- master
pull_request:
branches:
- main
- master
workflow_dispatch:
branches:
- '*'
#schedule:
# - cron: "0 8 * * *"
This workflow automatically runs when there is a push or pull
request event to the main branch of the repository. This workflow has
also been set to have the option to be run manually from the GitHub
Actions page for any branch of the repository through the
workflow-dispatch
specification in the workflow YAML file.
GitHub Actions can also be scheduled to run at specific times and
frequency using the schedule
specification in the workflow YAML file
using POSIX cron syntax. Scheduled
workflows run on the latest commit on the default or base branch. The
shortest interval you can run scheduled workflows is once every 5
minutes. In the example workflow, the schedule
specification has been
set to run at 8 am everyday but this has been hashed out. If you would
like to schedule your workflow runs, remove the hash and then set the
POSIX cron syntax to the frequency that you require. Note while github
actions is highly reliable Github does not guarantee that a scheduled
job will run if you’re using github servers and jobs are less likely to
run if you choose a popular run time (generally on the hour).
The job for GitHub Actions is specified in these lines in the workflow YAML file:
jobs:
container-workflow-template:
runs-on: ubuntu-latest # Run on GitHub Actions runner
#runs-on: [self-hosted, linux, x64, onprem-aegypti] # Run the workflow on EHA aegypti runner
#runs-on: [self-hosted, linux, x64, onprem-prospero] # Run the workflow on EHA prospero runner
container:
image: rocker/verse:4.1.2
The job named container-workflow-template
is specified to run on
runners hosted by GitHub Actions. These runners can be identified
through a tag that specifies the operating software followed by the
version. In the example workflow, the line specifying
runs-on: ubuntu-latest
runs the workflow on a machine hosted by GitHub
Actions with the latest Ubuntu operating software.
The job can also be run on a self-hosted GitHub Actions runner that is
installed on EHA’s high performance computing machines using the
runs-on
workflow YAML specification. Tags unique to this GitHub runner
are used to identify the specific machine to use. Syntax on how to
specify these runners are shown but hashed out.
To further make the GitHub Actions workflow more robust and
reproducible, we setup a container at the job level. The container
specified is a versioned R image that has tidyverse
and other R
publishing tools installed. This container image would generally be
adequate for most workflows that require data wrangling and manipulation
using the tidyverse
tools and reporting using rmarkdown
. Some
projects/workflows (like those using spatial packages such as sf
) may
benefit from using a different R image so change the container
specification accordingly. To read more about available R images, see
https://www.rocker-project.org/images/.
This repository has been set as a private template repository. This means that this can be used by EHA staff for creating new repositories with the same filesystem.
This can be done as follows:
-
In your GitHub account, go to the EcoHealth Alliance organisation (https://github.com/ecohealthalliance) then click on the green button labeled
New
. -
You will now be directed to the
Create new repository
page. Here, right at the top, you will see theRepository template
heading. Click on the drop down button right below this that saysNo template
. You will then see all the available templates within EHA. Select the template namedecohealthalliance/container-template
. -
Give your new repository a name, set the appropriate repository visibility, and then click on
Create repository
. -
You will now have a new repository the contents of which are the same files and structure as this template repository.
-
You can now make the necessary changes and additions that are specific to your workflow.
Your project may contain a mix of public and private content. Being able
to encrypt the private contents of your project is very useful. It is
recommended that you use PGP (Pretty Good Privacy) encryption,
implemented by the program
git-crypt
. It takes a bit to set
up but once activated makes sharing secure and seamless. To setup PGP
and git-crypt
on your project that is based on this template, see the
Encryption chapter of the EHA Modeling and Analytics
Handbook.
Once you have enabled git-crypt
on your project, you will need to make
the following edits to the container-workflow-template.yml
file to be
able to perform symmetric key decryption described
here.
Here is the container-workflow-template.yml
file updated to allow and
perform symmetric key decryption:
name: container-workflow-encrypted-template
on:
push:
branches:
- main
- master
pull_request:
branches:
- main
- master
workflow_dispatch:
branches:
- '*'
#schedule:
# - cron: "0 8 * * *"
env:
GIT_CRYPT_KEY64: ${{ secrets.GIT_CRYPT_KEY64 }}
jobs:
container-workflow-encrypted-tempalte:
runs-on: ubuntu-latest # Run on GitHub Actions runner
#runs-on: [self-hosted, linux, x64, onprem-aegypti] # Run the workflow on EHA aegypti runner
#runs-on: [self-hosted, linux, x64, onprem-prospero] # Run the workflow on EHA prospero runner
container:
image: rocker/verse:4.1.2
steps:
- uses: actions/checkout@v2
- name: Install system dependencies
run: |
apt-get update && apt-get install -y --no-install-recommends \
git-crypt \
libcurl4-openssl-dev \
libssl-dev
- name: Decrypt repository using symmetric key
run: |
echo $GIT_CRYPT_KEY64 > git_crypt_key.key64 && base64 -di git_crypt_key.key64 > git_crypt_key.key && git-crypt unlock git_crypt_key.key
rm git_crypt_key.key git_crypt_key.key64
- name: Restore R packages
run: |
renv::restore()
shell: Rscript {0}
- name: Run targets workflow
run: |
targets::tar_make()
shell: Rscript {0}
Once you have edited your worklfow YAML file and before you push the changes to GitHub, you will then have to add the symmetric key to your GitHub repository as a secret.
First, generate a symmetric key by running this in your project directory.
git-crypt export-key git_crypt_key.key
git_crypt_key.key
can now be used to decrypt the repository, and you
can provide it to GitHub Actions as a secret environment variable (see
https://docs.github.com/en/actions/security-guides/encrypted-secrets).
However, since it is binary data, you’ll need to convert it to base64
first. So run something like:
cat git_crypt_key.key | base64 | pbcopy
to convert this file to base64 data, then paste it in GitHub’s secret
environment variable field as GIT_CRYPT_KEY64
.