Skip to content

Latest commit

 

History

History
166 lines (122 loc) · 7.21 KB

pre_commit_hooks.md

File metadata and controls

166 lines (122 loc) · 7.21 KB

Pre-commit hooks

This repository uses the Python package pre-commit to manage pre-commit hooks. Pre-commit hooks are actions which are run automatically, typically on each commit, to perform some common set of tasks. For example, a pre-commit hook might be used to run any code linting automatically, providing any warnings before code is committed, ensuring that all of our code adheres to a certain quality standard.

Purpose

For this repository, we are using pre-commit for a number of purposes:

  • Checking for secrets being committed accidentally — there is a strict definition of a "secret";
  • Checking for any large files (over 5 MB) being committed; and
  • Cleaning Jupyter notebooks, which means removing all outputs, execution counts, Python kernels, and, for Google Colaboratory (Colab), stripping out user information.

We have configured pre-commit to run automatically on every commit. By running on each commit, we ensure that pre-commit will be able to detect all contraventions and keep our repository in a healthy state.

⚠️ No pre-commit hooks will be run on Google Colab notebooks pushed directly to GitHub. For security reasons, it is highly recommended that you manually download your notebook, and commit up locally to ensure pre-commit hooks are executed on your changes

Installation

In order for pre-commit to run, action is needed to configure it on your system.

  • Install the pre-commit package into your Python environment from requirements.txt; and
  • Run pre-commit install in your terminal to set up pre-commit to run when code is committed.

Using the detect-secrets pre-commit hook

⚠️ The detect-secrets package does its best to prevent accidental committing of secrets, but it can't catch everything. It doesn't replace good software development practices! See the definition of a secret for further information.

We use detect-secrets to check that no secrets, are accidentally committed. This hook requires you to generate a baseline file if one is not already present within the root directory. To create the baseline file, run the following at the root of the repository:

detect-secrets scan > .secrets.baseline

Next, audit the baseline that has been generated by running:

detect-secrets audit .secrets.baseline

When you run this command, you'll enter an interactive console and be presented with a list of high-entropy strings and/or anything which could be a secret, and asked to verify whether this is the case. By doing this, the hook will be in a position to know if you're later committing any new secrets to the repository, and it will be able to alert you accordingly.

Definition of a "secret" according to detect-secrets

The detect-secrets documentation, as of January 2021, says it works:

...by running periodic diff outputs against heuristically crafted [regular expression] statements, to identify whether any new secret has been committed.

This means it uses regular expression patterns to scan your code changes for anything that looks like a secret according to one or more of these regular expression patterns. By definition, there are only a limited number of patterns, so the detect-secrets package cannot detect every conceivable type of secret.

To understand what types of secrets will be detected, read the caveats, and the list of supported plugins that the package uses. Also, you should use secret variable names that contain words that will trip the KeywordDetector plugin; see the DENYLIST variable for the full list of words.

If pre-commit detects secrets during commit

If pre-commit detects any secrets when you try to create a commit, it will detail what it found and where to go to check the secret.

If the detected secret is a false positive, there are two options to resolve this, and prevent your commit from being blocked: inline allowlisting (recommended) or updating .secrets.baseline.

In either case, if an actual secret is detected (or a combination of actual secrets and false positives), first remove the actual secret before following either of these processes.

Inline allowlisting (recommended)

To exclude a false positive, add a pragma comment such as:

secret = "Password123"  # pragma: allowlist secret

or

#  pragma: allowlist nextline secret
secret = "Password123"

If the detected secret is actually a secret (or other sensitive information), remove the secret and re-commit; there is no need to add any pragma comments.

If your commit contains a mixture of false positives and actual secrets, remove the actual secrets first before adding pragma comments to the false positives.

Updating .secrets.baseline

To exclude a false positive, you can also update the .secrets.baseline by repeating the same two commands as in the initial setup.

During auditing, if the detected secret is actually a secret (or other sensitive information), remove the secret and re-commit. There is no need to update the .secrets.baseline file in this case.

If your commit contains a mixture of false positives and actual secrets, remove the actual secrets first before updating and auditing the .secrets.baseline file.

Keeping specific Jupyter notebook outputs

It may be necessary or useful to keep certain output cells of a Jupyter notebook, for example charts or graphs visualising some set of data. To do this, according to the documentation for the nbstripout package, either:

  1. Add a keep_output tag to the desired cell; or
  2. Add "keep_output": true to the desired cell's metadata.

You can access cell tags or metadata in Jupyter by enabling the "Tags" or "Edit Metadata" toolbar (View > Cell Toolbar > Tags; View > Cell Toolbar > Edit Metadata). For the tags approach, enter keep_output in the text field for each desired cell, and press the "Add tag" button. For the metadata approach, press the "Edit Metadata" button on each desired cell, and edit the metadata to look like this:

{
  "keep_output": true
}

This will tell the hook not to strip the resulting output of the desired cell(s), allowing the output(s) to be committed.

ℹ️ Currently (March 2020) there is no way to add tags and/or metadata to Google Colab notebooks. It's strongly suggested that you download the Colab as a .ipynb file, and edit tags and/or metadata using Jupyter before committing the code if you want to keep some outputs.