Skip to content

Commit

Permalink
Initial commit
Browse files Browse the repository at this point in the history
  • Loading branch information
hayleyschi authored Jul 5, 2024
0 parents commit d6373cb
Show file tree
Hide file tree
Showing 15 changed files with 1,071 additions and 0 deletions.
3 changes: 3 additions & 0 deletions .gitattributes
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# Don't allow windows checkouts to convert `\n` to `\r\n`, as this
# breaks stuff that is meant to be run in linux-in-docker
* text=auto eol=lf
26 changes: 26 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
# Credentials for accessing BigQuery
bq-service-account.json

# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class

# pyenv
.python-version

# jupyter
.ipynb_checkpoints
.ipython/
.jupyter/
.local/

# sublime test/pycharm
.idea/
.DS_Store

# Emacs
*~

# Linux trash directories
.Trash-*/
117 changes: 117 additions & 0 deletions DEVELOPERS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,117 @@
# The Bennett Institute's default notebook environment


## Running Jupyter Lab

You will need to have installed Git and Docker, please see the
[`INSTALLATION_GUIDE.md`](INSTALLATION_GUIDE.md) for further details.

Windows and Linux users should double-click the `jupyter-lab` file.
Users on macOS should double-click `jupyter-lab-mac-os` instead.

This will build a Docker image with all software requirements installed,
start a new Jupyter Lab server, and then provide a link to access this
server.

The first time you run this command it may take some time to download
and install the necessary software. Subsequent runs should be much
faster.


## Adding or updating Python packages

To install a new package:

* Add it to the bottom of the `requirements.in` file.
* From the Jupyter Labs Launcher page, choose "Terminal" (in the
"Other" section).
* Run:
```sh
pip-compile -v
```
This will automatically update your `requirements.txt` file to
include the new package. (The `-v` just means "verbose" so you can
see progess as this command can take a while to run.)
* Shutdown the Jupyter server and re-run the `jupyter-lab` launcher
script.
* Docker should automatically install the new package before starting
the server.

To update an existing package the process is the same as above except
that instead of running `pip-compile -v` you should run:
```sh
pip-compile -v --upgrade-package <package_name>
```

To update _all_ packages you can run:
```sh
pip-compile -v --upgrade
```


## Importing from `lib`

We used to have configuration which made Python files in the top-level
`lib` directory importable. However this did not work reliably and users
developed a variety of different workarounds. We now no longer make any
changes to Python's default import behaviour. Depending on what
workarounds you already have in place this may make no difference to
you, or it may break your imports.

If you find your imports no longer work and you have imports of the
form:
```python
from lib import my_custom_library
```
Then you should move the `lib` directory to be inside `notebooks` and it
should work.

If your imports no longer work and they are of the form:
```python
import my_custom_library
```
Then you can move `lib/my_custom_library.py` to
`notebooks/my_custom_library.py`.


## Diffing notebook files

By default, changes to `.ipynb` files do not produce easily readable
diffs in Github. One solution is to enable the "[Rich Jupyter Notebook
Diffs][richdiff]" preview feature. You can find this by clicking your
account icon in top right of the Github interface, choosing "Feature
preview", then "Rich Jupyter Notebooks Diffs" and then "Enable".

[richdiff]: https://github.blog/changelog/2023-03-01-feature-preview-rich-jupyter-notebook-diffs/

Another option is to use [Jupytext][jupytext], which we have pre-added to the
list of installed packages. You can use either the `percent` or
`markdown` formats to create notebooks which have naturally readable
diffs, at the cost of not being able to save the outputs of cells within
the notebook.

[jupytext]: https://jupytext.readthedocs.io/en/latest/

To use the "paired" format in which a traditional `.ipynb` file is saved
alongside a pure-Python variant inside a `diffable_python` directory,
add a file called `jupytext.toml` to the root of your repo containing
these lines:
```toml
[formats]
"notebooks/" = "ipynb"
"notebooks/diffable_python/" = "py:percent"
```

To prevent `.ipynb` files from showing in Github diffs add these lines
to the bottom of the `.gitattributes` files:
```
# Don't show notebook files when diffing in GitHub
notebooks/**/*ipynb linguist-generated=true
```


## How to invite people to cite

Once a project is completed, please use the instructions [here](https://guides.github.com/activities/citable-code/) to deposit a copy of your code with Zenodo. You will need a Zenodo free account to do this. This creates a DOI. Once you have this please add this in the readme.

If there is a paper associated with this code, please change the 'how to cite' section to the citation and DOI for the paper. This allows us to build up citation credit.
35 changes: 35 additions & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
# syntax=docker/dockerfile:1.2
FROM python:3.12-bookworm

# Install apt packages, using the host cache
COPY packages.txt /tmp/packages.txt
RUN --mount=target=/var/lib/apt/lists,type=cache,sharing=locked \
--mount=target=/var/cache/apt,type=cache,sharing=locked \
rm -f /etc/apt/apt.conf.d/docker-clean \
&& apt-get update \
&& sed 's/#.*//' /tmp/packages.txt \
| xargs apt-get -y --no-install-recommends install

# Install Python packages, using the host cache
COPY requirements.txt /tmp/requirements.txt
RUN --mount=type=cache,target=/root/.cache \
python -m pip install --no-deps --requirement /tmp/requirements.txt

# Without this, the Jupyter terminal defaults to /bin/sh which is much less
# usable
ENV SHELL=/bin/bash
# Jupyter writes various runtime files to $HOME so we need that to be writable
# regardless of which user we run as
ENV HOME=/tmp
# Allow Jupyter to be configured from within the workspace
ENV JUPYTER_CONFIG_DIR=/workspace/jupyter-config
# This variable is only needed for the `ebmdatalab` package:
# https://pypi.org/project/ebmdatalab/
ENV EBMDATALAB_BQ_CREDENTIALS_PATH=/workspace/bq-service-account.json

# Run any necessary post-installation tasks
COPY postinstall.sh /tmp/postinstall.sh
RUN /tmp/postinstall.sh

RUN mkdir /workspace
WORKDIR /workspace
79 changes: 79 additions & 0 deletions INSTALLATION_GUIDE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
## Docker enviroment

### Why Docker?

Software Engineers and Developers need to collaborate on software together. In our team, we use Jupyter
Notebooks to carry out research. Our work requires use of existing software packages. A common problem
is that different team members have different versions of these packages on their machine and work on
different operating systems. This means there are sometimes problems with running shared code. This is
particularly a problem when using a Windows machine.

Docker allows you to run identical software on all platforms. It creates containers that are guaranteed
to be identical on any system that can run Docker. The exact specification of the environment are
recorded in the `Dockerfile` and by distributing this file, it guarantees that all team members
have the same set up. Because each container is its own entity, team members can have multiple projects
on their machine at the same time without creating clashes between different versions of a package.

### Installation

####

Windows and Macs have different installation processes. Regardless of machine, you will have to install
Docker and make an account on the [Docker Website](https://docs.docker.com/).

Please follow installation instructions on the [Docker website](https://docs.docker.com/install/) for how to complete this step.
Docker Desktop is preferred over Docker Toolbox. Docker Desktop offers native support via Hyper-V containers, and so is preferred, but requires
Windows 10 64-bit Pro, Enterprise, or Education (Build 15063 or later), and Hyper-V and Containers
Windows features must be enabled (all of which are the case on our standard university laptop
installs; if Hyper-V has not been enabled, [follow the instructions here[(https://docs.microsoft.com/en-us/virtualization/hyper-v-on-windows/quick-start/enable-hyper-v)).

Docker Toolbox runs docker within a Linux virtualbox via Docker Machine, and therefore offers a functional but sub-optimal experience.



#### Windows

First install Docker Desktop onto your machine. Windows users who log into an Active Directory domain
(i.e. a network login) may find they lack permissions to start Docker correctly. If
so, follow [these instructions](https://github.com/docker/for-win/issues/785#issuecomment-344805180).

It is best to install using the default settings. You may be asked to enable Hyper-V and Containers,
which you should do. At least one user has had the box ticked on the screen but had to untick and tick again
to get this to enable correctly (Detailed in issue [#4](https://github.com/ebmdatalab/custom-docker/issues/4)).

When starting Docker, it takes a while to actually start up - up to 5 minutes. While it's doing so, an animation runs in the notification area:

![image](https://user-images.githubusercontent.com/211271/72052991-14a8c000-32be-11ea-948f-575a3c84bc3b.png)

Another notification appears when it's finished.

"Running" means there's a docker service running on your computer, to which you can connect using the command line. You can check it's up and running by opening a Command Prompt and entering `docker info`, which should output a load of diagnostics.

To be able to access the windows filesystem from the docker container (and therefore do development inside Jupyter with results appearing in a place visible to Git), you must explicitly share your hard drive in the Docker settings (click system tray docker icon; select "settings"; select "shared drives")

##### Network login issues

When installing from the office, and logged in as a network user, there have been permission problems
that have been solved by adding the special "Authenticated Users" group to the `docker-users` group, per [this comment](https://github.com/docker/for-win/issues/785#issuecomment-327237998) (screenshot of place to do it [here](https://github.com/docker/for-win/issues/785#issuecomment-344805180)).

Finally, note that when authentication changes (e.g. different logins), you sometimes have to reauthorise Docker's "Shared Drives" (click system tray docker icon; select "settings"; select "shared drives"; click Reset credentials; retick the drive to share; Apply)

#### Macs

Follow the instructions from the Docker website. You may have to restart your computer during installation.

Once you have Docker installed, you will need to log in. This can be accessed via the Applications Folder
and once you have logged in, you should have the Docker icon on the top taskbar (ie. next to battery icon, etc.)

![image](https://user-images.githubusercontent.com/25401512/75257439-dff4b780-57dc-11ea-9ae8-592e1570bc71.png)

Once this is running, you should be able to use Docker.

#### Gotchas

- The first time you use Docker or use a new Docker template, please be aware that it takes a long time to make the build.
It is easy to think that it has frozen, but it will make quite a while to get going.

If this is the case, look at this cat whilst you wait:

![Alt Text](https://media.giphy.com/media/vFKqnCdLPNOKc/giphy.gif)
46 changes: 46 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
# The Bennett Institute's skeleton notebook environment


## Getting started with this skeleton project

This is a skeleton project for creating a reproducible, cross-platform
analysis notebook, using Docker.

Developers and analysts using this skeleton for new development should
refer to [`DEVELOPERS.md`](DEVELOPERS.md) for instructions on getting
started. Update this `README.md` so it is a suitable introduction to
your project.


## Running Jupyter Lab

You will need to have installed Git and Docker, please see the
[`INSTALLATION_GUIDE.md`](INSTALLATION_GUIDE.md) for further details.

Windows and Linux users should double-click the `jupyter-lab` file.
Users on macOS should double-click `jupyter-lab-mac-os` instead.

Note: if double-clicking the `jupyter-lab` file opens the file in VS Code, you
should instead right-click on the file and open it with Git for Windows.

This will build a Docker image with all software requirements installed,
start a new Jupyter Lab server, and then provide a link to access this
server.

The first time you run this command it may take some time to download
and install the necessary software. Subsequent runs should be much
faster.

Note: if running the command fails with:

```
docker: Error response from daemon: user declined directory sharing C:\path\to\directory
```

you should open the Docker dashboard, and then under Settings -> Resources ->
FileSharing, add the appropriate path.


## How to cite

XXX Please change to either a paper (if published) or the repo. You may find it helpful to use Zenodo DOI (see [`DEVELOPERS.md`](DEVELOPERS.md#how-to-invite-people-to-cite) for further information)
Empty file added data/.gitkeep
Empty file.
3 changes: 3 additions & 0 deletions jupyter-config/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# Ignore runtime-generated config
/lab
/labconfig
14 changes: 14 additions & 0 deletions jupyter-lab-mac-os
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
#!/bin/bash

# We want launcher shell scripts which can be directly executed from the file
# manager GUI without requiring a terminal. On Windows this requires an
# extension of ".sh", on macOS this requires either no extension or the
# extension ".command". There's no way to jointly satisfy these requirements so
# we need two launchers with different extensions, one of which just
# immediately executes the other.

# Unset CDPATH to prevent `cd` potentially behaving unexpectedly
unset CDPATH
cd "$( dirname "${BASH_SOURCE[0]}")"

exec ./jupyter-lab.sh
Loading

0 comments on commit d6373cb

Please sign in to comment.