Welcome to the runtime repository for the SFP Cervical Biopsy challenge. This repo contains the definition of the environment where your code submissions will run. It specifies both the operating system and the Python packages that will be available to your solution.
This repository has two primary uses for competitors:
- Testing your code submission: It lets you test your
submission.zip
file with a locally running version of the container so you don't have to wait for it to process on the competition site to find programming errors. - Requesting new packages in the official runtime: It lets you test adding additional packages to the official runtime environment. The official runtime uses Python 3.8. You can then submit a PR to request compatible packages be included in the official container image.
Make sure you have the prerequisites installed.
- A clone or fork of this repository
- Docker
- At least ~10GB of free space for both the training images and the Docker container images
- GNU make (optional, but useful for using the commands in the Makefile)
- AWS CLI (Optional, but useful for the
make sample-images
command which downloads images from S3)
Additional requirements to run with GPU:
- NVIDIA drivers (we check whether you have
nvidia-smi
installed and enabled to automatically determine whether to build the cpu or gpu image) - NVIDIA Docker container runtime
To test out the full execution pipeline, run the following commands in order in the terminal. These will get the Docker images, download a few training images (3 images, ~300 MB) with which to test the execution, zip up an example submission script, and submit that submission.zip to your locally running version of the container.
make pull
make sample-images
make pack-benchmark
make test-submission
You should see output like this in the end (and find the same logs in the folder submission/log.txt
):
docker run \
\
--network none \
--mount type=bind,source=/Users/bull/code/sfp-cervical-biopsy-runtime/inference-data,target=/inference/data,readonly \
--mount type=bind,source=/Users/bull/code/sfp-cervical-biopsy-runtime/submission,target=/inference/submission \
--shm-size 8g \
925a59ad1b19
GPU unavailable; falling back to CPU.
Unpacking submission...
Archive: ./submission/submission.zip
creating: ./assets/
inflating: ./main.py
Running submission with Python
Exporting submission.csv result...
Script completed its run.
================ END ================
Running make
at the terminal will tell you all the commands available in the repository:
Settings based on your machine:
CPU_OR_GPU=cpu # Whether or not to try to build, download, and run GPU versions
SUBMISSION_IMAGE=925a59ad1b19 # ID of the image that will be used when running test-submission
Available competition images:
drivendata/sfp-competition:cpu-local (925a59ad1b19); drivendata/sfp-competition:cpu-latest (09768914d125);
Available commands:
build Builds the container locally, tagging it with cpu-local or gpu-local
debug-container Start your locally built container and open a bash shell within the running container; same as submission setup except has network access
pack-benchmark Creates a submission/submission.zip file from whatever is in the "benchmark" folder
pull Pulls the official container tagged cpu-latest or gpu-latest from Docker hub
sample-images Download the 3 sample images from inference-data/test_metadata.csv (300 MB)
test-container Ensures that your locally built container can import all the Python packages successfully when it runs
test-submission Runs container with submission/submission.zip as your submission and inference-data as the data to work with
To find out more about what these commands do, keep reading! 👀
In order to test your code submission, you will need a code submission! You will need to train your model separately before creating your submission.zip
file that will perform inference.
NOTE: You WILL implement all of your training and experiments on your machine. It is highly recommended that you use the same package versions as we do in the inference runtime definition (cpu or gpu). They can be installed with conda
.
The submission format page contains the detailed information you need to prepare your submission.
Your submission will run inside a virtual operating system within the container that Docker runs on your machine (your computer is the "host" for the container). Within that virtual operating system, /inference/data
will point to whatever is in your host machine's inference-data
folder. /inference/submission
will point to whatever is in your host machine's submission
folder.
The script to execute the submission will unzip the contents of /inference/submission/submssion.zip
into the /inference
folder. This should create a main.py
file at /inference/main.py
.
We will then run a Python process in /inference
to execute the main.py
extracted from submission.zip
. This main.py
should read the submission_format.csv
and test_metadata.csv
files from /inference/data
. On the DrivenData platform, /inference/data
will have the actual test images, and the matching submission_format.csv
and test_metadata.csv
. In this repo, the make sample-images
command will download 3 images from the training set that match the metadata and submission_format provided in this repo. You can use these 3 images to ensure your submission runs, but the metadata, submission format, and images on the DrivenData platform will be the actual test set. (You could use whatever images you want from the training set for local testing as long as they are in the inference-data
folder and the corresponding entries for those files are in inference-data/test_metadata.csv
and inference-data/submission_format.csv
.
There is an example test_metadata.csv
in this repo that is actually just three rows from the train_metadata.csv
, which is available on the competition Data Download page. You should note that the actual test_metadata.csv
in the production container does not have all of the same columns available. The details of what columns are available in the can be found on the Problem Description page.
Running this command will download three images (~300 MB) to infernce-data
. You can see which images will be downloaded by looking at test_metadata.csv
. You can run a submission against these images for testing, after you download them by running:
make sample-images
As mentioned, when you execute the container locally, we will mount two subfolders in this repository into the containter:
- the
inference-data
directory is mounted in your locally running container as a read-only directory/inference/data
- the
submission
directory is mounted in your locally running container as/inference/submission
Your submission.zip
file must exist in the submission
folder on your host machine in order to be processed when you are testing execution locally.
The make pack-benchmark
command will create a zipfile of everything in the benchmark
folder and save that to submission/submission.zip
. To prepare the example submission and put it into the submission folder, just run this command:
make pack-benchmark
When you run this in the future, you should check and remove any existing submission/submission.zip
file. The make pack-benchmark
command does not overwrite this file (so we won't accidentally lose your work).
You can execute the same containers locally that we will use on the DrivenData platform to ensure your code will run.
Make sure you have the prerequisites installed. Then, you can run the following command within the repository to download the official image:
make pull
Once you have the container image downloaded locally, you will be able to run it to see if your inference code works. You can put your submission.zip
file in the submission
folder and run the following command (or just use the sample one that was created when you ran make pack-benchmark
above):
make test-submission
This will spin up the container, mount the local folders as drives within the folder, and follow the same steps that will run on the platform to unpack your submission and run inference against what it finds in the /inference/data
folder.
When you run make test-submission
the logs will be printed to the terminal. They will also be written to the submission
folder as log.txt
. You can always review that file and copy any versions of it that you want from the submission
folder. The errors there will help you to determine what changes you need to make so your code executes successfully.
We accept contributions to add dependencies to the runtime environment. To do so, follow these steps:
- Fork this repository
- Make your changes
- Test them and commit using git
- Open a pull request to this repository
If you're new to the GitHub contribution workflow, check out this guide by GitHub.
We use conda to manage Python dependencies. Add your new dependencies to both runtime/py-cpu.yml
and runtime/py-gpu.yml
. Please also add your dependencies to runtime/tests/test-installs.py
, below the line ## ADD ADDITIONAL REQUIREMENTS BELOW HERE ##
.
Your new dependency should follow the format in the yml and be pinned to a particular version of the package and build with conda.
Please test your new dependency locally by recreating the relevant conda environment using the appropriate CPU or GPU .yml
file. Try activating that environment and loading your new dependency.
Once that works, you'll want to make sure it works within the container as well. To do so, you can run:
make test-container
Note: this will run make build
to create the new container image with your changes automatically, but you could also do it manually.
This will build a local version of the official container and then run the import tests to make sure the relevant libraries can all be successfully loaded. This must pass before you submit a pull request to our repo to update the requirements. If it does not, you'll want to figure out what else you need to make the dependencies happy.
If you have problems, the following command will run a bash shell in the container to let you interact with it. Make sure to activate the conda
environment (e.g., conda activate py-cpu
) when you start the container if you want to test the dependencies!
make debug-container
After making and testing your changes, commit your changes and push to your fork. Then, when viewing the repository on github.com, you will see a banner that lets you open the pull request. For more detailed instructions, check out GitHub's help page.
Once you open the pull request, Github Actions will automatically try building the Docker images with your changes and run the tests in runtime/tests
. These tests take ~30 minutes to run through, and may take longer if your build is queued behind others. You will see a section on the pull request page that shows the status of the tests and links to the logs.
You may be asked to submit revisions to your pull request if the tests fail, or if a DrivenData team member asks for revisions. Pull requests won't be merged until all tests pass and the team has reviewed and approved the changes.
Thanks for reading! Enjoy the competition, and hit up the forums if you have any questions!