Skip to content

Latest commit

 

History

History
72 lines (36 loc) · 3.8 KB

README.md

File metadata and controls

72 lines (36 loc) · 3.8 KB

Detecting Gravitational Waves from BBH CBCs using Deep Neural Networks

Problem Definition

Using strain data segments from a network of ground-based GW detectors, predict whether a GW signal is present in the strain segment.

ML Task

Binary classification using SOTA deep neural networks of multivariate timeseries data.

Dataset

Source: G2Net Gravitational Wave Detection (https://www.kaggle.com/competitions/g2net-gravitational-wave-detection/overview)

Each data sample (npy file) contains 3 time series (1 for each detector) and each spans 2 sec and is sampled at 2,048 Hz.

train/ - the training set files, one npy file per observation; labels are provided in a files shown below

test/ - the test set files; you must predict the probability that the observation contains a gravitational wave

training_labels.csv - target values of whether the associated signal contains a gravitational wave

Codebase

  1. src: Scripts Folder

    GettingStarted.ipynb: Start here; it walks through EDA and the modelling pipeline.

    configs: Folder contains config files used in the training scripts to provide parameters of dataloaders, models, etc.

    Subfolder train holds config JSON files used to perform experimental runs using train.py or train_pl.py. File base.json which contains the basic config like choice of optimizer, no. of training epochs, etc. File optim.json contains the parameters for the chosen optimizer. File stop_early.json contains parameters for early stopping criteria. File lr_schd.json contains the parameters for the chosen learning rate scheduler. The rest of the JSON files correspond to the models; there will be one file per model. All relevant config files are read in the beginning of the training script.

    Subfolder sweep contains JSON files used to perform hyperparameter sweeps. Contains similar JSON files; just adapted to work with wandb sweep.

    dataloaders: Folder contains scripts that implememt useful functions to load data from local download.

    models: Folder contains implementations of SOTA DL models for time-series classification. Has a tsai folder containing source code of TSAI (https://github.com/timeseriesAI/tsai); contains SOTA models implementataions. Also has a pytorch folder for other custom implementations which may or may not use tsai modules.

    wandb_sweep.py: Entry-point script for model training and hyperparameter tuning with W&B logging.

    train.py: Entry-point script for single run of training and evaluation using vanilla PyTorch with W&B logging. Will produce a run directory in results directory with test-set eval results and optionally model weights.

    train_pl.py: Entry-point script for single run of training and evaluation with PyTorch Lightning with W&B logging. Will produce a run directory in results directory with test-set eval results and optionally model weights.

  2. results: Folder to organize run results.

  3. environment.yml: File to create python environment.

  4. wandb_api_key.txt: File to hold your wandb API key for logging to your wandb. dashboard.

     Instructions: 
    
     1. Create a wandb account at wandb.ai
     2. Create a new project.
     3. Copy your API key and paste on the first line of this file.
    

Quickstart Instructions

Note: Make sure anaconda or miniconda is installed on your system.

  1. Clone this repository.
  2. In environment.yml, specify appropriate env. name (first line; default: gwsearchenv) and path (last line); a standard practice is to specify this to be <path/to/anaconda or miniconda/dir>/envs/<name_of_env>.
  3. Create the environment by running this command - conda env create -f environment.yml.
  4. Activate your newly create environment conda activate gwsearchenv.
  5. Run GettingStarted.ipynb.