Merge branch 'deep_classiflie_feat' into master

speediedan · Oct 11, 2020 · 89e4fb7 · 89e4fb7
2 parents 0fe43a9 + 416fc0a
commit 89e4fb7
Show file tree

Hide file tree

Showing 16 changed files with 235 additions and 47 deletions.
diff --git a/README.md b/README.md
@@ -16,6 +16,7 @@
 - [Configuration](#configuration)
 - [Further Research](#further-research)
 - [Model Replication](#model-replication)
+- [Model Replication and Exploration w/ Docker](#model-replication-and-exploration-with-docker)
 - [Caveats](#caveats)
 - [Citing Deep Classiflie](#citing-deep-classiflie)
 - [References and Notes](#references-and-notes)
@@ -265,7 +266,7 @@ N.B. before you begin, the core external dependency is admin access to a mariadb
     ```
 2. install [conda](https://docs.conda.io/projects/conda/en/latest/user-guide/install/download.html#anaconda-or-miniconda) if necessary. Then create and activate deep_classiflie virtual env:
     ```shell
-    conda env create -f ./deep_classiflie/utils/deep_classiflie.yml
+    conda env create -f ./deep_classiflie/assets/deep_classiflie.yml
     conda activate deep_classiflie
     ```
 3. clone captum and HuggingFace's transformers repos. Install transformers binaries.:
@@ -428,6 +429,66 @@ N.B. before you begin, the core external dependency is admin access to a mariadb
 </details>
 
 ---
+
+### Model Replication and Exploration with Docker
+<details><summary markdown="span"><strong>Instructions</strong>
+</summary>
+
+
+<br/>
+As of writing (2020.10.11), Docker Compose does not fully support GPU provisioning so using the docker cli w/ --gpus flag here.
+
+1. Pull image from docker hub
+    ```shell
+    sudo docker pull speediedan/deep_classiflie:v0.1.3
+    ```
+2. Recursively train model using latest dataset.
+    - create a local directory to bind mount and use for exploring experiment output and start training container
+      ```shell 
+      mkdir /tmp/docker_experiment_output
+      sudo docker container run --rm -d --gpus all --mount type=bind,source=/tmp/docker_experiment_output,target=/experiments --name deep_classiflie_train deep_classiflie:v0.1.3  \
+      conda run -n deep_classiflie python deep_classiflie.py --config /home/deep_classiflie/repos/deep_classiflie/configs/docker_train_albertbase.yaml 
+      ```
+    - run tensorboard container to follow training progress (~6 hrs on a single GPU)
+      ```
+      sudo docker container run --rm -d --gpus all --mount type=bind,source=/tmp/docker_experiment_output,target=/experiments -p 6006:6006 --workdir /experiments/deep_classiflie/logs --name deep_classiflie_tb deep_classiflie:v0.1.3 conda run -n deep_classiflie tensorboard --host 0.0.0.0 --logdir=/experiments/deep_classiflie/logs --reload_multifile=true
+      ```
+3. Use a trained checkpoint to evaluate test performance
+   - start the container with a local bind mount
+       ```shell
+       sudo docker container run --rm -it --gpus all --mount type=bind,source=/tmp/docker_experiment_output,target=/experiments --name deep_classiflie_explore deep_classiflie:v0.1.3 
+       ```
+    - update the docker_test_only.yaml file, passing the desired inference path (e.g. /experiments/deep_classiflie/checkpoints/20201010172113/checkpoint-0.5595-29-148590.pt)
+        ```shell
+        vi configs/docker_test_only.yaml
+        ...
+        inference_ckpt: "/experiments/deep_classiflie/checkpoints/20201010172113/checkpoint-0.5595-29-148590.pt"
+        ...
+    - evaluate on test set
+      ```shell
+      conda run -n deep_classiflie python deep_classiflie.py --config /home/deep_classiflie/repos/deep_classiflie/configs/docker_test_only.yaml
+      ```
+4. Run custom predictions
+    - update model checkpoint used for predictions with the one you trained
+       ```shell
+        vi /home/deep_classiflie/repos/deep_classiflie/configs/docker_cust_predict.yaml
+        ...
+        inference_ckpt: "/experiments/deep_classiflie/checkpoints/20201010172113/checkpoint-0.5595-29-148590.pt"
+        ...
+    - add tweets or statements to do inference/interpretation on as desired by modifying /home/deep_classiflie/datasets/explore_pred_interpretations.json
+    - generate predictions
+      ```shell 
+      conda run -n deep_classiflie python deep_classiflie.py --config /home/deep_classiflie/repos/deep_classiflie/configs/docker_cust_predict.yaml --pred_inputs /home/deep_classiflie/datasets/explore_pred_interpretations.json
+      ```
+    - review prediction interpretation card in local host browser, 
+      ```shell 
+      chrome /tmp/docker_experiment_output/deep_classiflie/logs/20201011203013/inference_output/example_stmt_1_0.png
+      ```
+
+</details>
+
+---
+
 ### Caveats
 
 <ul class="fnum">

diff --git a/assets/Dockerfile b/assets/Dockerfile
@@ -0,0 +1,55 @@
+FROM nvidia/cuda:10.2-base-ubuntu18.04
+ARG USERNAME
+COPY . /tmp/build
+RUN ls /tmp/build
+VOLUME /experiments/${USERNAME}
+# Install some basic utilities and create non-root user
+RUN apt-get update && apt-get install -y \
+    curl \
+    ca-certificates \
+    sudo \
+    git \
+    unzip \
+    bzip2 \
+    libx11-6 \
+ && apt-get -y autoremove \
+ && apt-get clean autoclean \
+ && rm -rf /var/lib/apt/lists/{apt,dpkg,cache,log} /var/tmp/* \
+ && adduser --disabled-password --gecos '' --shell /bin/bash ${USERNAME} \ 
+ && chown "${USERNAME}":"${USERNAME}" /home/${USERNAME} /tmp/build -R \
+ && chown "${USERNAME}":"${USERNAME}" /experiments/${USERNAME} -R \
+ && echo "${USERNAME} ALL=(ALL) NOPASSWD:ALL" > /etc/sudoers.d/90-${USERNAME}
+USER ${USERNAME}
+ENV HOME=/home/${USERNAME}
+ENV DC_BASE="${HOME}/repos/${USERNAME}" \
+    PYTHONPATH="${HOME}/repos/${USERNAME}:${HOME}/repos/captum:${HOME}/repos/transformers" \
+    CONDA_AUTO_UPDATE_CONDA=false \
+    TARGET_ENV=${USERNAME}
+RUN mkdir -p /home/${USERNAME}/repos /home/${USERNAME}/datasets/model_cache/${USERNAME}
+# Create a docker volume for the container
+WORKDIR /home/${USERNAME}/repos
+RUN  git clone https://github.com/pytorch/captum.git \
+ && git clone https://github.com/huggingface/transformers \
+ && git clone https://github.com/speediedan/deep_classiflie.git
+RUN unzip /tmp/build/dc_ds.zip -d /home/${USERNAME}/datasets \
+ && unzip /tmp/build/dc_model_alpha.zip -d /home/${USERNAME}/datasets/model_cache/${USERNAME}
+RUN curl -sLo ~/miniconda.sh https://repo.anaconda.com/miniconda/Miniconda3-py37_4.8.3-Linux-x86_64.sh \
+ && chmod +x ~/miniconda.sh \
+ && ~/miniconda.sh -b -p ~/miniconda \
+ && rm ~/miniconda.sh
+ENV PATH=$HOME/miniconda/bin:$PATH \
+    CONDA_DEFAULT_ENV=$TARGET_ENV
+RUN conda update -n base -c defaults conda
+RUN conda env create -f /tmp/build/deep_classiflie.yml -n ${USERNAME} \
+ && conda clean -ya
+WORKDIR /home/${USERNAME}/repos/transformers
+RUN conda run -n ${TARGET_ENV} pip install . \
+ && echo ". $HOME/miniconda/etc/profile.d/conda.sh" >> ~/.profile
+# Make RUN commands use the bash shell:
+SHELL ["/bin/bash", "-c"]
+RUN conda init bash \
+ && rm -rf /tmp/build \
+ && ls $HOME \
+ && env
+WORKDIR $DC_BASE
+ENTRYPOINT conda run -n $TARGET_ENV python ./deep_classiflie.py
diff --git a/assets/Humor-Sans-1.0.ttf b/assets/Humor-Sans-1.0.ttf
diff --git a/utils/deep_classiflie.yml → assets/deep_classiflie.yml b/utils/deep_classiflie.yml → assets/deep_classiflie.yml
diff --git a/assets/detailed_report.css b/assets/detailed_report.css
@@ -5,7 +5,7 @@
 }
 @font-face {
 font-family: "Humor Sans";
-src: url("../docs/assets/Humor-Sans-1.0.ttf") format('truetype');
+src: url("Humor-Sans-1.0.ttf") format('truetype');
 }
 body {
   margin: 2px;

diff --git a/assets/entrypoint.sh b/assets/entrypoint.sh
@@ -0,0 +1,4 @@
+#!/bin/bash --login
+set -e
+conda activate $TARGET_ENV
+exec "$@"
diff --git a/configs/cust_predict.yaml b/configs/cust_predict.yaml
@@ -1,5 +1,6 @@
 experiment:
-  inference_ckpt: "/home/speediedan/experiments/deep_classiflie/checkpoints/20200816114940/checkpoint-0.0000-swa_best_2_ckpts--1-0.pt"
+  inference_ckpt: "/home/speediedan/experiments/deep_classiflie/checkpoints/20200911144157/checkpoint-0.0000-swa_best_2_ckpts--1-0.pt"
+  #inference_ckpt: "/home/speediedan/experiments/deep_classiflie_feat/checkpoints/20200901084410/checkpoint-0.0000-swa_best_2_ckpts--1-0.pt"
   predict_only: True
   debug:
     debug_enabled: False

diff --git a/configs/docker_cust_predict.yaml b/configs/docker_cust_predict.yaml
@@ -0,0 +1,13 @@
+experiment:
+  inference_ckpt: "/experiments/deep_classiflie/checkpoints/20201010172113/checkpoint-0.5595-29-148590.pt"
+  predict_only: True
+  debug:
+    debug_enabled: False
+  dirs:
+    experiments_base_dir: "/experiments" # defaults to {base_dir}/experiments
+    tmp_data_dir: "/home/deep_classiflie/datasets/dc_dataset_collection" # defaults to {raw_data_dir}/temp/{constants.APP_NAME}
+data_source:
+  skip_db_refresh: True
+inference:
+  interpret_preds: True
+  purge_intermediate_rpt_files: True # default is True, left here for debugging convenience
diff --git a/configs/docker_test_only.yaml b/configs/docker_test_only.yaml
@@ -0,0 +1,12 @@
+experiment:
+  db_functionality_enabled: False # must set to True to generate reports, run dctweetbot, among other functions
+  inference_ckpt: "/experiments/deep_classiflie/checkpoints/20201010172113/checkpoint-0.5595-29-148590.pt" # note build_swa_from_ckpts will be ignored if inference_ckpt is present
+  debug:
+    debug_enabled: False
+  dirs:
+    experiments_base_dir: "/experiments" # defaults to {base_dir}/experiments
+    tmp_data_dir: "/home/deep_classiflie/datasets/dc_dataset_collection" # defaults to {raw_data_dir}/temp/{constants.APP_NAME}
+data_source:
+  # db_conf must be explictly specified only in dev mode or if db_conf is in a non-default location
+  # db_conf: "/home/speediedan/repos/edification/deep_classiflie_db/deep_classiflie_db.yaml"
+  skip_db_refresh: False
diff --git a/configs/docker_train_albertbase.yaml b/configs/docker_train_albertbase.yaml
@@ -0,0 +1,20 @@
+experiment:
+  db_functionality_enabled: False
+  debug:
+    debug_enabled: True
+    use_debug_dataset: False
+  dirs:
+    experiments_base_dir: "/experiments" # defaults to {base_dir}/experiments
+    tmp_data_dir: "/home/deep_classiflie/datasets/dc_dataset_collection" # defaults to {raw_data_dir}/temp/{constants.APP_NAME}
+data_source:
+  skip_db_refresh: True
+trainer:
+  # restart_training_ckpt: "/experiments/deep_classiflie/checkpoints/20200826121309/checkpoint-0.6039-11-1236.pt"
+  dump_model_thaw_sched_only: False
+  label_smoothing_enabled: True
+  # histogram_vars: ['classifier.weight', 'ctxt_embed.weight', 'albert.pooler.weight']
+  fine_tune_scheduler:
+    thaw_schedule: "DeepClassiflie_thaw_schedule.yaml"
+  earlystopping:
+    patience: 4
+    monitor_metric: "val_loss"
diff --git a/dataprep/dataprep.py b/dataprep/dataprep.py
@@ -113,6 +113,7 @@ def build_ds_from_db_flow(self) -> None:
                 f"{constants.DB_WARNING_START} Since the specified cached dataset ({self.file_suffix[1:]}) "
                 f"has not been found or cannot be rebuilt, instance aborting. "
                 f"Please see repo readme for further details.")
+            logger.error(f"Current config: {self.config}")
             sys.exit(0)
         self.db_to_pkl()
         ds_dict = {"train_recs": self.dataset_conf['num_train_recs'], "val_recs": self.dataset_conf['num_val_recs'],

diff --git a/deep_classiflie.py b/deep_classiflie.py
@@ -15,11 +15,8 @@
 import utils.constants as constants
 from dataprep.dataprep import DatasetCollection
 from utils.core_utils import create_lock_file
-from utils.dc_tweetbot import DCTweetBot
-from utils.dc_infsvc import DCInfSvc
 from utils.envconfig import EnvConfig
 from analysis.inference import Inference
-from analysis.model_analysis_rpt import ModelAnalysisRpt
 from training.trainer import Trainer
 import faulthandler
 
@@ -43,6 +40,7 @@ def main() -> Optional[NoReturn]:
         if not config.experiment.db_functionality_enabled:
             logger.error(f"{constants.DB_WARNING_START} Model analysis reports {constants.DB_WARNING_END}")
             sys.exit(0)
+        from analysis.model_analysis_rpt import ModelAnalysisRpt
         ModelAnalysisRpt(config)
     else:
         core_flow(config)
@@ -51,9 +49,11 @@ def main() -> Optional[NoReturn]:
 def init_dc_service(config:MutableMapping, service_type: str) -> NoReturn:
     if service_type == 'infsvc':
         svc_name = 'inference service'
+        from utils.dc_infsvc import DCInfSvc
         svc_module = DCInfSvc
     else:
         svc_name = 'tweetbot'
+        from utils.dc_tweetbot import DCTweetBot
         svc_module = DCTweetBot
     lock_file = None
     try:

diff --git a/docs/index.md b/docs/index.md
@@ -11,6 +11,7 @@
 - [Configuration](#configuration)
 - [Further Research](#further-research)
 - [Model Replication](#model-replication)
+- [Model Replication and Exploration w/ Docker](#model-replication-and-exploration-with-docker)
 - [Caveats](#caveats)
 - [Citing Deep Classiflie](#citing-deep-classiflie)
 - [References and Notes](#references-and-notes)
@@ -257,7 +258,7 @@ N.B. before you begin, the core external dependency is admin access to a mariadb
     ```
 2. install [conda](https://docs.conda.io/projects/conda/en/latest/user-guide/install/download.html#anaconda-or-miniconda) if necessary. Then create and activate deep_classiflie virtual env:
     ```shell
-    conda env create -f ./deep_classiflie/utils/deep_classiflie.yml
+    conda env create -f ./deep_classiflie/assets/deep_classiflie.yml
     conda activate deep_classiflie
     ```
 3. clone captum and HuggingFace's transformers repos. Install transformers binaries.:
@@ -420,6 +421,65 @@ N.B. before you begin, the core external dependency is admin access to a mariadb
 </details>
 
 ---
+### Model Replication and Exploration with Docker
+<details><summary markdown="span"><strong>Instructions</strong>
+</summary>
+
+
+<br/>
+As of writing (2020.10.11), Docker Compose does not fully support GPU provisioning so using the docker cli w/ --gpus flag here.
+
+1. Pull image from docker hub
+    ```shell
+    sudo docker pull speediedan/deep_classiflie:v0.1.3
+    ```
+2. Recursively train model using latest dataset.
+    - create a local directory to bind mount and use for exploring experiment output and start training container
+      ```shell 
+      mkdir /tmp/docker_experiment_output
+      sudo docker container run --rm -d --gpus all --mount type=bind,source=/tmp/docker_experiment_output,target=/experiments --name deep_classiflie_train deep_classiflie:v0.1.3  \
+      conda run -n deep_classiflie python deep_classiflie.py --config /home/deep_classiflie/repos/deep_classiflie/configs/docker_train_albertbase.yaml 
+      ```
+    - run tensorboard container to follow training progress (~6 hrs on a single GPU)
+      ```
+      sudo docker container run --rm -d --gpus all --mount type=bind,source=/tmp/docker_experiment_output,target=/experiments -p 6006:6006 --workdir /experiments/deep_classiflie/logs --name deep_classiflie_tb deep_classiflie:v0.1.3 conda run -n deep_classiflie tensorboard --host 0.0.0.0 --logdir=/experiments/deep_classiflie/logs --reload_multifile=true
+      ```
+3. Use a trained checkpoint to evaluate test performance
+   - start the container with a local bind mount
+       ```shell
+       sudo docker container run --rm -it --gpus all --mount type=bind,source=/tmp/docker_experiment_output,target=/experiments --name deep_classiflie_explore deep_classiflie:v0.1.3 
+       ```
+    - update the docker_test_only.yaml file, passing the desired inference path (e.g. /experiments/deep_classiflie/checkpoints/20201010172113/checkpoint-0.5595-29-148590.pt)
+        ```shell
+        vi configs/docker_test_only.yaml
+        ...
+        inference_ckpt: "/experiments/deep_classiflie/checkpoints/20201010172113/checkpoint-0.5595-29-148590.pt"
+        ...
+    - evaluate on test set
+      ```shell
+      conda run -n deep_classiflie python deep_classiflie.py --config /home/deep_classiflie/repos/deep_classiflie/configs/docker_test_only.yaml
+      ```
+4. Run custom predictions
+    - update model checkpoint used for predictions with the one you trained
+       ```shell
+        vi /home/deep_classiflie/repos/deep_classiflie/configs/docker_cust_predict.yaml
+        ...
+        inference_ckpt: "/experiments/deep_classiflie/checkpoints/20201010172113/checkpoint-0.5595-29-148590.pt"
+        ...
+    - add tweets or statements to do inference/interpretation on as desired by modifying /home/deep_classiflie/datasets/explore_pred_interpretations.json
+    - generate predictions
+      ```shell 
+      conda run -n deep_classiflie python deep_classiflie.py --config /home/deep_classiflie/repos/deep_classiflie/configs/docker_cust_predict.yaml --pred_inputs /home/deep_classiflie/datasets/explore_pred_interpretations.json
+      ```
+    - review prediction interpretation card in local host browser, 
+      ```shell 
+      chrome /tmp/docker_experiment_output/deep_classiflie/logs/20201011203013/inference_output/example_stmt_1_0.png
+      ```
+
+</details>
+
+---
+
 ### Caveats
 <ul class="fnum">
     <li> <span class="fnum" id="ca">[a]</span> The distance threshold for filtering out "false truths" using base model embeddings matches falsehoods to their corresponding truths with high but imperfect accuracy. This fuzzy matching process will result in a modest upward performance bias in the test results. Model performance on datasets built using the noisy matching process (vs exclusively hash-based) improved by only ~2% globally with gains slightly disproportionately going to more confident buckets. This places a relatively low ceiling on the magnitude of the performance bias introduced through this filtering. The precise magnitude of this bias will be quantified in the future via one or both of the following methods <a href="#aa">↩</a>:</li>

diff --git a/requirements.txt b/requirements.txt
diff --git a/utils/Dockerfile b/utils/Dockerfile
diff --git a/utils/envconfig.py b/utils/envconfig.py
@@ -88,7 +88,7 @@ def cfg_dirs(self) -> None:
         self._config.experiment.dirs.model_cache_dir = self._config.experiment.dirs.model_cache_dir or \
                                                        cust_model_cache_dir
         self._config.experiment.dirs.dcbot_log_dir = self._config.experiment.dirs.dcbot_log_dir or \
-                                                     f"{self._config.experiment.dirs.experiments_base_dir}/dcbot"
+                                                         f"{self._config.experiment.dirs.experiments_base_dir}/dcbot"
         self._config.experiment.dirs.rpt_arc_dir = self._config.experiment.dirs.rpt_arc_dir or \
                                                    f"{self._config.experiment.dirs.base_dir}/repos/" \
                                                    f"{constants.APP_NAME}_history"