diff --git a/README.md b/README.md
index d67e37c..27b05cd 100644
--- a/README.md
+++ b/README.md
@@ -16,6 +16,7 @@
- [Configuration](#configuration)
- [Further Research](#further-research)
- [Model Replication](#model-replication)
+- [Model Replication and Exploration w/ Docker](#model-replication-and-exploration-with-docker)
- [Caveats](#caveats)
- [Citing Deep Classiflie](#citing-deep-classiflie)
- [References and Notes](#references-and-notes)
@@ -265,7 +266,7 @@ N.B. before you begin, the core external dependency is admin access to a mariadb
```
2. install [conda](https://docs.conda.io/projects/conda/en/latest/user-guide/install/download.html#anaconda-or-miniconda) if necessary. Then create and activate deep_classiflie virtual env:
```shell
- conda env create -f ./deep_classiflie/utils/deep_classiflie.yml
+ conda env create -f ./deep_classiflie/assets/deep_classiflie.yml
conda activate deep_classiflie
```
3. clone captum and HuggingFace's transformers repos. Install transformers binaries.:
@@ -428,6 +429,66 @@ N.B. before you begin, the core external dependency is admin access to a mariadb
---
+
+### Model Replication and Exploration with Docker
+Instructions
+
+
+
+
+As of writing (2020.10.11), Docker Compose does not fully support GPU provisioning so using the docker cli w/ --gpus flag here.
+
+1. Pull image from docker hub
+ ```shell
+ sudo docker pull speediedan/deep_classiflie:v0.1.3
+ ```
+2. Recursively train model using latest dataset.
+ - create a local directory to bind mount and use for exploring experiment output and start training container
+ ```shell
+ mkdir /tmp/docker_experiment_output
+ sudo docker container run --rm -d --gpus all --mount type=bind,source=/tmp/docker_experiment_output,target=/experiments --name deep_classiflie_train deep_classiflie:v0.1.3 \
+ conda run -n deep_classiflie python deep_classiflie.py --config /home/deep_classiflie/repos/deep_classiflie/configs/docker_train_albertbase.yaml
+ ```
+ - run tensorboard container to follow training progress (~6 hrs on a single GPU)
+ ```
+ sudo docker container run --rm -d --gpus all --mount type=bind,source=/tmp/docker_experiment_output,target=/experiments -p 6006:6006 --workdir /experiments/deep_classiflie/logs --name deep_classiflie_tb deep_classiflie:v0.1.3 conda run -n deep_classiflie tensorboard --host 0.0.0.0 --logdir=/experiments/deep_classiflie/logs --reload_multifile=true
+ ```
+3. Use a trained checkpoint to evaluate test performance
+ - start the container with a local bind mount
+ ```shell
+ sudo docker container run --rm -it --gpus all --mount type=bind,source=/tmp/docker_experiment_output,target=/experiments --name deep_classiflie_explore deep_classiflie:v0.1.3
+ ```
+ - update the docker_test_only.yaml file, passing the desired inference path (e.g. /experiments/deep_classiflie/checkpoints/20201010172113/checkpoint-0.5595-29-148590.pt)
+ ```shell
+ vi configs/docker_test_only.yaml
+ ...
+ inference_ckpt: "/experiments/deep_classiflie/checkpoints/20201010172113/checkpoint-0.5595-29-148590.pt"
+ ...
+ - evaluate on test set
+ ```shell
+ conda run -n deep_classiflie python deep_classiflie.py --config /home/deep_classiflie/repos/deep_classiflie/configs/docker_test_only.yaml
+ ```
+4. Run custom predictions
+ - update model checkpoint used for predictions with the one you trained
+ ```shell
+ vi /home/deep_classiflie/repos/deep_classiflie/configs/docker_cust_predict.yaml
+ ...
+ inference_ckpt: "/experiments/deep_classiflie/checkpoints/20201010172113/checkpoint-0.5595-29-148590.pt"
+ ...
+ - add tweets or statements to do inference/interpretation on as desired by modifying /home/deep_classiflie/datasets/explore_pred_interpretations.json
+ - generate predictions
+ ```shell
+ conda run -n deep_classiflie python deep_classiflie.py --config /home/deep_classiflie/repos/deep_classiflie/configs/docker_cust_predict.yaml --pred_inputs /home/deep_classiflie/datasets/explore_pred_interpretations.json
+ ```
+ - review prediction interpretation card in local host browser,
+ ```shell
+ chrome /tmp/docker_experiment_output/deep_classiflie/logs/20201011203013/inference_output/example_stmt_1_0.png
+ ```
+
+
+
+---
+
### Caveats
diff --git a/assets/Dockerfile b/assets/Dockerfile
new file mode 100644
index 0000000..13a65b5
--- /dev/null
+++ b/assets/Dockerfile
@@ -0,0 +1,55 @@
+FROM nvidia/cuda:10.2-base-ubuntu18.04
+ARG USERNAME
+COPY . /tmp/build
+RUN ls /tmp/build
+VOLUME /experiments/${USERNAME}
+# Install some basic utilities and create non-root user
+RUN apt-get update && apt-get install -y \
+ curl \
+ ca-certificates \
+ sudo \
+ git \
+ unzip \
+ bzip2 \
+ libx11-6 \
+ && apt-get -y autoremove \
+ && apt-get clean autoclean \
+ && rm -rf /var/lib/apt/lists/{apt,dpkg,cache,log} /var/tmp/* \
+ && adduser --disabled-password --gecos '' --shell /bin/bash ${USERNAME} \
+ && chown "${USERNAME}":"${USERNAME}" /home/${USERNAME} /tmp/build -R \
+ && chown "${USERNAME}":"${USERNAME}" /experiments/${USERNAME} -R \
+ && echo "${USERNAME} ALL=(ALL) NOPASSWD:ALL" > /etc/sudoers.d/90-${USERNAME}
+USER ${USERNAME}
+ENV HOME=/home/${USERNAME}
+ENV DC_BASE="${HOME}/repos/${USERNAME}" \
+ PYTHONPATH="${HOME}/repos/${USERNAME}:${HOME}/repos/captum:${HOME}/repos/transformers" \
+ CONDA_AUTO_UPDATE_CONDA=false \
+ TARGET_ENV=${USERNAME}
+RUN mkdir -p /home/${USERNAME}/repos /home/${USERNAME}/datasets/model_cache/${USERNAME}
+# Create a docker volume for the container
+WORKDIR /home/${USERNAME}/repos
+RUN git clone https://github.com/pytorch/captum.git \
+ && git clone https://github.com/huggingface/transformers \
+ && git clone https://github.com/speediedan/deep_classiflie.git
+RUN unzip /tmp/build/dc_ds.zip -d /home/${USERNAME}/datasets \
+ && unzip /tmp/build/dc_model_alpha.zip -d /home/${USERNAME}/datasets/model_cache/${USERNAME}
+RUN curl -sLo ~/miniconda.sh https://repo.anaconda.com/miniconda/Miniconda3-py37_4.8.3-Linux-x86_64.sh \
+ && chmod +x ~/miniconda.sh \
+ && ~/miniconda.sh -b -p ~/miniconda \
+ && rm ~/miniconda.sh
+ENV PATH=$HOME/miniconda/bin:$PATH \
+ CONDA_DEFAULT_ENV=$TARGET_ENV
+RUN conda update -n base -c defaults conda
+RUN conda env create -f /tmp/build/deep_classiflie.yml -n ${USERNAME} \
+ && conda clean -ya
+WORKDIR /home/${USERNAME}/repos/transformers
+RUN conda run -n ${TARGET_ENV} pip install . \
+ && echo ". $HOME/miniconda/etc/profile.d/conda.sh" >> ~/.profile
+# Make RUN commands use the bash shell:
+SHELL ["/bin/bash", "-c"]
+RUN conda init bash \
+ && rm -rf /tmp/build \
+ && ls $HOME \
+ && env
+WORKDIR $DC_BASE
+ENTRYPOINT conda run -n $TARGET_ENV python ./deep_classiflie.py
diff --git a/assets/Humor-Sans-1.0.ttf b/assets/Humor-Sans-1.0.ttf
new file mode 100644
index 0000000..d642b12
Binary files /dev/null and b/assets/Humor-Sans-1.0.ttf differ
diff --git a/utils/deep_classiflie.yml b/assets/deep_classiflie.yml
similarity index 100%
rename from utils/deep_classiflie.yml
rename to assets/deep_classiflie.yml
diff --git a/assets/detailed_report.css b/assets/detailed_report.css
index 3e24383..7f46393 100644
--- a/assets/detailed_report.css
+++ b/assets/detailed_report.css
@@ -5,7 +5,7 @@
}
@font-face {
font-family: "Humor Sans";
-src: url("../docs/assets/Humor-Sans-1.0.ttf") format('truetype');
+src: url("Humor-Sans-1.0.ttf") format('truetype');
}
body {
margin: 2px;
diff --git a/assets/entrypoint.sh b/assets/entrypoint.sh
new file mode 100644
index 0000000..afc93c3
--- /dev/null
+++ b/assets/entrypoint.sh
@@ -0,0 +1,4 @@
+#!/bin/bash --login
+set -e
+conda activate $TARGET_ENV
+exec "$@"
diff --git a/configs/cust_predict.yaml b/configs/cust_predict.yaml
index f959bdf..04ff61d 100644
--- a/configs/cust_predict.yaml
+++ b/configs/cust_predict.yaml
@@ -1,5 +1,6 @@
experiment:
- inference_ckpt: "/home/speediedan/experiments/deep_classiflie/checkpoints/20200816114940/checkpoint-0.0000-swa_best_2_ckpts--1-0.pt"
+ inference_ckpt: "/home/speediedan/experiments/deep_classiflie/checkpoints/20200911144157/checkpoint-0.0000-swa_best_2_ckpts--1-0.pt"
+ #inference_ckpt: "/home/speediedan/experiments/deep_classiflie_feat/checkpoints/20200901084410/checkpoint-0.0000-swa_best_2_ckpts--1-0.pt"
predict_only: True
debug:
debug_enabled: False
diff --git a/configs/docker_cust_predict.yaml b/configs/docker_cust_predict.yaml
new file mode 100644
index 0000000..32167c9
--- /dev/null
+++ b/configs/docker_cust_predict.yaml
@@ -0,0 +1,13 @@
+experiment:
+ inference_ckpt: "/experiments/deep_classiflie/checkpoints/20201010172113/checkpoint-0.5595-29-148590.pt"
+ predict_only: True
+ debug:
+ debug_enabled: False
+ dirs:
+ experiments_base_dir: "/experiments" # defaults to {base_dir}/experiments
+ tmp_data_dir: "/home/deep_classiflie/datasets/dc_dataset_collection" # defaults to {raw_data_dir}/temp/{constants.APP_NAME}
+data_source:
+ skip_db_refresh: True
+inference:
+ interpret_preds: True
+ purge_intermediate_rpt_files: True # default is True, left here for debugging convenience
\ No newline at end of file
diff --git a/configs/docker_test_only.yaml b/configs/docker_test_only.yaml
new file mode 100644
index 0000000..53e91dc
--- /dev/null
+++ b/configs/docker_test_only.yaml
@@ -0,0 +1,12 @@
+experiment:
+ db_functionality_enabled: False # must set to True to generate reports, run dctweetbot, among other functions
+ inference_ckpt: "/experiments/deep_classiflie/checkpoints/20201010172113/checkpoint-0.5595-29-148590.pt" # note build_swa_from_ckpts will be ignored if inference_ckpt is present
+ debug:
+ debug_enabled: False
+ dirs:
+ experiments_base_dir: "/experiments" # defaults to {base_dir}/experiments
+ tmp_data_dir: "/home/deep_classiflie/datasets/dc_dataset_collection" # defaults to {raw_data_dir}/temp/{constants.APP_NAME}
+data_source:
+ # db_conf must be explictly specified only in dev mode or if db_conf is in a non-default location
+ # db_conf: "/home/speediedan/repos/edification/deep_classiflie_db/deep_classiflie_db.yaml"
+ skip_db_refresh: False
\ No newline at end of file
diff --git a/configs/docker_train_albertbase.yaml b/configs/docker_train_albertbase.yaml
new file mode 100644
index 0000000..47282dd
--- /dev/null
+++ b/configs/docker_train_albertbase.yaml
@@ -0,0 +1,20 @@
+experiment:
+ db_functionality_enabled: False
+ debug:
+ debug_enabled: True
+ use_debug_dataset: False
+ dirs:
+ experiments_base_dir: "/experiments" # defaults to {base_dir}/experiments
+ tmp_data_dir: "/home/deep_classiflie/datasets/dc_dataset_collection" # defaults to {raw_data_dir}/temp/{constants.APP_NAME}
+data_source:
+ skip_db_refresh: True
+trainer:
+ # restart_training_ckpt: "/experiments/deep_classiflie/checkpoints/20200826121309/checkpoint-0.6039-11-1236.pt"
+ dump_model_thaw_sched_only: False
+ label_smoothing_enabled: True
+ # histogram_vars: ['classifier.weight', 'ctxt_embed.weight', 'albert.pooler.weight']
+ fine_tune_scheduler:
+ thaw_schedule: "DeepClassiflie_thaw_schedule.yaml"
+ earlystopping:
+ patience: 4
+ monitor_metric: "val_loss"
diff --git a/dataprep/dataprep.py b/dataprep/dataprep.py
index 0950ae3..bae0a0d 100644
--- a/dataprep/dataprep.py
+++ b/dataprep/dataprep.py
@@ -113,6 +113,7 @@ def build_ds_from_db_flow(self) -> None:
f"{constants.DB_WARNING_START} Since the specified cached dataset ({self.file_suffix[1:]}) "
f"has not been found or cannot be rebuilt, instance aborting. "
f"Please see repo readme for further details.")
+ logger.error(f"Current config: {self.config}")
sys.exit(0)
self.db_to_pkl()
ds_dict = {"train_recs": self.dataset_conf['num_train_recs'], "val_recs": self.dataset_conf['num_val_recs'],
diff --git a/deep_classiflie.py b/deep_classiflie.py
old mode 100644
new mode 100755
index 45f2707..07732dc
--- a/deep_classiflie.py
+++ b/deep_classiflie.py
@@ -15,11 +15,8 @@
import utils.constants as constants
from dataprep.dataprep import DatasetCollection
from utils.core_utils import create_lock_file
-from utils.dc_tweetbot import DCTweetBot
-from utils.dc_infsvc import DCInfSvc
from utils.envconfig import EnvConfig
from analysis.inference import Inference
-from analysis.model_analysis_rpt import ModelAnalysisRpt
from training.trainer import Trainer
import faulthandler
@@ -43,6 +40,7 @@ def main() -> Optional[NoReturn]:
if not config.experiment.db_functionality_enabled:
logger.error(f"{constants.DB_WARNING_START} Model analysis reports {constants.DB_WARNING_END}")
sys.exit(0)
+ from analysis.model_analysis_rpt import ModelAnalysisRpt
ModelAnalysisRpt(config)
else:
core_flow(config)
@@ -51,9 +49,11 @@ def main() -> Optional[NoReturn]:
def init_dc_service(config:MutableMapping, service_type: str) -> NoReturn:
if service_type == 'infsvc':
svc_name = 'inference service'
+ from utils.dc_infsvc import DCInfSvc
svc_module = DCInfSvc
else:
svc_name = 'tweetbot'
+ from utils.dc_tweetbot import DCTweetBot
svc_module = DCTweetBot
lock_file = None
try:
diff --git a/docs/index.md b/docs/index.md
index e6966a9..57ecf9b 100644
--- a/docs/index.md
+++ b/docs/index.md
@@ -11,6 +11,7 @@
- [Configuration](#configuration)
- [Further Research](#further-research)
- [Model Replication](#model-replication)
+- [Model Replication and Exploration w/ Docker](#model-replication-and-exploration-with-docker)
- [Caveats](#caveats)
- [Citing Deep Classiflie](#citing-deep-classiflie)
- [References and Notes](#references-and-notes)
@@ -257,7 +258,7 @@ N.B. before you begin, the core external dependency is admin access to a mariadb
```
2. install [conda](https://docs.conda.io/projects/conda/en/latest/user-guide/install/download.html#anaconda-or-miniconda) if necessary. Then create and activate deep_classiflie virtual env:
```shell
- conda env create -f ./deep_classiflie/utils/deep_classiflie.yml
+ conda env create -f ./deep_classiflie/assets/deep_classiflie.yml
conda activate deep_classiflie
```
3. clone captum and HuggingFace's transformers repos. Install transformers binaries.:
@@ -420,6 +421,65 @@ N.B. before you begin, the core external dependency is admin access to a mariadb
---
+### Model Replication and Exploration with Docker
+Instructions
+
+
+
+
+As of writing (2020.10.11), Docker Compose does not fully support GPU provisioning so using the docker cli w/ --gpus flag here.
+
+1. Pull image from docker hub
+ ```shell
+ sudo docker pull speediedan/deep_classiflie:v0.1.3
+ ```
+2. Recursively train model using latest dataset.
+ - create a local directory to bind mount and use for exploring experiment output and start training container
+ ```shell
+ mkdir /tmp/docker_experiment_output
+ sudo docker container run --rm -d --gpus all --mount type=bind,source=/tmp/docker_experiment_output,target=/experiments --name deep_classiflie_train deep_classiflie:v0.1.3 \
+ conda run -n deep_classiflie python deep_classiflie.py --config /home/deep_classiflie/repos/deep_classiflie/configs/docker_train_albertbase.yaml
+ ```
+ - run tensorboard container to follow training progress (~6 hrs on a single GPU)
+ ```
+ sudo docker container run --rm -d --gpus all --mount type=bind,source=/tmp/docker_experiment_output,target=/experiments -p 6006:6006 --workdir /experiments/deep_classiflie/logs --name deep_classiflie_tb deep_classiflie:v0.1.3 conda run -n deep_classiflie tensorboard --host 0.0.0.0 --logdir=/experiments/deep_classiflie/logs --reload_multifile=true
+ ```
+3. Use a trained checkpoint to evaluate test performance
+ - start the container with a local bind mount
+ ```shell
+ sudo docker container run --rm -it --gpus all --mount type=bind,source=/tmp/docker_experiment_output,target=/experiments --name deep_classiflie_explore deep_classiflie:v0.1.3
+ ```
+ - update the docker_test_only.yaml file, passing the desired inference path (e.g. /experiments/deep_classiflie/checkpoints/20201010172113/checkpoint-0.5595-29-148590.pt)
+ ```shell
+ vi configs/docker_test_only.yaml
+ ...
+ inference_ckpt: "/experiments/deep_classiflie/checkpoints/20201010172113/checkpoint-0.5595-29-148590.pt"
+ ...
+ - evaluate on test set
+ ```shell
+ conda run -n deep_classiflie python deep_classiflie.py --config /home/deep_classiflie/repos/deep_classiflie/configs/docker_test_only.yaml
+ ```
+4. Run custom predictions
+ - update model checkpoint used for predictions with the one you trained
+ ```shell
+ vi /home/deep_classiflie/repos/deep_classiflie/configs/docker_cust_predict.yaml
+ ...
+ inference_ckpt: "/experiments/deep_classiflie/checkpoints/20201010172113/checkpoint-0.5595-29-148590.pt"
+ ...
+ - add tweets or statements to do inference/interpretation on as desired by modifying /home/deep_classiflie/datasets/explore_pred_interpretations.json
+ - generate predictions
+ ```shell
+ conda run -n deep_classiflie python deep_classiflie.py --config /home/deep_classiflie/repos/deep_classiflie/configs/docker_cust_predict.yaml --pred_inputs /home/deep_classiflie/datasets/explore_pred_interpretations.json
+ ```
+ - review prediction interpretation card in local host browser,
+ ```shell
+ chrome /tmp/docker_experiment_output/deep_classiflie/logs/20201011203013/inference_output/example_stmt_1_0.png
+ ```
+
+
+
+---
+
### Caveats
- [a] The distance threshold for filtering out "false truths" using base model embeddings matches falsehoods to their corresponding truths with high but imperfect accuracy. This fuzzy matching process will result in a modest upward performance bias in the test results. Model performance on datasets built using the noisy matching process (vs exclusively hash-based) improved by only ~2% globally with gains slightly disproportionately going to more confident buckets. This places a relatively low ceiling on the magnitude of the performance bias introduced through this filtering. The precise magnitude of this bias will be quantified in the future via one or both of the following methods ↩:
diff --git a/requirements.txt b/requirements.txt
deleted file mode 100644
index fb60914..0000000
--- a/requirements.txt
+++ /dev/null
@@ -1,37 +0,0 @@
-numpy~=1.19.1
-matplotlib~=3.3.1
-typing_extensions~=3.7.4.2
-traitlets>=4.3.3
-beautifulsoup4~=4.9.1
-nbformat>=5.0.7
-nbconvert>=5.6.1
-setuptools~=49.6.0
-dotmap~=1.2.20
-psutil~=5.7.0
-tweepy~=3.9.0
-mysql-connector-python~=8.0.18
-tqdm>=4.48.2
-transformers~=3.0.2
-cudf~=0.14.0
-cuml~=0.14.0
-weasyprint~=51
-adjusttext~=0.7.3.1
-pandas~=0.25.3
-bokeh>=2.1.1
-scipy~=1.5.2
-python-dotenv~=0.14.0
-dateparser~=0.7.6
-selenium~=3.141.0
-requests~=2.24.0
-pytz~=2020.1
-filelock~=3.0.12
-packaging~=20.4
-sacremoses~=0.0.43
-tokenizers~=0.8.1rc2
-regex>=2020.7.14
-sentencepiece~=0.1.91
-six~=1.15.0
-pillow~=7.2.0
-torch~=1.6.0
-scikit-learn~=0.23.2
-urllib3~=1.25.10
\ No newline at end of file
diff --git a/utils/Dockerfile b/utils/Dockerfile
deleted file mode 100644
index 9066db2..0000000
--- a/utils/Dockerfile
+++ /dev/null
@@ -1,2 +0,0 @@
-FROM nvidia/cuda:10.2-base
-CMD nvidia-smi
\ No newline at end of file
diff --git a/utils/envconfig.py b/utils/envconfig.py
index 5e4bb16..625d420 100644
--- a/utils/envconfig.py
+++ b/utils/envconfig.py
@@ -88,7 +88,7 @@ def cfg_dirs(self) -> None:
self._config.experiment.dirs.model_cache_dir = self._config.experiment.dirs.model_cache_dir or \
cust_model_cache_dir
self._config.experiment.dirs.dcbot_log_dir = self._config.experiment.dirs.dcbot_log_dir or \
- f"{self._config.experiment.dirs.experiments_base_dir}/dcbot"
+ f"{self._config.experiment.dirs.experiments_base_dir}/dcbot"
self._config.experiment.dirs.rpt_arc_dir = self._config.experiment.dirs.rpt_arc_dir or \
f"{self._config.experiment.dirs.base_dir}/repos/" \
f"{constants.APP_NAME}_history"