Skip to content

Commit

Permalink
Merge branch 'deep_classiflie_feat' into master
Browse files Browse the repository at this point in the history
  • Loading branch information
Daniel Dale committed Oct 11, 2020
2 parents 0fe43a9 + 416fc0a commit 89e4fb7
Show file tree
Hide file tree
Showing 16 changed files with 235 additions and 47 deletions.
63 changes: 62 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@
- [Configuration](#configuration)
- [Further Research](#further-research)
- [Model Replication](#model-replication)
- [Model Replication and Exploration w/ Docker](#model-replication-and-exploration-with-docker)
- [Caveats](#caveats)
- [Citing Deep Classiflie](#citing-deep-classiflie)
- [References and Notes](#references-and-notes)
Expand Down Expand Up @@ -265,7 +266,7 @@ N.B. before you begin, the core external dependency is admin access to a mariadb
```
2. install [conda](https://docs.conda.io/projects/conda/en/latest/user-guide/install/download.html#anaconda-or-miniconda) if necessary. Then create and activate deep_classiflie virtual env:
```shell
conda env create -f ./deep_classiflie/utils/deep_classiflie.yml
conda env create -f ./deep_classiflie/assets/deep_classiflie.yml
conda activate deep_classiflie
```
3. clone captum and HuggingFace's transformers repos. Install transformers binaries.:
Expand Down Expand Up @@ -428,6 +429,66 @@ N.B. before you begin, the core external dependency is admin access to a mariadb
</details>

---

### Model Replication and Exploration with Docker
<details><summary markdown="span"><strong>Instructions</strong>
</summary>


<br/>
As of writing (2020.10.11), Docker Compose does not fully support GPU provisioning so using the docker cli w/ --gpus flag here.

1. Pull image from docker hub
```shell
sudo docker pull speediedan/deep_classiflie:v0.1.3
```
2. Recursively train model using latest dataset.
- create a local directory to bind mount and use for exploring experiment output and start training container
```shell
mkdir /tmp/docker_experiment_output
sudo docker container run --rm -d --gpus all --mount type=bind,source=/tmp/docker_experiment_output,target=/experiments --name deep_classiflie_train deep_classiflie:v0.1.3 \
conda run -n deep_classiflie python deep_classiflie.py --config /home/deep_classiflie/repos/deep_classiflie/configs/docker_train_albertbase.yaml
```
- run tensorboard container to follow training progress (~6 hrs on a single GPU)
```
sudo docker container run --rm -d --gpus all --mount type=bind,source=/tmp/docker_experiment_output,target=/experiments -p 6006:6006 --workdir /experiments/deep_classiflie/logs --name deep_classiflie_tb deep_classiflie:v0.1.3 conda run -n deep_classiflie tensorboard --host 0.0.0.0 --logdir=/experiments/deep_classiflie/logs --reload_multifile=true
```
3. Use a trained checkpoint to evaluate test performance
- start the container with a local bind mount
```shell
sudo docker container run --rm -it --gpus all --mount type=bind,source=/tmp/docker_experiment_output,target=/experiments --name deep_classiflie_explore deep_classiflie:v0.1.3
```
- update the docker_test_only.yaml file, passing the desired inference path (e.g. /experiments/deep_classiflie/checkpoints/20201010172113/checkpoint-0.5595-29-148590.pt)
```shell
vi configs/docker_test_only.yaml
...
inference_ckpt: "/experiments/deep_classiflie/checkpoints/20201010172113/checkpoint-0.5595-29-148590.pt"
...
- evaluate on test set
```shell
conda run -n deep_classiflie python deep_classiflie.py --config /home/deep_classiflie/repos/deep_classiflie/configs/docker_test_only.yaml
```
4. Run custom predictions
- update model checkpoint used for predictions with the one you trained
```shell
vi /home/deep_classiflie/repos/deep_classiflie/configs/docker_cust_predict.yaml
...
inference_ckpt: "/experiments/deep_classiflie/checkpoints/20201010172113/checkpoint-0.5595-29-148590.pt"
...
- add tweets or statements to do inference/interpretation on as desired by modifying /home/deep_classiflie/datasets/explore_pred_interpretations.json
- generate predictions
```shell
conda run -n deep_classiflie python deep_classiflie.py --config /home/deep_classiflie/repos/deep_classiflie/configs/docker_cust_predict.yaml --pred_inputs /home/deep_classiflie/datasets/explore_pred_interpretations.json
```
- review prediction interpretation card in local host browser,
```shell
chrome /tmp/docker_experiment_output/deep_classiflie/logs/20201011203013/inference_output/example_stmt_1_0.png
```

</details>

---

### Caveats

<ul class="fnum">
Expand Down
55 changes: 55 additions & 0 deletions assets/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
FROM nvidia/cuda:10.2-base-ubuntu18.04
ARG USERNAME
COPY . /tmp/build
RUN ls /tmp/build
VOLUME /experiments/${USERNAME}
# Install some basic utilities and create non-root user
RUN apt-get update && apt-get install -y \
curl \
ca-certificates \
sudo \
git \
unzip \
bzip2 \
libx11-6 \
&& apt-get -y autoremove \
&& apt-get clean autoclean \
&& rm -rf /var/lib/apt/lists/{apt,dpkg,cache,log} /var/tmp/* \
&& adduser --disabled-password --gecos '' --shell /bin/bash ${USERNAME} \
&& chown "${USERNAME}":"${USERNAME}" /home/${USERNAME} /tmp/build -R \
&& chown "${USERNAME}":"${USERNAME}" /experiments/${USERNAME} -R \
&& echo "${USERNAME} ALL=(ALL) NOPASSWD:ALL" > /etc/sudoers.d/90-${USERNAME}
USER ${USERNAME}
ENV HOME=/home/${USERNAME}
ENV DC_BASE="${HOME}/repos/${USERNAME}" \
PYTHONPATH="${HOME}/repos/${USERNAME}:${HOME}/repos/captum:${HOME}/repos/transformers" \
CONDA_AUTO_UPDATE_CONDA=false \
TARGET_ENV=${USERNAME}
RUN mkdir -p /home/${USERNAME}/repos /home/${USERNAME}/datasets/model_cache/${USERNAME}
# Create a docker volume for the container
WORKDIR /home/${USERNAME}/repos
RUN git clone https://github.com/pytorch/captum.git \
&& git clone https://github.com/huggingface/transformers \
&& git clone https://github.com/speediedan/deep_classiflie.git
RUN unzip /tmp/build/dc_ds.zip -d /home/${USERNAME}/datasets \
&& unzip /tmp/build/dc_model_alpha.zip -d /home/${USERNAME}/datasets/model_cache/${USERNAME}
RUN curl -sLo ~/miniconda.sh https://repo.anaconda.com/miniconda/Miniconda3-py37_4.8.3-Linux-x86_64.sh \
&& chmod +x ~/miniconda.sh \
&& ~/miniconda.sh -b -p ~/miniconda \
&& rm ~/miniconda.sh
ENV PATH=$HOME/miniconda/bin:$PATH \
CONDA_DEFAULT_ENV=$TARGET_ENV
RUN conda update -n base -c defaults conda
RUN conda env create -f /tmp/build/deep_classiflie.yml -n ${USERNAME} \
&& conda clean -ya
WORKDIR /home/${USERNAME}/repos/transformers
RUN conda run -n ${TARGET_ENV} pip install . \
&& echo ". $HOME/miniconda/etc/profile.d/conda.sh" >> ~/.profile
# Make RUN commands use the bash shell:
SHELL ["/bin/bash", "-c"]
RUN conda init bash \
&& rm -rf /tmp/build \
&& ls $HOME \
&& env
WORKDIR $DC_BASE
ENTRYPOINT conda run -n $TARGET_ENV python ./deep_classiflie.py
Binary file added assets/Humor-Sans-1.0.ttf
Binary file not shown.
File renamed without changes.
2 changes: 1 addition & 1 deletion assets/detailed_report.css
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
}
@font-face {
font-family: "Humor Sans";
src: url("../docs/assets/Humor-Sans-1.0.ttf") format('truetype');
src: url("Humor-Sans-1.0.ttf") format('truetype');
}
body {
margin: 2px;
Expand Down
4 changes: 4 additions & 0 deletions assets/entrypoint.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
#!/bin/bash --login
set -e
conda activate $TARGET_ENV
exec "$@"
3 changes: 2 additions & 1 deletion configs/cust_predict.yaml
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
experiment:
inference_ckpt: "/home/speediedan/experiments/deep_classiflie/checkpoints/20200816114940/checkpoint-0.0000-swa_best_2_ckpts--1-0.pt"
inference_ckpt: "/home/speediedan/experiments/deep_classiflie/checkpoints/20200911144157/checkpoint-0.0000-swa_best_2_ckpts--1-0.pt"
#inference_ckpt: "/home/speediedan/experiments/deep_classiflie_feat/checkpoints/20200901084410/checkpoint-0.0000-swa_best_2_ckpts--1-0.pt"
predict_only: True
debug:
debug_enabled: False
Expand Down
13 changes: 13 additions & 0 deletions configs/docker_cust_predict.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
experiment:
inference_ckpt: "/experiments/deep_classiflie/checkpoints/20201010172113/checkpoint-0.5595-29-148590.pt"
predict_only: True
debug:
debug_enabled: False
dirs:
experiments_base_dir: "/experiments" # defaults to {base_dir}/experiments
tmp_data_dir: "/home/deep_classiflie/datasets/dc_dataset_collection" # defaults to {raw_data_dir}/temp/{constants.APP_NAME}
data_source:
skip_db_refresh: True
inference:
interpret_preds: True
purge_intermediate_rpt_files: True # default is True, left here for debugging convenience
12 changes: 12 additions & 0 deletions configs/docker_test_only.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
experiment:
db_functionality_enabled: False # must set to True to generate reports, run dctweetbot, among other functions
inference_ckpt: "/experiments/deep_classiflie/checkpoints/20201010172113/checkpoint-0.5595-29-148590.pt" # note build_swa_from_ckpts will be ignored if inference_ckpt is present
debug:
debug_enabled: False
dirs:
experiments_base_dir: "/experiments" # defaults to {base_dir}/experiments
tmp_data_dir: "/home/deep_classiflie/datasets/dc_dataset_collection" # defaults to {raw_data_dir}/temp/{constants.APP_NAME}
data_source:
# db_conf must be explictly specified only in dev mode or if db_conf is in a non-default location
# db_conf: "/home/speediedan/repos/edification/deep_classiflie_db/deep_classiflie_db.yaml"
skip_db_refresh: False
20 changes: 20 additions & 0 deletions configs/docker_train_albertbase.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
experiment:
db_functionality_enabled: False
debug:
debug_enabled: True
use_debug_dataset: False
dirs:
experiments_base_dir: "/experiments" # defaults to {base_dir}/experiments
tmp_data_dir: "/home/deep_classiflie/datasets/dc_dataset_collection" # defaults to {raw_data_dir}/temp/{constants.APP_NAME}
data_source:
skip_db_refresh: True
trainer:
# restart_training_ckpt: "/experiments/deep_classiflie/checkpoints/20200826121309/checkpoint-0.6039-11-1236.pt"
dump_model_thaw_sched_only: False
label_smoothing_enabled: True
# histogram_vars: ['classifier.weight', 'ctxt_embed.weight', 'albert.pooler.weight']
fine_tune_scheduler:
thaw_schedule: "DeepClassiflie_thaw_schedule.yaml"
earlystopping:
patience: 4
monitor_metric: "val_loss"
1 change: 1 addition & 0 deletions dataprep/dataprep.py
Original file line number Diff line number Diff line change
Expand Up @@ -113,6 +113,7 @@ def build_ds_from_db_flow(self) -> None:
f"{constants.DB_WARNING_START} Since the specified cached dataset ({self.file_suffix[1:]}) "
f"has not been found or cannot be rebuilt, instance aborting. "
f"Please see repo readme for further details.")
logger.error(f"Current config: {self.config}")
sys.exit(0)
self.db_to_pkl()
ds_dict = {"train_recs": self.dataset_conf['num_train_recs'], "val_recs": self.dataset_conf['num_val_recs'],
Expand Down
6 changes: 3 additions & 3 deletions deep_classiflie.py
100644 → 100755
Original file line number Diff line number Diff line change
Expand Up @@ -15,11 +15,8 @@
import utils.constants as constants
from dataprep.dataprep import DatasetCollection
from utils.core_utils import create_lock_file
from utils.dc_tweetbot import DCTweetBot
from utils.dc_infsvc import DCInfSvc
from utils.envconfig import EnvConfig
from analysis.inference import Inference
from analysis.model_analysis_rpt import ModelAnalysisRpt
from training.trainer import Trainer
import faulthandler

Expand All @@ -43,6 +40,7 @@ def main() -> Optional[NoReturn]:
if not config.experiment.db_functionality_enabled:
logger.error(f"{constants.DB_WARNING_START} Model analysis reports {constants.DB_WARNING_END}")
sys.exit(0)
from analysis.model_analysis_rpt import ModelAnalysisRpt
ModelAnalysisRpt(config)
else:
core_flow(config)
Expand All @@ -51,9 +49,11 @@ def main() -> Optional[NoReturn]:
def init_dc_service(config:MutableMapping, service_type: str) -> NoReturn:
if service_type == 'infsvc':
svc_name = 'inference service'
from utils.dc_infsvc import DCInfSvc
svc_module = DCInfSvc
else:
svc_name = 'tweetbot'
from utils.dc_tweetbot import DCTweetBot
svc_module = DCTweetBot
lock_file = None
try:
Expand Down
62 changes: 61 additions & 1 deletion docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@
- [Configuration](#configuration)
- [Further Research](#further-research)
- [Model Replication](#model-replication)
- [Model Replication and Exploration w/ Docker](#model-replication-and-exploration-with-docker)
- [Caveats](#caveats)
- [Citing Deep Classiflie](#citing-deep-classiflie)
- [References and Notes](#references-and-notes)
Expand Down Expand Up @@ -257,7 +258,7 @@ N.B. before you begin, the core external dependency is admin access to a mariadb
```
2. install [conda](https://docs.conda.io/projects/conda/en/latest/user-guide/install/download.html#anaconda-or-miniconda) if necessary. Then create and activate deep_classiflie virtual env:
```shell
conda env create -f ./deep_classiflie/utils/deep_classiflie.yml
conda env create -f ./deep_classiflie/assets/deep_classiflie.yml
conda activate deep_classiflie
```
3. clone captum and HuggingFace's transformers repos. Install transformers binaries.:
Expand Down Expand Up @@ -420,6 +421,65 @@ N.B. before you begin, the core external dependency is admin access to a mariadb
</details>

---
### Model Replication and Exploration with Docker
<details><summary markdown="span"><strong>Instructions</strong>
</summary>


<br/>
As of writing (2020.10.11), Docker Compose does not fully support GPU provisioning so using the docker cli w/ --gpus flag here.

1. Pull image from docker hub
```shell
sudo docker pull speediedan/deep_classiflie:v0.1.3
```
2. Recursively train model using latest dataset.
- create a local directory to bind mount and use for exploring experiment output and start training container
```shell
mkdir /tmp/docker_experiment_output
sudo docker container run --rm -d --gpus all --mount type=bind,source=/tmp/docker_experiment_output,target=/experiments --name deep_classiflie_train deep_classiflie:v0.1.3 \
conda run -n deep_classiflie python deep_classiflie.py --config /home/deep_classiflie/repos/deep_classiflie/configs/docker_train_albertbase.yaml
```
- run tensorboard container to follow training progress (~6 hrs on a single GPU)
```
sudo docker container run --rm -d --gpus all --mount type=bind,source=/tmp/docker_experiment_output,target=/experiments -p 6006:6006 --workdir /experiments/deep_classiflie/logs --name deep_classiflie_tb deep_classiflie:v0.1.3 conda run -n deep_classiflie tensorboard --host 0.0.0.0 --logdir=/experiments/deep_classiflie/logs --reload_multifile=true
```
3. Use a trained checkpoint to evaluate test performance
- start the container with a local bind mount
```shell
sudo docker container run --rm -it --gpus all --mount type=bind,source=/tmp/docker_experiment_output,target=/experiments --name deep_classiflie_explore deep_classiflie:v0.1.3
```
- update the docker_test_only.yaml file, passing the desired inference path (e.g. /experiments/deep_classiflie/checkpoints/20201010172113/checkpoint-0.5595-29-148590.pt)
```shell
vi configs/docker_test_only.yaml
...
inference_ckpt: "/experiments/deep_classiflie/checkpoints/20201010172113/checkpoint-0.5595-29-148590.pt"
...
- evaluate on test set
```shell
conda run -n deep_classiflie python deep_classiflie.py --config /home/deep_classiflie/repos/deep_classiflie/configs/docker_test_only.yaml
```
4. Run custom predictions
- update model checkpoint used for predictions with the one you trained
```shell
vi /home/deep_classiflie/repos/deep_classiflie/configs/docker_cust_predict.yaml
...
inference_ckpt: "/experiments/deep_classiflie/checkpoints/20201010172113/checkpoint-0.5595-29-148590.pt"
...
- add tweets or statements to do inference/interpretation on as desired by modifying /home/deep_classiflie/datasets/explore_pred_interpretations.json
- generate predictions
```shell
conda run -n deep_classiflie python deep_classiflie.py --config /home/deep_classiflie/repos/deep_classiflie/configs/docker_cust_predict.yaml --pred_inputs /home/deep_classiflie/datasets/explore_pred_interpretations.json
```
- review prediction interpretation card in local host browser,
```shell
chrome /tmp/docker_experiment_output/deep_classiflie/logs/20201011203013/inference_output/example_stmt_1_0.png
```

</details>

---

### Caveats
<ul class="fnum">
<li> <span class="fnum" id="ca">[a]</span> The distance threshold for filtering out "false truths" using base model embeddings matches falsehoods to their corresponding truths with high but imperfect accuracy. This fuzzy matching process will result in a modest upward performance bias in the test results. Model performance on datasets built using the noisy matching process (vs exclusively hash-based) improved by only ~2% globally with gains slightly disproportionately going to more confident buckets. This places a relatively low ceiling on the magnitude of the performance bias introduced through this filtering. The precise magnitude of this bias will be quantified in the future via one or both of the following methods <a href="#aa"></a>:</li>
Expand Down
37 changes: 0 additions & 37 deletions requirements.txt

This file was deleted.

2 changes: 0 additions & 2 deletions utils/Dockerfile

This file was deleted.

2 changes: 1 addition & 1 deletion utils/envconfig.py
Original file line number Diff line number Diff line change
Expand Up @@ -88,7 +88,7 @@ def cfg_dirs(self) -> None:
self._config.experiment.dirs.model_cache_dir = self._config.experiment.dirs.model_cache_dir or \
cust_model_cache_dir
self._config.experiment.dirs.dcbot_log_dir = self._config.experiment.dirs.dcbot_log_dir or \
f"{self._config.experiment.dirs.experiments_base_dir}/dcbot"
f"{self._config.experiment.dirs.experiments_base_dir}/dcbot"
self._config.experiment.dirs.rpt_arc_dir = self._config.experiment.dirs.rpt_arc_dir or \
f"{self._config.experiment.dirs.base_dir}/repos/" \
f"{constants.APP_NAME}_history"
Expand Down

0 comments on commit 89e4fb7

Please sign in to comment.