added current predictions explorer among other improvements for 0.1.1…

… release
speediedan · Sep 12, 2020 · bda664a · bda664a
1 parent fa6dada
commit bda664a
Show file tree

Hide file tree

Showing 36 changed files with 252,685 additions and 166 deletions.
diff --git a/README.md b/README.md
@@ -80,10 +80,11 @@ The best way to start understanding/exploring the current model is to use the ex
 <details><summary markdown="span"><strong>[Current Predictions Explorer](current_explorer.html)</strong>
 </summary>
 
-Explore the current (unlabeled) predictions generated by the latest model incarnation. All statements yet to be labeled by current fact-checking sources (currently, only [Washington Post Factchecker](https://www.washingtonpost.com/graphics/politics/trump-claims-database)) are available. 
-Live predictions continuously added via [ipfs](https://ipfs.io). Twitter statements will be delayed by ~15 minutes to allow thread-based scoring. [Factba.se](https://factba.se) is polled for new statements every 10 minutes. 
-This explorer provides fact-checkers a means (one of many possible) of using current model predictions and may also help those building fact-checking systems evaluate the potential utility of integrating similar models into their systems.
+Explore current predictions of the latest model. All statements that have yet to be labeled by the currently used fact-checking sources (only [Washington Post Factchecker](https://www.washingtonpost.com/graphics/politics/trump-claims-database) at present) are available. 
+
+Live predictions are continuously added via [ipfs](https://ipfs.io). Twitter statements will be delayed by ~15 minutes to allow thread-based scoring. [Factba.se](https://factba.se) is polled for new statements every 15 minutes. 
 
+This explorer provides fact-checkers a means (one of many possible) of using current model predictions and may also help those building fact-checking systems evaluate the potential utility of integrating similar models into their systems.
 
 <img src="docs/assets/current_explorer.gif" alt="current predictions explorer" />
 </details>
@@ -116,7 +117,7 @@ The entire initial Deep Classiflie system (raw dataset, model, analytics modules
 </summary>
 
 - Fine-tune a base model (currently HuggingFace's [ALBERT implementation](https://huggingface.co/transformers/model_doc/albert.html) with some minor customizations) in tandem with a simple embedding reflecting the semantic shift associated with the medium via which the statement was conveyed (i.e., for the POC, just learn the tweet vs non-tweet transformation) (using [Pytorch](https://pytorch.org/))
-- Explore the latest model's training session on [tensorboard.dev](https://tensorboard.dev/experiment/rGNQpYnYSOaHb2A84xRAzw). 
+- Explore the latest model's training session on [tensorboard.dev](https://tensorboard.dev/experiment/Ys0KLo5nRnq0soINjyEv4A). 
 - N.B. neuro-symbolic methods<sup id="a6">[6](#f6)</sup> that leverage knowledge bases and integrate symbolic reasoning with connectionist methods are not used in this model. Use of these approaches may be explored in [future research](#further-research) using this framework. 
 </details>
 <details><summary markdown="span"><strong>Analysis & Reporting</strong>
@@ -140,7 +141,7 @@ The entire initial Deep Classiflie system (raw dataset, model, analytics modules
 <details><summary markdown="span"><strong>Global</strong>
 </summary>
 
-Global metrics<sup id="a9">[9](#f9)</sup> summarized in the table below relate to the current model's performance on a test set comprised of ~12K statements made between 2020-04-03 and 2020-07-08:<br/>
+Global metrics<sup id="a9">[9](#f9)</sup> summarized in the table below relate to the current model's performance on a test set comprised of ~13K statements made between 2020-04-03 and 2020-07-08:<br/>
 <img src="docs/assets/global_metrics_summ.png" alt="Global Stat Summary" />
 
 </details>
@@ -180,7 +181,7 @@ To minimize false positives and maximize the model's utility, the following appr
 - Generate and configure thawing schedules for models.
 - EarlyStopping easily configurable with multiple non-standard monitor metrics (e.g. mcc)
 - Both automatic and manually-specified [stochastic weight averaging](https://pytorch.org/blog/stochastic-weight-averaging-in-pytorch/) of model checkpoints<sup id="af">[f](#cf)</sup>
-- mixed-precision training via [apex](https://github.com/NVIDIA/apex)<sup id="ag">[g](#cg)</sup>
+- Mixed-precision training<sup id="ag">[g](#cg)</sup>
 </details>
 <details><summary markdown="span"><strong>Analysis & reporting</strong>
 </summary>
@@ -274,15 +275,9 @@ N.B. before you begin, the core external dependency is admin access to a mariadb
     cd transformers
     pip install .
     ```
-4. (temporarily required) Testing of this alpha release occurred before native AMP was integrated into Pytorch with release 1.6. As such, native apex installation is temporarily (as of 2020.08.18) required to replicate the model. Switching from the native AMP api to the pytorch integrated one is planned as part of issue #999 which should obviate the need to install native apex.
-    ```shell
-    git clone https://github.com/NVIDIA/apex
-    cd apex
-    pip uninstall apex
-    pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./
-    ```
-5. [Install mariadb](https://mariadb.com/kb/en/getting-installing-and-upgrading-mariadb/) or mysql DB if necessary.
-6. These are the relevant DB configuration settings used for the current release of Deep Classiflie's backend. Divergence from this configuration has not been tested and may result in unexpected behavior.
+4. [Install mariadb](https://mariadb.com/kb/en/getting-installing-and-upgrading-mariadb/) or mysql DB if necessary.
+
+5. These are the relevant DB configuration settings used for the current release of Deep Classiflie's backend. Divergence from this configuration has not been tested and may result in unexpected behavior.
 
     ```mysql
     collation-server = utf8mb4_unicode_ci
@@ -291,7 +286,7 @@ N.B. before you begin, the core external dependency is admin access to a mariadb
     sql_mode = 'STRICT_TRANS_TABLES,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION,ANSI_QUOTES'
     transaction-isolation = READ-COMMITTED
     ```
-7. copy/update relevant Deep Classiflie config file to $HOME dir
+6. copy/update relevant Deep Classiflie config file to $HOME dir
     ```shell
     cp ./deep_classiflie_db/db_setup/.dc_config.example ~
     mv .dc_config.example .dc_config
@@ -317,33 +312,31 @@ N.B. before you begin, the core external dependency is admin access to a mariadb
     export DCDB_NAME="deep_classiflie"
     ```
 
-8. execute Deep Classiflie DB backend initialization script:
-
-    <img src="docs/assets/dc_schema_build.gif" alt="Deep Classiflie logo" />
-
-    Ensure you have access to a DB user with administrator privs. "admin" in the case above.
-
+7. execute Deep Classiflie DB backend initialization script:
     ```shell
     cd deep_classiflie_db/db_setup
     ./deep_classiflie_db_setup.sh deep_classiflie
     ```
+    Ensure you have access to a DB user with administrator privs. "admin" in the case above.
+    <img src="docs/assets/dc_schema_build.gif" alt="Deep Classiflie logo" />
 
-9. login to the backend db and seed historical tweets (necessary as only most recent 3200 can currently be retrieved directly from twitter)
+8. login to the backend db and seed historical tweets (necessary as only most recent 3200 can currently be retrieved directly from twitter)
     ```mysql
     mysql -u dcbot -p
     use deep_classiflie
-    source dcbot_tweets_init_20200814.sql
+    source dcbot_tweets_init_20200910.sql
+    exit
     ```
 
-10. copy over relevant base model weights to specified model_cache_dir:
+9. copy over relevant base model weights to specified model_cache_dir:
     ```shell
     # model_cache_dir default found in configs/config_defaults.yaml
     # it defaults to $HOME/datasets/model_cache/deep_classiflie/
     cd {PATH_TO_DEEP_CLASSIFLIE_BASE}/deep_classiflie/assets/
     cp albert-base-v2-pytorch_model.bin albert-base-v2-spiece.model {MODEL_CACHE_DIR}/
     ```
 
-11. Run deep_classiflie.py with the provided config necessary to download the raw data from the relevant data sources (factba.se, twitter, washington post), execute the data processing pipeline and generate the dataset collection.
+10. Run deep_classiflie.py with the provided config necessary to download the raw data from the relevant data sources (factba.se, twitter, washington post), execute the data processing pipeline and generate the dataset collection.
     ```shell
     cd deep_classiflie
     ./deep_classiflie.py --config "{PATH_TO_DEEP_CLASSIFLIE_BASE}/configs/dataprep_only.yaml"
@@ -369,33 +362,33 @@ N.B. before you begin, the core external dependency is admin access to a mariadb
     2020-08-14 16:58:14,331:deep_classiflie:DEBUG: Metadata update complete, 1 record(s) affected.
     ...
     ```
-12. Recursively train the deep classiflie POC model:
+11. Recursively train the deep classiflie POC model:
     ```shell
     cd deep_classiflie
     ./deep_classiflie.py --config "{PATH_TO_DEEP_CLASSIFLIE_BASE}/configs/train_albertbase.yaml"
     ```
 
-13. Generate an swa checkpoint (current release was built using swa torchcontrib module but will switch to the now-integrated pytorch swa api in the next release):
+12. Generate an swa checkpoint (current release was built using swa torchcontrib module but will switch to the now-integrated pytorch swa api in the next release):
     ```shell
     cd deep_classiflie
     ./deep_classiflie.py --config "{PATH_TO_DEEP_CLASSIFLIE_BASE}/configs/gen_swa_ckpt.yaml"
     ```
 
-14. Generate model analysis report(s) using the generated swa checkpoint:
+13. Generate model analysis report(s) using the generated swa checkpoint:
     ```shell
     # NOTE, swa checkpoint generated in previous step must be added to gen_report.yaml
     cd deep_classiflie
     ./deep_classiflie.py --config "{PATH_TO_DEEP_CLASSIFLIE_BASE}/configs/gen_report.yaml"
     ```
 
-15. Generate model analysis dashboards:
+14. Generate model analysis dashboards:
     ```shell
     # NOTE, swa checkpoint generated in previous step must be added to gen_dashboards.yaml
     cd deep_classiflie
     ./deep_classiflie.py --config "{PATH_TO_DEEP_CLASSIFLIE_BASE}/configs/gen_dashboards.yaml"
     ```
 
-16. configure jekyll static site generator to use bokeh dashboards locally:
+15. configure jekyll static site generator to use bokeh dashboards locally:
 
     ```shell
 
@@ -447,8 +440,8 @@ N.B. before you begin, the core external dependency is admin access to a mariadb
     <li><span class="fnum" id="cc">[c]</span> Deep Classiflie depends upon deep_classiflie_db (initially released as a separate repository) for much of its analytical and dataset generation functionality. Depending on how Deep Classiflie evolves (e.g. as it supports distributed data stores etc.), it may make more sense to integrate deep_classiflie_db back into deep_classiflie. <a href="#ac">↩</a></li>
     <li><span class="fnum" id="cd">[d]</span> It's notable that the model suffers a much higher FP ratio on tweets relative to non-tweets. Exploring tweet FPs, there are a number of plausible explanations for this discrepancy which could be explored in future research. <a href="#ad">↩</a></li>
     <li><span class="fnum" id="ce">[e]</span> Still in early development, there are significant outstanding issues (e.g. no tests yet!) and code quality shortcomings galore, but any constructive thoughts or contributions are welcome. I'm interested in using ML to curtail disinformation, not promulgate it, so I want to be clear --  this is essentially a fancy sentence similarity system with a lot of work put into building the dataset generation and model analysis data pipelines (I have a data engineering background, not a software engineering one).<a href="#ae">↩</a></li>
-    <li><span class="fnum" id="cf">[f]</span> Current model release built/tested before swa graduated from torchcontrib to core pytorch. Next release of Deep Classiflie will use the integrated swa api.<a href="#af">↩</a></li>
-    <li><span class="fnum" id="cg">[g]</span> Current model release built/tested before AMP was integrated into core pytorch. Next release of Deep Classiflie will use the integrated AMP api.<a href="#ag">↩</a></li>
+    <li><span class="fnum" id="cf">[f]</span> Previous versions used the swa module from torchcontrib before it graduated to core pytorch.<a href="#af">↩</a></li>
+    <li><span class="fnum" id="cg">[g]</span> Previous versions used NVIDIA's native <a href="https://github.com/NVIDIA/apex">apex</a> before AMP was integrated into pytorch<a href="#ag">↩</a></li>
     <li><span class="fnum" id="ch">[h]</span> N.B. This daemon may violate Twitter's <a href="https://help.twitter.com/en/rules-and-policies/twitter-automation">policy</a> w.r.t. tweeting sensitive content if the subject's statements contain such content (no content-based filtering is included in the daemon). @DeepClassflie initially tested the Deep Classiflie twitter daemon but will post only framework-related announcements moving forward.<a href="#ah">↩</a></li>
 </ul>
 

diff --git a/analysis/captum_cust_viz.py b/analysis/captum_cust_viz.py
@@ -196,7 +196,7 @@ def fmt_notes_box(ext_rec: Tuple) -> str:
     <li>prediction of whether Washington Post's Fact Checker will add this claim to its "Trump False Claims" DB</li>
     <li>if claim was included in WP's Fact Checker false claims DB at time of original model training</li>
     <li>accuracy estimated by sorting & bucketing the test set sigmoid outputs, averaging performance in each bucket
-    <li>global metrics relate to the current model's performance on a test set comprised of ~12K statements made between 
+    <li>global metrics relate to the current model's performance on a test set comprised of ~13K statements made between 
     {ext_rec[5][13].strftime('%Y-%m-%d')} and {ext_rec[5][14].strftime('%Y-%m-%d')}. Training, validation and test sets 
     are chronologically disjoint. </li>
     <li>subject to interpretability filter, some subword tokens have been omitted to facilitate interpretability</li>
@@ -259,10 +259,10 @@ def gen_pred_exp_attr_tup(datarecord: VisualizationDataRecord, ext_rec: Tuple, t
 
 
 def pred_exp_attr(datarecords: List[VisualizationDataRecord], ext_recs: List[Tuple] = None, token_mask: List = None,
-                  invert_colors: bool = False) -> Tuple[List, Tuple]:
+                  invert_colors: bool = False, **_) -> Tuple[List, Tuple]:
     global_metrics_summ = ext_recs[0][8]
     pred_exp_tups = []
     for i, (datarecord, ext_rec) in enumerate(zip(datarecords, ext_recs)):
         pred_exp_tup = gen_pred_exp_attr_tup(datarecord, ext_rec, token_mask, invert_colors)
         pred_exp_tups.append(pred_exp_tup)
-    return pred_exp_tups, global_metrics_summ
+    return pred_exp_tups, global_metrics_summ
diff --git a/analysis/gen_pred_explorer.py b/analysis/gen_pred_explorer.py
@@ -38,7 +38,6 @@ def init_radio_groups() -> Tuple[RadioButtonGroup, ...]:
 
 
 def init_explorer_divs(pred_stmt_dict: Dict) -> Tuple[Div, ...]:
-    # stmt_attr, word_import_html, max_word_html
     default_idx = min([i for i, (b, c) in enumerate(zip(pred_stmt_dict['bucket_type'], pred_stmt_dict['tp']))
                        if b == 'max_acc_nontweets' and c == 1])
     word_import_div = Div(text=pred_stmt_dict['pred_exp_attr_tups'][default_idx][1], height_policy='max',

diff --git a/analysis/model_analysis_rpt.py b/analysis/model_analysis_rpt.py
@@ -74,15 +74,15 @@ def gen_pred_exp_ds(self) -> Tuple[Dict, Tuple]:
         pred_exp_tups = fetchallwrapper(self.cnxp.get_connection(), self.config.inference.sql.pred_exp_sql)
         pred_exp_set = []
         pred_exp_ds = OrderedDict({'bucket_type': [], 'bucket_acc': [], 'conf_percentile': [], 'pos_pred_acc': [],
-                                  'neg_pred_acc': [], 'pos_pred_ratio': [], 'neg_pred_ratio': [], 'statement_id': [],
-                                  'statement_text': [], 'tp': [], 'tn': [], 'fp': [], 'fn': []})
+                                   'neg_pred_acc': [], 'pos_pred_ratio': [], 'neg_pred_ratio': [], 'statement_id': [],
+                                   'statement_text': [], 'tp': [], 'tn': [], 'fp': [], 'fn': []})
         for (bucket_type, bucket_acc, conf_percentile, pos_pred_acc, neg_pred_acc, pos_pred_ratio, neg_pred_ratio,
              statement_id, statement_text, ctxt_type, tp, tn, fp, fn) in pred_exp_tups:
             label = 'False' if tp == 1 or fn == 1 else 'True'
             pred_exp_set.append((statement_text, ctxt_type, label))
             for k, v in zip(list(pred_exp_ds.keys()), [bucket_type, bucket_acc, conf_percentile, pos_pred_acc,
-                                                      neg_pred_acc, pos_pred_ratio, neg_pred_ratio, statement_id,
-                                                      statement_text, tp, tn, fp, fn]):
+                                                       neg_pred_acc, pos_pred_ratio, neg_pred_ratio, statement_id,
+                                                       statement_text, tp, tn, fp, fn]):
                 pred_exp_ds[k].append(v)
         pred_exp_attr_tups, global_metric_summ = Inference(self.config, pred_exp_set=pred_exp_set).init_predict()
         pred_exp_ds['pred_exp_attr_tups'] = pred_exp_attr_tups

diff --git a/assets/dc_ds.zip b/assets/dc_ds.zip
diff --git a/assets/dc_model_alpha.zip b/assets/dc_model_alpha.zip
diff --git a/configs/config_defaults_sql.yaml b/configs/config_defaults_sql.yaml
@@ -100,7 +100,7 @@ inference:
     global_model_perf_cache_sql: "select * from global_model_accuracy_lookup_cache"
     pred_exp_sql: "select * from pred_explr_stmts"
     save_model_sql: "insert into model_metadata select * from latest_global_model_perf_summary"
-    save_perf_sql: "insert into local_model_perf_summary_hist select * from latest_local_model_perf_summary"
+    save_perf_sql: "insert ignore into local_model_perf_summary_hist select * from latest_local_model_perf_summary"
     ds_md_sql: >-
       select dsid, train_start_date, train_end_date from ds_metadata where ds_type='converged_filtered' order by dsid desc limit 1
     save_model_rpt_sql: >-

diff --git a/configs/dataprep_only.yaml b/configs/dataprep_only.yaml
@@ -5,10 +5,10 @@ experiment:
   dataprep_only: True
   debug:
     debug_enabled: True
-    use_debug_dataset: True
+    use_debug_dataset: False
 data_source:
   # db_conf must be explictly specified only in dev mode or if db_conf is in a non-default location
-  db_conf: "/home/speediedan/repos/edification/deep_classiflie_db_feat/deep_classiflie_db.yaml"
+  # db_conf: "/home/speediedan/repos/edification/deep_classiflie_db_feat/deep_classiflie_db.yaml"
   model_filter_topk: 20
   filter_w_embed_cache: False
   # safest way to build a new dataset is to verify backup of the previous one and remove the relevant cache softlink

diff --git a/configs/gen_dashboards.yaml b/configs/gen_dashboards.yaml
@@ -1,7 +1,7 @@
 experiment:
   db_functionality_enabled: True # must set to True to generate reports, run dctweetbot, among other functions
   # provide the generated swa checkpoint below
-  inference_ckpt: "/home/speediedan/experiments/deep_classiflie/checkpoints/20200816114940/checkpoint-0.0000-swa_best_2_ckpts--1-0.pt" # note build_swa_from_ckpts will be ignored if inference_ckpt is present
+  inference_ckpt: "/home/speediedan/experiments/deep_classiflie/checkpoints/20200911144157/checkpoint-0.0000-swa_best_2_ckpts--1-0.pt" # note build_swa_from_ckpts will be ignored if inference_ckpt is present
   debug:
     debug_enabled: False
 data_source:

diff --git a/configs/gen_report.yaml b/configs/gen_report.yaml
@@ -1,13 +1,13 @@
 experiment:
   db_functionality_enabled: True # must set to True to generate reports, run dctweetbot, among other functions
   # provide the generated swa checkpoint below
-  inference_ckpt: "/home/speediedan/experiments/deep_classiflie_feat/checkpoints/20200901084410/checkpoint-0.0000-swa_best_2_ckpts--1-0.pt" # note build_swa_from_ckpts will be ignored if inference_ckpt is present
+  inference_ckpt: "/home/speediedan/experiments/deep_classiflie/checkpoints/20200911144157/checkpoint-0.0000-swa_best_2_ckpts--1-0.pt" # note build_swa_from_ckpts will be ignored if inference_ckpt is present
   debug:
     debug_enabled: False
 data_source:
   skip_db_refresh: True
   # db_conf must be explictly specified only in dev mode or if db_conf is in a non-default location
-  db_conf: "/home/speediedan/repos/edification/deep_classiflie_db/deep_classiflie_db.yaml"
+  # db_conf: "/home/speediedan/repos/edification/deep_classiflie_db/deep_classiflie_db.yaml"
 inference:
   report_mode: True  # set to true to enable report generation
   rebuild_perf_cache: True # set True to (re)build perf cache (report_mode must also be True)