Skip to content

Commit

Permalink
added current predictions explorer among other improvements for 0.1.1…
Browse files Browse the repository at this point in the history
… release
  • Loading branch information
Daniel Dale committed Sep 12, 2020
1 parent fa6dada commit bda664a
Show file tree
Hide file tree
Showing 36 changed files with 252,685 additions and 166 deletions.
59 changes: 26 additions & 33 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -80,10 +80,11 @@ The best way to start understanding/exploring the current model is to use the ex
<details><summary markdown="span"><strong>[Current Predictions Explorer](current_explorer.html)</strong>
</summary>

Explore the current (unlabeled) predictions generated by the latest model incarnation. All statements yet to be labeled by current fact-checking sources (currently, only [Washington Post Factchecker](https://www.washingtonpost.com/graphics/politics/trump-claims-database)) are available.
Live predictions continuously added via [ipfs](https://ipfs.io). Twitter statements will be delayed by ~15 minutes to allow thread-based scoring. [Factba.se](https://factba.se) is polled for new statements every 10 minutes.
This explorer provides fact-checkers a means (one of many possible) of using current model predictions and may also help those building fact-checking systems evaluate the potential utility of integrating similar models into their systems.
Explore current predictions of the latest model. All statements that have yet to be labeled by the currently used fact-checking sources (only [Washington Post Factchecker](https://www.washingtonpost.com/graphics/politics/trump-claims-database) at present) are available.

Live predictions are continuously added via [ipfs](https://ipfs.io). Twitter statements will be delayed by ~15 minutes to allow thread-based scoring. [Factba.se](https://factba.se) is polled for new statements every 15 minutes.

This explorer provides fact-checkers a means (one of many possible) of using current model predictions and may also help those building fact-checking systems evaluate the potential utility of integrating similar models into their systems.

<img src="docs/assets/current_explorer.gif" alt="current predictions explorer" />
</details>
Expand Down Expand Up @@ -116,7 +117,7 @@ The entire initial Deep Classiflie system (raw dataset, model, analytics modules
</summary>

- Fine-tune a base model (currently HuggingFace's [ALBERT implementation](https://huggingface.co/transformers/model_doc/albert.html) with some minor customizations) in tandem with a simple embedding reflecting the semantic shift associated with the medium via which the statement was conveyed (i.e., for the POC, just learn the tweet vs non-tweet transformation) (using [Pytorch](https://pytorch.org/))
- Explore the latest model's training session on [tensorboard.dev](https://tensorboard.dev/experiment/rGNQpYnYSOaHb2A84xRAzw).
- Explore the latest model's training session on [tensorboard.dev](https://tensorboard.dev/experiment/Ys0KLo5nRnq0soINjyEv4A).
- N.B. neuro-symbolic methods<sup id="a6">[6](#f6)</sup> that leverage knowledge bases and integrate symbolic reasoning with connectionist methods are not used in this model. Use of these approaches may be explored in [future research](#further-research) using this framework.
</details>
<details><summary markdown="span"><strong>Analysis & Reporting</strong>
Expand All @@ -140,7 +141,7 @@ The entire initial Deep Classiflie system (raw dataset, model, analytics modules
<details><summary markdown="span"><strong>Global</strong>
</summary>

Global metrics<sup id="a9">[9](#f9)</sup> summarized in the table below relate to the current model's performance on a test set comprised of ~12K statements made between 2020-04-03 and 2020-07-08:<br/>
Global metrics<sup id="a9">[9](#f9)</sup> summarized in the table below relate to the current model's performance on a test set comprised of ~13K statements made between 2020-04-03 and 2020-07-08:<br/>
<img src="docs/assets/global_metrics_summ.png" alt="Global Stat Summary" />

</details>
Expand Down Expand Up @@ -180,7 +181,7 @@ To minimize false positives and maximize the model's utility, the following appr
- Generate and configure thawing schedules for models.
- EarlyStopping easily configurable with multiple non-standard monitor metrics (e.g. mcc)
- Both automatic and manually-specified [stochastic weight averaging](https://pytorch.org/blog/stochastic-weight-averaging-in-pytorch/) of model checkpoints<sup id="af">[f](#cf)</sup>
- mixed-precision training via [apex](https://github.com/NVIDIA/apex)<sup id="ag">[g](#cg)</sup>
- Mixed-precision training<sup id="ag">[g](#cg)</sup>
</details>
<details><summary markdown="span"><strong>Analysis & reporting</strong>
</summary>
Expand Down Expand Up @@ -274,15 +275,9 @@ N.B. before you begin, the core external dependency is admin access to a mariadb
cd transformers
pip install .
```
4. (temporarily required) Testing of this alpha release occurred before native AMP was integrated into Pytorch with release 1.6. As such, native apex installation is temporarily (as of 2020.08.18) required to replicate the model. Switching from the native AMP api to the pytorch integrated one is planned as part of issue #999 which should obviate the need to install native apex.
```shell
git clone https://github.com/NVIDIA/apex
cd apex
pip uninstall apex
pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./
```
5. [Install mariadb](https://mariadb.com/kb/en/getting-installing-and-upgrading-mariadb/) or mysql DB if necessary.
6. These are the relevant DB configuration settings used for the current release of Deep Classiflie's backend. Divergence from this configuration has not been tested and may result in unexpected behavior.
4. [Install mariadb](https://mariadb.com/kb/en/getting-installing-and-upgrading-mariadb/) or mysql DB if necessary.
5. These are the relevant DB configuration settings used for the current release of Deep Classiflie's backend. Divergence from this configuration has not been tested and may result in unexpected behavior.

```mysql
collation-server = utf8mb4_unicode_ci
Expand All @@ -291,7 +286,7 @@ N.B. before you begin, the core external dependency is admin access to a mariadb
sql_mode = 'STRICT_TRANS_TABLES,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION,ANSI_QUOTES'
transaction-isolation = READ-COMMITTED
```
7. copy/update relevant Deep Classiflie config file to $HOME dir
6. copy/update relevant Deep Classiflie config file to $HOME dir
```shell
cp ./deep_classiflie_db/db_setup/.dc_config.example ~
mv .dc_config.example .dc_config
Expand All @@ -317,33 +312,31 @@ N.B. before you begin, the core external dependency is admin access to a mariadb
export DCDB_NAME="deep_classiflie"
```

8. execute Deep Classiflie DB backend initialization script:

<img src="docs/assets/dc_schema_build.gif" alt="Deep Classiflie logo" />

Ensure you have access to a DB user with administrator privs. "admin" in the case above.

7. execute Deep Classiflie DB backend initialization script:
```shell
cd deep_classiflie_db/db_setup
./deep_classiflie_db_setup.sh deep_classiflie
```
Ensure you have access to a DB user with administrator privs. "admin" in the case above.
<img src="docs/assets/dc_schema_build.gif" alt="Deep Classiflie logo" />

9. login to the backend db and seed historical tweets (necessary as only most recent 3200 can currently be retrieved directly from twitter)
8. login to the backend db and seed historical tweets (necessary as only most recent 3200 can currently be retrieved directly from twitter)
```mysql
mysql -u dcbot -p
use deep_classiflie
source dcbot_tweets_init_20200814.sql
source dcbot_tweets_init_20200910.sql
exit
```

10. copy over relevant base model weights to specified model_cache_dir:
9. copy over relevant base model weights to specified model_cache_dir:
```shell
# model_cache_dir default found in configs/config_defaults.yaml
# it defaults to $HOME/datasets/model_cache/deep_classiflie/
cd {PATH_TO_DEEP_CLASSIFLIE_BASE}/deep_classiflie/assets/
cp albert-base-v2-pytorch_model.bin albert-base-v2-spiece.model {MODEL_CACHE_DIR}/
```

11. Run deep_classiflie.py with the provided config necessary to download the raw data from the relevant data sources (factba.se, twitter, washington post), execute the data processing pipeline and generate the dataset collection.
10. Run deep_classiflie.py with the provided config necessary to download the raw data from the relevant data sources (factba.se, twitter, washington post), execute the data processing pipeline and generate the dataset collection.
```shell
cd deep_classiflie
./deep_classiflie.py --config "{PATH_TO_DEEP_CLASSIFLIE_BASE}/configs/dataprep_only.yaml"
Expand All @@ -369,33 +362,33 @@ N.B. before you begin, the core external dependency is admin access to a mariadb
2020-08-14 16:58:14,331:deep_classiflie:DEBUG: Metadata update complete, 1 record(s) affected.
...
```
12. Recursively train the deep classiflie POC model:
11. Recursively train the deep classiflie POC model:
```shell
cd deep_classiflie
./deep_classiflie.py --config "{PATH_TO_DEEP_CLASSIFLIE_BASE}/configs/train_albertbase.yaml"
```

13. Generate an swa checkpoint (current release was built using swa torchcontrib module but will switch to the now-integrated pytorch swa api in the next release):
12. Generate an swa checkpoint (current release was built using swa torchcontrib module but will switch to the now-integrated pytorch swa api in the next release):
```shell
cd deep_classiflie
./deep_classiflie.py --config "{PATH_TO_DEEP_CLASSIFLIE_BASE}/configs/gen_swa_ckpt.yaml"
```

14. Generate model analysis report(s) using the generated swa checkpoint:
13. Generate model analysis report(s) using the generated swa checkpoint:
```shell
# NOTE, swa checkpoint generated in previous step must be added to gen_report.yaml
cd deep_classiflie
./deep_classiflie.py --config "{PATH_TO_DEEP_CLASSIFLIE_BASE}/configs/gen_report.yaml"
```

15. Generate model analysis dashboards:
14. Generate model analysis dashboards:
```shell
# NOTE, swa checkpoint generated in previous step must be added to gen_dashboards.yaml
cd deep_classiflie
./deep_classiflie.py --config "{PATH_TO_DEEP_CLASSIFLIE_BASE}/configs/gen_dashboards.yaml"
```

16. configure jekyll static site generator to use bokeh dashboards locally:
15. configure jekyll static site generator to use bokeh dashboards locally:

```shell
Expand Down Expand Up @@ -447,8 +440,8 @@ N.B. before you begin, the core external dependency is admin access to a mariadb
<li><span class="fnum" id="cc">[c]</span> Deep Classiflie depends upon deep_classiflie_db (initially released as a separate repository) for much of its analytical and dataset generation functionality. Depending on how Deep Classiflie evolves (e.g. as it supports distributed data stores etc.), it may make more sense to integrate deep_classiflie_db back into deep_classiflie. <a href="#ac"></a></li>
<li><span class="fnum" id="cd">[d]</span> It's notable that the model suffers a much higher FP ratio on tweets relative to non-tweets. Exploring tweet FPs, there are a number of plausible explanations for this discrepancy which could be explored in future research. <a href="#ad">↩</a></li>
<li><span class="fnum" id="ce">[e]</span> Still in early development, there are significant outstanding issues (e.g. no tests yet!) and code quality shortcomings galore, but any constructive thoughts or contributions are welcome. I'm interested in using ML to curtail disinformation, not promulgate it, so I want to be clear -- this is essentially a fancy sentence similarity system with a lot of work put into building the dataset generation and model analysis data pipelines (I have a data engineering background, not a software engineering one).<a href="#ae"></a></li>
<li><span class="fnum" id="cf">[f]</span> Current model release built/tested before swa graduated from torchcontrib to core pytorch. Next release of Deep Classiflie will use the integrated swa api.<a href="#af"></a></li>
<li><span class="fnum" id="cg">[g]</span> Current model release built/tested before AMP was integrated into core pytorch. Next release of Deep Classiflie will use the integrated AMP api.<a href="#ag"></a></li>
<li><span class="fnum" id="cf">[f]</span> Previous versions used the swa module from torchcontrib before it graduated to core pytorch.<a href="#af"></a></li>
<li><span class="fnum" id="cg">[g]</span> Previous versions used NVIDIA's native <a href="https://github.com/NVIDIA/apex">apex</a> before AMP was integrated into pytorch<a href="#ag">↩</a></li>
<li><span class="fnum" id="ch">[h]</span> N.B. This daemon may violate Twitter's <a href="https://help.twitter.com/en/rules-and-policies/twitter-automation">policy</a> w.r.t. tweeting sensitive content if the subject's statements contain such content (no content-based filtering is included in the daemon). @DeepClassflie initially tested the Deep Classiflie twitter daemon but will post only framework-related announcements moving forward.<a href="#ah">↩</a></li>
</ul>
Expand Down
6 changes: 3 additions & 3 deletions analysis/captum_cust_viz.py
Original file line number Diff line number Diff line change
Expand Up @@ -196,7 +196,7 @@ def fmt_notes_box(ext_rec: Tuple) -> str:
<li>prediction of whether Washington Post's Fact Checker will add this claim to its "Trump False Claims" DB</li>
<li>if claim was included in WP's Fact Checker false claims DB at time of original model training</li>
<li>accuracy estimated by sorting & bucketing the test set sigmoid outputs, averaging performance in each bucket
<li>global metrics relate to the current model's performance on a test set comprised of ~12K statements made between
<li>global metrics relate to the current model's performance on a test set comprised of ~13K statements made between
{ext_rec[5][13].strftime('%Y-%m-%d')} and {ext_rec[5][14].strftime('%Y-%m-%d')}. Training, validation and test sets
are chronologically disjoint. </li>
<li>subject to interpretability filter, some subword tokens have been omitted to facilitate interpretability</li>
Expand Down Expand Up @@ -259,10 +259,10 @@ def gen_pred_exp_attr_tup(datarecord: VisualizationDataRecord, ext_rec: Tuple, t


def pred_exp_attr(datarecords: List[VisualizationDataRecord], ext_recs: List[Tuple] = None, token_mask: List = None,
invert_colors: bool = False) -> Tuple[List, Tuple]:
invert_colors: bool = False, **_) -> Tuple[List, Tuple]:
global_metrics_summ = ext_recs[0][8]
pred_exp_tups = []
for i, (datarecord, ext_rec) in enumerate(zip(datarecords, ext_recs)):
pred_exp_tup = gen_pred_exp_attr_tup(datarecord, ext_rec, token_mask, invert_colors)
pred_exp_tups.append(pred_exp_tup)
return pred_exp_tups, global_metrics_summ
return pred_exp_tups, global_metrics_summ
1 change: 0 additions & 1 deletion analysis/gen_pred_explorer.py
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,6 @@ def init_radio_groups() -> Tuple[RadioButtonGroup, ...]:


def init_explorer_divs(pred_stmt_dict: Dict) -> Tuple[Div, ...]:
# stmt_attr, word_import_html, max_word_html
default_idx = min([i for i, (b, c) in enumerate(zip(pred_stmt_dict['bucket_type'], pred_stmt_dict['tp']))
if b == 'max_acc_nontweets' and c == 1])
word_import_div = Div(text=pred_stmt_dict['pred_exp_attr_tups'][default_idx][1], height_policy='max',
Expand Down
8 changes: 4 additions & 4 deletions analysis/model_analysis_rpt.py
Original file line number Diff line number Diff line change
Expand Up @@ -74,15 +74,15 @@ def gen_pred_exp_ds(self) -> Tuple[Dict, Tuple]:
pred_exp_tups = fetchallwrapper(self.cnxp.get_connection(), self.config.inference.sql.pred_exp_sql)
pred_exp_set = []
pred_exp_ds = OrderedDict({'bucket_type': [], 'bucket_acc': [], 'conf_percentile': [], 'pos_pred_acc': [],
'neg_pred_acc': [], 'pos_pred_ratio': [], 'neg_pred_ratio': [], 'statement_id': [],
'statement_text': [], 'tp': [], 'tn': [], 'fp': [], 'fn': []})
'neg_pred_acc': [], 'pos_pred_ratio': [], 'neg_pred_ratio': [], 'statement_id': [],
'statement_text': [], 'tp': [], 'tn': [], 'fp': [], 'fn': []})
for (bucket_type, bucket_acc, conf_percentile, pos_pred_acc, neg_pred_acc, pos_pred_ratio, neg_pred_ratio,
statement_id, statement_text, ctxt_type, tp, tn, fp, fn) in pred_exp_tups:
label = 'False' if tp == 1 or fn == 1 else 'True'
pred_exp_set.append((statement_text, ctxt_type, label))
for k, v in zip(list(pred_exp_ds.keys()), [bucket_type, bucket_acc, conf_percentile, pos_pred_acc,
neg_pred_acc, pos_pred_ratio, neg_pred_ratio, statement_id,
statement_text, tp, tn, fp, fn]):
neg_pred_acc, pos_pred_ratio, neg_pred_ratio, statement_id,
statement_text, tp, tn, fp, fn]):
pred_exp_ds[k].append(v)
pred_exp_attr_tups, global_metric_summ = Inference(self.config, pred_exp_set=pred_exp_set).init_predict()
pred_exp_ds['pred_exp_attr_tups'] = pred_exp_attr_tups
Expand Down
Binary file modified assets/dc_ds.zip
Binary file not shown.
Binary file modified assets/dc_model_alpha.zip
Binary file not shown.
2 changes: 1 addition & 1 deletion configs/config_defaults_sql.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -100,7 +100,7 @@ inference:
global_model_perf_cache_sql: "select * from global_model_accuracy_lookup_cache"
pred_exp_sql: "select * from pred_explr_stmts"
save_model_sql: "insert into model_metadata select * from latest_global_model_perf_summary"
save_perf_sql: "insert into local_model_perf_summary_hist select * from latest_local_model_perf_summary"
save_perf_sql: "insert ignore into local_model_perf_summary_hist select * from latest_local_model_perf_summary"
ds_md_sql: >-
select dsid, train_start_date, train_end_date from ds_metadata where ds_type='converged_filtered' order by dsid desc limit 1
save_model_rpt_sql: >-
Expand Down
4 changes: 2 additions & 2 deletions configs/dataprep_only.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -5,10 +5,10 @@ experiment:
dataprep_only: True
debug:
debug_enabled: True
use_debug_dataset: True
use_debug_dataset: False
data_source:
# db_conf must be explictly specified only in dev mode or if db_conf is in a non-default location
db_conf: "/home/speediedan/repos/edification/deep_classiflie_db_feat/deep_classiflie_db.yaml"
# db_conf: "/home/speediedan/repos/edification/deep_classiflie_db_feat/deep_classiflie_db.yaml"
model_filter_topk: 20
filter_w_embed_cache: False
# safest way to build a new dataset is to verify backup of the previous one and remove the relevant cache softlink
Expand Down
2 changes: 1 addition & 1 deletion configs/gen_dashboards.yaml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
experiment:
db_functionality_enabled: True # must set to True to generate reports, run dctweetbot, among other functions
# provide the generated swa checkpoint below
inference_ckpt: "/home/speediedan/experiments/deep_classiflie/checkpoints/20200816114940/checkpoint-0.0000-swa_best_2_ckpts--1-0.pt" # note build_swa_from_ckpts will be ignored if inference_ckpt is present
inference_ckpt: "/home/speediedan/experiments/deep_classiflie/checkpoints/20200911144157/checkpoint-0.0000-swa_best_2_ckpts--1-0.pt" # note build_swa_from_ckpts will be ignored if inference_ckpt is present
debug:
debug_enabled: False
data_source:
Expand Down
4 changes: 2 additions & 2 deletions configs/gen_report.yaml
Original file line number Diff line number Diff line change
@@ -1,13 +1,13 @@
experiment:
db_functionality_enabled: True # must set to True to generate reports, run dctweetbot, among other functions
# provide the generated swa checkpoint below
inference_ckpt: "/home/speediedan/experiments/deep_classiflie_feat/checkpoints/20200901084410/checkpoint-0.0000-swa_best_2_ckpts--1-0.pt" # note build_swa_from_ckpts will be ignored if inference_ckpt is present
inference_ckpt: "/home/speediedan/experiments/deep_classiflie/checkpoints/20200911144157/checkpoint-0.0000-swa_best_2_ckpts--1-0.pt" # note build_swa_from_ckpts will be ignored if inference_ckpt is present
debug:
debug_enabled: False
data_source:
skip_db_refresh: True
# db_conf must be explictly specified only in dev mode or if db_conf is in a non-default location
db_conf: "/home/speediedan/repos/edification/deep_classiflie_db/deep_classiflie_db.yaml"
# db_conf: "/home/speediedan/repos/edification/deep_classiflie_db/deep_classiflie_db.yaml"
inference:
report_mode: True # set to true to enable report generation
rebuild_perf_cache: True # set True to (re)build perf cache (report_mode must also be True)
Expand Down
Loading

0 comments on commit bda664a

Please sign in to comment.