Saving `is_training_set_available` in `sys_info` during `get_overall_statistics()` #453

OscarWang114 · 2022-09-06T03:58:34Z

Although this issue occurs in the web interface. I'm writing it here as it's mainly SDK-related.

Problem

In the web interface:

[0]   File "/Users/oscar/opt/anaconda3/envs/exb/lib/python3.9/site-packages/explainaboard/processors/processor.py", line 252, in perform_analyses
[0]     my_analysis.perform(
[0]   File "/Users/oscar/opt/anaconda3/envs/exb/lib/python3.9/site-packages/explainaboard/analysis/analyses.py", line 191, in perform
[0]     raise RuntimeError(f"bucket analysis: feature {self.feature} not found.")

In SDK:

The function _gen_cases_and_stats() in conditional_generation.py (called by processor.py’s get_overall_statistics()) skips saving require_training_set=True example-level features. However, these skipped feature names are saved in sys_info.analysis_levels[0].

This causes perform() in BucketAnalysis in analyses.py to attempt to look up these features and throw the above error as the features cannot be found in the actual cases (since they are skipped).

Quick fix

Set skip_failed_analyses=True.

Long-term solution

Following up on #410, we should save a flag like is_training_set_available in sys_info. If set to false, we should skip the require_training_set=True features during bucket analysis.

The text was updated successfully, but these errors were encountered:

odashi · 2022-09-06T23:33:26Z

@OscarWang114 Thanks for reporting the issue!

First, could skip_failed_analyses=True in Processor.process be a quick fix, or does not it satisfy the use case?

I also agree with having more specific control around feature groups (in this case, train-only or not). Is the flag name just is_trainint_set rather than is_training_set_available?

OscarWang114 · 2022-09-06T23:38:50Z

@odashi Thanks! Yes,skip_failed_analyses=True is a valid quick fix; I updated the issue description. And thanks for catching the typo (also updated).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Saving `is_training_set_available` in `sys_info` during `get_overall_statistics()` #453

Saving `is_training_set_available` in `sys_info` during `get_overall_statistics()` #453

OscarWang114 commented Sep 6, 2022 •

edited

Loading

odashi commented Sep 6, 2022

OscarWang114 commented Sep 6, 2022

Saving is_training_set_available in sys_info during get_overall_statistics() #453

Saving is_training_set_available in sys_info during get_overall_statistics() #453

Comments

OscarWang114 commented Sep 6, 2022 • edited Loading

Problem

In the web interface:

In SDK:

Quick fix

Long-term solution

odashi commented Sep 6, 2022

OscarWang114 commented Sep 6, 2022

Saving `is_training_set_available` in `sys_info` during `get_overall_statistics()` #453

Saving `is_training_set_available` in `sys_info` during `get_overall_statistics()` #453

OscarWang114 commented Sep 6, 2022 •

edited

Loading