Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨ wizard: anomalist (first draft) #3363

Merged
merged 36 commits into from
Oct 9, 2024
Merged

✨ wizard: anomalist (first draft) #3363

merged 36 commits into from
Oct 9, 2024

Conversation

lucasrodes
Copy link
Member

@lucasrodes lucasrodes commented Oct 3, 2024

First implementation of: #3340

  • Re-organisation of grapher db/s3 util functions.
  • Draft of anomalist page in Wizard.

@owidbot
Copy link
Contributor

owidbot commented Oct 3, 2024

Quick links (staging server):

Site Admin Wizard

Login: ssh owid@staging-site-wizard-anomalies

chart-diff: ✅
  • 2/2 reviewed charts
    • Modified: 2/2
    • New: 0/0
data-diff: ❌ Found differences
= Dataset garden/artificial_intelligence/2024-02-05/chess
  = Table chess
    ~ Column elo_rating (changed data)
        ~ Changed values: 5 / 39 (12.82%)
                         entity  year  elo_rating -  elo_rating +
          Computer chess engine  1994          2321          2314
          Computer chess engine  2009          3237          3227
          Computer chess engine  2010          3237          3227
          Computer chess engine  2011          3237          3236
          Computer chess engine  2023          3591          3586
= Dataset garden/artificial_intelligence/2024-06-28/ai_bills
  = Table ai_bills
~ Dataset garden/war/2024-08-26/ucdp
-   -   This dataset provides information on armed conflicts, using data from the UCDP Georeferenced Event Dataset (version 24.1), the UCDP/PRIO Armed Conflict Dataset (version 24.1), and the UCDP Battle-Related Deaths Dataset (version 24.1).
    ?                                                                                                                        ^                                                    ^                                                          ^
+   +   This dataset provides information on armed conflicts, using data from the UCDP Georeferenced Event Dataset (version 23.1), the UCDP/PRIO Armed Conflict Dataset (version 23.1), and the UCDP Battle-Related Deaths Dataset (version 23.1).
    ?                                                                                                                        ^                                                    ^                                                          ^
  = Table ucdp_country
  = Table ucdp
  = Table ucdp_locations
~ Dataset garden/war/2024-08-26/ucdp_prio
-   -   This dataset provides information on armed conflicts, using data from the UCDP Georeferenced Event Dataset (version 24.1), the UCDP/PRIO Armed Conflict Dataset (version 24.1), and the UCDP Battle-Related Deaths Dataset (version 24.1).
    ?                                                                                                                        ^                                                    ^                                                          ^
+   +   This dataset provides information on armed conflicts, using data from the UCDP Georeferenced Event Dataset (version 23.1), the UCDP/PRIO Armed Conflict Dataset (version 23.1), and the UCDP Battle-Related Deaths Dataset (version 23.1).
    ?                                                                                                                        ^                                                    ^                                                          ^
  = Table ucdp_prio
= Dataset garden/who/2024-09-09/flu_test
  = Table flu_test
    ~ Dim country
+       + New values: 1 / 71581 (0.00%)
                date country
          2024-09-16  Mexico
-       - Removed values: 23 / 71581 (0.03%)
                date   country
          2024-09-23     India
          2024-09-23 Indonesia
          2024-09-23   Jamaica
          2024-09-16   Vietnam
          2024-09-23   Vietnam
    ~ Dim date
+       + New values: 1 / 71581 (0.00%)
          country       date
           Mexico 2024-09-16
-       - Removed values: 23 / 71581 (0.03%)
            country       date
              India 2024-09-23
          Indonesia 2024-09-23
            Jamaica 2024-09-23
            Vietnam 2024-09-16
            Vietnam 2024-09-23
    ~ Column denomcombined (new data, changed data)
+       + New values: 1 / 71581 (0.00%)
          country       date  denomcombined
           Mexico 2024-09-16            482
-       - Removed values: 23 / 71581 (0.03%)
            country       date  denomcombined
              India 2024-09-23            113
          Indonesia 2024-09-23             24
            Jamaica 2024-09-23              5
            Vietnam 2024-09-16             50
            Vietnam 2024-09-23             21
        ~ Changed values: 92 / 71581 (0.13%)
             country       date  denomcombined -  denomcombined +
           Argentina 2024-08-26              115              105
              Brazil 2024-05-20             7369             7326
              Brazil 2024-06-10             7006             6979
              Brazil 2024-07-22             6140             6130
          Costa Rica 2024-08-19              724              681
    ~ Column pcnt_poscombined (new data, changed data)
+       + New values: 1 / 71581 (0.00%)
          country       date  pcnt_poscombined
           Mexico 2024-09-16          5.394191
-       - Removed values: 23 / 71581 (0.03%)
            country       date  pcnt_poscombined
              India 2024-09-23         23.008850
          Indonesia 2024-09-23          8.333333
            Jamaica 2024-09-23         20.000000
            Vietnam 2024-09-16         32.000000
            Vietnam 2024-09-23         38.095238
        ~ Changed values: 104 / 71581 (0.15%)
            country       date  pcnt_poscombined -  pcnt_poscombined +
          Argentina 2024-06-10           24.615160           24.248644
             Brazil 2024-08-12            4.376248            4.378633
            Jamaica 2024-08-05           16.666666           16.326530
               Mali 2024-09-09           20.000000           18.367348
            Vietnam 2024-01-22           31.081081           32.432434


Legend: +New  ~Modified  -Removed  =Identical  Details
Hint: Run this locally with etl diff REMOTE data/ --include yourdataset --verbose --snippet

Automatically updated datasets matching weekly_wildfires|excess_mortality|covid|fluid|flunet|country_profile|garden/ihme_gbd/2019/gbd_risk are not included

Edited: 2024-10-09 10:35:32 UTC
Execution time: 4.22 seconds

@lucasrodes
Copy link
Member Author

lucasrodes commented Oct 3, 2024

implemented the foundations of the app.

next tasks:

  • create plot interface (entity→ country)
  • add button to generate anomaly list, per indicator
    • can we have sth to list those we want to create lists for? checkboxes per container; maybe too long. multiselect at first + individual buttons? ー second option sounds better
    • individual button action should be kept in fragment ー avoid whole page rerun
  • think plot for upgrade mode
  • export anomalies somewhere, maybe indicator metadata?

@lucasrodes
Copy link
Member Author

lucasrodes commented Oct 4, 2024

Managed to use Grapher to chart indicators (user might want to see the time series if an anomaly is detected)

Currently working on streaming the output from OpenAI and render streamlit objects.

Relevant links:

@@ -1251,6 +1277,24 @@ def override_yaml_path(self) -> Path:
"""Return path to indicator YAML file."""
return self.step_path.with_suffix(".meta.override.yml")

def get_data(self, session: Optional[Session] = None) -> pd.DataFrame:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's already a function variable_data_df_from_s3 for reading data from S3, you can reuse it.

* wip

* db -> db_utils

* io -> db

* move things db_utils -> db

* db -> grapher_io

* db -> grapher_io, db_utils -> db

* docstring

* db_utils -> db

* wip

* remove indicator

* add overloads

* ci/cd

* wip

* cicd

* wip

* deprecation warnings

* missing import
@lucasrodes lucasrodes changed the title ✨ wizard: anomalies ✨ wizard: anomalist (first draft) Oct 9, 2024
Copy link
Collaborator

@Marigold Marigold left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's hard to review such a massive PR, but it looks good after skimming through it. Don't forget to squash it.

@lucasrodes lucasrodes marked this pull request as ready for review October 9, 2024 11:05
@lucasrodes lucasrodes merged commit d0afe39 into master Oct 9, 2024
9 of 10 checks passed
@lucasrodes lucasrodes deleted the wizard-anomalies branch October 9, 2024 11:05
paarriagadap pushed a commit that referenced this pull request Oct 9, 2024
* ✨ wizard: anomalies

* wip

* bump streamlit

* wip

* wip: chart

* wip

* todo

* plot indicator

* re-structure

* wip: loading indicators

* fix API grapher_chart

* deprecate chart_html

* chart_html -> grapher_chart

* clean

* ci/cd

* wip

* wip

* changed module name

* custom components module

* add methods to get uris

* new alias

* get dataset uris

* update import

* update gpt pricing

* update import

* wip

* provide entity-context for anomaly

* wip: anomalist v2

* wip

* wip

* lock

* ✨ anomalist: improve utils (#3385)

* wip

* db -> db_utils

* io -> db

* move things db_utils -> db

* db -> grapher_io

* db -> grapher_io, db_utils -> db

* docstring

* db_utils -> db

* wip

* remove indicator

* add overloads

* ci/cd

* wip

* cicd

* wip

* deprecation warnings

* missing import

* hide anomalist in wizard
@lucasrodes lucasrodes mentioned this pull request Oct 7, 2024
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants