Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🎉 Create wizard page on data producer analytics #3711

Merged
merged 26 commits into from
Dec 12, 2024

Conversation

pabloarosado
Copy link
Contributor

@pabloarosado pabloarosado commented Dec 10, 2024

Context

In the past, I have used this script from analytics to be able to share analytics with specific data providers. From now on, it could be part of the ETL wizard, and it could be expanded.

This page is not only useful to share analytics with data producers, but also internally to see analytics at the data producer level.

To test it, first run make test to install pandas-gbq, and then execute etlwiz locally and go to the new "Producer analytics" page.

Main changes

  • Create a wizard page to visualize and easily export analytics for specific data providers.
  • Add argument exclude_steps to be able to exclude certain steps (e.g. auxiliary steps like population) from the DAG.
  • Install pandas-gbq and update the uv lock.

Notes

  • This work is sort of in between the analytics and the ETL repos. I initially thought it would make more sense in the former, but given that it requires many tools from ETL, I decided to include it in the latter.
  • The new wizard page does not work on staging because pandas-gbq expects authentication via a browser. I suppose we would need to add the appropriate credentials in ~/.config/pandas_gbq/bigquery_credentials.dat, but I'm not sure what the best way is to achieve that, or if that's a good idea. What do you think, @Marigold ? Thanks.
  • Ideally, we should track producers at the indicator level, but I don't think we currently have the capabilities to do that (we could use attribution from the grapher variables table, but this is often empty, or is a concatenation of multiple producers). The current approach tracks producers at the dataset level, following the DAG.

Ideas

Just brainstorming where this could lead to in the future (feel free to add other ideas):

  • Currently, to share analytics with a data provider, we need to open this wizard page and manually copy/paste the output into an email.
  • A better idea would be to export a more convenient format (a zip file with a few things, or a HTML with the interactive chart?).
  • A better idea would be to have a kind of newsletter, automatically generated with analytics, say, every month.
  • An even better idea (but it may be too much, and it's unclear how much value it would provide), would be to have a dedicated dashboard page for certain data providers, so they could access all analytics they needed on any given day.

@owidbot
Copy link
Contributor

owidbot commented Dec 10, 2024

Quick links (staging server):

Site Dev Site Preview Admin Wizard Docs

Login: ssh owid@staging-site-wizard-on-provider-analytics

chart-diff: ✅ No charts for review.
data-diff: ❌ Found differences
= Dataset garden/health/latest/global_health_mpox
  = Table global_health_mpox
= Dataset garden/wb/2024-12-03/poverty_projections
  = Table poverty_projections
    ~ Dim country
+       + New values: 11712 / 11712 (100.00%)
           year  povertyline                             scenario                       country
           1993         3.65 Current forecast + historical growth Europe and Central Asia (PIP)
           2036         2.15                            6% growth Europe and Central Asia (PIP)
           1990         6.85                           Historical      Sub-Saharan Africa (PIP)
           1995         2.15        2% growth + Gini reduction 1%                         World
           2011         6.85                            8% growth                         World
-       - Removed values: 13176 / 11712 (112.50%)
           year  povertyline                                         scenario                               country
           1997         6.85                             Historical estimates         Europe and Central Asia (PIP)
           2029         3.65                            2% growth projections Latin America and the Caribbean (PIP)
           2010         2.15 Current forecast + historical growth projections     Other high income countries (PIP)
           2010         3.65                             Historical estimates                      South Asia (PIP)
           2013         3.65                            8% growth projections                      South Asia (PIP)
    ~ Dim year
+       + New values: 11712 / 11712 (100.00%)
                                country  povertyline                             scenario  year
          Europe and Central Asia (PIP)         3.65 Current forecast + historical growth  1993
          Europe and Central Asia (PIP)         2.15                            6% growth  2036
               Sub-Saharan Africa (PIP)         6.85                           Historical  1990
                                  World         2.15        2% growth + Gini reduction 1%  1995
                                  World         6.85                            8% growth  2011
-       - Removed values: 13176 / 11712 (112.50%)
                                        country  povertyline                                         scenario  year
                  Europe and Central Asia (PIP)         6.85                             Historical estimates  1997
          Latin America and the Caribbean (PIP)         3.65                            2% growth projections  2029
              Other high income countries (PIP)         2.15 Current forecast + historical growth projections  2010
                               South Asia (PIP)         3.65                             Historical estimates  2010
                               South Asia (PIP)         3.65                            8% growth projections  2013
    ~ Dim povertyline
+       + New values: 11712 / 11712 (100.00%)
                                country  year                             scenario  povertyline
          Europe and Central Asia (PIP)  1993 Current forecast + historical growth         3.65
          Europe and Central Asia (PIP)  2036                            6% growth         2.15
               Sub-Saharan Africa (PIP)  1990                           Historical         6.85
                                  World  1995        2% growth + Gini reduction 1%         2.15
                                  World  2011                            8% growth         6.85
-       - Removed values: 13176 / 11712 (112.50%)
                                        country  year                                         scenario  povertyline
                  Europe and Central Asia (PIP)  1997                             Historical estimates         6.85
          Latin America and the Caribbean (PIP)  2029                            2% growth projections         3.65
              Other high income countries (PIP)  2010 Current forecast + historical growth projections         2.15
                               South Asia (PIP)  2010                             Historical estimates         3.65
                               South Asia (PIP)  2013                            8% growth projections         3.65
    ~ Dim scenario
+       + New values: 11712 / 11712 (100.00%)
                                country  year  povertyline                             scenario
          Europe and Central Asia (PIP)  1993         3.65 Current forecast + historical growth
          Europe and Central Asia (PIP)  2036         2.15                            6% growth
               Sub-Saharan Africa (PIP)  1990         6.85                           Historical
                                  World  1995         2.15        2% growth + Gini reduction 1%
                                  World  2011         6.85                            8% growth
-       - Removed values: 13176 / 11712 (112.50%)
                                        country  year  povertyline                                         scenario
                  Europe and Central Asia (PIP)  1997         6.85                             Historical estimates
          Latin America and the Caribbean (PIP)  2029         3.65                            2% growth projections
              Other high income countries (PIP)  2010         2.15 Current forecast + historical growth projections
                               South Asia (PIP)  2010         3.65                             Historical estimates
                               South Asia (PIP)  2013         3.65                            8% growth projections
    ~ Column fgt0 (changed metadata, new data, changed data)
-       -     <% if scenario == "Historical estimates" %>
        ?                                  ----------
+       +     <% if scenario == "Historical" %>
-       -     <% elif scenario == "Current forecast + historical growth projections" %>
        ?                                                              ------------
+       +     <% elif scenario == "Current forecast + historical growth" %>
-       -     <% elif scenario == "Historical estimates + projections" %>
-       -     This data combines data based on household surveys or extrapolated up until the year of the data release using GDP growth estimates and forecasts, with projections based on GDP growth projections from the World Bank's Global Economic Prospects and the the Macro Poverty Outlook, together with IMF's World Economic Outlook, in the period 2025-2029. For the period 2030-2050, the data is projected using the average annual historical GDP per capita growth over 2010-2019.
-       -     <% elif scenario == "2% growth projections" %>
        ?                                   ------------
+       +     <% elif scenario == "2% growth" %>
-       -     <% elif scenario == "2% growth + Gini reduction 1% projections" %>
        ?                                                       ------------
+       +     <% elif scenario == "2% growth + Gini reduction 1%" %>
-       -     <% elif scenario == "2% growth + Gini reduction 2% projections" %>
        ?                                                       ------------
+       +     <% elif scenario == "2% growth + Gini reduction 2%" %>
-       -     <% elif scenario == "4% growth projections" %>
        ?                                   ------------
+       +     <% elif scenario == "4% growth" %>
-       -     <% elif scenario == "6% growth projections" %>
        ?                                   ------------
+       +     <% elif scenario == "6% growth" %>
-       -     <% elif scenario == "8% growth projections" %>
        ?                                   ------------
+       +     <% elif scenario == "8% growth" %>
-       -     attribution: Lakner et al. (2024). Reproducibility package for Poverty, Prosperity and Planet Report 2024
-       -     <% if scenario == "Historical estimates" or scenario == "Historical estimates + projections" %>
+       +     <% if scenario == "Historical" %>

+       + New values: 11712 / 11712 (100.00%)
                                country  year  povertyline                             scenario       fgt0
          Europe and Central Asia (PIP)  1993         3.65 Current forecast + historical growth       <NA>
          Europe and Central Asia (PIP)  2036         2.15                            6% growth   0.105415
               Sub-Saharan Africa (PIP)  1990         6.85                           Historical  90.141022
                                  World  1995         2.15        2% growth + Gini reduction 1%       <NA>
                                  World  2011         6.85                            8% growth       <NA>
-       - Removed values: 13176 / 11712 (112.50%)
                                        country  year  povertyline                                         scenario       fgt0
                  Europe and Central Asia (PIP)  1997         6.85                             Historical estimates  44.163143
          Latin America and the Caribbean (PIP)  2029         3.65                            2% growth projections   7.258942
              Other high income countries (PIP)  2010         2.15 Current forecast + historical growth projections       <NA>
                               South Asia (PIP)  2010         3.65                             Historical estimates  65.804619
                               South Asia (PIP)  2013         3.65                            8% growth projections       <NA>
    ~ Column poorpop (changed metadata, new data, changed data)
-       -     <% if scenario == "Historical estimates" %>
        ?                                  ----------
+       +     <% if scenario == "Historical" %>
-       -     <% elif scenario == "Current forecast + historical growth projections" %>
        ?                                                              ------------
+       +     <% elif scenario == "Current forecast + historical growth" %>
-       -     <% elif scenario == "Historical estimates + projections" %>
-       -     This data combines data based on household surveys or extrapolated up until the year of the data release using GDP growth estimates and forecasts, with projections based on GDP growth projections from the World Bank's Global Economic Prospects and the the Macro Poverty Outlook, together with IMF's World Economic Outlook, in the period 2025-2029. For the period 2030-2050, the data is projected using the average annual historical GDP per capita growth over 2010-2019.
-       -     <% elif scenario == "2% growth projections" %>
        ?                                   ------------
+       +     <% elif scenario == "2% growth" %>
-       -     <% elif scenario == "2% growth + Gini reduction 1% projections" %>
        ?                                                       ------------
+       +     <% elif scenario == "2% growth + Gini reduction 1%" %>
-       -     <% elif scenario == "2% growth + Gini reduction 2% projections" %>
        ?                                                       ------------
+       +     <% elif scenario == "2% growth + Gini reduction 2%" %>
-       -     <% elif scenario == "4% growth projections" %>
        ?                                   ------------
+       +     <% elif scenario == "4% growth" %>
-       -     <% elif scenario == "6% growth projections" %>
        ?                                   ------------
+       +     <% elif scenario == "6% growth" %>
-       -     <% elif scenario == "8% growth projections" %>
        ?                                   ------------
+       +     <% elif scenario == "8% growth" %>
-       -     attribution: Lakner et al. (2024). Reproducibility package for Poverty, Prosperity and Planet Report 2024
-       -     <% if scenario == "Historical estimates" or scenario == "Historical estimates + projections" %>
+       +     <% if scenario == "Historical" %>

+       + New values: 11712 / 11712 (100.00%)
                                country  year  povertyline                             scenario       poorpop
          Europe and Central Asia (PIP)  1993         3.65 Current forecast + historical growth          <NA>
          Europe and Central Asia (PIP)  2036         2.15                            6% growth  523123.90625
               Sub-Saharan Africa (PIP)  1990         6.85                           Historical   465695296.0
                                  World  1995         2.15        2% growth + Gini reduction 1%          <NA>
                                  World  2011         6.85                            8% growth          <NA>
-       - Removed values: 13176 / 11712 (112.50%)
                                        country  year  povertyline                                         scenario       poorpop
                  Europe and Central Asia (PIP)  1997         6.85                             Historical estimates   208242400.0
          Latin America and the Caribbean (PIP)  2029         3.65                            2% growth projections    49932444.0
              Other high income countries (PIP)  2010         2.15 Current forecast + historical growth projections          <NA>
                               South Asia (PIP)  2010         3.65                             Historical estimates  1092715904.0
                               South Asia (PIP)  2013         3.65                            8% growth projections          <NA>
= Dataset garden/who/2024-09-09/flu_test
  = Table flu_test
    ~ Dim country
-       - Removed values: 9 / 72518 (0.01%)
                date     country
          2024-12-02      Brunei
          2024-12-02       China
          2024-12-02     Lebanon
          2024-11-25     Nigeria
          2024-09-30 North Korea
    ~ Dim date
-       - Removed values: 9 / 72518 (0.01%)
              country       date
               Brunei 2024-12-02
                China 2024-12-02
              Lebanon 2024-12-02
              Nigeria 2024-11-25
          North Korea 2024-09-30
    ~ Column denomcombined (changed data)
-       - Removed values: 9 / 72518 (0.01%)
              country       date  denomcombined
               Brunei 2024-12-02             29
                China 2024-12-02          24816
              Lebanon 2024-12-02             47
              Nigeria 2024-11-25              5
          North Korea 2024-09-30            137
        ~ Changed values: 4 / 72518 (0.01%)
           country       date  denomcombined -  denomcombined +
             China 2024-11-25            27077            23860
          Maldives 2024-11-25               69               59
           Nigeria 2024-10-07               49               45
           Nigeria 2024-10-14               39               34
    ~ Column pcnt_poscombined (changed data)
-       - Removed values: 9 / 72518 (0.01%)
              country       date  pcnt_poscombined
               Brunei 2024-12-02         13.793103
                China 2024-12-02         11.266925
              Lebanon 2024-12-02           2.12766
              Nigeria 2024-11-25              20.0
          North Korea 2024-09-30          1.459854
        ~ Changed values: 4 / 72518 (0.01%)
           country       date  pcnt_poscombined -  pcnt_poscombined +
             China 2024-11-25            7.840603             7.25482
          Maldives 2024-11-25           40.579712           42.372883
           Nigeria 2024-10-07            8.163265            8.888889
           Nigeria 2024-10-14            7.692307            8.823529
= Dataset garden/who/latest/monkeypox
  = Table monkeypox


Legend: +New  ~Modified  -Removed  =Identical  Details
Hint: Run this locally with etl diff REMOTE data/ --include yourdataset --verbose --snippet

Automatically updated datasets matching weekly_wildfires|excess_mortality|covid|fluid|flunet|country_profile|garden/ihme_gbd/2019/gbd_risk are not included

Edited: 2024-12-11 09:53:10 UTC
Execution time: 18.83 seconds

@pabloarosado pabloarosado marked this pull request as ready for review December 11, 2024 15:15
@Marigold
Copy link
Collaborator

Whoa, very nice! As for

The new wizard page does not work on staging because pandas-gbq expects authentication via a browser. I suppose we would need to add the appropriate credentials in ~/.config/pandas_gbq/bigquery_credentials.dat, but I'm not sure what the best way is to achieve that, or if that's a good idea. What do you think, @Marigold ? Thanks.

I can add service account for GCP to all staging servers to make it work. Just let me know.

@pabloarosado
Copy link
Contributor Author

Whoa, very nice! As for

The new wizard page does not work on staging because pandas-gbq expects authentication via a browser. I suppose we would need to add the appropriate credentials in ~/.config/pandas_gbq/bigquery_credentials.dat, but I'm not sure what the best way is to achieve that, or if that's a good idea. What do you think, @Marigold ? Thanks.

I can add service account for GCP to all staging servers to make it work. Just let me know.

Thanks Mojmir, yes, that sounds good to me!

Copy link
Member

@lucasrodes lucasrodes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks super nice, Pablo! Thanks for putting this out!

I've modified some bits here and there, just trying to make the code slightly more readable.

Feel free to merge whenever.

@pabloarosado
Copy link
Contributor Author

Thanks a lot @lucasrodes for the improvements!

@pabloarosado pabloarosado merged commit a1e9815 into master Dec 12, 2024
6 of 8 checks passed
@pabloarosado pabloarosado deleted the wizard-on-provider-analytics branch December 12, 2024 15:59
@pabloarosado
Copy link
Contributor Author

pabloarosado commented Dec 12, 2024

Hey @Marigold the wizard page is failing in production, because of the missing GBQ credentials. Is it trivial to add them? Please let me know if I can help, thanks a lot!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants