-
-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🎉 App to find similar insights #3518
Conversation
Quick links (staging server):
Login: chart-diff: ✅No charts for review.data-diff: ❌ Found differences= Dataset garden/un/2024-04-09/undp_hdr
= Table undp_hdr
~ Column abr (changed metadata)
- - We calculated averages over continents and income groups by taking the population-weighted average of the countries in each group. If less than 80% of countries in an area report data for a given year, we do not calculate the average for that area.
+ + - We calculated averages over continents and income groups by taking the population-weighted average of the countries in each group. If less than 80% of countries in an area report data for a given year, we do not calculate the average for that area.
? ++
~ Column co2_prod (changed metadata)
- - We calculated averages over continents and income groups by taking the population-weighted average of the countries in each group. If less than 80% of countries in an area report data for a given year, we do not calculate the average for that area.
+ + - We calculated averages over continents and income groups by taking the population-weighted average of the countries in each group. If less than 80% of countries in an area report data for a given year, we do not calculate the average for that area.
? ++
~ Column coef_ineq (changed metadata)
- - We calculated averages over continents and income groups by taking the population-weighted average of the countries in each group. If less than 80% of countries in an area report data for a given year, we do not calculate the average for that area.
+ + - We calculated averages over continents and income groups by taking the population-weighted average of the countries in each group. If less than 80% of countries in an area report data for a given year, we do not calculate the average for that area.
? ++
~ Column diff_hdi_phdi (changed metadata)
- - We calculated averages over continents and income groups by taking the population-weighted average of the countries in each group. If less than 80% of countries in an area report data for a given year, we do not calculate the average for that area.
+ + - We calculated averages over continents and income groups by taking the population-weighted average of the countries in each group. If less than 80% of countries in an area report data for a given year, we do not calculate the average for that area.
? ++
~ Column eys (changed metadata)
- - We calculated averages over continents and income groups by taking the population-weighted average of the countries in each group. If less than 80% of countries in an area report data for a given year, we do not calculate the average for that area.
+ + - We calculated averages over continents and income groups by taking the population-weighted average of the countries in each group. If less than 80% of countries in an area report data for a given year, we do not calculate the average for that area.
? ++
~ Column eys_f (changed metadata)
- - We calculated averages over continents and income groups by taking the population-weighted average of the countries in each group. If less than 80% of countries in an area report data for a given year, we do not calculate the average for that area.
+ + - We calculated averages over continents and income groups by taking the population-weighted average of the countries in each group. If less than 80% of countries in an area report data for a given year, we do not calculate the average for that area.
? ++
~ Column eys_m (changed metadata)
- - We calculated averages over continents and income groups by taking the population-weighted average of the countries in each group. If less than 80% of countries in an area report data for a given year, we do not calculate the average for that area.
+ + - We calculated averages over continents and income groups by taking the population-weighted average of the countries in each group. If less than 80% of countries in an area report data for a given year, we do not calculate the average for that area.
? ++
~ Column gdi (changed metadata)
- - We calculated averages over continents and income groups by taking the population-weighted average of the countries in each group. If less than 80% of countries in an area report data for a given year, we do not calculate the average for that area.
+ + - We calculated averages over continents and income groups by taking the population-weighted average of the countries in each group. If less than 80% of countries in an area report data for a given year, we do not calculate the average for that area.
? ++
~ Column gdi_group (changed metadata, changed data)
+ + description_processing: |-
+ + - We calculated averages over continents and income groups by taking the population-weighted average of the countries in each group. If less than 80% of countries in an area report data for a given year, we do not calculate the average for that area.
~ Changed values: 11 / 7161 (0.15%)
country year gdi_group - gdi_group +
Europe 2022 <NA> 1.268419
High-income countries 2022 <NA> 1.392950
Lower-middle-income countries 2022 <NA> 4.389009
South America 2022 <NA> 1.150919
Upper-middle-income countries 2022 <NA> 2.057359
~ Column gii (changed metadata)
- - We calculated averages over continents and income groups by taking the population-weighted average of the countries in each group. If less than 80% of countries in an area report data for a given year, we do not calculate the average for that area.
+ + - We calculated averages over continents and income groups by taking the population-weighted average of the countries in each group. If less than 80% of countries in an area report data for a given year, we do not calculate the average for that area.
? ++
~ Column gii_rank (changed metadata, changed data)
+ + description_processing: |-
+ + - We calculated averages over continents and income groups by taking the population-weighted average of the countries in each group. If less than 80% of countries in an area report data for a given year, we do not calculate the average for that area.
~ Changed values: 9 / 7161 (0.13%)
country year gii_rank - gii_rank +
Asia 2022 <NA> 3579
Europe 2022 <NA> 1089
High-income countries 2022 <NA> 1832
South America 2022 <NA> 1092
Upper-middle-income countries 2022 <NA> 3799
~ Column gni_pc_f (changed metadata)
- - We calculated averages over continents and income groups by taking the population-weighted average of the countries in each group. If less than 80% of countries in an area report data for a given year, we do not calculate the average for that area.
+ + - We calculated averages over continents and income groups by taking the population-weighted average of the countries in each group. If less than 80% of countries in an area report data for a given year, we do not calculate the average for that area.
? ++
~ Column gni_pc_m (changed metadata)
- - We calculated averages over continents and income groups by taking the population-weighted average of the countries in each group. If less than 80% of countries in an area report data for a given year, we do not calculate the average for that area.
+ + - We calculated averages over continents and income groups by taking the population-weighted average of the countries in each group. If less than 80% of countries in an area report data for a given year, we do not calculate the average for that area.
? ++
~ Column gnipc (changed metadata)
- - We calculated averages over continents and income groups by taking the population-weighted average of the countries in each group. If less than 80% of countries in an area report data for a given year, we do not calculate the average for that area.
+ + - We calculated averages over continents and income groups by taking the population-weighted average of the countries in each group. If less than 80% of countries in an area report data for a given year, we do not calculate the average for that area.
? ++
~ Column hdi (changed metadata)
- - We calculated averages over continents and income groups by taking the population-weighted average of the countries in each group. If less than 80% of countries in an area report data for a given year, we do not calculate the average for that area.
+ + - We calculated averages over continents and income groups by taking the population-weighted average of the countries in each group. If less than 80% of countries in an area report data for a given year, we do not calculate the average for that area.
? ++
~ Column hdi_f (changed metadata)
- - We calculated averages over continents and income groups by taking the population-weighted average of the countries in each group. If less than 80% of countries in an area report data for a given year, we do not calculate the average for that area.
+ + - We calculated averages over continents and income groups by taking the population-weighted average of the countries in each group. If less than 80% of countries in an area report data for a given year, we do not calculate the average for that area.
? ++
~ Column hdi_m (changed metadata)
- - We calculated averages over continents and income groups by taking the population-weighted average of the countries in each group. If less than 80% of countries in an area report data for a given year, we do not calculate the average for that area.
+ + - We calculated averages over continents and income groups by taking the population-weighted average of the countries in each group. If less than 80% of countries in an area report data for a given year, we do not calculate the average for that area.
? ++
~ Column hdi_rank (changed metadata, changed data)
+ + description_processing: |-
+ + - We calculated averages over continents and income groups by taking the population-weighted average of the countries in each group. If less than 80% of countries in an area report data for a given year, we do not calculate the average for that area.
~ Changed values: 11 / 7161 (0.15%)
country year hdi_rank - hdi_rank +
Europe 2022 <NA> 1537
High-income countries 2022 <NA> 2161
Lower-middle-income countries 2022 <NA> 7099
South America 2022 <NA> 1054
Upper-middle-income countries 2022 <NA> 4964
~ Column ihdi (changed metadata)
- - We calculated averages over continents and income groups by taking the population-weighted average of the countries in each group. If less than 80% of countries in an area report data for a given year, we do not calculate the average for that area.
+ + - We calculated averages over continents and income groups by taking the population-weighted average of the countries in each group. If less than 80% of countries in an area report data for a given year, we do not calculate the average for that area.
? ++
~ Column ineq_edu (changed metadata)
- - We calculated averages over continents and income groups by taking the population-weighted average of the countries in each group. If less than 80% of countries in an area report data for a given year, we do not calculate the average for that area.
+ + - We calculated averages over continents and income groups by taking the population-weighted average of the countries in each group. If less than 80% of countries in an area report data for a given year, we do not calculate the average for that area.
? ++
~ Column ineq_inc (changed metadata)
- - We calculated averages over continents and income groups by taking the population-weighted average of the countries in each group. If less than 80% of countries in an area report data for a given year, we do not calculate the average for that area.
+ + - We calculated averages over continents and income groups by taking the population-weighted average of the countries in each group. If less than 80% of countries in an area report data for a given year, we do not calculate the average for that area.
? ++
~ Column ineq_le (changed metadata)
- - We calculated averages over continents and income groups by taking the population-weighted average of the countries in each group. If less than 80% of countries in an area report data for a given year, we do not calculate the average for that area.
+ + - We calculated averages over continents and income groups by taking the population-weighted average of the countries in each group. If less than 80% of countries in an area report data for a given year, we do not calculate the average for that area.
? ++
~ Column le (changed metadata)
- - We calculated averages over continents and income groups by taking the population-weighted average of the countries in each group. If less than 80% of countries in an area report data for a given year, we do not calculate the average for that area.
+ + - We calculated averages over continents and income groups by taking the population-weighted average of the countries in each group. If less than 80% of countries in an area report data for a given year, we do not calculate the average for that area.
? ++
~ Column le_f (changed metadata)
- - We calculated averages over continents and income groups by taking the population-weighted average of the countries in each group. If less than 80% of countries in an area report data for a given year, we do not calculate the average for that area.
+ + - We calculated averages over continents and income groups by taking the population-weighted average of the countries in each group. If less than 80% of countries in an area report data for a given year, we do not calculate the average for that area.
? ++
~ Column le_m (changed metadata)
- - We calculated averages over continents and income groups by taking the population-weighted average of the countries in each group. If less than 80% of countries in an area report data for a given year, we do not calculate the average for that area.
+ + - We calculated averages over continents and income groups by taking the population-weighted average of the countries in each group. If less than 80% of countries in an area report data for a given year, we do not calculate the average for that area.
? ++
~ Column lfpr_f (changed metadata)
- - We calculated averages over continents and income groups by taking the population-weighted average of the countries in each group. If less than 80% of countries in an area report data for a given year, we do not calculate the average for that area.
+ + - We calculated averages over continents and income groups by taking the population-weighted average of the countries in each group. If less than 80% of countries in an area report data for a given year, we do not calculate the average for that area.
? ++
~ Column lfpr_m (changed metadata)
- - We calculated averages over continents and income groups by taking the population-weighted average of the countries in each group. If less than 80% of countries in an area report data for a given year, we do not calculate the average for that area.
+ + - We calculated averages over continents and income groups by taking the population-weighted average of the countries in each group. If less than 80% of countries in an area report data for a given year, we do not calculate the average for that area.
? ++
~ Column loss (changed metadata, changed data)
+ + description_processing: |-
+ + - We calculated averages over continents and income groups by taking the population-weighted average of the countries in each group. If less than 80% of countries in an area report data for a given year, we do not calculate the average for that area.
~ Changed values: 82 / 7161 (1.15%)
country year loss - loss +
Africa 2015 NaN 1714.844482
Africa 2021 NaN 1684.291626
Europe 2020 NaN 351.625641
High-income countries 2019 NaN 535.104187
Lower-middle-income countries 2018 NaN 1293.413940
~ Column mf (changed metadata)
- - We calculated averages over continents and income groups by taking the population-weighted average of the countries in each group. If less than 80% of countries in an area report data for a given year, we do not calculate the average for that area.
+ + - We calculated averages over continents and income groups by taking the population-weighted average of the countries in each group. If less than 80% of countries in an area report data for a given year, we do not calculate the average for that area.
? ++
~ Column mmr (changed metadata)
- - We calculated averages over continents and income groups by taking the population-weighted average of the countries in each group. If less than 80% of countries in an area report data for a given year, we do not calculate the average for that area.
+ + - We calculated averages over continents and income groups by taking the population-weighted average of the countries in each group. If less than 80% of countries in an area report data for a given year, we do not calculate the average for that area.
? ++
~ Column mys (changed metadata)
- - We calculated averages over continents and income groups by taking the population-weighted average of the countries in each group. If less than 80% of countries in an area report data for a given year, we do not calculate the average for that area.
+ + - We calculated averages over continents and income groups by taking the population-weighted average of the countries in each group. If less than 80% of countries in an area report data for a given year, we do not calculate the average for that area.
? ++
~ Column mys_f (changed metadata)
- - We calculated averages over continents and income groups by taking the population-weighted average of the countries in each group. If less than 80% of countries in an area report data for a given year, we do not calculate the average for that area.
+ + - We calculated averages over continents and income groups by taking the population-weighted average of the countries in each group. If less than 80% of countries in an area report data for a given year, we do not calculate the average for that area.
? ++
~ Column mys_m (changed metadata)
- - We calculated averages over continents and income groups by taking the population-weighted average of the countries in each group. If less than 80% of countries in an area report data for a given year, we do not calculate the average for that area.
+ + - We calculated averages over continents and income groups by taking the population-weighted average of the countries in each group. If less than 80% of countries in an area report data for a given year, we do not calculate the average for that area.
? ++
~ Column phdi (changed metadata)
- - We calculated averages over continents and income groups by taking the population-weighted average of the countries in each group. If less than 80% of countries in an area report data for a given year, we do not calculate the average for that area.
+ + - We calculated averages over continents and income groups by taking the population-weighted average of the countries in each group. If less than 80% of countries in an area report data for a given year, we do not calculate the average for that area.
? ++
~ Column pop_total (changed metadata)
- - We calculated averages over continents and income groups by taking the population-weighted average of the countries in each group. If less than 80% of countries in an area report data for a given year, we do not calculate the average for that area.
+ + - We calculated averages over continents and income groups by taking the population-weighted average of the countries in each group. If less than 80% of countries in an area report data for a given year, we do not calculate the average for that area.
? ++
~ Column pr_f (changed metadata)
- - We calculated averages over continents and income groups by taking the population-weighted average of the countries in each group. If less than 80% of countries in an area report data for a given year, we do not calculate the average for that area.
+ + - We calculated averages over continents and income groups by taking the population-weighted average of the countries in each group. If less than 80% of countries in an area report data for a given year, we do not calculate the average for that area.
? ++
~ Column pr_m (changed metadata)
- - We calculated averages over continents and income groups by taking the population-weighted average of the countries in each group. If less than 80% of countries in an area report data for a given year, we do not calculate the average for that area.
+ + - We calculated averages over continents and income groups by taking the population-weighted average of the countries in each group. If less than 80% of countries in an area report data for a given year, we do not calculate the average for that area.
? ++
~ Column rankdiff_hdi_phdi (changed metadata, changed data)
+ + description_processing: |-
+ + - We calculated averages over continents and income groups by taking the population-weighted average of the countries in each group. If less than 80% of countries in an area report data for a given year, we do not calculate the average for that area.
~ Changed values: 6 / 7161 (0.08%)
country year rankdiff_hdi_phdi - rankdiff_hdi_phdi +
Africa 2022 <NA> 98
Asia 2022 <NA> -340
Europe 2022 <NA> 100
European Union (27) 2022 <NA> 79
South America 2022 <NA> 130
~ Column se_f (changed metadata)
- - We calculated averages over continents and income groups by taking the population-weighted average of the countries in each group. If less than 80% of countries in an area report data for a given year, we do not calculate the average for that area.
+ + - We calculated averages over continents and income groups by taking the population-weighted average of the countries in each group. If less than 80% of countries in an area report data for a given year, we do not calculate the average for that area.
? ++
~ Column se_m (changed metadata)
- - We calculated averages over continents and income groups by taking the population-weighted average of the countries in each group. If less than 80% of countries in an area report data for a given year, we do not calculate the average for that area.
+ + - We calculated averages over continents and income groups by taking the population-weighted average of the countries in each group. If less than 80% of countries in an area report data for a given year, we do not calculate the average for that area.
? ++
= Dataset garden/who/2024-09-09/flu_test
= Table flu_test
~ Dim country
- - Removed values: 63 / 71983 (0.09%)
date country
2024-10-28 Malta
2024-10-28 Qatar
2024-10-28 Slovenia
2024-10-28 South Africa
2024-10-07 Zambia
~ Dim date
- - Removed values: 63 / 71983 (0.09%)
country date
Malta 2024-10-28
Qatar 2024-10-28
Slovenia 2024-10-28
South Africa 2024-10-28
Zambia 2024-10-07
~ Column denomcombined (changed data)
- - Removed values: 63 / 71983 (0.09%)
country date denomcombined
Malta 2024-10-28 301
Qatar 2024-10-28 765
Slovenia 2024-10-28 983
South Africa 2024-10-28 85
Zambia 2024-10-07 110
~ Changed values: 106 / 71983 (0.15%)
country date denomcombined - denomcombined +
Brazil 2024-10-21 5188 4218
Honduras 2024-10-07 70 68
Indonesia 2023-10-09 37 38
Slovenia 2024-10-21 1224 1183
Uganda 2024-09-23 58 51
~ Column pcnt_poscombined (changed data)
- - Removed values: 63 / 71983 (0.09%)
country date pcnt_poscombined
Malta 2024-10-28 2.325581
Qatar 2024-10-28 17.385620
Slovenia 2024-10-28 0.305188
South Africa 2024-10-28 5.882353
Zambia 2024-10-07 3.636364
~ Changed values: 114 / 71983 (0.16%)
country date pcnt_poscombined - pcnt_poscombined +
Costa Rica 2024-10-07 0.326442 0.326797
Denmark 2024-10-21 1.134791 1.140251
Indonesia 2023-08-28 43.478260 40.000000
Indonesia 2024-04-22 23.809525 24.390244
South Africa 2024-09-16 8.730159 8.800000
Legend: +New ~Modified -Removed =Identical Details
Hint: Run this locally with etl diff REMOTE data/ --include yourdataset --verbose --snippet Automatically updated datasets matching weekly_wildfires|excess_mortality|covid|fluid|flunet|country_profile|garden/ihme_gbd/2019/gbd_risk are not included Edited: 2024-11-11 09:57:39 UTC |
@lucasrodes could you review it please? I can't install torch on my laptop due to this issue. It's probably solvable, but I've already spent an hour on it and didn't make any progress. |
Thanks Mojmir, I'm sorry about that issue, it sounds annoying! If you want I can add this app temporarily to wizard, so you can play with it (in any case I'm also happy if Lucas wants to have a look, or both). |
Hey @Marigold I've moved it to wizard, so you can try it out. But of course, if this is going to break your ETL environment, we shouldn't push it. I find it very useful, and having that library on ETL could also let us experiment with other similar things, but we can also move it to its own repos if it's problematic (or discard it if others don't find it useful, it's just an experiment). Let me know what you think, thanks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks useful! I made it work with torch<2.3.0 (and committed to your PR).
Create a script that launches a streamlit app to do a semantic search over data insights.
The script loads and parses data insights (from the database), creates an embedding (on my laptop, it takes less than 10 seconds, but ideally this should happen under the hood, and store embeddings in the database), and sorts DIs by semantic similarity with respect to a given input string. For now, this is an experiment. If we decide it's useful, we can integrate it on our wizard.
I think it would be useful to have something like this on our wizard. For authors, it could be useful to find what has already been written about a certain topic. And for data peeps, it can open doors to do other kinds of analytics and experiments with our content.
The downside is that it requires installing some big libraries (transformers and pytorch). The first time it's build it needs to download some models, which are ~100MB. But maybe this can be useful for other similar applications.