Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(data-table): respect tolerance #2859

Merged
merged 4 commits into from
Oct 26, 2023
Merged

Conversation

sophiamersmann
Copy link
Member

@sophiamersmann sophiamersmann commented Oct 26, 2023

  • fixes Tables do not respect tolerance and show misleading data #2831

  • there are two code paths: "range" mode and "point" mode

    • in "range" mode, a start and end year are selected and we show four columns: Start year, end year, absolute change, relative change
    • in "point" mode, a single year is selected and we show a single column for that year
  • in "point" mode, all is good

    • tolerance is appropriately applied and no hundreds-years-away data points are used to fill an empty cell
  • the issue is with the "range" mode

    • here, we take the range of available data points within the current selection, and then apply min/max to get the values for the start and end point, i.e. if the user-selected time range is [1000, 2000] but we only have data for [1990, 2000], then we'll end up showing the 1990 value for the year 1000
    • the tolerance defined by the author is not taken into account, as far as I can see
  • to fix the issue, we simply grab the closest available data point within tolerance for both the start and the end year

I can see that the special code path for "range" mode was a deliberate choice, but I'm not sure why (to show the delta columns as often as possible?). What am I missing?

@sophiamersmann sophiamersmann marked this pull request as ready for review October 26, 2023 11:43
// sanity check: start time should always be <= end time
start.time !== undefined &&
end.time !== undefined &&
start.time <= end.time
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

arguably, this could also be <

Copy link
Contributor

@danyx23 danyx23 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for the quick fix, @sophiamersmann! A few rambling comments to your question of why this behaviour was different:

  • Visually this used to be a lesser problem because in the past the table showed the year the data point was from. With the redesign we made this more subtle and only show the circled i indicator. It used to be that there were two lines in such cases, where the first one was in gray the year and the second was the value. This I think was acceptable but now it is not obvious enough anymore that the data is not from the year the column says above
  • The functionality of this was never great. If you sort by values in the first column in a long run time series like population for year -10000, you used to get the highest value for whatever the first datapoint was for any entity (go to https://ourworldindata.org/grapher/population?tab=table&country=CHN~IND~USA~BRA~NGA~IRN~ZAF and sort by first column - West Germany in 1800. Obviously 😵‍💫 )
  • I think the two different code paths where a performance tuning. Computing the relevant values on the first and last year only is obviously better for this case and the current solution might use substantially more RAM for long run time series - but I'd say lets wait and see if this causes any issues (or maybe add a comment that maybe this should be optimized again for the table case where we want to compute changes)

If you compare the 12 most important charts then I think the new behaviour is clearly better in all of them:

Current behaviour:
https://ourworldindata.org/12-key-metrics

Your fixes:
http://staging-site-fix-data-table-tolerance/12-key-metrics

Old grapher (pre redesign):
https://staging.owid.cloud/admin/posts/preview/24902

So all in all: :shipit:

@sophiamersmann
Copy link
Member Author

thank you for taking the time to write all of this down! shipping it 🚀

@sophiamersmann sophiamersmann merged commit b824180 into master Oct 26, 2023
16 checks passed
@sophiamersmann sophiamersmann deleted the fix-data-table-tolerance branch October 26, 2023 14:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Tables do not respect tolerance and show misleading data
2 participants