Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking issue: Data Pages grapher tasks #2668

Closed
4 of 31 tasks
Marigold opened this issue Sep 27, 2023 · 11 comments
Closed
4 of 31 tasks

Tracking issue: Data Pages grapher tasks #2668

Marigold opened this issue Sep 27, 2023 · 11 comments

Comments

@Marigold
Copy link
Contributor

Marigold commented Sep 27, 2023

Tasks

Bugs to fix

Smaller rendering issues

  • Citation rendering should show the url of the data page and time of access should be the current timestamp
  • Following Ed's suggested (it seems we already decided about it at some other time in the past) scatter plots should not show population in the footer of the chart, but they should show population in the sources tab (in "Data published by").
  • The heading of the text section "What you should know about this data" should change to something less strong when showing something other than the description_key field and be hidden if we have no text.
  • Bullet points from description_key are not picked.
  • I have the text How the producer of this data - undefined - describe this data?
  • Jump lines and bullet points are not respected in the content of the same section (Note: this will need a new markdown renderer)
  • Collapse snapshot origins into a single one when they refer to the same data product.
  • The text in the full citation is faded too early (see screenshot in comments below)
  • Instead of mentioning Twitter (in "How to cite this data"), we should say "social media".
  • We should not say "OWID" anywhere (as it happens in "How to cite this data").
  • It seems like the link we provide in the full citation is the download link. I suppose it should instead be the main url of the source.
  • Also, should we include URLs? (e.g. What happens if it's very long and messy? How to ensure links won't break?)
  • We should show the date of next expected update. This should be based on dataset.update_period_days and the current date (if it's passed, increase it by a month).
  • Currently, the short and full citation seem to use the format "Producer 1; Producer 2" and "Producer, data product title; Producer 2, data product title 2". Maybe they should instead use the indicator's attribution_short and attribution.
  • The processing description is not rendered properly (e.g. bullet points are put together without line breaks).

Bigger stretch goals

  • Switch to a new markdown parser/renderer for proper bullet point etc support

Quality of life features

  • Add a link to the data page to go to the metadata.yaml file on github and show the full data path including column shortname

Open questions

  • Should description_short be shown always somewhere?
  • If we have neither description_short nor description_key nor description_producer, should we just show an empty section under the chart or do something else with the design?
  • The text from description_processing is not being surfaced anywhere in the data pages. This text is crucial to signal that the data being displayed is not directly the source's data but with our touch. The field description_processing does appear in data pages. It wasn't so far because of a bug that Mojmir has fixed.
  • Linking from topic tags to topic page urls is not waterproof ATM - decide if this should be explicit by adding a new column in the tags table or similar, or implicit by making the slugify logic better. Initial report text was: Topic tags redirect you to non-existing pages, example redirects to http://staging-site-mojmir/co2-greenhouse-gas-emissions which doesn't exist
  • Currently, description_key must include the info in description_short. I think this shouldn't be the case. It should be possible that there is some overlap, but description short and key should be separate, and both important fields. Both should be shown in sources tab and data pages. I'd propose:
    (A) (Preferred) They should appear in separate places: Both in the data page and the sources tab, we first show the short description, followed by the bullet points of key info.
    (B) The description short is rendered as the first point of the key info.
  • Regarding processing levels, for me, "processed" and "adapted" are ambiguous, and in fact "adapted" clearly sounds "less processed" than "processed". I'd propose (for minor and major processing):
    (A) (Preferred) "With minor processing by Our World in Data" & "With major processing by Our World in Data".
    (B) "Imported by Our World in Data" & "Processed by Our World in Data".
  • What should happen when there are multiple producers?
    (A) (Preferred) "[Main source] and other sources - Processed by Our World in Data"
    (B) "Various source - Processed by Our World in Data"?

Sources tab items

  • Origin.description doesn't render line breaks in Sources and processing tab
  • Combined Data published by from sources and origins might be non-unique - Duplicate mention below as "Names are repeated in This data is based on the following sources, because I used two files from the same author. Should I set this differently?"
  • In the sources tab, the "link" field shows only links for the first origin, e.g. this chart (that shows only the link for population). Also, the "retrieved" field is also the one for population; in this case, there's an ambiguity, but maybe we should pick the latest date of all origins.
  • The text from description_key is not being surfaced in the sources tab.

Obsolete points

  • Currently, for charts using old indicators with sources, we include the dataset description at the bottom of the sources tab (e.g. this chart). For new charts, we don't show the dataset description anywhere. I think that, while we still don't have the new grapher, we should keep showing the dataset description at the bottom of the sources tab (otherwise we are missing a lot of relevant info that is shown nowhere).

I'll see what tasks I can do myself and where we need some help. cc @pabloarosado

@lucasrodes
Copy link
Member

On

Origin.description doesn't render line breaks in Sources and processing tab

I've recently realised that the line breaks are not rendered in the metadata preview in a notebook. Does this mean that the issue is coming from ETL instead? And more specifically, from how origin.description is stored internally?

image

@Marigold
Copy link
Contributor Author

@lucasrodes nice catch, I fixed it in one of my PRs. It should render markdown as HTML and show in a notebook (it'll fix line breaks too). Grapher rendering still won't work though. It's an issue on the grapher side.

@pabloarosado
Copy link
Contributor

pabloarosado commented Sep 28, 2023

In the sources tab, the "link" field shows only links for the first origin, e.g. this chart (that shows only the link for population). Also, the "retrieved" field is also the one for population; in this case, there's an ambiguity, but maybe we should pick the latest date of all origins.

@paarriagadap
Copy link

paarriagadap commented Sep 28, 2023

I copy from here:
Some issues I've found with the metadata-based data pages:

  • Charts and data pages are picking the wrong (random?) year for the sources. 1970 in this case.
  • Bullet points from description_key are not picked.
  • I have the text How the producer of this data - undefined - describe this data?
  • Jump lines and bullet points are not respected in the content of the same section
  • Names are repeated in This data is based on the following sources, because I used two files from the same author. Should I set this differently?
  • Jump lines are not respected either there,
  • Citations have a similar problem because of the name repetition.

You can take a look here.

@lucasrodes
Copy link
Member

lucasrodes commented Sep 28, 2023

  • The text from description_processing is not being surfaced anywhere in the data pages. This text is crucial to signal that the data being displayed is not directly the source's data but with our touch.
  • The text from description_key is not being surfaced in the sources tab.

@Marigold
Copy link
Contributor Author

@danyx23
Copy link
Contributor

danyx23 commented Oct 2, 2023

Hi all! I'll edit the main issue description at the top to include all the points that you all added as comments so that this is easier to scan and reply to inline

@lucasrodes
Copy link
Member

lucasrodes commented Oct 5, 2023

Issue:

  • Collapse snapshot origins into a single one when they refer to the same data product.

Summary:

Sometimes, we rely on multiple snapshots of the same data product to build a dataset. Take this example: http://staging-site-lucas/admin/datapage-preview/818629. Here, we display life expectancy from two data products: UN WPP and HMD. However, we got the data from UN WPP from three different snapshots.

Therefore, in the "Sources and processing" section of the data page, we list four different entries:

image

Three of these are equivalent because they refer to the same data product. Why are there three? Because there is a snapshot for "Both Sexes", "Females" and "Males".

We should reduce these three to just one entry, maybe by checking that the origin.description field is equivalent, or the origin.title field, etc.

@Marigold
Copy link
Contributor Author

Marigold commented Oct 9, 2023

Order of FAQs gets lost on insert to MySQL. Table posts_gdocs_variables_faqs uses only gdocId, variableId and fragmentId columns, nothing about ordering. We should probably add a new column order to that table.

Example: http://staging-site-mojmir/admin/datapage-preview/419298#faqs

@pabloarosado
Copy link
Contributor

pabloarosado commented Oct 11, 2023

  • Currently, description_key must include the info in description_short. I think this shouldn't be the case. It should be possible that there is some overlap, but description short and key should be separate, and both important fields. Both should be shown in sources tab and data pages. I'd propose:
    (A) (Preferred) They should appear in separate places: Both in the data page and the sources tab, we first show the short description, followed by the bullet points of key info.
    (B) The description short is rendered as the first point of the key info.
  • If there's no description short or key, the text "What you should know about this indicator" should not appear.
  • Regarding processing levels, for me, "processed" and "adapted" are ambiguous, and in fact "adapted" clearly sounds "less processed" than "processed". I'd propose (for minor and major processing):
    (A) (Preferred) "With minor processing by Our World in Data" & "With major processing by Our World in Data".
    (B) "Imported by Our World in Data" & "Processed by Our World in Data".
  • The text in the full citation is faded too early:
    Screen Shot 2023-10-11 at 10 12 42
  • Instead of mentioning Twitter (in "How to cite this data"), we should say "social media".
  • We should not say "OWID" anywhere (as it happens in "How to cite this data").
  • It seems like the link we provide in the full citation is the download link. I suppose it should instead be the main url of the source.
  • Also, should we include URLs? (e.g. What happens if it's very long and messy? How to ensure links won't break?)
  • What should happen when there are multiple producers?
    (A) (Preferred) "[Main source] and other sources - Processed by Our World in Data"
    (B) "Various source - Processed by Our World in Data"?
  • We should show the date of next expected update. This should be based on dataset.update_period_days and the current date (if it's passed, increase it by a month).
  • Currently, the short and full citation seem to use the format "Producer 1; Producer 2" and "Producer, data product title; Producer 2, data product title 2". Maybe they should instead use the indicator's attribution_short and attribution.

@danyx23
Copy link
Contributor

danyx23 commented Oct 12, 2023

@danyx23 danyx23 closed this as completed Oct 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants