Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨ metadata: HTML -> markdown #1422

Merged
merged 12 commits into from
Sep 20, 2023
Merged

✨ metadata: HTML -> markdown #1422

merged 12 commits into from
Sep 20, 2023

Conversation

lucasrodes
Copy link
Member

@lucasrodes lucasrodes commented Aug 3, 2023

For our new metadata structure, we will support our flavour of Markdown.

This PR removes all HTML tags and replaces these by their Markdown equivalent. Tags replaced include:

  • <a>[text](link)
  • <b>**text**
  • <i>*text*
  • <em>_text_
  • <strong>**text**
  • <li> → (depending on <ul>, <ol>)
  • <ul>-
  • <ol>1., 2., etc.
  • h1#, h2##, h3##, ...

I've left the tag <br>, which is used heavily in files (@paarriagadap, no action required):

  • /home/lucas/repos/etl/etl/steps/data/garden/eth/2023-03-15/ethnic_power_relations.meta.yml
  • /home/lucas/repos/etl/etl/steps/data/garden/lgbt_rights/2023-04-27/lgbti_policy_index.meta.yml

Should we also support <br> in our Markdown? @danyx23

For the future

  • <sup>

Review

I tried using some automation tools, but they were never 100% accurate (sometimes inserting unwanted characters). Therefore, some manual edit was required.

As many files are involved, I've tried to assign them to different people based on their ownership (I assigned those that correspond to me to Fiona & Veronika).

@pabloarosado

  • etl/steps/data/garden/agriculture (2 files)
  • etl/steps/data/garden/andrew/2019-12-03/co2_mitigation_curves.meta.yml
  • etl/steps/data/garden/bp (6 files)
  • etl/steps/data/garden/cait/2022-08-10/ghg_emissions_by_sector.meta.yml
  • etl/steps/data/garden/eia/2022-07-27/energy_consumption.meta.yml
  • etl/steps/data/garden/ember (4 files)
  • etl/steps/data/garden/emdat/2022-11-24/natural_disasters.meta.yml
  • etl/steps/data/garden/emissions (2 files)
  • etl/steps/data/garden/energy/(34 files)
  • etl/steps/data/garden/gcp (4 files)
  • etl/steps/data/garden/papers (2 files)
  • etl/steps/data/garden/rff (2 files)
  • etl/steps/data/garden/shift/2022-07-18/fossil_fuel_production.meta.yml
  • etl/steps/data/garden/usda_nass/2023-04-20/us_corn_yields.meta.yml

( I started changing | for >-, but stopped once we had the slack discussion)

@spoonerf

  • etl/steps/data/garden/who/2022-09-30/ghe.meta.yml
  • etl/steps/data/garden/owid/latest/key_indicators/key_indicators.meta.yml
  • lib/walden/ingests/papers/2022-11-01/zijdeman_et_al_2015/meta.yml

@veronikasamborska1994

  • etl/steps/data/garden/artificial_intelligence/2023-06-14/ai_deepfakes.meta.yml
  • etl/steps/data/garden/demography (4 files)

@Marigold

  • etl/steps/data/garden/covid (8 files)

@paarriagadap

  • etl/steps/data/garden/eth/2023-03-15/ethnic_power_relations.meta.yml
  • etl/steps/data/garden/growth/2022-12-19/gdp_historical.meta.yml
  • etl/steps/data/garden/lgbt_rights/2023-04-13/equaldex.meta.yml
  • etl/steps/data/garden/lis/2023-01-18/luxembourg_income_study.meta.yml
  • etl/steps/data/garden/oecd/2023-06-06/income_distribution_database.meta.yml
  • etl/steps/data/garden/ophi (2 files)
  • etl/steps/data/garden/wid/2023-01-27/world_inequality_database.meta.yml
  • etl/steps/data/meadow/ophi/2022-12-13/multidimensional_poverty_index.meta.yml

Copy link
Collaborator

@Marigold Marigold left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

COVID files look good!

Copy link
Contributor

@paarriagadap paarriagadap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @lucasrodes! This is much cleaner. I wonder what the alternative for line jumps would be if we are creating a description via code. I usually do this:

            description=new_line.join(
                [
                    x,
                    y,
                    z,
                ]
            )

With new_line = "<br><br>"

I will check if I included additional HTML code this way.

@danyx23
Copy link
Contributor

danyx23 commented Aug 8, 2023

@paarriagadap just a FYI that our markdown parser understands and keeps newlines. So if the string that ends up in the database contains newlines it will just work. Adding <br> tags is actually a bit of additional implementation work on the grapher side so if you don't need it for other reasons it might be easier to standardize to normal newlines.

@paarriagadap
Copy link
Contributor

@danyx23 What I mean with my example is that I use common bits of texts that are arranged in different manners depending on the indicator. That's why I save them in variables (x, y, z) and I join them with .join. If there's an alternative to it when writing Python code I will be happy to use it.

This is one example of what I'm talking about.

Copy link
Contributor

@spoonerf spoonerf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link
Contributor

@veronikasamborska1994 veronikasamborska1994 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good !

Copy link
Contributor

@pabloarosado pabloarosado left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have noticed a few issues. Let me know if you prefer me to fix them or if you prefer to do it yourself. Also, what about unordered lists? There are cases where we use "*", others "+". Should we changed them to "-"? Apart from these issues, looks good!


+ Affected: People requiring immediate assistance during a period of emergency, i.e. requiring basic survival needs such as food, water, shelter, sanitation and immediate medical assistance.
+ Affected: People requiring immediate assistance during a period of emergency, i.e. requiring basic survival needs such as food, water, shelter, sanitation and immediate medical assistance.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the current situation regarding lists? According to this poll, it seems we should use - instead of * or +. Do you want me to change this file, or do you want to change all cases consistently?

@paarriagadap
Copy link
Contributor

paarriagadap commented Aug 8, 2023

I have replaced my yaml files with <br> tags with the cleaner placeholder implementation @Marigold introduced some weeks ago. All those <br> were the solution we had at the moment to use yaml anchors and have jump lines at the same time. They are not necessary anymore.

And I have also eliminated the <br> tags I included with code + the .join function. I found out I could use triple-quoted f-strings, so unless they are somewhere else we can get rid of all HTML tags and avoid the additional implementation @danyx23 described.

Copy link
Contributor

@pabloarosado pabloarosado left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, Lucas. As I mentioned in my previous review, the only thing left would be the lists that start with "*" or "+", and should instead start with "-". But if you prefer we can do that in a separate PR.

@paarriagadap
Copy link
Contributor

Hey! I was wondering how this change would work in explorers. If I add an explicit jump line it would mess with the structure of the column tables. What should it be the solution there? For now, I keep the <br> there. See this one, for example.

@lucasrodes
Copy link
Member Author

@paarriagadap I think <br> should be fine; should render correctly both in charts and eplorers.

@lucasrodes
Copy link
Member Author

lucasrodes commented Sep 4, 2023

This PR has not been merged because markdown support is not yet in Grapher. This is being tracked here: owid/owid-grapher#2520

@danyx23 danyx23 merged commit 71d4978 into master Sep 20, 2023
@danyx23 danyx23 deleted the refactor/html-tags-to-markdown branch September 20, 2023 17:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants