Tracking issue: scaling data pages #2379

larsyencken · 2023-06-28T13:33:11Z

JoeHasell · 2023-08-08T12:46:02Z

@larsyencken @danyx23 One thing that is missing (or not prominently featured) from this list is the work on 'tooling' needed to make it easier a) for data managers to fill out the metadata; b) for researchers and data managers to collaborate on the metadata.

Essentially, these were one of our four big bets for the year ('technical text admin'). It feels like our ambition there has implicitly been cut down quite considerably, without us fully reflecting on it.

Perhaps we could discuss this on Weds?

larsyencken · 2023-08-09T10:01:53Z

Hey Joe! I don't think an admin's off the table, but it's more that it might be in addition to, and after, many of the things above. In particular, the Related Research & Writing and Related Data blocks are still outstanding and need to be built. There's also the desire to see what pain points emerge from use, although if we waited for them then an admin would really be post-Porto.

Perhaps chat next week with Daniel?

…2739) This PR implements #2379. It adds the missing link in our db from wordpress posts to charts that are used there. It then uses this new posts_links table together with the existing posts_gdocs_links table to find the related writing for a data page by going from indciator id -> charts using this indicator -> articles using this indicator. The posts_links table was modelled on the posts_gdocs_links table as I thought that uniformity is more important than the optimal layout here. Extracting the links is a bit crudely done ATM in that it just uses regex's on the raw html tag instead of parsing the html and querying for a tags. The latter would give us the text content of the content that establishes the links which is probably often useful, but it would complicate and slow down the script. I'd like to hear your opinions on whether this should switch to proper parsing and filling richer information into the DB. The thumbnail rendering is also a bit ad-hoc. We have an Image component but that one is built for use in gdocs and we need to show thumbnails for both WP posts and Gdocs articles. To rank related research and writing we use the pageviews table. This is empty by default in dev environments and so this PR adds a make command to refresh pageviews (fetched from datasette-private) - [ ] ❗ after merging this to production, run the db/syncPostsToGrapher.js script to fill the new relationship table!

danyx23 · 2023-11-27T11:31:23Z

There is a follow-up tracking issue for work that we decided not to do as part of the Sept/Oct/Nov cycles: #2949

larsyencken assigned Marigold and danyx23 Jun 28, 2023

github-actions bot added the needs triage label Jun 28, 2023

larsyencken mentioned this issue Jun 28, 2023

Roadmap: data engineering 2023 owid/etl#1285

Closed

41 tasks

larsyencken removed the needs triage label Jun 28, 2023

danyx23 mentioned this issue Oct 11, 2023

🎉 Add related research and writing via content graph to data pages #2739

Merged

1 task

danyx23 mentioned this issue Nov 27, 2023

Tracking issue: data pages improvements #2949

Open

15 tasks

danyx23 closed this as completed Nov 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tracking issue: scaling data pages #2379

Tracking issue: scaling data pages #2379

larsyencken commented Jun 28, 2023 •

edited by danyx23

Loading

JoeHasell commented Aug 8, 2023

larsyencken commented Aug 9, 2023 •

edited

Loading

danyx23 commented Nov 27, 2023

Tracking issue: scaling data pages #2379

Tracking issue: scaling data pages #2379

Comments

larsyencken commented Jun 28, 2023 • edited by danyx23 Loading

JoeHasell commented Aug 8, 2023

larsyencken commented Aug 9, 2023 • edited Loading

danyx23 commented Nov 27, 2023

larsyencken commented Jun 28, 2023 •

edited by danyx23

Loading

larsyencken commented Aug 9, 2023 •

edited

Loading