Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking issue: scaling data pages #2379

Closed
19 of 20 tasks
Tracked by #1285
larsyencken opened this issue Jun 28, 2023 · 3 comments
Closed
19 of 20 tasks
Tracked by #1285

Tracking issue: scaling data pages #2379

larsyencken opened this issue Jun 28, 2023 · 3 comments
Assignees

Comments

@larsyencken
Copy link
Contributor

larsyencken commented Jun 28, 2023

Things we need to do to sustainably author data pages as part of our work, continuing from #1946.

@JoeHasell
Copy link

@larsyencken @danyx23 One thing that is missing (or not prominently featured) from this list is the work on 'tooling' needed to make it easier a) for data managers to fill out the metadata; b) for researchers and data managers to collaborate on the metadata.

Essentially, these were one of our four big bets for the year ('technical text admin'). It feels like our ambition there has implicitly been cut down quite considerably, without us fully reflecting on it.

Perhaps we could discuss this on Weds?

@larsyencken
Copy link
Contributor Author

larsyencken commented Aug 9, 2023

Hey Joe! I don't think an admin's off the table, but it's more that it might be in addition to, and after, many of the things above. In particular, the Related Research & Writing and Related Data blocks are still outstanding and need to be built. There's also the desire to see what pain points emerge from use, although if we waited for them then an admin would really be post-Porto.

Perhaps chat next week with Daniel?

danyx23 added a commit that referenced this issue Nov 27, 2023
…2739)

This PR implements #2379. It adds the missing link in our db from wordpress posts to charts that are used there. It then uses this new posts_links table together with the existing posts_gdocs_links table to find the related writing for a data page by going from indciator id -> charts using this indicator -> articles using this indicator.

The posts_links table was modelled on the posts_gdocs_links table as I thought that uniformity is more important than the optimal layout here. Extracting the links is a bit crudely done ATM in that it just uses regex's on the raw html tag instead of parsing the html and querying for a tags. The latter would give us the text content of the content that establishes the links which is probably often useful, but it would complicate and slow down the script. I'd like to hear your opinions on whether this should switch to proper parsing and filling richer information into the DB.

The thumbnail rendering is also a bit ad-hoc. We have an Image component but that one is built for use in gdocs and we need to show thumbnails for both WP posts and Gdocs articles.

To rank related research and writing we use the pageviews table. This is empty by default in dev environments and so this PR adds a make command to refresh pageviews (fetched from datasette-private)

- [ ] ❗ after merging this to production, run the db/syncPostsToGrapher.js script to fill the new relationship table!
@danyx23 danyx23 closed this as completed Nov 27, 2023
@danyx23
Copy link
Contributor

danyx23 commented Nov 27, 2023

There is a follow-up tracking issue for work that we decided not to do as part of the Sept/Oct/Nov cycles: #2949

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants