Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: grapher page related research from DB content graph #3176

Merged
merged 10 commits into from
Feb 19, 2024

Conversation

mlbrgl
Copy link
Member

@mlbrgl mlbrgl commented Feb 7, 2024

This PR replaces the fortunejs content graph with a DB-based alternative, using the links stored in the posts_links and posts_gdoc_links tables.

Screenshot 2024-02-07 at 10.54.47.png

This PR also continues the deprecation work started in #3166.

Inconsistency on how grapher chart links are stored in the DB: there are a handful of chart links stored with the https://ourworldindata.org/grapher prefix. These disappear upon saving the containing articles, which indicates that they were created with an older version of the codebase. I'm then opting for not supporting them in the content graph.

    SELECT pgl.*, pg.published from posts_gdocs_links pgl
    JOIN posts_gdocs pg on pg.id = pgl.sourceId
    WHERE pgl.target LIKE "https://ourworldindata.org/grapher%"

Testing links

Below are some testing links to grapher pages, backlinking to posts in different configurations.

Copy link

coderabbitai bot commented Feb 7, 2024

Important

Auto Review Skipped

Auto reviews are disabled on base/target branches other than the default branch. Please add the base/target branch pattern to the list of additional branches to be reviewed in the settings.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository.

To trigger a single review, invoke the @coderabbitai review command.

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share

Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>.
    • Generate unit-tests for this file.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit tests for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai generate interesting stats about this repository from git and render them as a table.
    • @coderabbitai show all the console.log statements in this repository.
    • @coderabbitai read src/utils.ts and generate unit tests.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (invoked as PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger a review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai help to get help.

Additionally, you can add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Configration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • The JSON schema for the configuration file is available here.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/coderabbit-overrides.v2.json

CodeRabbit Discord Community

Join our Discord Community to get help, request features, and share feedback.

@mlbrgl mlbrgl marked this pull request as ready for review February 7, 2024 10:14
@mlbrgl mlbrgl requested a review from danyx23 February 7, 2024 10:14
@mlbrgl
Copy link
Member Author

mlbrgl commented Feb 7, 2024

Back to the drawing board, evaluating whether getReferencesByChartId could be a good fit.

@mlbrgl mlbrgl marked this pull request as draft February 7, 2024 11:25
@mlbrgl mlbrgl force-pushed the bake-from-snapshot branch from cb1d456 to 8265f67 Compare February 7, 2024 13:45
@mlbrgl mlbrgl marked this pull request as ready for review February 7, 2024 19:55
@mlbrgl
Copy link
Member Author

mlbrgl commented Feb 7, 2024

Turns out it is: #3179. Given the refactor in the next PR, this one then becomes more of a stepping stone than a final destination. Comments regarding the content graph backreferences query should rather be made in #3179.

@mlbrgl mlbrgl changed the title feat: content graph from db feat: grapher page related research from DB content graph Feb 8, 2024
@mlbrgl mlbrgl force-pushed the bake-from-snapshot branch from 8265f67 to c2e9291 Compare February 13, 2024 08:41
Copy link
Contributor

@danyx23 danyx23 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Two minor comments but neither is very important

MAX(chart_tags.keyChartLevel) as keyChartLevel
FROM charts
INNER JOIN chart_tags ON charts.id=chart_tags.chartId
WHERE JSON_CONTAINS(config->'$.dimensions', '{"variableId":${variableId}}')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's better to join on the chart_dimensions table rather than this json_contains (normal tables are much faster to filter than nested json in mysql)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, that makes sense. This was only moved during a refactor so I made an issue for it: #3203.

Comment on lines +325 to +329
-- note: we are not filtering by linkType to cast of wider net: if a post links to an
-- explorer having the same slug as the grapher chart, we want to surface it as
-- a "Related research" as it is most likely relevant.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite get this comment - can you try to explain to me in different words what you mean?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea here was to not restrict linkType to "grapher" only, so a post embedding an explorer sharing the same slug as the chart would also be returned (next to posts linking to the chart only). This was more of an edge case optimization which I ended up dropping when I merged references (in the grapher admin) with related posts (in the grapher page) in a58fea7.

@mlbrgl
Copy link
Member Author

mlbrgl commented Feb 19, 2024

Merge activity

  • Feb 19, 12:02 PM EST: @mlbrgl started a stack merge that includes this pull request via Graphite.
  • Feb 19, 12:03 PM EST: Graphite rebased this pull request as part of a merge.
  • Feb 19, 12:04 PM EST: @mlbrgl merged this pull request with Graphite.

Base automatically changed from bake-from-snapshot to master February 19, 2024 17:03
@mlbrgl mlbrgl merged commit ee5bd52 into master Feb 19, 2024
14 of 16 checks passed
@mlbrgl mlbrgl deleted the db-content-graph branch February 19, 2024 17:04
mlbrgl added a commit that referenced this pull request Feb 19, 2024
This PR merges the content graph queries used in grapher pages (see #3176) and the references queries used in the chart admin to indicate where a chart is being used.

If fixes two issues on the tab page:
- Overridden Wordpress posts now ignored
- Remove duplicate posts

Reusable blocks are no longer surfaced. This is ok since reusable blocks are now dereferenced in the embedding article, which means a chart within a reusable block would show in the parent article. One exception is explorer content blocks, which are still standalone. A quick manual check on all explorers reveals that only one explorer references a chart: https://ourworldindata.org/explorers/plastic-pollution. Given these blocks' scope and limited lifetime, this caveat is acceptable.

_A future PR should use these references during the check performed before deleting a chart. Currently, [only gdocs are being checked](https://github.com/owid/owid-grapher/blob/e5acb212754bbba7d5d563fcd856adbb32ab8a36/adminSiteServer/apiRouter.ts#L603-L611)._

### Before
<img width="546" alt="Screenshot 2024-02-07 at 17 07 50" src="https://github.com/owid/owid-grapher/assets/13406362/1a4c25b8-3398-44b7-b14c-10c752d442d3">

### After
<img width="530" alt="Screenshot 2024-02-07 at 20 38 49" src="https://github.com/owid/owid-grapher/assets/13406362/1fad824e-8613-4825-9342-6d18a089d900">

### Testing links

Below are some testing links to grapher pages, backlinking to posts in different configurations.

- gdoc: [https://ourworldindata.org/grapher/agricultural-export-subsidies](https://ourworldindata.org/grapher/agricultural-export-subsidies)
    - [x]  [http://localhost:3030/grapher/agricultural-export-subsidies](http://localhost:3030/grapher/agricultural-export-subsidies)
    - [x]  [http://staging-site-content-graph-references/grapher/agricultural-export-subsidies](http://staging-site-content-graph-references/grapher/agricultural-export-subsidies)
- gdocs: [https://ourworldindata.org/grapher/pollution-deaths-from-fossil-fuels](https://ourworldindata.org/grapher/pollution-deaths-from-fossil-fuels)
    - [x]  [http://localhost:3030/grapher/pollution-deaths-from-fossil-fuels](http://localhost:3030/grapher/pollution-deaths-from-fossil-fuels)
    - [x]  [http://staging-site-content-graph-references/grapher/pollution-deaths-from-fossil-fuels](http://staging-site-content-graph-references/grapher/pollution-deaths-from-fossil-fuels)
- wp: [https://ourworldindata.org/grapher/dalys-rate-from-all-causes](https://ourworldindata.org/grapher/dalys-rate-from-all-causes)
    - [x]  [http://localhost:3030/grapher/dalys-rate-from-all-causes](http://localhost:3030/grapher/dalys-rate-from-all-causes)
    - [x]  [http://staging-site-content-graph-references/grapher/dalys-rate-from-all-causes](http://staging-site-content-graph-references/grapher/dalys-rate-from-all-causes)
- wp (with chart redirect): [https://ourworldindata.org/grapher/age-standardized-death-rate-from-pm25-pollution-per-100000-vs-gdp-per-capita-int-](https://ourworldindata.org/grapher/age-standardized-death-rate-from-pm25-pollution-per-100000-vs-gdp-per-capita-int-)
    - [x]  [http://localhost:3030/grapher/age-standardized-death-rate-from-pm25-pollution-per-100000-vs-gdp-per-capita-int-](http://localhost:3030/grapher/age-standardized-death-rate-from-pm25-pollution-per-100000-vs-gdp-per-capita-int-)
    - [x]  [http://staging-site-content-graph-references/grapher/age-standardized-death-rate-from-pm25-pollution-per-100000-vs-gdp-per-capita-int-](http://staging-site-content-graph-references/grapher/age-standardized-death-rate-from-pm25-pollution-per-100000-vs-gdp-per-capita-int-)
- gdoc (with chart redirect): [https://ourworldindata.org/grapher/population-long-run-with-projections?time=earliest..2100&country=~OWID_WRL](https://ourworldindata.org/grapher/population-long-run-with-projections?time=earliest..2100&country=~OWID_WRL)
    - [x]  [http://localhost:3030/grapher/population-long-run-with-projections?time=earliest..2100&country=~OWID_WRL](http://localhost:3030/grapher/population-long-run-with-projections?time=earliest..2100&country=~OWID_WRL)
    - [x]  [http://staging-site-content-graph-references/grapher/population-long-run-with-projections?time=earliest..2100&country=~OWID_WRL](http://staging-site-content-graph-references/grapher/population-long-run-with-projections?time=earliest..2100&country=~OWID_WRL)
- none: [https://ourworldindata.org/grapher/death-rates-alcohol-drug-overdoses-by-age-who](https://ourworldindata.org/grapher/death-rates-alcohol-drug-overdoses-by-age-who)
    - [x]  [http://localhost:3030/grapher/death-rates-alcohol-drug-overdoses-by-age-who](http://localhost:3030/grapher/death-rates-alcohol-drug-overdoses-by-age-who)
    - [x]  [http://staging-site-content-graph-references/grapher/death-rates-alcohol-drug-overdoses-by-age-who](http://staging-site-content-graph-references/grapher/death-rates-alcohol-drug-overdoses-by-age-who)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants