Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking issue: content graph #1678

Closed
1 of 3 tasks
larsyencken opened this issue Sep 23, 2022 · 5 comments
Closed
1 of 3 tasks

Tracking issue: content graph #1678

larsyencken opened this issue Sep 23, 2022 · 5 comments
Assignees
Labels

Comments

@larsyencken
Copy link
Contributor

larsyencken commented Sep 23, 2022

Overview

We want to know exactly what parts of our content is being used where, so that we can automatically generate links between them, and so that we can update or delete it without fear of breaking anything.

Links we want to be aware of

Actions we want to prevent

  • Prevent archiving of a dataset used in an ETL step
  • Prevent deletion of a chart used in a core-econ article
  • Prevent deletion of a chart used in the SDG Tracker

ER Diagram

This ER diagram attempts to show the final target for the content graph relevant entities. Only the most relevant fields of each entity are included. The ETL related parts are still missing (i.e. we probably want to mirror all ETL steps into the Grapher DB as well so we can link e.g. Explorer csv files to the ETL step that generated it). Note that the Mermaid Diagram below is getting a bit unwieldy and it might be nicer to copy the diagram code into Mermaid live and zoom in and pan over there.

erDiagram
    ExternalInboundLink ||--o| ExternalInboundChartLink : "should resolve to chart"
    CompleteRedirects |o--|| ExternalInboundChartLink : "can be redirected"
    ExternalInboundChartLink }o--|| Charts : "links to chart"
    WordpressRedirects ||--o| CompleteRedirects : "merged into"
    ChartSlugRedirects ||--o| CompleteRedirects : "merged into"
    ManualLegacyRedirects ||--o| CompleteRedirects : "merged into"
    ChartSlugRedirects }o--|| Charts : "redirects to chart"
    Posts ||--o{ PostLinks : "links to"
    PostLinks ||--o| PostChartLinks : "links to chart"
    PostChartLinks ||--|| Charts : "links to chart"
    Explorer ||--o{ ExplorerGrapherView : "embeds chart"
    ExplorerGrapherView }o--|| Charts : "embeds chart"
    Explorer }o--o{ Dataset : "uses data from"
    Charts ||--o{ ChartDimensions : "uses variable"
    Charts ||--o{ ChartLinks : "links to"
    ChartDimensions }o--|| Variables : "uses variable"
    Variables }o--|| Dataset : "contained in"
    Variables }o--|| Source : "source text"
    Source }o--|| Dataset : "source text"
    Details }o--|| DetailCharts : "referenced in"
    DetailCharts ||--o{ Charts : "referenced in"
    Details }o--|| DetailLinks : "links to"
    DetailLinks ||--o{ DetailChartLinks : "links to chart"
    DetailChartLinks ||--o{ Charts : "links to chart"
    CompleteTerminalPaths ||--o| Posts : "resolves to Post"
    CompleteTerminalPaths ||--o| Charts : "resolves to Chart"
    CompleteTerminalPaths ||--o| Explorer : "resolves to Explorer"
    CompleteTerminalPaths ||--o| CompleteRedirects : "resolves to Redirect"

    ExternalInboundLink {
        int Id
        string SourceId
        string SourceUrl
        enum LinkKind
        string TargetPath
        string TargetQueryParams
    }
    ExternalInboundChartLink {
        int ExternalInboundLinkId FK
        int CompleteRedirectsId FK "Optional - only if redirected"
        int ChartId FK
    }
    WordpressRedirects {
        int Id
        string Slug
        string TargetPath
        string TargetQueryString
        int StatusCode
    }
    ChartSlugRedirects {
        int Id
        string Slug
        int ChartId
    }
    ManualLegacyRedirects {
        int Id
        string Slug
        string TargetDomain
        string TargetPath
        string TargetQueryString
    }
    CompleteRedirects {
        int Id
        enum SourceKind
        string From
        string TargetDomain
        string TargetPath
        string TargetQueryString
        int StatusCode
    }
    CompleteTerminalPaths {
        string Path
        int PostId FK
        int ChartId FK
        int ExplorerId FK
        int RedirectId FK
    }
    Charts {
        int Id
        json config
    }
    ChartDimensions {
        int Id
        int order
        enum property
        int chartId FK
        int variableId FK
    }
    Posts {
        int Id
        string content
    }
    PostLinks {
        int Id
        int PostId FK
        string SourceUrl
        enum LinkKind
        string TargetDomain
        string TargetPath
        string TargetQueryString
    }
    PostChartLinks {
        int PostLinkId FK
        int CompleteRedirectsId FK "Optional - only if redirected"
        int ChartId FK
    }
    Explorer {
        int Id
        string Slug
        json Config
    }
    ExplorerGrapherView {
        int ExplorerId
        int GrapherId FK "Optional - only if an existing Grapher is referenced"
    }
    Variables {
        int Id
        string name
        string unit
        json display
        int datasetId FK
        int sourceId FK
    }
    Source {
        int Id
        string name
        string description
        int datasetId FK
    }
    Dataset {
        int Id
        string name
    }
    Details {
        int Id
        string category
        string term
        string content
    }
    DetailCharts {
        int DetailId
        int ChartId
    }
    DetailLinks {
        int Id
        int DetailId
        string TargetDomain
        string TargetPath
        string TargetQueryString
    }
    DetailChartLinks {
        int DetailLinkId FK
        int CompleteRedirectsId FK "Optional - only if redirected"
        int ChartId FK
    }
    ChartLinks {
        int ChartId
        enum LinkKind
        string TargetDomain
        string TargetPath
        string TargetQueryString
    }
Loading
@larsyencken
Copy link
Contributor Author

@danyx23 You won't tackle all these things next cycle, but it's good to have the laundry list somewhere

@danyx23
Copy link
Contributor

danyx23 commented Sep 25, 2022

@larsyencken we have issue #1223 already that pulls a few threads together related to this - do you think we should have both issues in parallel? Should we make one about this iteration and the other about the longer term concept?

@larsyencken
Copy link
Contributor Author

@danyx23 Ah, good find! I think we want one overall tracking issue, and one project for you this cycle that's more narrowed down. We didn't get a chance to discuss that before you went on break, but we could can next week when you're back.

@ikesau
Copy link
Member

ikesau commented Aug 6, 2024

I think we've improved a lot of the issues listed in the OP and it's time we go back through them and see what we're missing, and whether or not we want to go ahead with making further enhancements.

@ikesau
Copy link
Member

ikesau commented Aug 13, 2024

@danyx23 and I went through each item in the OP and crossed out the issues that have already been done, or added more context to the few (low priority) issues that remain.

Overall, we think it's okay to close this ticket for now and can handle specific content graph requirements if they come up organically.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants