Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Include inter-class and inter-collection relationship graphs in schema documentation #2198

Conversation

eecavanna
Copy link
Collaborator

In this branch, I updated two GitHub Actions workflows—one that builds and deploys the production documentation website and one that builds and deploys the preview documentation website—so that they use refgraph (part of refscan, a homegrown referential integrity checker tool) to generate a pair of diagrams and include them in the resulting schema documentation.

Specifically, the workflows run $ pipx run refgraph {options} to generate a graph (a.k.a. network diagram) in which the circles (nodes) represent Mongo collections and the arrows (edges) represent relationships between those collections. An arrow from circle A to circle B means that the schema allows a document in collection A to contain a reference to a document in collection B.

Similarly, the workflows then run $ pipx run refgraph {different options} to generate a graph where the circles (nodes) represent classes instead of collections.

The graphs are web-based and are stored in files named:

  • collection-graph.html
  • class-graph.html

Both graphs (files) are injected into the schema documentation website's file tree at "build time" (i.e. when the GitHub Actions workflow is compiling the schema documentation website from source). They are not stored in the repository.

Finally, I updated the MkDocs configuration file so that the website's left-hand sidebar contains hyperlinks to the two graphs.

Screenshots

Two new links in the sidebar:

image

class-graph.html:

image

collection-graph.html:

image

Note: Before berkeley-schema-fy24/main was merged into nmdc-schema/main (which took place on Monday, October 7, 2024), a similar PR (microbiomedata#265) was created in the berkeley-schema-fy24 repo. That PR can be closed once this PR gets merged into nmdc-schema/main, as getting these changes into nmdc-schema/main was my goal with both PRs.

@eecavanna eecavanna self-assigned this Oct 8, 2024
- name: Generate web-based documentation
run: |
mkdir -p docs
touch docs/.nojekyll
make gendoc
poetry run mkdocs build -d site
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed that this command is present in this test_pages_build.yaml file, but not in the deploy-docs.yaml file. I don't know why it's necessary in one situation, but not the other. Maybe it's because the author wanted the documentation website's file tree to be generated in a directory named site (instead of—or in addition to—the directory it gets generated in by default).

Comment on lines -38 to -40
mkdir -p docs
touch docs/.nojekyll
make gendoc
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI: These 3 lines were moved to an earlier step (i.e. to lines 36-38) so that a documentation website file tree exists by the time we try to inject the graphs into it (i.e. on lines 50-51).

@eecavanna
Copy link
Collaborator Author

The failure of the Preview documentation build / run (pull_request) check is due to #2199. It is not specific to this PR. It happens on every "cross-fork" PR.

Copy link
Member

@turbomam turbomam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did make squeaky-clean all test testdoc and the two new links appeared in the left hand gutter, but they were 404 links. Is that expected? The new visualizations won't be generated by a local build, only by a GitHub Action?

I would prefer more detailed (ie longer) names for the links. There are other diagrams of nmdc-schema classes. What are the distinguishing characteristics of this class diagram. What are its strengths and weaknesses? Could any of that be captured by adding another word or two to the links?

@eecavanna
Copy link
Collaborator Author

Response to first paragraph (will respond to second paragraph via a separate comment):

Thanks for taking a look! Yes, the diagrams only get generated (and injected into the docs) by GitHub Actions. That way, refscan is not a dependency of an nmdc-schema local development environment. The downside is that the diagrams don't get included in local builds of the docs — resulting in those broken navigation links (that was a mistake on my part — adding those links locally without also adding the diagrams).

I'll update the PR so that the diagrams are included in docs that are built locally.

@eecavanna
Copy link
Collaborator Author

eecavanna commented Oct 14, 2024

Currently, I can't add refscan as a Poetry dependency of nmdc-schema because refscan advertises its minimum Python version as Python 3.10, but nmdc-schema allows for Python versions older than that (i.e. Python 3.9). Here's the error I get when I run $ poetry add refscan.

root@79adbcebe9b3:/nmdc-schema# poetry add refscan
Using version ^0.1.20 for refscan

Updating dependencies
Resolving dependencies... (0.0s)

The current project's supported Python range (>=3.9,<4.0) is not compatible with some of the required packages Python requirement:
  - refscan requires Python <4.0,>=3.10, so it will not be satisfied for Python >=3.9,<3.10

Because no versions of refscan match >0.1.20,<0.2.0
 and refscan (0.1.20) requires Python <4.0,>=3.10, refscan is forbidden.
So, because nmdc-schema depends on refscan (^0.1.20), version solving failed.

I'll explore other options.

@eecavanna
Copy link
Collaborator Author

I updated refscan to work with older Python versions (as old as Python 3.9, instead of Python 3.10). refscan can now be installed as a dependency of nmdc-schema.

@eecavanna
Copy link
Collaborator Author

I updated this branch so that it is $ make gendoc that generates the diagrams. Accordingly, I removed the relevant commands from the GitHub Actions workflow configuration files, since they would be redundant (since $ make gendoc is already being run in them).

Here's a screenshot showing a locally-running docs site, which includes the diagrams.

image

@eecavanna
Copy link
Collaborator Author

Like before, the failure of the Preview documentation build / run (pull_request) check is due to #2199. It is not specific to this PR. It happens on every "cross-fork" PR.

@eecavanna
Copy link
Collaborator Author

eecavanna commented Oct 15, 2024

What are the distinguishing characteristics of this class diagram.

  • These depict relationships between all classes or between all collections
  • These are interactive (their layout can be edited in real time)

Note: I don't know which specific diagrams you are thinking of for comparison. When I wrote the above list, the documents I had in mind were the ones I'm using to seeing in the schema docs (example shown below).

image

What are its strengths and weaknesses?

Strengths:

  • See above
  • I think @aclum mentioned publishing these would resolve some user feedback

Weaknesses:

  • Can bog down the web browser when there are a lot of elements in the graph
  • The HTML files are injected into the documentation website separately from the MkDocs build process, so if MkDocs were to happen to create something at the same path(s), it could get overwritten when the HTML files are injected
  • refscan (and refgraph, which is a part of it) is a new tool (can be a strength—e.g. nimbleness—and a weakness)

Could any of that be captured by adding another word or two to the links?

Ideas (can swap "collection" for "class"):

  • Inter-class relationship diagram
  • Class relationship diagram
  • Class diagram (interactive)

P.S. I'll be unavailable on Tuesday due to the "Berkeley Schema Switch-over" + Release Day activities.

@eecavanna
Copy link
Collaborator Author

I updated the sidebar link text to be, in my opinion, more descriptive.

image

@eecavanna
Copy link
Collaborator Author

eecavanna commented Oct 15, 2024

I do like the idea of having refscan (specifically, the refgraph piece of it) available in an nmdc-schema development environment. It'll allow schema editors to visualize schema changes locally, after they make those changes locally.

@eecavanna eecavanna requested a review from turbomam October 15, 2024 01:35
@turbomam
Copy link
Member

I can confirm that the refgraphs build locally now. Thanks @eecavanna

@eecavanna
Copy link
Collaborator Author

Hi @turbomam, I am ready for this PR to undergo the old one two merge-aroo. Is there anything you want me to change about it?

@eecavanna
Copy link
Collaborator Author

Instead of adding two links to the sidebar, which point directly to the diagrams; I have updated this PR to add one link to the sidebar, which points to a new page called "Visualizations." That page contains introductory text for each of the two diagrams, and contains links that point to the diagrams.

Here's a screenshot of that page:

image

Hi @turbomam, what do you think of this latest version; is there anything you want me to change about it?

@eecavanna
Copy link
Collaborator Author

Like before, the failure of the Preview documentation build / run (pull_request) check is due to #2199. It is not specific to this PR. It happens on every "cross-fork" PR.

@eecavanna
Copy link
Collaborator Author

Hi @turbomam, is there anything you want me to change about that "Visualizations" page I introduced?

# Compile static Markdown files, images, and JavaScript scripts, into a documentation website.
#
# Then, use `refgraph` (part of `refscan`) to generate a pair of graphs (i.e. network diagrams),
# one that depicts inter-collection relationships and one that depicts inter-class relationships.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My two cents here is it will be confusing and not very useful to distribute the inter-class relationship diagram until we make the ranges as strict as the structured syntax patterns.

@eecavanna
Copy link
Collaborator Author

eecavanna commented Oct 31, 2024

I'll separate this out into two PRs:

  1. One introducing the plumbing and the inter-collection diagram
  2. One, based on that, introducing the inter-class diagram

I'll expect the first one to undergo review and possibly also be merged tomorrow.

I'll expect the second one to sit idle until after the schema changes in #2235 #2238 have been applied, and the accompanying migrator has been run.

@aclum
Copy link
Contributor

aclum commented Oct 31, 2024

PR #2235 is unrelated. Releasing the inter-class diagram would be dependent on #2238

@eecavanna
Copy link
Collaborator Author

Oops! Correct — thanks! I think I had issue microbiomedata/refscan#26 (comment) on my mind when I wrote that.

@eecavanna
Copy link
Collaborator Author

Closing now that these changes have effectively been reproduced in two separate, smaller PRs (as planned):

@eecavanna eecavanna closed this Nov 1, 2024
@eecavanna eecavanna deleted the 2188-berkeley-incorporate-refscan-graphs-into-schema-documentation branch November 11, 2024 19:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

berkeley: Incorporate refscan graphs into schema documentation
3 participants