Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Epic][AP] Sitemap for CMS and other VA.gov content #18488

Open
1 of 10 tasks
Tracked by #6577
timcosgrove opened this issue Jul 10, 2024 · 0 comments
Open
1 of 10 tasks
Tracked by #6577

[Epic][AP] Sitemap for CMS and other VA.gov content #18488

timcosgrove opened this issue Jul 10, 2024 · 0 comments

Comments

@timcosgrove
Copy link
Contributor

timcosgrove commented Jul 10, 2024

Requirements & Goals

We want sitemaps that are well-constructed and complete, so that search engines can accurately index our content and therefore allow Veterans to use search to find the content they need.

Sitemaps should be well-organized such that they are human-readable, for awareness and troubleshooting purposes.

Sitemaps should be able to be broken up semantically into categories, to allow for better troubleshooting and also to allow prioritization of some sitemaps for more frequent or higher priority indexing.

How will we know we've achieved our goal?

Preview Give feedback

Work

Issues or tasks to be turned into issues

Preview Give feedback
  1. 1 of 8
    Accelerated Publishing Needs refining
    nfpappas-oddball

Supporting material

Sitemap proposal

Phase 1

Currently, the sitemap building process for both Content Build and Next build happens near the end of each generation process. The sitemap generation functionality looks at what has actually been produced - the HTML files that are actually present in the output - and creates a sitemap based on that.

  • va.gov/sitemap.xml becomes a sitemap index pointing to two sitemaps:
    • sitemap-cb.xml, produced by Content Build
    • sitemap-nb.xml, produced by Next Build
  • benefits:
    • Can be done quickly
    • Each system accounts for what it is building
    • Only produces one additional sitemap file to review for issues
    • Accounts for everything that is actually built, including URLs that CMS does not have knowledge of (registry.json apps, Status pages, vagov-content URLs, other URLs generated from scratch by Content Build)
  • negatives:
    • There may be some duplication of URLs between the two sitemaps while individual template types are transitioning
      • It is understood that this is not ideal, due to adding to the high volume of items submitted to Search.gov
      • CMS Team is also motivated to remove duplicates from Content Build once templates are established in Next Build, because of the speed gains for Content Build. So, duplication should be temporary.
    • Does produce a second sitemap file to be reviewed for troubleshooting

Phase 2

The process by which Next Build creates a sitemap currently requires that all files be available when the sitemap is being created. This will no longer be possible when we move to a persistent server and just-in-time generation of pages.

We will need a new process for generating sitemaps. Our proposal is that each 'system' which has knowledge of URLs should be leveraged to create its own sitemap, and for them to be unified by a sitemap index. In particular:

  • CMS would at a minimum have its own sitemap for its URLs
    • This sitemap can be constructed directly from CMS data, and thus will have access to lastmod information as well as any site structure or IA information that we might want to use to further divide the sitemap.
  • Registry.json would have its own sitemap for its URLs
  • Any 'generated' URLs would need to be added to a sitemap (health-care-facility/status, pagination pages, etc)

These sitemaps can all be orchestrated by Next Build. This can launch prior to just-in-time generation of pages, and would replace the sitemap indexes from Phase 1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant