You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We want sitemaps that are well-constructed and complete, so that search engines can accurately index our content and therefore allow Veterans to use search to find the content they need.
Sitemaps should be well-organized such that they are human-readable, for awareness and troubleshooting purposes.
Sitemaps should be able to be broken up semantically into categories, to allow for better troubleshooting and also to allow prioritization of some sitemaps for more frequent or higher priority indexing.
The content you are editing has changed. Please copy your edits and refresh the page.
Currently, the sitemap building process for both Content Build and Next build happens near the end of each generation process. The sitemap generation functionality looks at what has actually been produced - the HTML files that are actually present in the output - and creates a sitemap based on that.
va.gov/sitemap.xml becomes a sitemap index pointing to two sitemaps:
sitemap-cb.xml, produced by Content Build
sitemap-nb.xml, produced by Next Build
benefits:
Can be done quickly
Each system accounts for what it is building
Only produces one additional sitemap file to review for issues
Accounts for everything that is actually built, including URLs that CMS does not have knowledge of (registry.json apps, Status pages, vagov-content URLs, other URLs generated from scratch by Content Build)
negatives:
There may be some duplication of URLs between the two sitemaps while individual template types are transitioning
It is understood that this is not ideal, due to adding to the high volume of items submitted to Search.gov
CMS Team is also motivated to remove duplicates from Content Build once templates are established in Next Build, because of the speed gains for Content Build. So, duplication should be temporary.
Does produce a second sitemap file to be reviewed for troubleshooting
Phase 2
The process by which Next Build creates a sitemap currently requires that all files be available when the sitemap is being created. This will no longer be possible when we move to a persistent server and just-in-time generation of pages.
We will need a new process for generating sitemaps. Our proposal is that each 'system' which has knowledge of URLs should be leveraged to create its own sitemap, and for them to be unified by a sitemap index. In particular:
CMS would at a minimum have its own sitemap for its URLs
This sitemap can be constructed directly from CMS data, and thus will have access to lastmod information as well as any site structure or IA information that we might want to use to further divide the sitemap.
Registry.json would have its own sitemap for its URLs
Any 'generated' URLs would need to be added to a sitemap (health-care-facility/status, pagination pages, etc)
These sitemaps can all be orchestrated by Next Build. This can launch prior to just-in-time generation of pages, and would replace the sitemap indexes from Phase 1.
The text was updated successfully, but these errors were encountered:
Requirements & Goals
We want sitemaps that are well-constructed and complete, so that search engines can accurately index our content and therefore allow Veterans to use search to find the content they need.
Sitemaps should be well-organized such that they are human-readable, for awareness and troubleshooting purposes.
Sitemaps should be able to be broken up semantically into categories, to allow for better troubleshooting and also to allow prioritization of some sitemaps for more frequent or higher priority indexing.
How will we know we've achieved our goal?
Work
Issues or tasks to be turned into issues
Supporting material
Sitemap proposal
Phase 1
Currently, the sitemap building process for both Content Build and Next build happens near the end of each generation process. The sitemap generation functionality looks at what has actually been produced - the HTML files that are actually present in the output - and creates a sitemap based on that.
Phase 2
The process by which Next Build creates a sitemap currently requires that all files be available when the sitemap is being created. This will no longer be possible when we move to a persistent server and just-in-time generation of pages.
We will need a new process for generating sitemaps. Our proposal is that each 'system' which has knowledge of URLs should be leveraged to create its own sitemap, and for them to be unified by a sitemap index. In particular:
These sitemaps can all be orchestrated by Next Build. This can launch prior to just-in-time generation of pages, and would replace the sitemap indexes from Phase 1.
The text was updated successfully, but these errors were encountered: