From 38e63f72e0c60c323b5bec9152bf74fb32eceb79 Mon Sep 17 00:00:00 2001 From: Johannes Schindelin Date: Mon, 3 Apr 2023 20:46:50 +0200 Subject: [PATCH] Adjust `ARCHITECTURE.md` to the Hugo reality Originally, the idea was to use Jekyll instead, see https://github.com/git/git-scm.com/issues/942. However, I aborted that migration when it turned out that Jekyll required 20 minutes to process the files while Hugo spent less than half a minute on them. Signed-off-by: Johannes Schindelin --- ARCHITECTURE.md | 166 +++++++++++++++--------------------------------- 1 file changed, 51 insertions(+), 115 deletions(-) diff --git a/ARCHITECTURE.md b/ARCHITECTURE.md index efc22cb3e2..884896b320 100644 --- a/ARCHITECTURE.md +++ b/ARCHITECTURE.md @@ -1,161 +1,97 @@ # git-scm.com architecture This document describes the general setup and architecture that runs the -git-scm.com site. The idea is to document all the moving parts that -_aren't_ checked in to this repository. That may help new people joining -the project to help out, as well provide some continuity in case the -maintainer is hit by a bus. +git-scm.com site. ## Content -Though the site is a rails app, it can _mostly_ be thought of as serving -static content. It's just that we suck in that static content and -pre-process it using nightly scheduled jobs. We never write anything to -the database on behalf of user requests. +This site is served via GitHub Pages and is a [Hugo](https://gohugo.io/) site +with the search implemented using [Pagefind](https://pagefind.app/). The content is a mix of: - - actual static content in this repository + - original content from this repository - community book content brought in from https://github.com/progit; - see the `lib/tasks/book2.rake` file. + see the `script/update-book2.rb` and `script/book.rb` files. - - manpages from releases of the git project, imported and formatted - via asciidoctor; see the `lib/tasks/index.rake` task. + The content is pre-rendered and tracked in the `external/book/` directory + tree. + - manual pages from releases of the git project, imported and formatted via + AsciiDoctor, and translated versions of the manual pages from + https://github.com/jnavila/git-manpages-l10n/ (which itself contains + pre-rendered pages from https://github.com/jnavila/git-manpages-l10n/); see + the `script/update-docs.rb` file. -## Heroku + The pre-rendered pages are tracked in the `external/docs/` directory tree. -The app itself is served by Heroku. The app name is `git-scm` (so you -can visit it directly as https://git-scm.herokuapp.com). The site is -owned by the git-scm.com team. If you want to be involved in managing -uptime/deploys/etc, you'll need a Heroku account and request to be added -to that team. +To deploy to GitHub Pages, it is necessary to turn off the default setting to +"publish from a branch" and instead change the setting to "publish with a +custom GitHub Actions workflow": +https://docs.github.com/en/pages/getting-started-with-github-pages/configuring-a-publishing-source-for-your-github-pages-site#publishing-with-a-custom-github-actions-workflow +With this change, the site can be tested in the fork by pushing to the +`gh-pages` branch (which will trigger the `deploy.yml` workflow) and then +navigating to https://git-scm..github.io/. -We use a few Heroku add-ons: +## Non-static parts - - Bonsai elasticsearch (see below) +While the site consists mostly of static content, there are a couple of +parts that are sort of dynamic. - - Heroku Postgres as the database +The search is implemented client-side, via [Pagefind](https://pagefind.app/). - - Heroku Redis for rails caching +A few scheduled GitHub workflows keep the content up to date: - - Heroku scheduler for cron jobs + - `update-git-version-and-manual-pages` and `update-download-data` (pick + up newly released git versions) -The nightly scheduled jobs are: + - `update-translated-manual-pages` (fetch and format translated manual + pages from the jnavila/git-html-l10n repository) - - `rake downloads` (pick up newly released git versions) - - - `rake preindex` (pull in and format manpages for released git - versions) - - - `rake remote_genbook2` (pull in and format progit2 book content, + - `update-book` (fetch and format progit2 book content, including translations) -It should be safe to run any of those jobs more frequently. E.g., if you -know there's a new Git release out, then: - - heroku run rake preindex - heroku run rake downloads - -will get it on the site without waiting for the nightly run. - -Merges to the `main` branch on GitHub auto-deploy to Heroku, so unless -you're doing something tricky you generally shouldn't need to manually -deploy. - -Note that some of the formatting of manpages and book content happens -when they are imported by the rake tasks. So after fixing some -formatting and deploying, the rake jobs may need to be re-run with a -special flag to re-import (see the individual tasks for details). - - -## Cloudflare - -We get enough requests that it's easy to overwhelm the single Heroku -dyno. So we have Cloudflare sitting in front of it, aggressively caching -everything. That also should make the site faster to serve to regions -far away from Heroku's servers. - -The Cloudflare setup is mostly pretty simple: +These workflows are also marked as `workflow_dispatch`, i.e. they can be run +manually (e.g. to update the download links just after Git for Windows +published a new release). - - they serve DNS for the whole domain (that's where they insert the CDN - magic) - - - Cloudflare provides `https://` support to the user. Obviously the - site is totally open and doesn't have any sensitive data, so this is - really more about integrity. The certificate is generated by - Cloudflare (and requires SNI on the browser side). - - - the Cloudflare connection to Heroku is passed over TLS; they provide an - "internal" certificate that we ask Heroku to use, so the connection - is secured between the two (again, mostly for integrity) - - - the most exotic config is that we use "page rules" to mark the whole - site to be cached aggressively, regardless of any caching headers - sent from Heroku. This is a bit of a hack, but there's very little on - the site that can't be cached (which is perhaps a sign that the rails - setup needs to be tweaked to send more reasonable caching headers, - but this has been simple and effective so far). - - There are a few special page rules to lift this caching for cases - where we do server-side logic (e.g., - https://github.com/git/git-scm.com/issues/1129#issuecomment-363067019"), - but the long-term goal is to push that logic onto the client side as - much as possible. - -Both domains (c.f., the section on [DNS](#DNS) below) are owned by a -Cloudflare "Team", and membership of that team is required to -administrate the domains. Similar to the Heroku setup, you can ask to -join this team if you wish to help out. The information about the team -setup is in escrow with the Git PLC at Software Freedom Conservancy. -Cloudflare provides the project with enough credits that it doesn't cost -anything (though we're not using very many features, so it's possible -that a free account would be sufficient, too). - -## Bonsai Elasticsearch - -The search functionality on the site is served by an elasticsearch -cluster. The index can be populated by running `rake search_index` -(manpages) and `rake search_index_book` (book) on Heroku (we only index -the manpages and book). This perhaps should be run nightly, or at least -after pulling in new content, but it currently isn't done automatically. - -The elasticsearch cluster is provided by Bonsai via their Heroku plugin. -Our needs are larger than their free tier provides, but we receive -credits from them that provide the service for free. +Merges to the `gh-pages` branch on GitHub auto-deploy to GitHub Pages via the +`deploy` GitHub workflow. +Note that some of the formatting of manual pages and book content happens +when they are imported by the GitHub workflows. Therefore, whenever there are +changes to the scripts/workflows/automation that affect formatting, these +workflows may need to be triggered using the force-rebuild flag to be toggled +(see the individual workflows for details). ## DNS -The actual DNS service is provided by Cloudflare (see above). The domain -itself is registered with Gandi, and is owned by the project via -Software Freedom Conservancy. Funds for the registration are provided -from the Git project's Conservancy funds, and both the Git PLC and -Conservancy have credentials to modify the setup. +The actual DNS service is provided by Cloudflare. The domain itself is +registered with Gandi, and is owned by the project via Software Freedom +Conservancy. Funds for the registration are provided from the Git project's +Conservancy funds, and both the Git PLC and Conservancy have credentials to +modify the setup. Note that we own both git-scm.com and git-scm.org; the latter redirects to the former. - ## Manual Intervention The site mostly just runs without intervention: - - code merged to `main` is auto-deployed + - code merged to `gh-pages` is auto-deployed - - new git versions are detected daily and manpages and download links + - new git versions are detected daily and manual pages and download links updated - book updates (including translations) are picked up daily There are a few tasks that still need to be handled by a human: - - new images added to the book have to be copied manually from - progit/progit2 - - new languages for book translations need to be added to - `lib/tasks/book2.rake` + `script/book.rb` - - forced re-imports of content (e.g., a formatting fix to imported - manpages) must be triggered manually + - forced re-imports of content (e.g., when fixing formatting in the + imported manual pages) must be triggered manually with `force-rebuild` + toggled