Skip to content

DevelopmentPrinciples

Martin Pitt edited this page Jan 5, 2022 · 10 revisions

Development

  • keep master working on as many (current) OSes as possible; use run-time feature/API detection
  • test for every change; some OS conditionals in tests
  • code is easy (or possible+documented) to run straight out of git tree; no permanent system modifications
  • test VMs double as devel environment for testing intrusive changes; faster iteration with scp instead of image-prepare
  • tests are easy to run and debug locally

Upstream CI

  • test on every supported OS
  • offline build and tests
  • provide our own versions of third-party services: FreeIPA, Samba AD, candlepin, OpenShift, ovirt, selenium containers
  • provide mechanics for creating rpms, debs, and entire repositories from scratch locally
  • separate OS image refreshes
  • test robustness: touched tests succeed 3x in a row, untouched tests succeed 1 out of 3; database of test flakes

Fedora/RHEL

  • run upstream integration tests in downstream gating
  • the above approach allows us to upload current master until the latest freeze

Releases

  • automate everything: github, fedora, copr, PPA, dockerhub, home page (docs)
  • process in principle: create tag, write blog post

Our tests/CI Error Budget

High-level goal: What keeps our velocity and motivation?

  • PRs get validated in a reasonable time (queue + test run time)
  • We don’t waste time on interpreting unstable test results
  • We are not afraid of touching code
  • Test failures are relevant and meaningful. Relieve us from having to decide about “unrelated or not” every. single. time.

Service Level Objectives

When the following objectives are fulfilled, we operate normally and happily. Once these drop below the mark (“exceeding error budget”), a part of the team (discussed in daily standups) stops feature development and non-urgent changes, and fixes our infrastructure and tests to get back into the agreed service level.

Objectives that support the high-level goal, in descending importance:

  1. A merged PR became fully green with a 75% chance at the first attempt, and with a 95% chance after one retry
  2. Every individual test succeeds at least 90% of the time
  3. 95% of all PRs are merged without failed tests
  4. 95% of test runs take no more than 1 hour to execute.
  5. 95% of test runs spend no more than 5 minutes in the queue until they get assigned to a runner.
  6. 95% of scheduled tests run through to completion (all tests ran and status got reported to PR)
Clone this wiki locally