Skip to content

Commit

Permalink
test types, minor fixes
Browse files Browse the repository at this point in the history
Signed-off-by: Galo Navarro <[email protected]>
  • Loading branch information
srvaroa committed Feb 6, 2024
1 parent bc9c072 commit 0c43714
Show file tree
Hide file tree
Showing 4 changed files with 266 additions and 14 deletions.
28 changes: 14 additions & 14 deletions _posts/2020-02-12-eureka-kubernetes.md
Original file line number Diff line number Diff line change
Expand Up @@ -78,8 +78,8 @@ accounts following a fairly standard architecture. Here is a (very)
simplified view of one account:

<div class="image-box">
<a href="{{site.baseurl}}/assets/eureka-to-k8s/eureka_architecture.png" target="_blank">
<img src="{{site.baseurl}}/assets/eureka-to-k8s/eureka_architecture.png" width="100%" />
<a href="{{site.baseUrl}}/assets/eureka-to-k8s/eureka_architecture.png" target="_blank">
<img src="{{site.baseUrl}}/assets/eureka-to-k8s/eureka_architecture.png" width="100%" />
</a>
<span class="image_foot">Eureka architecture (<a href="https://github.com/Netflix/eureka/wiki/Eureka-at-a-glance">source</a>)</span>
</div>
Expand Down Expand Up @@ -154,8 +154,8 @@ that defines how to reach those `Pod`s. This is how Kubernetes wires
things up:

<div class="image-box">
<a href="{{site.baseurl}}/assets/eureka-to-k8s/services_in_k8s.png" target="_blank">
<img src="{{site.baseurl}}/assets/eureka-to-k8s/services_in_k8s.png" width="80%" />
<a href="{{site.baseUrl}}/assets/eureka-to-k8s/services_in_k8s.png" target="_blank">
<img src="{{site.baseUrl}}/assets/eureka-to-k8s/services_in_k8s.png" width="80%" />
</a>
<span class="image_foot">How we deploy a microservice in Kubernetes</span>
</div>
Expand All @@ -178,8 +178,8 @@ nodes. Not to any of the resources in the **overlay network** like
Let's see the diagram updated with a VPC peering between both AWS accounts.

<div class="image-box">
<a href="{{site.baseurl}}/assets/eureka-to-k8s/legacy_and_new.png" target="_blank">
<img src="{{site.baseurl}}/assets/eureka-to-k8s/legacy_and_new.png" width="100%" />
<a href="{{site.baseUrl}}/assets/eureka-to-k8s/legacy_and_new.png" target="_blank">
<img src="{{site.baseUrl}}/assets/eureka-to-k8s/legacy_and_new.png" width="100%" />
</a>
<span class="image_foot">Our starting point: legacy and new infrastructures</span>
</div>
Expand Down Expand Up @@ -262,8 +262,8 @@ a `Service`. Here is the diagram updated after adding two `Ingress`
objects for `Alice` and `Carol`.

<div class="image-box">
<a href="{{site.baseurl}}/assets/eureka-to-k8s/with_ingress.png" width="100%">
<img src="{{site.baseurl}}/assets/eureka-to-k8s/with_ingress.png" width="100%" />
<a href="{{site.baseUrl}}/assets/eureka-to-k8s/with_ingress.png" width="100%">
<img src="{{site.baseUrl}}/assets/eureka-to-k8s/with_ingress.png" width="100%" />
</a>
<span class="image_foot">Adding the Ingress</span>
</div>
Expand Down Expand Up @@ -330,8 +330,8 @@ each microservice to entries in Eureka. This gives us some benefits:
seamlessly.

<div class="image-box">
<a href="{{site.baseurl}}/assets/eureka-to-k8s/full_picture.png" width="100%">
<img src="{{site.baseurl}}/assets/eureka-to-k8s/full_picture.png" width="100%" />
<a href="{{site.baseUrl}}/assets/eureka-to-k8s/full_picture.png" width="100%">
<img src="{{site.baseUrl}}/assets/eureka-to-k8s/full_picture.png" width="100%" />
</a>
<span class="image_foot">The full picture</span>
</div>
Expand Down Expand Up @@ -411,8 +411,8 @@ Eurek8s will continue to enable the full migration of all workloads. By
the time we're done, both environments will look like this:

<div class="image-box">
<a href="{{site.baseurl}}/assets/eureka-to-k8s/all_migrated.png">
<img src="{{site.baseurl}}/assets/eureka-to-k8s/all_migrated.png" width="100%" />
<a href="{{site.baseUrl}}/assets/eureka-to-k8s/all_migrated.png">
<img src="{{site.baseUrl}}/assets/eureka-to-k8s/all_migrated.png" width="100%" />
</a>
<span class="image_foot">All microservices are migrated to Kubernetes</span>
</div>
Expand Down Expand Up @@ -463,8 +463,8 @@ Finally, I'm adding an overview of how Eurek8s is implemented. Here is
a high level diagram:

<div class="image-box">
<a href="{{site.baseurl}}/assets/eureka-to-k8s/eurek8s.jpg">
<img src="{{site.baseurl}}/assets/eureka-to-k8s/eurek8s.jpg" width="100%" />
<a href="{{site.baseUrl}}/assets/eureka-to-k8s/eurek8s.jpg">
<img src="{{site.baseUrl}}/assets/eureka-to-k8s/eurek8s.jpg" width="100%" />
</a>
<span class="image_foot">Eurek8s</span>
</div>
Expand Down
236 changes: 236 additions & 0 deletions _posts/2024-02-10-how-about-we-forget-test-types.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,236 @@
---
layout: post
title: "How about we forget test types?"
date: 2024-02-06 23:00:00 +0200
categories:
---

I have found that the concept of test types (unit, integration, and so
on) does more harm than good. People often give me odd looks when I say
this, so having the explanation in a URL will come handy for future
reference. I will use it as an introduction to a series of practical
case studies on the topic of testing software.

<hr />

Test types shape the way engineers and even adjacent professionals
reason about testing software. This goes far beyond how an engineer may
decide to test a single change. It also influences how teams and entire
organisations analyse, design, implement, and evolve their testing
pipelines. Which is no joke, given that test writing, test
infrastructure, and test tooling accounts for around 30-40% of the cost
of maintaining software (ChatGPT or Google queries are consistent in
that ballpark, which roughly matches my experience). This cost is well
justified by the impact of testing on any company whose product depends
on building software.

Imagine you’re in one of those organisations. It might be new, and
growing. How do you structure the testing pipelines, the infrastructure,
the principles and “best practices” that govern work inside each
individual engineering team and across them. How do you find the right
balance between quality and velocity? Do we need a QA team? How will all
those elements behave when growing pains appear? Will they adapt and
keep up with the business, or will your engineering machinery grind to a
halt?

What happens if the organisation was consolidated, and already at that
breaking point? Suffering quality issues, slow development, gruelling
and unpredictable delivery cycles. How do you approach the task of
improving the quality of the overall product, and each of its
components? How do you identify the necessary changes in the existing
testing pipeline? How do you convince leadership to fund them? How do
you execute those changes without disrupting the business? When these
changes impact the way product teams develop, test, and distribute their
software, how do we exercise an effective influence on their (likely
reluctant) engineers, their tech leads, their engineering and product
managers? What type of shared testing infrastructure needs to be built
to support all those teams, and which should never be built, even if
teams ask for it? Was there a QA team already? Does their role change?
How different are the analysis and the solutions if the product is not
just a web site with a bunch of server-side components, but also has a
mobile application, or native ones, or firmware? How do we know if we’re
making progress?

Having solid foundations to reason about your testing is essential to
answer any of those questions.

Test types consolidated on that foundational role when Michael Cohn
introduced the Test Pyramid in “[Succeeding with
Agile](https://www.amazon.es/Succeeding-Agile-Software-Development-Using/dp/0321579364?&linkCode=ll1&tag=avr0b-21&linkId=76908430c11232694d8007c9d428887b&language=es_ES&ref_=as_li_ss_tl)
(2010). The key concept it put in the collective mindset was that you
can classify tests in types which are then laid down as layers. Bottom
to top these were ”unit”, “service” (nowadays perhaps more commonly
known as “integration”), and “user interface”. You want to have more of
the lower ones, fewer of the upper ones.

Here is [Cohn
himself](https://www.mountaingoatsoftware.com/blog/the-forgotten-layer-of-the-test-automation-pyramid):

<blockquote>
"At the base of the test automation pyramid is unit testing. Unit
testing should be the foundation of a solid test automation strategy and
as such represents the largest part of the pyramid. Automated unit tests
are wonderful because [...]"
</blockquote>

This says something about their importance relative to other types but
nothing about how to distinguish them. I couldn’t find a clearer
definition in “Succeeding with Agile”. In my experience, when people
talk about unit tests they imply a focus on verifying a narrow surface
of a code base, although it’s unclear how narrow.
[Wikipedia](https://en.wikipedia.org/wiki/Unit_testing) says that a
*Unit is the smallest component that can be isolated within the complex
structure of an app. It could be a function, a subroutine, a method or
property*”, but it comes along with a "`citation needed`" and if I take
that definition seriously, then Cohn’s later point that Service testing
“fill[s] the gap between unit and user interface testing” sounds like
defining water surfaces as “ponds” and “everything else”.

So let’s consult more consultants. Martin Fowler explains that when
writing unit tests at C3 they would “set up a test fixture that created
that object with all the necessary dependencies so it could execute its
methods.” Now we seem to be at the granularity of an object. But in the
same explanation he [provides a
quote](https://martinfowler.com/articles/2021-test-shapes.html) from
Kent Beck’s [Extreme
Programming](https://www.amazon.com/Extreme-Programming-Explained-Embrace-Change/dp/0321278658?crid=2Q9PJD594CD0M&keywords=extreme+programming+kent+beck&qid=1704621833&s=books&sprefix=extreme+programming+kent+be%2Cstripbooks-intl-ship%2C259&sr=1-1&linkCode=ll1&tag=avr0b-20&linkId=f6e4c6e224d48f82be27de28db318fd5&language=en_US&ref_=as_li_ss_tl),
which Fowler reads as meaning that “*’unit test’ means anything written
by the programmers as opposed to a separate testing team*”.

That scope seems panoramic compared with “*the smaller component that
can be isolated within [] an app*” that we read in Wikipedia and many
other sources. Besides, linking the type of test to who writes it is
also problematic: I know of many organisations where programmers write
most tests, even those that verify large areas of the system or the UI.
Does this mean that tests will be “unit” in one company but not in
another?

According to Martin Fowler, [there
was](https://martinfowler.com/articles/2021-test-shapes.html)
“considerable discussion” about Kent Beck’s formulation to the point
that “*one test expert vigorously lambasted Kent for his usage*”. They
asked the expert for his definition of unit testing and he replied that
"*in the first morning of my training course I cover 24 different
definitions of unit test.*

<hr />

We can’t expect much of a conceptual framework based on test types when
the key terms, in Fowler’s own words, “*have always been rather murky,
even by the slippery standards of most software terminology*”. It
certainly explains why using them in any conversation about testing
among engineers works like the proverbial can of worms. Once you open
it, the slimy creeps cause trouble everywhere.

But it’s worse than that. If engineers can’t have a meaningful
conversation about testing, then the communication with business
stakeholders about quality and software delivery is doomed to be nothing
but dysfunctional.

I have seen my share of the consequences. Organisations that take the
pyramid at heart and over invest on the basis with a multitude of hyper
granular unit tests for individual functions and methods. They run into
the law of diminishing returns. Issues proliferate at the articulation
between pieces that were considered individually correct. Business
stakeholders are unimpressed by the high coverage number and would
rather see the product work after assembling the pieces. The “[two unit
tests, zero integration
tests](https://www.reddit.com/r/ProgrammerHumor/comments/dw8s1i/2_unit_tests_0_integration_tests/)
memes circulate. Teams get burnt by those problems, some dismiss unit
tests as “encumbering and superficial” and conclude that [unit testing
is overrated](https://news.ycombinator.com/item?id=30942020). They
decide that they need less of that type, and more of the comprehensive
types that exercise complete use cases across wider surfaces of the
system. A good amount of those teams later end up buried under the
weight of byzantine test infrastructures that silently grew slow and
unscalable in technical, organisational, or both dimensions. In the
meantime business grows more and more frustrated with the slow pace of
delivery.

All the way through this mess, people try to figure out what went wrong.
They have opinionated debates around the company’s or the internet’s
water cooler that seldom reach a conclusion. Maybe the problem is in the
choice of shape? What if [the Pyramid has fallen out of
style](https://twitter.com/swyx/status/1261202288476971008)? Perhaps we
should get more creative. Let’s try the [Test
Trophy](https://thetestingarchitect.substack.com/p/test-pyramid-test-honeycomb-test).
[The Test
Honeycomb](https://engineering.atspotify.com/2018/01/testing-of-microservices/).
[The Test Diamond](https://web.dev/articles/ta-strategies). [The Test
Crab](https://web.dev/articles/ta-strategies). We were not able to
define one type properly but why not add more? Component tests. API
tests. Fowler himself proposes [Subcutaneous
tests](https://martinfowler.com/bliki/SubcutaneousTest.html), [Broad
stack tests](https://martinfowler.com/bliki/BroadStackTest.html),
[Solitary and Sociable
tests](https://martinfowler.com/bliki/UnitTest.html). Sometimes they
depend on how much surface of code is touched. Others it’s on how that
code is structured. On what part of the stack the code belongs to. How
are tests written, or who writes them. Anything goes.

The whole thing reminds of Borges’ [Celestial Emporium of Benevolent
Knowledge](https://en.wikipedia.org/wiki/Celestial_Emporium_of_Benevolent_Knowledge).
All are valid classifications, none authoritative, and their utility
limited to some. Which is fine for certain topics. But when it comes
about testing software it seems that the importance of the subject would
deserve, if not demand, that software engineers have a solid vocabulary
to hold a rational conversation, among themselves and with stakeholders.

<div class="image-box">
<img
src="{{site.baseUrl}}/assets/test_types/wooden_shapes.png"
alt="A shape sorting toy made out of wood, with coloured pieces"/>
</div>

How about we stop being content with “slippery standards” on our
professional terminology and pay less attention to these murky terms?
What if instead, we focus on what we want to achieve and frame the
problem as what it is: a system. After all a test pipeline is just that,
a system whose inputs are code and related artefacts produced by
development teams, and whose function is to give as much confidence as
possible on whether they work as expected. We want this system to remain
efficient, performant, cost-effective, and scalable to keep up with the
needs of the business. The usual drill.

With this approach the problem stops being about shoehorning the
complexity of modern software in awkwardly shaped geometric ideals that
fit someone’s wooden box. Instead, we are designing a system with a
clear purpose. We are doing engineering.

What do we find if we look at a testing pipeline from a systems
perspective? Well, one can think about three basic, familiar properties
we always care about: latency, throughput, error rate.

* Latency tells us how long it takes to verify a given change, to run a
test, or the full test suite. It measures the length of our feedback
loops.
* Throughput tells us how many of those verifications I can run per unit
of time.
* Error rate tells us what percentage of test executions fail to do
their job. Note that this is not the same as failed tests (that is a
successful execution!). Errors would be false positives (regressions
that slip through our safety net), or false positives (often flakes).

These are not ground breaking ideas! They permeate the literature about
software testing in one form or another (the reason for preferring unit
tests boils down to trade-offs around latency, error rate, throughput).
But for some reason types, categories, and shapes take the spotlight and
dominate the discussion. Bringing back the focus to the domain of
systems design rather than abstract classification games, helps reason
about problems around testing much more productively.

As I mentioned above this is meant to be the introduction to a short
series of posts. In the next issues I will cover:

* Describe a testing pipeline involving multiple teams that suffers many
of the pathologies described above, based on real examples that I have
found in companies I worked with. I will model these pipelines as an
actual system and show how this gives us a much better understanding
of what we can do to improve the situation.
* Show concrete interventions that can be implemented, from the
individual team level to the larger organisation, and use our model of
the system to observe and measure the impact. Here I will make
reference to real work I have been involved with.

I'm curious about how people design their own testing pipelines,
challenges they find, and useful patterns so feel free to share them.
16 changes: 16 additions & 0 deletions _sass/_base.scss
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,11 @@ body {
padding: 20px;
}

::selection {
color: black;
background-color: yellow;
}

a {
text-decoration: none;
color: #4595DB;
Expand Down Expand Up @@ -90,6 +95,17 @@ a {

}

blockquote {
font-style: italic;
}

.image-box {
text-align: center;
img {
max-width: 80%;
}
}

.box {
background-color: rgb(225, 230, 240);
padding: 10px;
Expand Down
Binary file added assets/test_types/wooden_shapes.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit 0c43714

Please sign in to comment.