Skip to content

Commit

Permalink
L34 added New College
Browse files Browse the repository at this point in the history
  • Loading branch information
patricklam committed Sep 18, 2024
1 parent ef84b93 commit 95c901b
Show file tree
Hide file tree
Showing 2 changed files with 30 additions and 21 deletions.
25 changes: 17 additions & 8 deletions lectures/L34-slides.tex
Original file line number Diff line number Diff line change
Expand Up @@ -177,19 +177,19 @@
\end{center}

Its whole purpose is to manage your config as codes
situation where you want to run your code using a cloud provider (e.g., AWS),
situation where you want to run your code using a cloud provider (e.g., AWS).

\end{frame}


\begin{frame}
\frametitle{Planning is Essential}

Terraform has a ``plan'' operation: can verify the change it's about to make.
Terraform has a \emph{plan} operation: can verify the change it's about to make.

Verify that we aren't about to give all our money to Jeff Bezos but also that a small change is actually small.

If you are happy with the change, apply it -- but things can change between plan and apply!
If you are happy with the change, \emph{apply} it---but things can change between plan and apply!

\end{frame}

Expand Down Expand Up @@ -229,9 +229,9 @@

Is this what we are best at?

Think extra carefully if you plan to do roll your own anything that is security or encryption related.
Think extra carefully if you plan to do roll-your-own anything that is security or encryption related.

Remember that platforms like AWS are constantly launching new tools.
Also, remember that platforms like AWS are constantly launching new tools.

\end{frame}

Expand Down Expand Up @@ -274,7 +274,7 @@
\begin{frame}
\frametitle{Billing or Potato?}

Debates rage about names should be meaningful or fun.
Debates rage about whether names should be meaningful or fun.

If the service is called \texttt{billing} it may be helpful in determining what it does, more so than if it were called \texttt{potato}.

Expand All @@ -284,15 +284,24 @@

\end{frame}

\begin{frame}[fragile]
\frametitle{New College, founded 1379}

\begin{center}
\includegraphics[width=.8\textwidth]{images/New_College_garden_front_Oxford_England.jpg}\\
CC-BY-SA 2.0, SnapshotsofthePast.com\\
\tiny \url{https://commons.wikimedia.org/wiki/File:New_College_garden_front_Oxford_England.jpg}
\end{center}
\end{frame}

\begin{frame}
\frametitle{Descriptive Names aren't Magic}

I've seen examples where the teams are called (anonymized a bit) ``X infrastructure'' and ``X operations''.
I've seen examples where the teams are called (anonymized a bit) ``X~infrastructure'' and ``X operations''.

I'd estimate that 35\% of queries to each team result in a reply that says that the question should go to the other team.

It gets worse when a team is responsible for a common or shared component (e.g., library).
It gets worse when a team is responsible for a common or shared component (e.g., a library).

\end{frame}

Expand Down
26 changes: 13 additions & 13 deletions lectures/L34.tex
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ \section*{DevOps for P4P}
\paragraph{Continuous Integration.}
This is now a best practice -- each change (or related group of changes) is built and tested to evaluate it. Once upon a time, putting the code changes together didn't happen on every build, but nightly or otherwise. That was a weird time when builds were slow and expensive. I think we're past that now, especially given that we have use of version control, good tests, and scripted deployments.
It works like this:
\begin{itemize}
\begin{itemize}[noitemsep]
\item pull code from version control;
\item build;
\item run tests;
Expand All @@ -56,7 +56,7 @@ \section*{DevOps for P4P}

\subsection*{Configuration as Code}
Systems have long come with complicated configuration options.
Sendmail is particularly notorious, but apache and nginx aren't super
Sendmail is particularly notorious (though who runs their own mail server anymore?), but apache and nginx aren't super
easy to configure either. But the environment that you're running your code in is also a kind of configuration. Furthermore, it's an excellent idea to have tools for configuration. It's not enough to just have a wiki page or github document titled ``How to Install AwesomeApp'' (fill in name of program here). Complicated means mistakes and people will forget steps. Don't let them make mistakes: make it automatic. The first principle is to treat \emph{configuration as code}.
Therefore:
\begin{itemize}
Expand All @@ -70,7 +70,7 @@ \subsection*{Configuration as Code}
go down after a while, for instance.
\item aim for a suite of modular services that integrate together smoothly.
\item refactor configuration files (Puppet manifests, Chef recipes, etc);
\item use continuous builds
\item use continuous builds.
\end{itemize}

One particular example of applying all those principles to infrastructure
Expand All @@ -82,26 +82,26 @@ \subsection*{Configuration as Code}
Even beyond that, you can ask Terraform to manage things like who has access
to your GitHub repositories and who is in what groups (e.g., reviewers).

Terraform does support a ``plan'' operation so it can tell you what it will
do, so you can verify that, before anything is actually changed. The plan can also tell you expected changes in terms of cost, which both helps verify that we aren't about to give all our money to Jeff Bezos but also that a small change is actually small. If you are happy with the change, apply it!
Terraform does support a \emph{plan} operation so it can tell you what it will
do, so you can verify that, before anything is actually changed. The plan can also tell you expected changes in terms of cost, which both helps verify that we aren't about to give all our money to Jeff Bezos but also that a small change is actually small. If you are happy with the change, \emph{apply} it!

The plan operation isn't perfect as things can change between the plan and apply steps, and some things like unique identifiers are really only known if they are created. Non-destructive changes are generally easy to deal with; just make another PR that corrects it. Destructive changes, however...

It's easy for very bad things to happen with Terraform as well: you could accidentally tell it you want to destroy all GitHub groups and it will gladly carry it out. This has the side effect of causing some people to message you on Slack in a panic, thinking that the removal of their GitHub access is actually a sign they are being fired. They were not. But I see why they were worried, honestly. Restoring some information in destructive changes might not be as easy as just reverting the change: if you told your tool to destroy a database, reverting the change will re-create the database, but not its contents. You took backups, right?

\subsection*{Common Infrastructure}
Using tools to manage the infrastructure is a good start, but it also matters how services use it. You should view different parts of your infrastructure as having an interface and communication is done exclusively via the interface or API. This reduces the coupling between different components, and, as we've discussed, allows you to scale the parts that need scaling.
Using tools to manage the infrastructure is a good start, but it also matters how services use it. You should view different parts of your infrastructure as having an interface. Communication is done exclusively via the interface or API. This reduces the coupling between different components, and allows you to scale the parts that need scaling.

Try to avoid not-invented-here syndrome: it is usually better to use an existing tool -- whether open-source, commercial, or provided by your cloud platform -- than to roll your own. Some examples might be:
\begin{itemize}
Try to avoid not-invented-here syndrome: it is usually better to use an existing tool---whether open-source, commercial, or provided by your cloud platform---than to roll your own. Some examples might be:
\begin{itemize}[noitemsep]
\item Storage: some sort of access layer (e.g., MongoDB or S3);
\item Naming and discovery (e.g., Consul)
\item Monitoring (e.g., Prometheus)
\end{itemize}

However, be prepared to build your own tools if needed. Sometimes what you want, or need, doesn't exist (yet). Think carefully about whether this service that is needed is really part of your core competence and whether creating it adds sufficient value to the business. It's fun to make your own system and all, but are you doing what you're best at?

Think extra carefully if you plan to do roll your own anything that is security or encryption related. I'm just going to say that unless you have experts on staff who know the subject really well and you're willing to pay for external audits and the like, you're more likely to end up with a terrible security breach than a terrific secure system.
Think extra carefully if you plan to do roll-your-own anything that is security or encryption related. I'm just going to say that unless you have experts on staff who know the subject really well and you're willing to pay for external audits and the like, you're more likely to end up with a terrible security breach than a terrific secure system.

As a second followup soapbox point to that: if what you are looking for doesn't exist, there might be a reason, Maybe the reason is that you are the first to think of it, but consider the possibility that it's not that good of an idea (either due to inefficiency or just not being great in principle).

Expand All @@ -115,9 +115,9 @@ \subsection*{Naming}

Allegedly-descriptive names aren't always the easiest to figure out either. I've seen examples where the teams are called (anonymized a bit) ``X infrastructure'' and ``X operations'' and I'd estimate that 35\% of queries to each team result in a reply that says that the question should go to the other team. It gets worse when a team is responsible for a common or shared component (e.g., library).

The \textit{real} solution to this kind of problem at least in my opinion, is similar to the idea of service discovery: we need a tool that provides directory information: if I want to know about \texttt{potato} I need to be able to look it up and have it send me to the right place. Tools for this, like OpsLevel, exist (even if they do much more than this). Such tools can also give some information about service maturity -- are you using deprecated things, do you have unpatched security vulnerabilities, is there enough test coverage...?
The \textit{real} solution to this kind of problem at least in my opinion, is similar to the idea of service discovery: we need a tool that provides directory information: if I want to know about \texttt{potato} I need to be able to look it up and have it send me to the right place. Tools for this, like OpsLevel, exist (even if they do much more than this). Such tools can also give some information about service maturity---are you using deprecated things, do you have unpatched security vulnerabilities, is there enough test coverage...?

There are potential morale implications for insisting on boring names for teams and services. A team that has named itself after some mythological name or fictional organization can have some feeling of identity in it -- Avengers, Assemble -- and that can be valuable.
There are potential morale implications for insisting on boring names for teams and services. A team that has named itself after some mythological name or fictional organization can have some feeling of identity in it---Avengers, Assemble---and that can be valuable.

\subsection*{Servers as cattle, not pets}
By servers, I mean servers, or virtual machines, or containers. It's much better to have a reproducible process for deployment of a server than doing it manually every single time. The amount of manual intervention should be minimized and ideally zero. If this is done you can save a lot of hours of time, reduce errors, and allow for automatic scaling (starting and stopping servers depending on demand).
Expand All @@ -130,7 +130,7 @@ \subsection*{Servers as cattle, not pets}

This is also called ``test in prod''. Sometimes you just don't know how code is really going to work until you try it. After, of course, you use your best
efforts to make sure the code is good. But real life is rarely like the test system. I've seen many operations that work beautifully in the development environment where there are 100~000 records... and time out in production where there are 10~000~000. But for canarying deployments of the second kind, the basic steps:
\begin{itemize}
\begin{itemize}[noitemsep]
\item stage for deployment;
\item remove canary servers from service;
\item upgrade canary servers;
Expand All @@ -150,7 +150,7 @@ \subsection*{Servers as cattle, not pets}
Containerization gives many of the advantages of this separation, but without nearly so much overhead of the guest operating systems (both its maintenance and runtime costs). Containers are run by a container engine so there is some abstraction of the underlying hardware, and the container is assembled from a specification that says what libraries, tools, etc. are needed. And thus when the container is built and deployed it is sufficiently isolated but shares (in read only mode) where it can. So a container is a very lightweight VM, in some sense. See this diagram from~\cite{netappcontainer}:

\begin{center}
\includegraphics[width=0.55\textwidth]{images/cvm.png}
\includegraphics[width=0.5\textwidth]{images/cvm.png}
\end{center}

So the goal of your build process should be to produce a container that is ready to be deployed and managed by the other parts of your infrastructure (e.g., Kubernetes).
Expand Down

0 comments on commit 95c901b

Please sign in to comment.