-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathgoogle.tex
69 lines (61 loc) · 3.05 KB
/
google.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
\subsection{Case Study on Two Production Server Applications}
Using basic blocks extracted from two production applications in Google---Spanner~\cite{spanner} and Dremel~\cite{dremel},
we additionally validated the performance models discussed in this paper
(except OSACA, due to licensing issues).
Spanner is a highly available, globally distributed database,
designed to scale up to millions of machines and trillions of database rows \cite{spanner}.
Dremel is a scalable,
interactive ad-hoc query system for analysis of read-only data,
capable of running aggregation queries over trillion-row tables in seconds \cite{dremel}.
We show the composition of basic blocks these two applications in Figure~\ref{fig:google-blocks}.
Spanner and Dremel spend almost half of the time executing load instructions---40\% and 50\%, respectively.
Compared to similar applications from our
data set,
they spend significantly more
time---12\% and 9\%, respectively---executing vectorized basic blocks.
The closest applications in our dataset are
SQLite and Redis, and in contrast, they spend 5.0\% and 0.2\%
of the time executing (partially) vectorized basic blocks.
% This is evidence of Google's effort to optimize
% these applications.
\begin{figure}[htbp!]
\includegraphics[width=0.95\columnwidth]{figures/google-blocks.pdf}
\caption{Basic Block composition of Spanner. Each basic block is weighted by its
execution frequency.}
\label{fig:google-blocks}
\end{figure}
On a Haswell machine, we use our profilers
to measure the throughput of the 100,000 most frequently executed basic blocks
from the two applications. Table~\ref{tab:google-numbers} shows the accuracy
of IACA, llvm-mca, and Ithemal predicting these basic blocks.
Given the dominant basic block categories of the
two applications,
their predictions are consistent with our previous analysis,
with one exception:
llvm-mca, which has an average prediction error of 25\% on Haswell, predicts noticeably better
when processing Spanner's and Dremel's basic blocks.
\begin{table}[h]
\begin{tabular}
{|p{0.15\columnwidth}|p{0.15\columnwidth}|p{0.15\columnwidth}|p{0.15\columnwidth}|p{0.15\columnwidth}|}
\hline
\textbf{Application} & \textbf{Model} &
\textbf{Average Error} & \textbf{Weighted Error} & \textbf{Kendall's Tau Coefficient} \\
\hline
Spanner & IACA & 0.1892 & 0.1659 & 0.7786\\
& llvm-mca & 0.1764 & 0.1519 & 0.7623\\
& Ithemal & \textbf{0.1629} & \textbf{0.1414} & \textbf{0.7799}\\
\hline
Dremel & IACA & 0.1883 & 0.1846 & 0.7835\\
& llvm-mca & 0.1777 & \textbf{0.1831} & 0.7685\\
& Ithemal & \textbf{0.1640} & 0.1871 & \textbf{0.7862}\\
\hline
\end{tabular}
\\
\caption{Accuracy of IACA, llvm-mca, and Ithemal predicting basic blocks from Spanner and Dremel.
Bolded entries are the best among the three evaluated models.
Kendall's tau coefficient~\cite{kendalltau}
measures the fraction of pairwise throughput ordering preserved by a given model;
so the higher the better.
The ``Weighted Error'' column shows prediction errors weighted by a basic block's runtime frequency.}
\label{tab:google-numbers}
\end{table}