Adding notes on bitslicing

david-oswald · Apr 27, 2023 · 385c228 · 385c228
1 parent eef4170
commit 385c228
Show file tree

Hide file tree

Showing 4 changed files with 46 additions and 4 deletions.
diff --git a/chapters/symmetric_crypto.tex b/chapters/symmetric_crypto.tex
@@ -361,14 +361,14 @@ \section{Bitslicing}
 
 \paragraph{Key Addition Layer}
 Since both round key and state are in the ``expanded'' form, we can simply XOR them byte-wise, whereas each XOR between two bytes of course only takes care of one bit in the state. The \verb+C+ pseudocode would be:
+
 \begin{lstlisting}
 for(uint8_t i = 0; i < 64; i++)
 {
-   state[i] = state[i] ^ roundkey[i];
+   state[i] = state[i] ^ roundkey_exp[i];
 }
 \end{lstlisting}
 
-
 \paragraph{Permutation Layer}
 The bitwise permutation of PRESENT now becomes a byte-wise permutation (due to the wasteful way of storing the state bits). For example, for the first few bits of the PRESENT permutation, this would result in (assuming the permutation is not done in-place):
 \lstset{language=C}
@@ -414,15 +414,37 @@ \section{Bitslicing}
 		\label{fig:symmetric_crypto:bitslicing}
 \end{figure} 
 
-Note that storing each entry of $s'$ as one byte is a choice made here for illustrative purposes. Usually, one would select the register size of a processor (i.e., 16~bit for the MSP430, 64~bit for a modern \ac{CPU}) to be able to most efficiently make use of the processor's hardware. Of course, this implies that even more cipher instances are computed in parallel (16 for the example of the MSP430, 64 for the \ac{PC} \ac{CPU}).
+Note that storing each entry of $s'$ as one byte is a choice made here for illustrative purposes. Usually, one would select the register size of a processor (i.e., 16~bit for the MSP430, 32~bit for an embedded ARM core such as in the RP2040, 64~bit for a modern \ac{CPU}) to be able to most efficiently make use of the processor's hardware. Of course, this implies that even more cipher instances are computed in parallel (16 for the example of the MSP430, 32 for the RP2040, 64 for a \ac{PC} \ac{CPU}).
+
+\emph{Note:} One actually does not have to explicitly transform the round keys into bitsliced form. Instead, if the same key (and hence the same round keys) are used for all bitsliced blocks, we can make use of the fact that each element (bit) \verb+i+ in the bitsliced \verb+roundkey_exp[i]+ can only either be all 0 (if bit \verb+i+ of the round key is 0) or all 1 (if bit \verb+i+ of the round key is 0). 
+%
+Hence, the key addition can be rewritten as (pseudocode):
+\begin{lstlisting}
+for(uint8_t i = 0; i < 64; i++)
+{
+   if(bit i of roundkey == 1) {
+      state[i] = state[i] ^ 0xFFFF...FF;
+   }
+}
+\end{lstlisting}
+
+However, this way causes the runtime of the implementation to be key-dependent, which is generally undesirable. Using the fact that -1 = \verb+0xFFFF...FF+ and that -0 = \verb+0x0000...00+, we can however rewrite this as:
+\begin{lstlisting}
+for(uint8_t i = 0; i < 64; i++)
+{
+   // Assume: x is either 1 or 0
+   b = get bit i of roundkey;
+   state[i] = state[i] ^ -b;
+}
+\end{lstlisting}
 
 \paragraph{Time and Memory Complexity}
 For bit-oriented ciphers like PRESENT or \ac{DES}, bitslicing usually gives a significant performance increase due to the amount of operations saved for the permutations. The key addition layer is similar in performance for normal and bitsliced implementations, while the \ac{SBOX} layer is usually more costly when implemented in bitsliced form. The overhead of the \ac{SBOX} layer depends on the number of required Boolean operations to implement the \ac{SBOX}---hence, finding a representation with as little Boolean operations as possible is crucial for bitsliced implementations with best performance.
 %
 Besides, the operations to bring the input states into bitsliced form add computational complexity. Hence, it is normally desirable to stay in bitsliced form---and not to ``switch'' between the forms, e.g., to implement an \ac{SBOX} as a \ac{LUT}, converting the input to normal representation and back to bitsliced form afterwards. 
 %
 In terms of memory, bitslicing generates an overhead since multiple cipher states are stored at the same time. Moreover, the computation of the \ac{SBOX} usually requires (some) temporary registers to buffer the input bits. The code size may increase as well, especially if the \ac{SBOX} requires many Boolean operations.
-Overall, bitslicing is highly useful in scenarios where a large amount of data has to be encrypted with a bit-oriented cipher. In contrast, for byte-oriented ciphers like the \ac{AES} (with larger \ac{SBOX}), a table-based implementation will likely give better performance than bitslicing.
+Overall, bitslicing is highly useful in scenarios where a large amount of data has to be encrypted with a bit-oriented cipher. In contrast, for byte-oriented ciphers like the \ac{AES} (with larger \ac{SBOX}), a table-based implementation will likely give better performance than bitslicing, though bitslicing can offer competitive performance and other advantages such as constant execution time (see \Cref{sec:impl_attacks:sca} for why this is often needed)~\cite{Schwabe17}.
 
 \subsection{The \acl{ANF}}
 \label{chap:symmetric_crypto:anf}

diff --git a/lecture_notes.pdf b/lecture_notes.pdf
diff --git a/lecture_notes.tex b/lecture_notes.tex
@@ -57,6 +57,10 @@
 %% H
 \usepackage{here}
 \usepackage[pdftitle={David Oswald -- Hardware and Embedded Systems Security},pdfsubject={Hardware and Embedded Systems Security}, pdfauthor={David Oswald}, pdfkeywords={Keywords}, pdfdisplaydoctitle=true, linktocpage=true, bookmarksnumbered=true, colorlinks=false, linkcolor=blue, citecolor=blue, anchorcolor=blue, pdfpagelayout=TwoPageRight, bookmarksopen=true, bookmarksopenlevel=0]{hyperref}
+
+%% Cleveref must be loaded after hyperref
+\usepackage{cleveref}
+
 %% I
 \usepackage[utf8]{inputenc}                 % allow direct input of umlaute etc.
 %% J

diff --git a/literature.bib b/literature.bib
@@ -584,6 +584,22 @@ @inproceedings{canright2008very
   organization={Springer}
 }
 
+@InProceedings{Schwabe17,
+author="Schwabe, Peter
+and Stoffelen, Ko",
+editor="Avanzi, Roberto
+and Heys, Howard",
+title={{All the AES You Need on Cortex-M3 and M4}},
+booktitle="Selected Areas in Cryptography -- SAC 2016",
+year="2017",
+publisher="Springer",
+address="Cham",
+pages="180--194",
+isbn="978-3-319-69453-5"
+}
+
+
+
 @article{montgomery1987speeding,
   title={{Speeding the Pollard and elliptic curve methods of factorization}},
   author={Montgomery, Peter L},