Add note about FP64

jzarnett · Feb 4, 2024 · d8592aa · d8592aa
1 parent 2e2e103
commit d8592aa
Show file tree

Hide file tree

Showing 4 changed files with 59 additions and 1 deletion.
diff --git a/lectures/459.bib b/lectures/459.bib
@@ -1355,4 +1355,13 @@ @misc{parler
   year = {2021},
   url = {https://www.wired.com/story/parler-hack-data-public-posts-images-video/},
   note = {Online; accessed 2023-10-14}
-}
+}
+
+@misc{fp3264,
+  author = {{JeGX}},
+  title = {AMD Radeon and NVIDIA GeForce FP32/FP64 GFLOPS Table},
+  year = {2014},
+  url = {https://www.geeks3d.com/20140305/amd-radeon-and-nvidia-geforce-fp32-fp64-gflops-table-computing/},
+  note = {Online; accessed 2024-02-04}
+}   
+  
diff --git a/lectures/L22-slides.tex b/lectures/L22-slides.tex
@@ -291,5 +291,42 @@
 
 \end{frame}
 
+\begin{frame}{Trading Accuracy for Performance?}
+
+	One more item from previous ECE 459 student Tony Tascioglu.
+
+
+	A crowd favourite in ECE 459 is trading accuracy for performance.
+
+
+NVIDIA GeForce gaming GPU's don't natively support FP64 (double).
+		\begin{itemize}
+			\item Native FP64 typically requires \$\$\$ datacentre GPUs.
+			\item FP64 used to be locked in software, now missing in HW.
+			\item Emulated using FP32 on gaming and workstation cards.
+		\end{itemize}
+
+\end{frame}
+
+\begin{frame}{Trading Accuracy for Performance?}
+
+Using 32-bit floats rather than 64-bit doubles is typically a 16, 32 or even 64x speedup depending on the GPU!
+
+Even more: 16 bit float instead of 32 bit is typically another 2x faster.
+
+For many applications, double precision isn't necessary!
+
+\end{frame}
+
+\begin{frame}{Trading Accuracy for Performance?}
+
+How dramatic is the difference?
+
+	\begin{center}
+		\includegraphics[width=\textwidth]{images/gpu-fp32-fp64-table.png}
+	\end{center}
+
+\end{frame}
+
 \end{document}
 
diff --git a/lectures/L22.tex b/lectures/L22.tex
@@ -272,6 +272,18 @@ \subsection*{N-Body Host Code}
 
 The full version of the improved code is in the course repository as \texttt{nbody-cuda-grid}. But what you want to know is, did these changes work? Yes! It sped up the calculation to about 1.65 seconds (still with 100~000 points, still on the same server). Now that's a lot better! We are finally putting the parallel compute power of the GPU to good use and it results in an excellent speedup.
 
+\paragraph{Trading Accuracy for Performance?}
+Thanks to previous ECE 459 student Tony Tascioglu who contributed this section. We've covered on numerous occasions that trading accuracy for performance is often a worthwhile endeavour. You might even say it's a crowd favourite. It's an instructor favourite, at lea1st.
+
+Most of the gaming-oriented NVIDIA GeForce GPUs don't natively support FP64 (double-precision floating point numbers). Native support for that requires expensive datacentre GPUs; it used to be locked in software and is missing in the hardware in more modern cards. Instead of running in hardware, the 64-bit operations are emulated in software and that is significantly slower.  How much slower? Using 32-bit floats rather than 64-bit doubles is typically a 16, 32 or even 64x speedup depending on the GPU! We can even push that a bit farther because using a 16-bit float might typically be another 2x faster. For many applications (gaming?) this level of precision isn't necessary. 
+
+How dramatic is the difference? See this table from\cite{fp3264}, which although its date says 2014, has clearly been updated since then since the GeForce RTX 3080 did not come out until September of 2020:
+
+	\begin{center}
+		\includegraphics[width=\textwidth]{images/gpu-fp32-fp64-table.png}
+	\end{center}
+
+
 \input{bibliography.tex}
 
 \end{document}
diff --git a/lectures/images/gpu-fp32-fp64-table.png b/lectures/images/gpu-fp32-fp64-table.png