Arrays are a special data structure in Go that allow us to allocate contiguous blocks of fixed size memory. Arrays have some special features in Go related to how they are declared and viewed as types.
- If you don't understand the data, you don't understand the problem.
- If you don't understand the cost of solving the problem, you can't reason about the problem.
- If you don't understand the hardware, you can't reason about the cost of solving the problem.
- Arrays are fixed length data structures that can't change.
- Arrays of different sizes are considered to be of different types.
- Memory is allocated as a contiguous block.
- Go gives you control over spacial locality.
- Learn about the design guidelines for data oriented design.
CPU Caches and Why You Care (18:50-20:30) - Scott Meyers
CPU Caches and Why You Care (44:36-45:40) - Scott Meyers
Performance Through Cache-Friendliness (4:25-5:48) - Damian Gryski
-
CPU caches works by caching main memory on cache lines.
-
Cache lines today are either 32 or 64 bytes wide depending on the hardware.
-
Cores do not access main memory directly. They tend to only have access their local caches.
-
Both data and instructions are stored in the caches.
-
Cache lines are shuffled down L1->L2->L3 as new cache lines need to be stored in the caches.
-
Hardware likes to traverse data and instructions linearly along cache lines.
-
Main memory is built on relatively fast cheap memory. Caches are built on very fast expensive memory.
-
Access to main memory is incredibly slow, we need the cache.
- Accessing one byte from main memory will cause an entire cache line to be read and cached.
- Writes to one byte in a cache line requires the entire cache line to be written.
-
Small = Fast
- Compact, well localized code that fits in cache is fastest.
- Compact data structures that fit in cache are fastest.
- Traversals touching only cached data is the fastest.
-
Predictable access patterns matter.
- Whenever it is practical, you want to employ a linear array traversal.
- Provide regular patterns of memory access.
- Hardware can make better predictions about required memory.
-
Cache misses can result in TLB cache misses as well.
- Cache of translations of a virtual address to a physical address.
- Waiting on the OS to tell us where the memory is.
This is a diagram showing the relationship of the cache hierarchy for the 4 Core i7-9xx processor. The caches in the diagram are not to scale. This processor has four cores and each core has two hardware threads. The hardware threads per core share the Level 1 caches. The cores have individual Level 1 and Level 2 caches. All cores for all the processor share the L3 cache.
This is subject to be different in different processors. For this content, the following is the multi-levels of cache associated with the Intel 4 Core i7-9xx processor:
3GHz(3 clock cycles/ns) * 4 instructions per cycle = 12 instructions per ns!
1 ns ............. 1 ns .............. 12 instructions (one)
1 µs .......... 1000 ns .......... 12,000 instructions (thousand)
1 ms ..... 1,000,000 ns ...... 12,000,000 instructions (million)
1 s .. 1,000,000,000 ns .. 12,000,000,000 instructions (billion)
L1 - 64KB Cache (Per Core)
4 cycles of latency at 1.3 ns
Stalls for 16 instructions
L2 - 256KB Cache (Per Core)
12 cycles of latency at 4 ns
Stalls for 48 instructions
L3 - 8MB Cache
40 cycles of latency at 13.3 ns
Stalls for 160 instructions
Main Memory
100 cycle of latency at 33.3 ns
Stalled for 400 instructions
L1 cache reference ......................... 0.5 ns ................... 6 ins
Branch mispredict ............................ 5 ns ................... 60 ins
L2 cache reference ........................... 7 ns ................... 84 ins
Mutex lock/unlock ........................... 25 ns .................. 300 ins
Main memory reference ...................... 100 ns ................. 1200 ins
Compress 1K bytes with Zippy ............. 3,000 ns (3 µs) ........... 36k ins
Send 2K bytes over 1 Gbps network ....... 20,000 ns (20 µs) ........ 240k ins
SSD random read ........................ 150,000 ns (150 µs) ........ 1.8M ins
Read 1 MB sequentially from memory ..... 250,000 ns (250 µs) .......... 3M ins
Round trip within same datacenter ...... 500,000 ns (0.5 ms) .......... 6M ins
Read 1 MB sequentially from SSD* ..... 1,000,000 ns (1 ms) ........... 12M ins
Disk seek ........................... 10,000,000 ns (10 ms) ......... 120M ins
Read 1 MB sequentially from disk .... 20,000,000 ns (20 ms) ......... 240M ins
Send packet CA->Netherlands->CA .... 150,000,000 ns (150 ms) ........ 1.8B ins
CPU Caches and Why You Care - Video - Scott Meyers
A Crash Course in Modern Hardware - Video - Cliff Click
NUMA Deep Dive Series - Frank Denneman
CPU Caches and Why You Care - Deck - Scott Meyers
Mythbusting Modern Hardware to Gain 'Mechanical Sympathy' - Martin Thompson
What Every Programmer Should Know About Memory - Ulrich Drepper
How CPU Caches Work and Why - Joel Hruska
Modern Microprocessors A 90 Minute Guide - Jason Robert Carey Patterson
Memory part 2: CPU caches - Ulrich Drepper
The Free Lunch Is Over - Herb Sutter
Data Center Computers: Modern Challenges in CPU Design - Dick Sites
Wirth's Law - Wikipedia
Eliminate False Sharing - Herb Sutter
The Myth Of Ram - Emil Ernerfeldt
Understanding Transaction Hardware Memory - Gil Gene
Performance Through Cache-Friendliness (4:25-5:48) - Damian Gryski
Going Nowhere Faster - Chandler Carruth
Data-Oriented Design and C++ - Mike Acton
Efficiency with Algorithms, Performance with Data Structures - Chandler Carruth
Taming the performance Beast - Klaus Iglberger
Pitfalls of OOP - Tony Albrecht
Why you should avoid Linked Lists - Bjarne Stroustrup
Data-Oriented Design (Or Why You Might Be Shooting Yourself in The Foot With OOP) - Noel
Was object-oriented programming a failure? - Quora
Declare, initialize and iterate (Go Playground)
Different type arrays (Go Playground)
Contiguous memory allocations (Go Playground)
Range mechanics (Go Playground)
Declare an array of 5 strings with each element initialized to its zero value. Declare a second array of 5 strings and initialize this array with literal string values. Assign the second array to the first and display the results of the first array. Display the string value and address of each element.
Template (Go Playground) | Answer (Go Playground)
All material is licensed under the Apache License Version 2.0, January 2004.