Skip to content

Commit

Permalink
Prepare L21 flipped note
Browse files Browse the repository at this point in the history
  • Loading branch information
h365chen committed Feb 26, 2024
1 parent 4798d87 commit 327b1f7
Showing 1 changed file with 15 additions and 6 deletions.
21 changes: 15 additions & 6 deletions lectures/flipped/L21.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,13 +25,19 @@ taking a taxi or rideshare to and from the airport.

## Data parallelism (10 minutes)

* The idea is that we evaluate a function, or kernel, at a set of points. The
set of points is also named *index space*. Each of the points corresponds to a
*work item*.

* The unit of execution in a GPU is called a *warp*, which executes SIMT (Single
Instruction Multiple Thread) instructions. There can be multiple units of
execution in a given GPU
execution in a given GPU.

* Each data element is a work item. CUDA spawns a thread for each work item,
with a unique thread ID; they are grouped into blocks (see [Grid of Thread
Blocks](https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#thread-hierarchy-grid-of-thread-blocks))
Blocks](https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#thread-hierarchy-grid-of-thread-blocks)).
Blocks can share memory locally, but each block has to be able to execute
independently

* Discuss the memory types (A useful material may be
[link](https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#memory-hierarchy))
Expand All @@ -42,17 +48,20 @@ taking a taxi or rideshare to and from the airport.
* constant memory (read-only, , optimized for different memory usages)

* Branches. In practice, the hardware will execute all branches that any thread
in a warp executed (due to SIMT)
in a warp executed (due to SIMT).
[This](https://stackoverflow.com/questions/17223640/is-branch-divergence-really-so-bad)
may help.

* Atomic functions
* Atomic functions. For example, two work items somehow touch the same memory
location.

## Live coding: kernels (10 minutes)

* show the simple sum kernel and run it (see `live-coding/L22/`)
* show the simple sum kernel and run it (see `live-coding/L21/`)

## Activity

TODO: find a better example, maybe the password cracking example from L24.
TODO: find a better example

Project the N-body problem code from L22. Each student is a work-item. Assign a
grid number to each student on the 2D grid induced by the classroom layout. So,
Expand Down

0 comments on commit 327b1f7

Please sign in to comment.