FoldsCUDA

FoldsCUDA.jl provides Transducers.jl-compatible fold (reduce) implemented using CUDA.jl. This brings the transducers and reducing function combinators implemented in Transducers.jl to GPU. Furthermore, using FLoops.jl, you can write parallel for loops that run on GPU.

API

FoldsCUDA exports CUDAEx, a parallel loop executor. It can be used with the parallel for loop created with FLoops.@floop, Base-like high-level parallel API in Folds.jl, and extensible transducers provided by Transducers.jl.

Examples

`findmax` using FLoops.jl

You can pass CUDA executor FoldsCUDA.CUDAEx() to @floop to run a parallel for loop on GPU:

julia> using FoldsCUDA, CUDA, FLoops

julia> using GPUArrays: @allowscalar

julia> xs = CUDA.rand(10^8);

julia> @allowscalar xs[100] = 2;

julia> @allowscalar xs[200] = 2;

julia> @floop CUDAEx() for (x, i) in zip(xs, eachindex(xs))
           @reduce() do (imax = -1; i), (xmax = -Inf32; x)
               if xmax < x
                   xmax = x
                   imax = i
               end
           end
       end

julia> xmax
2.0f0

julia> imax  # the *first* position for the largest value
100

`extrema` using `Transducers.TeeRF`

julia> using Transducers, Folds

julia> @allowscalar xs[300] = -0.5;

julia> Folds.reduce(TeeRF(min, max), xs, CUDAEx())
(-0.5f0, 2.0f0)

julia> Folds.reduce(TeeRF(min, max), (2x for x in xs), CUDAEx())  # iterator comprehension works
(-1.0f0, 4.0f0)

julia> Folds.reduce(TeeRF(min, max), Map(x -> 2x)(xs), CUDAEx())  # equivalent, using a transducer
(-1.0f0, 4.0f0)

More examples

For more examples, see the examples section in the documentation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

FoldsCUDA

API

Examples

`findmax` using FLoops.jl

`extrema` using `Transducers.TeeRF`

More examples

Files

README.md

Latest commit

History

README.md

File metadata and controls

FoldsCUDA

API

Examples

findmax using FLoops.jl

extrema using Transducers.TeeRF

More examples

`findmax` using FLoops.jl

`extrema` using `Transducers.TeeRF`