-
Notifications
You must be signed in to change notification settings - Fork 119
Quick Tour
Accelerate is a multi-dimensional parallel array library for Haskell, with a number of distinct capabilities:
- Arrays are regular: dense, rectangular, and of the same element type
- Functions may be polymorphic in the dimensionality of the array
- Accelerate is a stratified language which separates array computations from scalar expressions at the type level
The Accelerate package is currently available from:
- Hackage: stable releases
-
GitHub: main development branch; get it via the website or with
git clone git://github.com/AccelerateHS/accelerate.git
Installing Accelerate on Mac OS X and various *nix-like platforms should be fairly painless. On the other hand, the installer is largely untested on Windows. However, there are no technical reasons why Accelerate will not work, so if you have a Windows machine (as I do not) and are willing to help, please contact me.
Installing Accelerate-cuda on Windows Update 4.7.2013 See: How to install on Windows
To install Accelerate with the CUDA backend, you will first need to install the CUDA driver and developer toolkit. Accelerate is currently being developed with version 4.0 of the toolkit, but other versions may work as well. Assuming the default installation prefix of /usr/local/cuda
, after installing the CUDA SDK:
- make sure your
PATH
includes/usr/local/cuda/bin
- make sure your
LD_LIBRARY_PATH
includes- for 32-bit linux distributions and Mac OS X
/usr/local/cuda/lib
- for 64-bit linux distributions
/usr/local/cuda/lib64:/usr/local/cuda/lib
- for 32-bit linux distributions and Mac OS X
Linux users may alternatively add these library paths to /etc/ld.so.conf
and run ldconfig
as root.
Next, install and test the CUDA bindings:
$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2011 NVIDIA Corporation
Built on Fri_May_13_01:54:22_PDT_2011
Cuda compilation tools, release 4.0, V0.2.1221
$ cabal install cuda
...
$ ghci
Prelude> :m +Foreign.CUDA
Prelude Foreign.CUDA> props 0
DeviceProperties {deviceName = "GeForce GT 120", computeCapability = 1.1, ... }
Once the bindings are are in place, installing Accelerate itself should then proceed without incident.
$ cabal install accelerate-cuda
If you are feeling adventurous, or would like to contribute, download and install the latest source release instead.
When using Accelerate, you need to import both the base library as well as a backend implementation. Writing functions in Accelerate corresponds to building expressions that reflect a computation that will yield a result once executed by a specific backend. That is, when you program using Accelerate you are writing a Haskell program that generates a CUDA program, which is compiled using NVIDIA's compiler, loaded onto the GPU, and executed. However, in many respects it looks just like a Haskell program.
import qualified Data.Array.Accelerate as Acc
import qualified Data.Array.Accelerate.CUDA as CUDA
The library needs to be imported qualified as it shares the same names as list operations in the Prelude. Note that operations that involve writing new index types for Accelerate arrays will require the TypeOperators
language extension.
For non-core functionality and example algorithms, a number of related packages are available:
Parallelism in Accelerate takes the form of collective array operations over types Array sh e
, where sh
is the shape and e
is the element type of the array. Following the approach taken by the repa array library, shapes and indices of arrays are represented using and inductive notation of tuples as heterogeneous snoc lists, to enable rank-polymorphic array functions.
Shape types are built somewhat like lists. As shown in the following encoding, on both the type and value level we use the constructor Z
to represent a shape of rank zero, and the infix operator (:.)
to increase the rank by adding a new dimension to the right of the shape.
data Z = Z -- rank-0
data tail :. head = tail :. head -- increase rank by 1
Thus, a rank-3 index with components x
, y
and z
is written as (Z :. x :. y :. z)
and has type (Z :. Int :. Int :. Int)
. We also define type synonyms for common shape and array types:
type DIM0 = Z
type DIM1 = DIM0 :. Int
type DIM2 = DIM1 :. Int
type DIM3 = DIM2 :. Int
-- and so on...
type Array DIM0 e = Scalar e
type Array DIM1 e = Vector e
Thus Array DIM2 Double
is the type of a two-dimensional array of double-precision floating point values, while Array Z Float
is a zero-dimensional object--- i.e. a point ---holding a single floating value.
Many operations over arrays are polymorphic in the shape or dimension component, while others operate on the array shape or element indices, rather than the values themselves.
Accelerate is an embedded language that distinguishes between vanilla Haskell arrays and arrays within the embedded language, identified by the type constructor Acc
, as well as the operations on each flavour of array. In the CUDA backend, this differentiates vanilla arrays which reside in the CPU host memory, from arrays in the embedded language that are allocated in GPU device memory. Embedded array computations must explicitly lift vanilla arrays into the embedded language, which in this example also entails transferring the array data to GPU memory. The following function embeds arrays into the internal language, where the type classes Shape
and Elt
characterise the types that may be used as array indices and elements respectively.
use :: (Shape sh, Elt e) => Array sh e -> Acc (Array sh e)
Accelerate distinguishes the types of collective and scalar computations ---Acc
and Exp
respectively--- to achieve a stratified language. Collective operations comprise many scalar computations that are executed in parallel, but scalar computations can not contain collective operations. This separation excludes nested, irregular data-parallelism statically; instead, Accelerate is limited to flat data-parallelism involving only regular, multi-dimensional arrays.
As a simple example consider computing the dot-product of two vectors, where the two input vectors are multiplied element-wise and the resulting products summed to yield a scalar result. Using Accelerate, we can implement this as follows:
dotp :: Vector Float -> Vector Float -> Acc (Scalar Float)
dotp xs ys = let xs' = use xs
ys' = use ys
in
fold (+) 0 (zipWith (*) xs' ys')
Here, fold
and zipWith
are from Data.Array.Accelerate
rather than the versions from the standard Prelude. Note that the result is an Accelerate computation, indicated by the type Acc
, meaning that the function may be online-compiled for performance; for example, by using the CUDA backend it may be on-the-fly off-loaded to a GPU.