Skip to content

SYCL port

cpviolator edited this page Aug 22, 2024 · 1 revision

Development is in the sycl branch https://github.com/lattice/quda/tree/feature/sycl

Changes from develop can be seen in the PR https://github.com/lattice/quda/pull/1168

Outstanding changes/issues

  • BlockReduce calls simplified to make it easier to implement in SYCL
  • reducer_t types added for reductions (reducer.h, transform_reduce.cuh)
  • multi blas Args may not fit in max_kernel_arg_size (using max_constant_size instead)
  • quda_target.h needs to be included from quda_internal.h
  • block size in dslash_coarse kernel must evenly divide threads
  • FAST_COMPILE_REDUCE version of block_orthogonalize.cu and restrictor.cu can't go larger than max_block_size