TODO

This page contains a list of things that could be done to improve this project. This list is not exhaustive by any means, there are a lot of other possible contributions.

If someone is interested in participating in one of these tasks, don't hesitate to contact me or to open a Pull Request.

Integers

Currently ETL is mainly optimized for floating points, both in a single and double precision. It is also a bit optimized for complex numbers. However, it is not optimized for integers which are not vectorized and cannot be handled by BLAS library so far.

Several things should be done to improve this state:

Operations on integers should be vectorized
Operations on integers should be able to pass to BLAS libraries

FFT

The FFT implementation is fairly good, yet several things could be improved

A 6-point transform module could be added to improve performance further
There is still room for improvement in some transform modules
The Inverse FFT implementation should not make multiple pass over the data, this mean that the conjugate should be done directly inside the transform modules based on a function parameter
Add support for ifft_many (and 2d)

GPU

The GPU support is quite limited for now. Indeed, it supports GEMM, FFT, Convolutions and Transpositions. Other routines could be faster using GPU. Moreover, having more operations supported on GPU would help in having a full GPU mode with no going back to GPU.

The optimization of GPU temporaries should be going further.

A full GPU mode would be very interesting. Indeed, at this point, the GPU result is always going back to CPU after the evaluation of the operation. It should only be going back to CPU if necessary by an algorithm or for printing.

Operations that could be performed on GPU:

Reductions (sum, max, min, ...)
Scalar operations on matrices
...

Matrix Multiplication

When BLAS is not present, matrix multiplication is not efficient enough. It must be made much faster.

An efficient implementation for floats and complex numbers would greatly benefit the library.
An AVX-optimized matrix multiplication algorithm for double would be of great value
Improving the standard multiplication algorithm for complex numbers would be a great improvement as well.

Advanced padding

Currently, ETL only pads the complete size of vector or a matrix to a multiple of the vector size.

A lot of performance can be gained in algorithms since everything will be aligned and a lot of loops can be removed.

Misc

Reduce the compile-time overhead of the library expressions
Fix the combinatorial template instantiation issue of the optimizer
Implement some of the optimizer optimizations in the evaluator
Improve performance of out of place matrix transposition
Column major optimizations
Complete the Reference documentation with more information and make sure every supported operation is present.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TODO

Integers

FFT

GPU

Matrix Multiplication

Advanced padding

Misc

Clone this wiki locally