-
Notifications
You must be signed in to change notification settings - Fork 17
TODO
This page contains a list of things that could be done to improve this project. This list is not exhaustive by any means, there are a lot of other possible contributions.
If someone is interested in participating in one of these tasks, don't hesitate to contact me or to open a Pull Request.
Currently ETL is mainly optimized for floating points, both in a single and double precision. It is also a bit optimized for complex numbers. However, it is not optimized for integers which are not vectorized and cannot be handled by BLAS library so far.
Several things should be done to improve this state:
- Operations on integers should be vectorized
- Operations on integers should be able to pass to BLAS libraries
The FFT implementation is fairly good, yet several things could be improved
- A 6-point transform module could be added to improve performance further
- There is still room for improvement in some transform modules
- The Inverse FFT implementation should not make multiple pass over the data, this mean that the conjugate should be done directly inside the transform modules based on a function parameter
- Add support for ifft_many (and 2d)
The GPU support is quite limited for now. Indeed, it supports GEMM, FFT, Convolutions and Transpositions. Other routines could be faster using GPU. Moreover, having more operations supported on GPU would help in having a full GPU mode with no going back to GPU.
The optimization of GPU temporaries should be going further.
A full GPU mode would be very interesting. Indeed, at this point, the GPU result is always going back to CPU after the evaluation of the operation. It should only be going back to CPU if necessary by an algorithm or for printing.
Operations that could be performed on GPU:
- Reductions (sum, max, min, ...)
- Scalar operations on matrices
- ...
When BLAS is not present, matrix multiplication is not efficient enough. It must be made much faster.
- An efficient implementation for floats and complex numbers would greatly benefit the library.
- An AVX-optimized matrix multiplication algorithm for double would be of great value
- Improving the standard multiplication algorithm for complex numbers would be a great improvement as well.
Currently, ETL only pads the complete size of vector or a matrix to a multiple of the vector size.
A lot of performance can be gained in algorithms since everything will be aligned and a lot of loops can be removed.
- Reduce the compile-time overhead of the library expressions
- Fix the combinatorial template instantiation issue of the optimizer
- Implement some of the optimizer optimizations in the evaluator
- Improve performance of out of place matrix transposition
- Column major optimizations
- Complete the Reference documentation with more information and make sure every supported operation is present.