Skip to content

v9.1.0

Latest
Compare
Choose a tag to compare
@xiaoyeli xiaoyeli released this 11 Nov 01:53
· 6 commits to master since this release
ef8a7c1

v9.1.0 Release Note

This includes the following updates:

  1. Improved batched interface to solve many independent systems at the same time.
    Internally it uses C++ template to support multiple datatypes, e.g., complex.
    Please cite this IJHPCA paper when you use the batched functions.

  2. "SolveOnly" interface: you can input your own LU (or ILU) factored matrices,
    but use our parallel, multi-GPU capable sparse triangular solve routine.
    This is achieved by setting: options->SolveOnly = YES;
    The user still inputs matrix A. Internally, we will treat the lower triangle
    of A as the L factor, and upper triangle (including diagonal) of A as the U factor.
    See an example program EXAMPLE/pddrive3d.c

  3. Python interface, currently only support double precision.
    See PYTHON/README

  4. Fix memory leaks in the 3D multi-GPU routines in SRC/CplusplusFactor/

What's Changed

  • Fix the sizeof and add casting to trf3d partition structs by @abagusetty in #162
  • Fix memory error when using parallel symbolic factorization (ParMETIS) by @sebastiangrimberg in #164
  • Avoid cuda device compiling step when linking against the library. by @eromero-vlc in #170

New Contributors

Full Changelog: v9.0.0...v9.1.0