First of all, yes, these are all based on an initial tridiagonalization (often quoted to be 43n3 flops). DSYEV is just an easier to use version of DSYEVX, so let's ignore it for now. The basic breakdown is such:
- DSYEVX: Tridiagonal implicitly shifted QR (bulge chasing)
- DSYEVD: Divide and Conquer (stitching by rank 1 modifications)
- DSYEVR: Use the Multiple Relatively Robust Representations (MRRR) algorithm (symmetric dqds)
The flop count of each of these algorithms is highly dependent on the eigenvalue distribution. For example, if your matrix has clusters of closely packed eigenvalues that are nearly degenerate, then these algorithms tend to have reduced order of convergence. At best you have O(n2) flops in the tridiagonal eigenproblem, and at worst O(n3) if the sizes of your clusters go as O(n). See this working note for a more detailed analysis.
Since you want eigenvectors, let's look at the flop count for their extraction. Implicitly shifted QR continually updates the orthogonal matrices used for reduction, so it is always runs in O(n3) when eigenvectors are desired (basically, each QR bulge sweep takes O(n2), and you need to O(n) sweeps usually).
For D&C and MRRR, the textbook descriptions say to compute eigenvalues first, then extract eigenvectors with inverse iteration. This requires O(n) work per eigenvector since usually 1-2 iterations is enough, so it would be O(n2) work to get all the eigenvectors after the eigenvalues have been found. However, in practice, Lapack does something more sophisticated.
I'm not familiar with D&C, but look at the source, it looks like it keeps the eigenvectors updated as it goes, so it's runtime is likely somewhere between O(n2) and O(n3). For MRRR, the failure of inverse iteration has been analyzed in this paper. In practice, the eigenvectors must be updated throughout the algorithm and reorthogonalized when clusters are detected. A few of the details of this process can be controlled in the interface to DSYEVR, and like D&C, the practical runtime is somewhere between quadratic and cubic.
graffy@physix90:/opt/ipr/cluster/work.local/graffy/bug3177$ time ./diag_magma_gpu 8000
nb = 100
info = 0
Time taken by dsyev for matrix size 8000 was 409.82 seconds
-51.6506244 -51.4058072 -51.3475218 -51.2443889 -51.1703626 -51.1030575 -51.0072332 -50.9789340 -50.9090230 -50.8573262
50.9106148 50.9503807 51.0075738 51.1221930 51.2002598 51.2702891 51.3218429 51.4007310 51.6350470 4000.6517809
real 0m9,112s
user 6m39,464s
sys 0m11,204s
graffy@physix90:/opt/ipr/cluster/work.local/graffy/bug3177$ time ./diag_mkl 8000
info = 0
Time taken by dsyev for matrix size 8000 was 176.24 seconds
-51.6506244 -51.4058072 -51.3475218 -51.2443889 -51.1703626 -51.1030575 -51.0072332 -50.9789340 -50.9090230 -50.8573262
50.9106148 50.9503807 51.0075738 51.1221930 51.2002598 51.2702891 51.3218429 51.4007310 51.6350470 4000.6517809
real 2m57,398s
user 2m56,736s
sys 0m0,208s
graffy@physix90:/opt/ipr/cluster/work.local/graffy/bug3177$ time ./diag_magma 8000
info = 0
Time taken by dsyev for matrix size 8000 was 189.90 seconds
-51.6506244 -51.4058072 -51.3475218 -51.2443889 -51.1703626 -51.1030575 -51.0072332 -50.9789340 -50.9090230 -50.8573262
50.9106148 50.9503807 51.0075738 51.1221930 51.2002598 51.2702891 51.3218429 51.4007310 51.6350470 4000.6517809
real 0m23,362s
user 3m9,148s
sys 0m1,496s
graffy@physix90:/opt/ipr/cluster/work.local/graffy/bug3177$ time ./diag_magma_gpu 16000
nb = 100
info = 0
Time taken by dsyev for matrix size 16000 was 1828.30 seconds
-72.8550325 -72.6979438 -72.6765172 -72.6194803 -72.5662783 -72.4850298 -72.4639951 -72.3427485 -72.3181432 -72.2927798
72.3422654 72.4097912 72.4202021 72.5078816 72.5726288 72.6068457 72.6619977 72.8739593 72.9735306 7999.9111633
real 0m35,040s
user 29m43,724s
sys 0m47,976s
>>> ( 4.0/3.0 * 16000**3 )/1.e9/(35.040)
155.8599695585997
graffy@physix90:/opt/ipr/cluster/work.local/graffy/bug3177$ time ./diag_magma_gpu 32000
nb = 100
info = 0
Time taken by dsyev for matrix size 32000 was 9012.72 seconds
-103.1265024 -103.0976923 -103.0580529 -102.9909914 -102.9341979 -102.8812811 -102.8219491 -102.7303616 -102.6743930 -102.6622158
102.6781966 102.6999579 102.7533367 102.8068101 102.8417293 102.9259807 102.9992714 103.0677603 103.2170313 15999.6382274
real 2m42,285s
user 146m26,744s
sys 4m6,176s
>>> ( 4.0/3.0 * 32000**3 )/1.e9/(162.285)
269.2218422322868
n = 8000 => nops = 4/3 * 8000**3 = 682666666666.6666
method | gflops |
---|---|
mkl (1 core) | 3.8 |
magma 8 cores | 29.2 |
magma 1 p100 | 74.9 |
>>> 177.398/23.362
7.5934423422652175
graffy@physix90:/opt/ipr/cluster/work.local/graffy/bug3177$ time ./diag_magma 16000
info = 0
Time taken by dsyev for matrix size 16000 was 2179.51 seconds
-72.8550325 -72.6979438 -72.6765172 -72.6194803 -72.5662783 -72.4850298 -72.4639951 -72.3427485 -72.3181432 -72.2927798
72.3422654 72.4097912 72.4202021 72.5078816 72.5726288 72.6068457 72.6619977 72.8739593 72.9735306 7999.9111633
real 4m40,192s
user 36m14,516s
sys 0m8,284s
graffy@physix90:/opt/ipr/cluster/work.local/graffy/bug3177$ time ./diag_magma_gpu 16000
nb = 100
info = 0
Time taken by dsyev for matrix size 16000 was 1837.43 seconds
-72.8550325 -72.6979438 -72.6765172 -72.6194803 -72.5662783 -72.4850298 -72.4639951 -72.3427485 -72.3181432 -72.2927798
72.3422654 72.4097912 72.4202021 72.5078816 72.5726288 72.6068457 72.6619977 72.8739593 72.9735306 7999.9111633
real 0m38,862s
user 29m53,972s
sys 0m46,992s
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 14071 C ./diag_magma_gpu 4301MiB |
+-----------------------------------------------------------------------------+
Mon Jun 28 11:00:55 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.152.00 Driver Version: 418.152.00 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla P100-PCIE... On | 00000000:25:00.0 Off | 0 |
| N/A 36C P0 70W / 250W | 4311MiB / 16280MiB | 57% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla P100-PCIE... On | 00000000:5B:00.0 Off | 0 |
| N/A 29C P0 26W / 250W | 10MiB / 16280MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 2 Tesla P100-PCIE... On | 00000000:9B:00.0 Off | 0 |
| N/A 32C P0 26W / 250W | 10MiB / 16280MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 3 Tesla P100-PCIE... On | 00000000:C8:00.0 Off | 0 |
| N/A 31C P0 25W / 250W | 10MiB / 16280MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
graffy@physix90:/opt/ipr/cluster/work.local/graffy/bug3177$ env | grep NUM_THR
MKL_NUM_THREADS=72
MAGMA_NUM_THREADS=72
OMP_NUM_THREADS=72
graffy@physix90:/opt/ipr/cluster/work.local/graffy/bug3177$ time ./diag_mkl_omp 16000
info = 0
Time taken by dsyev for matrix size 16000 was 2187.96 seconds
-72.8550325 -72.6979438 -72.6765172 -72.6194803 -72.5662783 -72.4850298 -72.4639951 -72.3427485 -72.3181432 -72.2927798
72.3422654 72.4097912 72.4202021 72.5078816 72.5726288 72.6068457 72.6619977 72.8739593 72.9735306 7999.9111633
real 4m37,511s
user 36m22,320s
sys 0m8,896s
graffy@physix90:/opt/ipr/cluster/work.local/graffy/bug3177$ export OMP_NUM_THREADS=1
graffy@physix90:/opt/ipr/cluster/work.local/graffy/bug3177$ export MAGMA_NUM_THREADS=1
graffy@physix90:/opt/ipr/cluster/work.local/graffy/bug3177$ export MKL_NUM_THREADS=1
graffy@physix90:/opt/ipr/cluster/work.local/graffy/bug3177$ time ./diag_magma_gpu 16000
nb = 100
info = 0
Time taken by dsyev for matrix size 16000 was 30.13 seconds
-72.8550325 -72.6979438 -72.6765172 -72.6194803 -72.5662783 -72.4850298 -72.4639951 -72.3427485 -72.3181432 -72.2927798
72.3422654 72.4097912 72.4202021 72.5078816 72.5726288 72.6068457 72.6619977 72.8739593 72.9735306 7999.9111633
real 0m33,676s
user 0m26,136s
sys 0m7,380s
graffy@physix90:/opt/ipr/cluster/work.local/graffy/bug3177$ time ./diag_magma_gpu 32000
nb = 100
info = 0
Time taken by dsyev for matrix size 32000 was 153.27 seconds
-103.1265024 -103.0976923 -103.0580529 -102.9909914 -102.9341979 -102.8812811 -102.8219491 -102.7303616 -102.6743930 -102.6622158
102.6781966 102.6999579 102.7533367 102.8068101 102.8417293 102.9259807 102.9992714 103.0677603 103.2170313 15999.6382274
real 2m54,601s
user 2m13,468s
sys 0m40,424s
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 30518 C ./diag_magma_gpu 16021MiB |
+-----------------------------------------------------------------------------+
Mon Jun 28 13:27:50 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.152.00 Driver Version: 418.152.00 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla P100-PCIE... On | 00000000:25:00.0 Off | 0 |
| N/A 38C P0 134W / 250W | 16031MiB / 16280MiB | 73% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla P100-PCIE... On | 00000000:5B:00.0 Off | 0 |
| N/A 29C P0 26W / 250W | 10MiB / 16280MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 2 Tesla P100-PCIE... On | 00000000:9B:00.0 Off | 0 |
| N/A 32C P0 26W / 250W | 10MiB / 16280MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 3 Tesla P100-PCIE... On | 00000000:C8:00.0 Off | 0 |
| N/A 32C P0 25W / 250W | 10MiB / 16280MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
graffy@physix90:/opt/ipr/cluster/work.local/graffy/bug3177$ time ./diag_magma_gpu 8000
nb = 100
info = 0
Time taken by dsyev for matrix size 8000 was 7.58 seconds
-51.6506244 -51.4058072 -51.3475218 -51.2443889 -51.1703626 -51.1030575 -51.0072332 -50.9789340 -50.9090230 -50.8573262
50.9106148 50.9503807 51.0075738 51.1221930 51.2002598 51.2702891 51.3218429 51.4007310 51.6350470 4000.6517809
real 0m8,701s
user 0m6,356s
sys 0m2,064s
graffy@physix90:/opt/ipr/cluster/work.local/graffy/bug3177$ time ./diag_magma_gpu 12000
nb = 100
info = 0
Time taken by dsyev for matrix size 12000 was 16.46 seconds
-63.0966139 -62.9934115 -62.9068927 -62.8081064 -62.7892967 -62.7294508 -62.6248293 -62.5964066 -62.5652698 -62.5251800
62.5216052 62.5726116 62.6076412 62.7159574 62.8125527 62.8662591 62.9496596 63.0661715 63.1963897 6000.2081462
real 0m18,357s
user 0m14,040s
sys 0m4,208s
graffy@physix90:/opt/ipr/cluster/work.local/graffy/bug3177$ export MAGMA_NUM_THREADS=8
graffy@physix90:/opt/ipr/cluster/work.local/graffy/bug3177$ time ./diag_mkl 32000
^Cforrtl: error (69): process interrupted (SIGINT)
Image PC Routine Line Source
diag_mkl 0000000000407B6B Unknown Unknown Unknown
libpthread-2.24.s 00007FCD9D69B0E0 Unknown Unknown Unknown
libmkl_avx512.so 00007FCBB113D58F mkl_blas_avx512_x Unknown Unknown
libmkl_core.so 00007FCD9FF6FD56 mkl_blas_xdsyr2 Unknown Unknown
libmkl_core.so 00007FCDA06F6132 mkl_lapack_dsytd2 Unknown Unknown
libmkl_core.so 00007FCDA0AD062E mkl_lapack_xdsytr Unknown Unknown
libmkl_core.so 00007FCDA06E8011 mkl_lapack_dsyev Unknown Unknown
libmkl_intel_lp64 00007FCD9F5BD7BC mkl_lapack__dsyev Unknown Unknown
diag_mkl 0000000000405497 Unknown Unknown Unknown
diag_mkl 00000000004049A2 Unknown Unknown Unknown
libc-2.24.so 00007FCD9D30B2E1 __libc_start_main Unknown Unknown
diag_mkl 000000000040486A Unknown Unknown Unknown
real 11m20,586s
user 11m15,500s
sys 0m3,396s
graffy@physix90:/opt/ipr/cluster/work.local/graffy/bug3177$ time ./diag_mkl_omp 32000
info = 0
Time taken by dsyev for matrix size 32000 was 19106.00 seconds
-103.1265024 -103.0976923 -103.0580529 -102.9909914 -102.9341979 -102.8812811 -102.8219491 -102.7303616 -102.6743930 -102.6622158
102.6781966 102.6999579 102.7533367 102.8068101 102.8417293 102.9259807 102.9992714 103.0677603 103.2170313 15999.6382274
real 40m25,284s
user 317m41,348s
sys 1m4,800s
graffy@physix90:/opt/ipr/cluster/work.local/graffy/bug3177$ for c in 1 2 3 4 5 6 7 8 10 12 16; do echo "num cores : $c";time OMP_NUM_THREADS=$c MKL_NUM_THREADS=$c ./diag_mkl_omp 6000; done
graffy@physix90:/opt/ipr/cluster/work.local/graffy/bug3177$ for c in 1 2 4 8 12 16 18 36 72; do echo "num cores : $c";time OMP_NUM_THREADS=$c MKL_NUM_THREADS=$c ./diag_mkl_omp 12000; done
graffy@graffy-ws2:~/work/tickets/bug3177$ rsync --rsh=ssh -va ./ [email protected]:/mnt/workl/graffy/bug3177/
graffy@physix12:/mnt/workl/graffy/bug3177$ module load compilers/ifort/latest
graffy@physix12:/mnt/workl/graffy/bug3177$ for c in 1 2 4 8 12 16 18 24 32; do echo "num cores : $c";time OMP_NUM_THREADS=$c MKL_NUM_THREADS=$c ./diag_mkl_omp 12000; done
date >> ./toto.log; for c in {1..32}; do echo "num cores : $c" >> ./toto.log ; OMP_NUM_THREADS=1 MKL_NUM_THREADS=1 ./diag_mkl_omp 6000 >> ./toto.log & done; wait ; date >> ./toto.log
jeudi 19 août 2021, 20:06:56 (UTC+0200)
jeudi 19 août 2021, 20:13:39 (UTC+0200)
graffy@physix12:/mnt/workl/graffy/bug3177$ strings ./toto.log | grep taken
Time taken by dsyev for matrix size 6000 was 263.56 seconds
Time taken by dsyev for matrix size 6000 was 278.55 seconds
Time taken by dsyev for matrix size 6000 was 279.00 seconds
Time taken by dsyev for matrix size 6000 was 280.90 seconds
Time taken by dsyev for matrix size 6000 was 283.78 seconds
Time taken by dsyev for matrix size 6000 was 289.54 seconds
Time taken by dsyev for matrix size 6000 was 294.37 seconds
Time taken by dsyev for matrix size 6000 was 298.77 seconds
Time taken by dsyev for matrix size 6000 was 300.43 seconds
Time taken by dsyev for matrix size 6000 was 302.72 seconds
Time taken by dsyev for matrix size 6000 was 308.66 seconds
Time taken by dsyev for matrix size 6000 was 310.77 seconds
Time taken by dsyev for matrix size 6000 was 312.39 seconds
Time taken by dsyev for matrix size 6000 was 313.05 seconds
Time taken by dsyev for matrix size 6000 was 319.08 seconds
Time taken by dsyev for matrix size 6000 was 322.58 seconds
Time taken by dsyev for matrix size 6000 was 382.66 seconds
Time taken by dsyev for matrix size 6000 was 386.16 seconds
Time taken by dsyev for matrix size 6000 was 387.94 seconds
Time taken by dsyev for matrix size 6000 was 388.16 seconds
Time taken by dsyev for matrix size 6000 was 390.04 seconds
Time taken by dsyev for matrix size 6000 was 393.09 seconds
Time taken by dsyev for matrix size 6000 was 394.48 seconds
Time taken by dsyev for matrix size 6000 was 394.61 seconds
Time taken by dsyev for matrix size 6000 was 394.26 seconds
Time taken by dsyev for matrix size 6000 was 395.16 seconds
Time taken by dsyev for matrix size 6000 was 396.60 seconds
Time taken by dsyev for matrix size 6000 was 397.20 seconds
Time taken by dsyev for matrix size 6000 was 398.25 seconds
Time taken by dsyev for matrix size 6000 was 399.21 seconds
Time taken by dsyev for matrix size 6000 was 399.24 seconds
Time taken by dsyev for matrix size 6000 was 401.76 seconds
graffy@physix12:/mnt/workl/graffy/bug3177$ for c in 1 2 4 8 12 16 18 24 32; do echo "num cores : $c";time OMP_NUM_THREADS=$c MKL_NUM_THREADS=$c ./diag_mkl_omp 6000; done
waw ... diagonalising a 6000x6000 matrix takes only 122s on one core of physix12, but if you attempt tu launch 32 of them simultaneously, they take between 263s and 401s...! The only explanation I see is that they're fighting for cache.. let's try smaller matrices
graffy@physix12:/mnt/workl/graffy/bug3177$ date >> ./toto.log; for c in {1..16}; do echo "num cores : $c" >> ./toto.log ; OMP_NUM_THREADS=1 MKL_NUM_THREADS=1 ./diag_mkl_omp 6000 >> ./toto.log & done; wait ; date >> ./toto.log
jeudi 19 août 2021, 20:26:12 (UTC+0200)
jeudi 19 août 2021, 20:30:32 (UTC+0200)
graffy@physix12:/mnt/workl/graffy/bug3177$ strings ./toto.log | grep taken
Time taken by dsyev for matrix size 6000 was 189.82 seconds
Time taken by dsyev for matrix size 6000 was 190.97 seconds
Time taken by dsyev for matrix size 6000 was 193.70 seconds
Time taken by dsyev for matrix size 6000 was 196.05 seconds
Time taken by dsyev for matrix size 6000 was 196.68 seconds
Time taken by dsyev for matrix size 6000 was 196.94 seconds
Time taken by dsyev for matrix size 6000 was 197.21 seconds
Time taken by dsyev for matrix size 6000 was 200.16 seconds
Time taken by dsyev for matrix size 6000 was 251.46 seconds
Time taken by dsyev for matrix size 6000 was 251.74 seconds
Time taken by dsyev for matrix size 6000 was 251.94 seconds
Time taken by dsyev for matrix size 6000 was 253.90 seconds
Time taken by dsyev for matrix size 6000 was 253.90 seconds
Time taken by dsyev for matrix size 6000 was 255.48 seconds
Time taken by dsyev for matrix size 6000 was 256.46 seconds
Time taken by dsyev for matrix size 6000 was 259.31 seconds
graffy@physix12:/mnt/workl/graffy/bug3177$ mv toto.log [email protected]
graffy@physix12:/mnt/workl/graffy/bug3177$ date > ./toto.log; for c in {1..32}; do echo "num cores : $c" >> ./toto.log ; OMP_NUM_THREADS=1 MKL_NUM_THREADS=1 ./diag_mkl_omp 4000 >> ./toto.log & done; wait ; date >> ./toto.log
jeudi 19 août 2021, 20:34:47 (UTC+0200)
jeudi 19 août 2021, 20:37:17 (UTC+0200)
graffy@physix12:/mnt/workl/graffy/bug3177$ strings ./toto.log | grep taken
Time taken by dsyev for matrix size 4000 was 60.19 seconds
Time taken by dsyev for matrix size 4000 was 61.14 seconds
Time taken by dsyev for matrix size 4000 was 70.28 seconds
Time taken by dsyev for matrix size 4000 was 70.73 seconds
Time taken by dsyev for matrix size 4000 was 70.95 seconds
Time taken by dsyev for matrix size 4000 was 71.16 seconds
Time taken by dsyev for matrix size 4000 was 71.27 seconds
Time taken by dsyev for matrix size 4000 was 71.59 seconds
Time taken by dsyev for matrix size 4000 was 81.06 seconds
Time taken by dsyev for matrix size 4000 was 83.59 seconds
Time taken by dsyev for matrix size 4000 was 84.70 seconds
Time taken by dsyev for matrix size 4000 was 86.68 seconds
Time taken by dsyev for matrix size 4000 was 91.74 seconds
Time taken by dsyev for matrix size 4000 was 91.82 seconds
Time taken by dsyev for matrix size 4000 was 135.89 seconds
Time taken by dsyev for matrix size 4000 was 142.83 seconds
Time taken by dsyev for matrix size 4000 was 143.08 seconds
Time taken by dsyev for matrix size 4000 was 144.80 seconds
Time taken by dsyev for matrix size 4000 was 145.04 seconds
Time taken by dsyev for matrix size 4000 was 145.25 seconds
Time taken by dsyev for matrix size 4000 was 145.77 seconds
Time taken by dsyev for matrix size 4000 was 145.80 seconds
Time taken by dsyev for matrix size 4000 was 145.93 seconds
Time taken by dsyev for matrix size 4000 was 146.92 seconds
Time taken by dsyev for matrix size 4000 was 147.27 seconds
Time taken by dsyev for matrix size 4000 was 147.59 seconds
Time taken by dsyev for matrix size 4000 was 148.72 seconds
Time taken by dsyev for matrix size 4000 was 148.86 seconds
Time taken by dsyev for matrix size 4000 was 148.99 seconds
Time taken by dsyev for matrix size 4000 was 149.07 seconds
Time taken by dsyev for matrix size 4000 was 149.23 seconds
Time taken by dsyev for matrix size 4000 was 149.41 seconds
graffy@physix12:/mnt/workl/graffy/bug3177$ mv toto.log [email protected]
graffy@physix90:/mnt/workl/graffy/bug3177$ date >> ./toto.log; for c in {1..70}; do echo "num cores : $c" >> ./toto.log ; OMP_NUM_THREADS=1 MKL_NUM_THREADS=1 ./diag_mkl_omp 8000 >> ./toto.log & done; wait ; date >> ./toto.log
jeudi 19 août 2021, 20:01:48 (UTC+0200)
jeudi 19 août 2021, 20:28:29 (UTC+0200)
graffy@physix90:/mnt/workl/graffy/bug3177$ strings size8000\@72cores.log | grep taken
Time taken by dsyev for matrix size 8000 was 1377.04 seconds
Time taken by dsyev for matrix size 8000 was 1385.54 seconds
Time taken by dsyev for matrix size 8000 was 1386.46 seconds
Time taken by dsyev for matrix size 8000 was 1386.28 seconds
Time taken by dsyev for matrix size 8000 was 1392.95 seconds
Time taken by dsyev for matrix size 8000 was 1393.57 seconds
Time taken by dsyev for matrix size 8000 was 1408.21 seconds
Time taken by dsyev for matrix size 8000 was 1413.81 seconds
Time taken by dsyev for matrix size 8000 was 1427.93 seconds
Time taken by dsyev for matrix size 8000 was 1429.53 seconds
Time taken by dsyev for matrix size 8000 was 1434.31 seconds
Time taken by dsyev for matrix size 8000 was 1434.73 seconds
Time taken by dsyev for matrix size 8000 was 1435.41 seconds
Time taken by dsyev for matrix size 8000 was 1434.86 seconds
Time taken by dsyev for matrix size 8000 was 1449.81 seconds
Time taken by dsyev for matrix size 8000 was 1449.94 seconds
Time taken by dsyev for matrix size 8000 was 1458.62 seconds
Time taken by dsyev for matrix size 8000 was 1464.88 seconds
Time taken by dsyev for matrix size 8000 was 1465.58 seconds
Time taken by dsyev for matrix size 8000 was 1474.32 seconds
Time taken by dsyev for matrix size 8000 was 1474.93 seconds
Time taken by dsyev for matrix size 8000 was 1475.24 seconds
Time taken by dsyev for matrix size 8000 was 1479.48 seconds
Time taken by dsyev for matrix size 8000 was 1480.93 seconds
Time taken by dsyev for matrix size 8000 was 1484.08 seconds
Time taken by dsyev for matrix size 8000 was 1486.02 seconds
Time taken by dsyev for matrix size 8000 was 1487.28 seconds
Time taken by dsyev for matrix size 8000 was 1484.03 seconds
Time taken by dsyev for matrix size 8000 was 1491.19 seconds
Time taken by dsyev for matrix size 8000 was 1490.94 seconds
Time taken by dsyev for matrix size 8000 was 1491.42 seconds
Time taken by dsyev for matrix size 8000 was 1493.18 seconds
Time taken by dsyev for matrix size 8000 was 1492.59 seconds
Time taken by dsyev for matrix size 8000 was 1494.33 seconds
Time taken by dsyev for matrix size 8000 was 1498.14 seconds
Time taken by dsyev for matrix size 8000 was 1499.87 seconds
Time taken by dsyev for matrix size 8000 was 1498.84 seconds
Time taken by dsyev for matrix size 8000 was 1503.38 seconds
Time taken by dsyev for matrix size 8000 was 1511.98 seconds
Time taken by dsyev for matrix size 8000 was 1511.79 seconds
Time taken by dsyev for matrix size 8000 was 1514.42 seconds
Time taken by dsyev for matrix size 8000 was 1515.51 seconds
Time taken by dsyev for matrix size 8000 was 1518.11 seconds
Time taken by dsyev for matrix size 8000 was 1524.22 seconds
Time taken by dsyev for matrix size 8000 was 1525.70 seconds
Time taken by dsyev for matrix size 8000 was 1528.15 seconds
Time taken by dsyev for matrix size 8000 was 1529.68 seconds
Time taken by dsyev for matrix size 8000 was 1529.47 seconds
Time taken by dsyev for matrix size 8000 was 1532.36 seconds
Time taken by dsyev for matrix size 8000 was 1532.66 seconds
Time taken by dsyev for matrix size 8000 was 1537.51 seconds
Time taken by dsyev for matrix size 8000 was 1540.12 seconds
Time taken by dsyev for matrix size 8000 was 1541.18 seconds
Time taken by dsyev for matrix size 8000 was 1543.94 seconds
Time taken by dsyev for matrix size 8000 was 1540.04 seconds
Time taken by dsyev for matrix size 8000 was 1543.16 seconds
Time taken by dsyev for matrix size 8000 was 1549.16 seconds
Time taken by dsyev for matrix size 8000 was 1547.96 seconds
Time taken by dsyev for matrix size 8000 was 1551.22 seconds
Time taken by dsyev for matrix size 8000 was 1548.58 seconds
Time taken by dsyev for matrix size 8000 was 1554.20 seconds
Time taken by dsyev for matrix size 8000 was 1555.50 seconds
Time taken by dsyev for matrix size 8000 was 1556.40 seconds
Time taken by dsyev for matrix size 8000 was 1555.19 seconds
Time taken by dsyev for matrix size 8000 was 1557.36 seconds
Time taken by dsyev for matrix size 8000 was 1557.69 seconds
Time taken by dsyev for matrix size 8000 was 1564.10 seconds
Time taken by dsyev for matrix size 8000 was 1564.82 seconds
Time taken by dsyev for matrix size 8000 was 1567.04 seconds
Time taken by dsyev for matrix size 8000 was 1576.38 seconds
-
with aviel, realized that the matrices are not the same, and that could make some diagonalizations faster than others. So I's added a fixed seed to ensure all computations are the same
-
to see the effect of the cache on performance, I'll do computations on smaller matrices that will fit into cache memory (1Mb per core for xeon 6248r).
- to do that I've added a loop argument to diag
graffy@physix90:/mnt/workl/graffy/bug3177$ export LOG_FILE=./run$(date).log; date > "$LOG_FILE"; for c in {1..2}; do echo "num cores : $c" >> "$LOG_FILE" ; OMP_NUM_THREADS=1 MKL_NUM_THREADS=1 ./diag_mkl_omp 128 10000 >> "$LOG_FILE" & done; wait ; date >> "$LOG_FILE"
graffy@physix12:/mnt/workl/graffy/bug3177$ ./cohab_batch.bash 32 physix12 10
# machine_id matrix_size num_loops num_processes duration_min(s) duration_max(s)
physix12 128 10000 1 23.33 23.33
physix12 128 10000 2 23.08 23.08
graffy@physix90:/mnt/workl/graffy/bug3177$ ./cohab_batch.bash 72 physix90 10
# machine_id matrix_size num_loops num_processes duration_min(s) duration_max(s)
physix90 128 10000 1 12.38 12.38
physix90 128 10000 2 12.38 12.45
graffy@graffy-ws2:~/work/tickets/bug3177$ rsync --rsh=ssh -va [email protected]:/mnt/workl/graffy/bug3177/cohab* ./cohab/physix12/
graffy@graffy-ws2:~/work/tickets/bug3177$ rsync --rsh=ssh -va [email protected]:/mnt/workl/graffy/bug3177/physix12* ./cohab/physix12/
graffy@graffy-ws2:~/work/tickets/bug3177$ rsync --rsh=ssh -va [email protected]:/mnt/workl/graffy/bug3177/cohab* ./cohab/physix90/
graffy@graffy-ws2:~/work/tickets/bug3177$ rsync --rsh=ssh -va [email protected]:/mnt/workl/graffy/bug3177/physix90* ./cohab/physix90/
now that some of the xeon 6248r are available, I can launch benchmarks on one of them (as 6248r is the cpu that we are planning to purchgase if we choose intel)
graffy@physix96:/mnt/workl/graffy/bug3177$ ./cohab_batch.bash 48 physix96 10