Update dense implementation to "New NNUE architecture and net" #201

Sopel97 · 2021-07-12T12:25:02Z

official-stockfish/Stockfish@e8d64af

Overall I've had quite a bit of trouble with this as there are many hidden assumptions in the code, as well as magic constants sprinkled everywhere. But I've got the dense nnue part working at least. The 16->32 layer is just interpreted and evaluated as 32->32 as it's small and makes it easier.

What still needs to be done

sparse (I failed. Needs to work for 16 outputs. The rest of the layers should be dense. See https://docs.google.com/document/d/1gTlrr02qSNKiXNZ_SuO4-RjK4MXBiFlLE6jvNqqMkAY/edit#heading=h.ax271eix87nl for some data on sparsity)
probably some outdated comments

tested archs (make build sparse=no ARCH=...) (can't really test others):

@syzygy1 I'd love you to look at this and we can maybe figure something out for the sparse case, as I really want cfish to get up to date. I tried removing AVX2 and AVX512 paths for sparse, hardcoding the output count to 16 for it, and using dense layers layer, but got a lot of trouble with it, due to different types being used for different archs and whether sparse is used, and different permutations that I don't understand and so on... Right now the nnue-sparse might not be that much of a speedup, but if everything goes well on my end we might end up with similar codepath in stockfish and 1024x2-64-64-1 nets or something like this, so I'd like this path to get preserved in some form.

JavaMast · 2021-07-12T14:07:10Z

@Sopel97
Wrong bench for AVX2 and AVX512 builds

Sopel97 · 2021-07-12T15:24:37Z

probably wt_idx doesn't correctly handle the case when dims==16, but there's only one person on earth that can understand that function...

Sopel97 · 2021-07-13T10:12:23Z

I managed to get sparse implementation working. Now no path for AVX2 and AVX512 though as there's no need (AVX2 could probably work like AVX512 did previously but idk). That means that the one big thing left is getting dense to work on >=AVX2

JavaMast · 2021-07-13T16:16:07Z

@Sopel97
build error with sparse=no

nnue.c: In function 'init_weights':
nnue.c:709:9: warning: implicit declaration of function 'read_output_weights'; did you mean 'read_output_weights_dense'? [-Wimplicit-function-declaration]
709 | d = read_output_weights(output_weights[k], d);
| ^~~~~~~~~~~~~~~~~~~
| read_output_weights_dense
nnue.c:709:7: warning: assignment to 'const char *' from 'int' makes pointer from integer without a cast [-Wint-conversion]
709 | d = read_output_weights(output_weights[k], d);
| ^
nnue.c: In function 'verify_net':
nnue.c:727:53: warning: unused parameter 'size' [-Wunused-parameter]
727 | static bool verify_net(const void evalData, size_t size)
| ~~~~~~~^~~~
In file included from nnue.c:633:
At top level:
nnue-regular.c:591:20: warning: 'read_output_weights_dense' defined but not used [-Wunused-function]
591 | static const char read_output_weights_dense(weight_t *w, const char *d)
| ^~~~~~~~~~~~~~~~~~~~~~~~~

warnings with sparse=yes

nnue.c: In function 'verify_net':
nnue.c:727:53: warning: unused parameter 'size' [-Wunused-parameter]
727 | static bool verify_net(const void evalData, size_t size)
| ~~~~~~~^~~~
In file included from nnue.c:633:
At top level:
nnue-regular.c:591:20: warning: 'read_output_weights_dense' defined but not used [-Wunused-function]
591 | static const char read_output_weights_dense(weight_t *w, const char *d)
| ^~~~~~~~~~~~~~~~~~~~~~~~~

JavaMast · 2021-07-13T16:35:50Z

@Sopel97
also can not build SSE2_sparse
(SSSE3/SSE41/AVX builds working fine)

nnue.c:226:34: error: conflicting types for 'clipped_t'
226 | typedef int16_t weight_t, out_t, clipped_t;
| ^~~~~~~~~
nnue.c:217:16: note: previous declaration of 'clipped_t' was here
217 | typedef int8_t clipped_t;
| ^~~~~~~~~
In file included from nnue.c:634:
nnue-sparse.c: In function 'nnue_evaluate':
nnue-sparse.c:325:18: warning: passing argument 2 of 'transform' from incompatible pointer type [-Wincompatible-pointer-types]
325 | #define B(x) (buf.x)
| ~~~~^~~
| |
| int8_t * {aka signed char *}
nnue-sparse.c:331:22: note: in expansion of macro 'B'
331 | if (transform(pos, B(input), hidden1_mask, bucket, &psqt_val))
| ^
nnue.c:564:55: note: expected 'clipped_t *' {aka 'short int *'} but argument is of type 'int8_t *' {aka 'signed char *'}
564 | INLINE bool transform(const Position *pos, clipped_t *output, mask_t outMask, int32_t psqtBucket, int32_t psqt_val)
| ~~~~~~~~~~~^~~~~~
nnue.c: In function 'verify_net':
nnue.c:727:53: warning: unused parameter 'size' [-Wunused-parameter]
727 | static bool verify_net(const void evalData, size_t size)
| ~~~~~~~^~~~
In file included from nnue.c:633:
At top level:
nnue-regular.c:591:20: warning: 'read_output_weights_dense' defined but not used [-Wunused-function]
591 | static const char read_output_weights_dense(weight_t *w, const char *d)
| ^~~~~~~~~~~~~~~~~~~~~~~~~
make[2]: *** [nnue.o] Error 1

JavaMast · 2021-07-19T20:00:55Z

@Sopel97
still can not build SSE2_sparse and MMX (sparse and non sparse)

official-stockfish/Stockfish@e8d64af

Sopel97 · 2021-07-21T09:51:25Z

seems that there are issues with >=AVX2 both for sparse=no and sparse=yes. However, sparse=yes doesn't use AVX2 so the error is not in inference; probably some ifdef I missed.

Sopel97 · 2021-07-29T11:29:41Z

More patches applied on top here https://github.com/Sopel97/Cfish/tree/sf_2021-07-23, but the issues outlined in this PR still persist.

magicianfromriga · 2021-07-31T05:14:28Z

@Sopel97 I tried compiling your latest code with both GCC and Clang, but the bench doesn't match each time. Also the engine has become really slow.
Arch=SSE41-POPCNT

JavaMast · 2021-07-31T15:09:27Z

@Sopel97 I tried compiling your latest code with both GCC and Clang, but the bench doesn't match each time. Also the engine has become really slow.
Arch=SSE41-POPCNT

@MagicianofRiga
Works fine for me.
About 10% faster than Stockfish.

Incorrect bench only for AVX2/BMI2 and AVX512 builds.
GCC 10.2 with MSYS 1

magicianfromriga · 2021-08-12T04:01:36Z

Is this bench correct?

Total time (ms) : 5801
Nodes searched : 5505251
Nodes/second : 949017

Command - make build ARCH=x86-64-sse41-popcnt COMP=gcc sparse=yes

Sopel97 · 2021-08-12T08:44:20Z

no, this patch mirrors official-stockfish/Stockfish@e8d64af

JavaMast · 2021-08-12T15:12:04Z

@MagicianofRiga
yes, this is correct bench for this branch https://github.com/Sopel97/Cfish/tree/sf_2021-07-23

Don't save excluded move eval in TT
Bench: 5505251

Sopel97 force-pushed the update branch from 58a1729 to 8a1e4d3 Compare July 13, 2021 15:38

Sopel97 force-pushed the update branch 2 times, most recently from 4c71e9d to 71e7d1f Compare July 19, 2021 12:07

Sopel97 added 2 commits July 20, 2021 13:27

Update dense implementation to "New NNUE architecture and net"

e9935ab

official-stockfish/Stockfish@e8d64af

sparse

e1d512e

Sopel97 force-pushed the update branch from 71e7d1f to e1d512e Compare July 20, 2021 11:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update dense implementation to "New NNUE architecture and net" #201

Update dense implementation to "New NNUE architecture and net" #201

Sopel97 commented Jul 12, 2021 •

edited

Loading

JavaMast commented Jul 12, 2021

Sopel97 commented Jul 12, 2021

Sopel97 commented Jul 13, 2021

JavaMast commented Jul 13, 2021

JavaMast commented Jul 13, 2021 •

edited

Loading

JavaMast commented Jul 19, 2021

Sopel97 commented Jul 21, 2021

Sopel97 commented Jul 29, 2021

magicianfromriga commented Jul 31, 2021

JavaMast commented Jul 31, 2021 •

edited

Loading

magicianfromriga commented Aug 12, 2021

Sopel97 commented Aug 12, 2021

JavaMast commented Aug 12, 2021

Update dense implementation to "New NNUE architecture and net" #201

Are you sure you want to change the base?

Update dense implementation to "New NNUE architecture and net" #201

Conversation

Sopel97 commented Jul 12, 2021 • edited Loading

JavaMast commented Jul 12, 2021

Sopel97 commented Jul 12, 2021

Sopel97 commented Jul 13, 2021

JavaMast commented Jul 13, 2021

JavaMast commented Jul 13, 2021 • edited Loading

JavaMast commented Jul 19, 2021

Sopel97 commented Jul 21, 2021

Sopel97 commented Jul 29, 2021

magicianfromriga commented Jul 31, 2021

JavaMast commented Jul 31, 2021 • edited Loading

magicianfromriga commented Aug 12, 2021

Is this bench correct?

Sopel97 commented Aug 12, 2021

JavaMast commented Aug 12, 2021

Sopel97 commented Jul 12, 2021 •

edited

Loading

JavaMast commented Jul 13, 2021 •

edited

Loading

JavaMast commented Jul 31, 2021 •

edited

Loading