Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update dense implementation to "New NNUE architecture and net" #201

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

Sopel97
Copy link

@Sopel97 Sopel97 commented Jul 12, 2021

official-stockfish/Stockfish@e8d64af

Overall I've had quite a bit of trouble with this as there are many hidden assumptions in the code, as well as magic constants sprinkled everywhere. But I've got the dense nnue part working at least. The 16->32 layer is just interpreted and evaluated as 32->32 as it's small and makes it easier.

What still needs to be done

tested archs (make build sparse=no ARCH=...) (can't really test others):

  • x86-64-avx512-vnni
  • x86-64-avx512
  • x86-64-vnni
  • x86-64-bmi2
  • x86-64-avx2
  • x86-64-avx
  • x86-64-sse41-popcnt
  • x86-64-modern
  • x86-64-ssse3
  • x86-64-sse3-popcnt
  • x86-64
  • x86-32-avx512-vnni
  • x86-32
  • x86-32-sse3-popcnt
  • x86-32-sse2
  • x86-32-mmx-sse
  • x86-32-mmx
  • x86-32
  • ppc-64
  • ppc-32
  • armv8
  • armv7-neon
  • armv7
  • apple-silicon
  • e2k
  • general-64
  • general-32

@syzygy1 I'd love you to look at this and we can maybe figure something out for the sparse case, as I really want cfish to get up to date. I tried removing AVX2 and AVX512 paths for sparse, hardcoding the output count to 16 for it, and using dense layers layer, but got a lot of trouble with it, due to different types being used for different archs and whether sparse is used, and different permutations that I don't understand and so on... Right now the nnue-sparse might not be that much of a speedup, but if everything goes well on my end we might end up with similar codepath in stockfish and 1024x2-64-64-1 nets or something like this, so I'd like this path to get preserved in some form.

@JavaMast
Copy link

@Sopel97
Wrong bench for AVX2 and AVX512 builds

Screenshot_185

@Sopel97
Copy link
Author

Sopel97 commented Jul 12, 2021

probably wt_idx doesn't correctly handle the case when dims==16, but there's only one person on earth that can understand that function...

@Sopel97
Copy link
Author

Sopel97 commented Jul 13, 2021

I managed to get sparse implementation working. Now no path for AVX2 and AVX512 though as there's no need (AVX2 could probably work like AVX512 did previously but idk). That means that the one big thing left is getting dense to work on >=AVX2

@JavaMast
Copy link

@Sopel97
build error with sparse=no

nnue.c: In function 'init_weights':
nnue.c:709:9: warning: implicit declaration of function 'read_output_weights'; did you mean 'read_output_weights_dense'? [-Wimplicit-function-declaration]
709 | d = read_output_weights(output_weights[k], d);
| ^~~~~~~~~~~~~~~~~~~
| read_output_weights_dense
nnue.c:709:7: warning: assignment to 'const char *' from 'int' makes pointer from integer without a cast [-Wint-conversion]
709 | d = read_output_weights(output_weights[k], d);
| ^
nnue.c: In function 'verify_net':
nnue.c:727:53: warning: unused parameter 'size' [-Wunused-parameter]
727 | static bool verify_net(const void evalData, size_t size)
| ~~~~~~~^~~~
In file included from nnue.c:633:
At top level:
nnue-regular.c:591:20: warning: 'read_output_weights_dense' defined but not used [-Wunused-function]
591 | static const char
read_output_weights_dense(weight_t *w, const char *d)
| ^~~~~~~~~~~~~~~~~~~~~~~~~

warnings with sparse=yes

nnue.c: In function 'verify_net':
nnue.c:727:53: warning: unused parameter 'size' [-Wunused-parameter]
727 | static bool verify_net(const void evalData, size_t size)
| ~~~~~~~^~~~
In file included from nnue.c:633:
At top level:
nnue-regular.c:591:20: warning: 'read_output_weights_dense' defined but not used [-Wunused-function]
591 | static const char
read_output_weights_dense(weight_t *w, const char *d)
| ^~~~~~~~~~~~~~~~~~~~~~~~~

@JavaMast
Copy link

JavaMast commented Jul 13, 2021

@Sopel97
also can not build SSE2_sparse
(SSSE3/SSE41/AVX builds working fine)

nnue.c:226:34: error: conflicting types for 'clipped_t'
226 | typedef int16_t weight_t, out_t, clipped_t;
| ^~~~~~~~~
nnue.c:217:16: note: previous declaration of 'clipped_t' was here
217 | typedef int8_t clipped_t;
| ^~~~~~~~~
In file included from nnue.c:634:
nnue-sparse.c: In function 'nnue_evaluate':
nnue-sparse.c:325:18: warning: passing argument 2 of 'transform' from incompatible pointer type [-Wincompatible-pointer-types]
325 | #define B(x) (buf.x)
| ~~~~^~~
| |
| int8_t * {aka signed char *}
nnue-sparse.c:331:22: note: in expansion of macro 'B'
331 | if (transform(pos, B(input), hidden1_mask, bucket, &psqt_val))
| ^
nnue.c:564:55: note: expected 'clipped_t *' {aka 'short int *'} but argument is of type 'int8_t *' {aka 'signed char *'}
564 | INLINE bool transform(const Position *pos, clipped_t *output, mask_t outMask, int32_t psqtBucket, int32_t psqt_val)
| ~~~~~~~~~~~^~~~~~
nnue.c: In function 'verify_net':
nnue.c:727:53: warning: unused parameter 'size' [-Wunused-parameter]
727 | static bool verify_net(const void evalData, size_t size)
| ~~~~~~~^~~~
In file included from nnue.c:633:
At top level:
nnue-regular.c:591:20: warning: 'read_output_weights_dense' defined but not used [-Wunused-function]
591 | static const char
read_output_weights_dense(weight_t *w, const char *d)
| ^~~~~~~~~~~~~~~~~~~~~~~~~
make[2]: *** [nnue.o] Error 1

@Sopel97 Sopel97 force-pushed the update branch 2 times, most recently from 4c71e9d to 71e7d1f Compare July 19, 2021 12:07
@JavaMast
Copy link

@Sopel97
still can not build SSE2_sparse and MMX (sparse and non sparse)

@Sopel97
Copy link
Author

Sopel97 commented Jul 21, 2021

seems that there are issues with >=AVX2 both for sparse=no and sparse=yes. However, sparse=yes doesn't use AVX2 so the error is not in inference; probably some ifdef I missed.

@Sopel97
Copy link
Author

Sopel97 commented Jul 29, 2021

More patches applied on top here https://github.com/Sopel97/Cfish/tree/sf_2021-07-23, but the issues outlined in this PR still persist.

@magicianfromriga
Copy link

@Sopel97 I tried compiling your latest code with both GCC and Clang, but the bench doesn't match each time. Also the engine has become really slow.
Arch=SSE41-POPCNT

@JavaMast
Copy link

JavaMast commented Jul 31, 2021

@Sopel97 I tried compiling your latest code with both GCC and Clang, but the bench doesn't match each time. Also the engine has become really slow.
Arch=SSE41-POPCNT

@MagicianofRiga
Works fine for me.
About 10% faster than Stockfish.
Screenshot_201

Incorrect bench only for AVX2/BMI2 and AVX512 builds.
GCC 10.2 with MSYS 1

@magicianfromriga
Copy link

Is this bench correct?

Total time (ms) : 5801
Nodes searched : 5505251
Nodes/second : 949017

Command - make build ARCH=x86-64-sse41-popcnt COMP=gcc sparse=yes

@Sopel97
Copy link
Author

Sopel97 commented Aug 12, 2021

no, this patch mirrors official-stockfish/Stockfish@e8d64af

@JavaMast
Copy link

@MagicianofRiga
yes, this is correct bench for this branch https://github.com/Sopel97/Cfish/tree/sf_2021-07-23

Don't save excluded move eval in TT
Bench: 5505251

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants