Mayhem NNUE - New NN engine

JohnWoe · Post by **JohnWoe** » Thu Nov 19, 2020 9:25 pm

mvanthoor wrote: ↑Thu Nov 19, 2020 11:16 am
JohnWoe wrote: ↑Fri Nov 13, 2020 12:18 am Nice to see others testing Mayhem too. I'm curious of the level. Since in my tests Mayhem performs better than Crafty 25.6 which is 2950 Elo.

With Sapeli I was struggling against 1900 Elo FairyMax. Now I'm in the big boys league. 60% score against 2953 Elo Crafty 25.6.
More strange is Mayhem is 1200 sloc. Sapeli is 1700 lines. I took 500 sloc out and gained 1000 Elo.

To be honest credits for the 1000 Elo boost go to the SF NNUE evaluation. Without it Mayhem is probably 1800 Elo.
The fact that NNUE gains you 1000 Elo just means that your search is quite good, and your evaluation was junk. If I have this correct, Stockfish 12 has NNUE built in. Between 11 and 12, SF gained somewhere along the lines of 30-35 Elo. It means that its handcrafted evaluation was about as perfect as humans can make/tune it.

In short: if you write a decent search, *EVERYBODY* is in the "big boys" league as soon as they implement SF's NNUE. Why? Because a basic engine with no features is somewhere around 1600 Elo. (Such as my current engine.) Engines with some features, such as VICE, BBC, and yes, Sapeli, are somehwere around 2000 Elo. And as soon as you stick a NNUE on top of them, they gain 800-1000 Elo (more Elo if the search is faster).

So it's no wonder that a small engine with a few features + NNUE is at least 2950 as soon as you stick a NNUE on top of it.

That is the entire reason why so many people have an aversion against using the NNUE tech from another engine. It's a cheapskate way to rocket yourself into the big boys league. You can basically do it by cloning any +/- 2000 Elo engine you want, renaming it, and attaching someone else's NNUE to it.

Have fun; I'd have rather tested my engine against Sapeli, and then have it creep up the rating ladder version by version, and actually have an engine I at least know what it's doing.

You are correct in all accounts. I have not much to add.

But Sapeli was ok in tactics. It scored 286/300 in WAC. It lacked good evaluation.
My goal wasn't to clone a little bit and leave all sub-2500 engines bite the dust ( Or maybe it was

)
I wanted to get rid of the horrible HCE and replace it with something simple.

This is the only place I use NNUE library: https://github.com/SamuraiDangyo/mayhem ... m.hpp#L926
Mayhem is super modular. NNUE is just a library.

Madeleine Birchfield wrote: ↑Thu Nov 19, 2020 11:36 am ..
Agreed. If Sapeli were to go NNUE I would prefer it to be implemented from scratch, not copied directly from the Stockfish codebase, and trained on Sapeli evaluations. Do what the Komodo developers did.

Otherwise be honest and call the engine Mayhem Stockfish NNUE, like what the BBC author did. And TCEC would never accept such an engine in the first place if it ends up having a copy of Stockfish evaluation.

A few corrections. I took the code from Maksim. Which is Dr. Shawful's NNUE library code. (I believe). I'm not sure.
I use Stockfish's EvalFile.
I have done some trivial cleanups on the NNUE lib. Ok. I fixed one leak. So Mayhem 1.1 is leak proof.

Madeleine Birchfield · Thu Nov 19, 2020 10:58 pm

JohnWoe wrote: ↑Thu Nov 19, 2020 9:25 pm
mvanthoor wrote: ↑Thu Nov 19, 2020 11:16 am
JohnWoe wrote: ↑Fri Nov 13, 2020 12:18 am Nice to see others testing Mayhem too. I'm curious of the level. Since in my tests Mayhem performs better than Crafty 25.6 which is 2950 Elo.

With Sapeli I was struggling against 1900 Elo FairyMax. Now I'm in the big boys league. 60% score against 2953 Elo Crafty 25.6.
More strange is Mayhem is 1200 sloc. Sapeli is 1700 lines. I took 500 sloc out and gained 1000 Elo.

To be honest credits for the 1000 Elo boost go to the SF NNUE evaluation. Without it Mayhem is probably 1800 Elo.
The fact that NNUE gains you 1000 Elo just means that your search is quite good, and your evaluation was junk. If I have this correct, Stockfish 12 has NNUE built in. Between 11 and 12, SF gained somewhere along the lines of 30-35 Elo. It means that its handcrafted evaluation was about as perfect as humans can make/tune it.

In short: if you write a decent search, *EVERYBODY* is in the "big boys" league as soon as they implement SF's NNUE. Why? Because a basic engine with no features is somewhere around 1600 Elo. (Such as my current engine.) Engines with some features, such as VICE, BBC, and yes, Sapeli, are somehwere around 2000 Elo. And as soon as you stick a NNUE on top of them, they gain 800-1000 Elo (more Elo if the search is faster).

So it's no wonder that a small engine with a few features + NNUE is at least 2950 as soon as you stick a NNUE on top of it.

That is the entire reason why so many people have an aversion against using the NNUE tech from another engine. It's a cheapskate way to rocket yourself into the big boys league. You can basically do it by cloning any +/- 2000 Elo engine you want, renaming it, and attaching someone else's NNUE to it.

Have fun; I'd have rather tested my engine against Sapeli, and then have it creep up the rating ladder version by version, and actually have an engine I at least know what it's doing.
You are correct in all accounts. I have not much to add.

But Sapeli was ok in tactics. It scored 286/300 in WAC. It lacked good evaluation.
My goal wasn't to clone a little bit and leave all sub-2500 engines bite the dust ( Or maybe it was )
I wanted to get rid of the horrible HCE and replace it with something simple.

This is the only place I use NNUE library: https://github.com/SamuraiDangyo/mayhem ... m.hpp#L926
Mayhem is super modular. NNUE is just a library.

Madeleine Birchfield wrote: ↑Thu Nov 19, 2020 11:36 am ..
Agreed. If Sapeli were to go NNUE I would prefer it to be implemented from scratch, not copied directly from the Stockfish codebase, and trained on Sapeli evaluations. Do what the Komodo developers did.

Otherwise be honest and call the engine Mayhem Stockfish NNUE, like what the BBC author did. And TCEC would never accept such an engine in the first place if it ends up having a copy of Stockfish evaluation.
A few corrections. I took the code from Maksim. Which is Dr. Shawful's NNUE library code. (I believe). I'm not sure.
I use Stockfish's EvalFile.
I have done some trivial cleanups on the NNUE lib. Ok. I fixed one leak. So Mayhem 1.1 is leak proof.

I would suggest replacing the Stockfish net with one of Dietrich Kappe's nets like Night Nurse or Dark Horse. They may be weaker than the Stockfish net, but they are currently unused by other engines and most likely much stronger than your handcrafted evaluation function, and so you get the best of both worlds, an engine stronger than Sapeli and no complaints about stealing Stockfish's eval.

mvanthoor · Post by **mvanthoor** » Thu Nov 19, 2020 11:47 pm

JohnWoe wrote: ↑Thu Nov 19, 2020 9:25 pm You are correct in all accounts. I have not much to add.

But Sapeli was ok in tactics. It scored 286/300 in WAC. It lacked good evaluation.
My goal wasn't to clone a little bit and leave all sub-2500 engines bite the dust ( Or maybe it was )
I wanted to get rid of the horrible HCE and replace it with something simple.

But how can you now personalize your engine? Most stuff regarding chess engines has been researched for close to 70 years. The result is that the implementation of these things has been described and optimized to the point of becoming almost boilerplate code.

- Board representation: bit boards.
- Move generator: Magics or PEXT bitboards.
- Moving pieces: Make/Unmake
- Hash table: Zobrist keys

Why use anything else? The above two are the fastest at this point, and unlikely to become faster soon.

- Search function: Alpha/Beta with many sorting and beta-cutoff optimization. If you want to be a contrarian, you can use MCTS.

Both are well documented, especially Alpha/Beta with all the optimizations.

- SMP: Why use anything but Lazy SMP? Even the strongest engine in the world uses Lazy SMP.
- Endgame tablebase: Syzygy has basically replaced anything that came before.
- Communication to GUI: UCI or XBoard. Use the protocol you like best. Or both.

All of the above functions except UCI and XBoard are merely stuff to make the engine search faster or deeper. The ONLY function where WE as programmers still have influence on HOW our engine plays chess, is the evaluation function... and now we go and automate even that, by replacing it by a neural network that can be trained by a computer.

So if you want to do that, the way to do it is to write your own network and and train it yourself, using the HCE of your own engine.

If you don't, and you use the SF NNUE, your engine basically becomes a weaker version of Stockfish; and the reason that its weaker, is because its slower and/or the search is less advanced. If you then start improving your search, the engine will start to play more and more like the SF version the NNUE came from.

One of the reasons I'm writing my own chess engine is BECAUSE of the evaluation function; and even if I (at some point) get into Texel or CLOP-tuning the eval (or even replace it with my own NN-thing-stuff-whatever), I'll still keep the HCE evaluation for personality's sake.

mvanthoor · Post by **mvanthoor** » Thu Nov 19, 2020 11:49 pm

Madeleine Birchfield wrote: ↑Thu Nov 19, 2020 10:58 pm I would suggest replacing the Stockfish net with one of Dietrich Kappe's nets like Night Nurse or Dark Horse. They may be weaker than the Stockfish net, but they are currently unused by other engines and most likely much stronger than your handcrafted evaluation function, and so you get the best of both worlds, an engine stronger than Sapeli and no complaints about stealing Stockfish's eval.

An even better thing to do would be to train a network on the basis of the Sapeli evaluation and see if it will be stronger. (Although I don't know yet how to do such a thing because I haven't looked into it, and I won't, for a long time to come. I will only be looking into NN's when I'm confident I can create my own from scratch. Actually, I can.... but I have never done it for chess, so that's an entirely new thing for me.)

Madeleine Birchfield · Fri Nov 20, 2020 12:19 am

mvanthoor wrote: ↑Thu Nov 19, 2020 11:47 pm
JohnWoe wrote: ↑Thu Nov 19, 2020 9:25 pm You are correct in all accounts. I have not much to add.

But Sapeli was ok in tactics. It scored 286/300 in WAC. It lacked good evaluation.
My goal wasn't to clone a little bit and leave all sub-2500 engines bite the dust ( Or maybe it was )
I wanted to get rid of the horrible HCE and replace it with something simple.
But how can you now personalize your engine? Most stuff regarding chess engines has been researched for close to 70 years. The result is that the implementation of these things has been described and optimized to the point of becoming almost boilerplate code.

- Board representation: bit boards.
- Move generator: Magics or PEXT bitboards.
- Moving pieces: Make/Unmake
- Hash table: Zobrist keys

Why use anything else? The above two are the fastest at this point, and unlikely to become faster soon.

- Search function: Alpha/Beta with many sorting and beta-cutoff optimization. If you want to be a contrarian, you can use MCTS.

Both are well documented, especially Alpha/Beta with all the optimizations.

- SMP: Why use anything but Lazy SMP? Even the strongest engine in the world uses Lazy SMP.
- Endgame tablebase: Syzygy has basically replaced anything that came before.
- Communication to GUI: UCI or XBoard. Use the protocol you like best. Or both.

All of the above functions except UCI and XBoard are merely stuff to make the engine search faster or deeper. The ONLY function where WE as programmers still have influence on HOW our engine plays chess, is the evaluation function... and now we go and automate even that, by replacing it by a neural network that can be trained by a computer.

So if you want to do that, the way to do it is to write your own network and and train it yourself, using the HCE of your own engine.

If you don't, and you use the SF NNUE, your engine basically becomes a weaker version of Stockfish; and the reason that its weaker, is because its slower and/or the search is less advanced. If you then start improving your search, the engine will start to play more and more like the SF version the NNUE came from.

One of the reasons I'm writing my own chess engine is BECAUSE of the evaluation function; and even if I (at some point) get into Texel or CLOP-tuning the eval (or even replace it with my own NN-thing-stuff-whatever), I'll still keep the HCE evaluation for personality's sake.

You could have different neural network architectures and train the networks using different data and you would get different personalities as well and be better than handcrafted eval. But the important thing is that they be different, just copying and pasting Stockfish's net is not enough.

Daniel Anulliero · Post by **Daniel Anulliero** » Fri Nov 20, 2020 7:15 am

JohnWoe wrote: ↑Wed Nov 18, 2020 2:18 pm Actually MY engine Sapeli was around 2300 Elo: See https://lichess.org/@/SapeliEngine
Too much lag. I won't play over TCP/IP ever.
But Mayhem is Sapeli. And GPLv3 like Stockfish. So no problem with some copy paste.

We all know the Lichess ratings are not very accurate ...

https://www.computerchess.org.uk/ccrl/4 ... _92_64-bit

I doubt you have improved sapeli 1.93 by 400 elo...
So ,pity, Mayhem will be never tested by the ccrl team, it'll be never allowed to play in some funnies Graham's tournaments , same for cegt , tcec etc ...
But ok, Mayhem is yours ... but ...

tmokonen · Post by **tmokonen** » Sat Nov 21, 2020 6:53 am

Well, Dany, it's stupidly easy to add NNUE to an engine. I just did it with Tony's Chess in, what, half an hour of work, if that. The results, using the net file nn-c3ca321c51c9.nnue:

Finished game 198 (TonysTest vs TonysChess004r29): 1-0 {White mates}
Score of TonysChess004r29 vs TonysTest: 15 - 133 - 50 [0.202] 198
ELO difference: -238.64 +/- 47.55
SPRT: llr -2.95, lbound -2.94, ubound 2.94 - H0 was accepted
Finished match

Wow, it passes SPRT after 198 games, with an approximately 200 Elo improvement. This test was at a ridiculously fast time control of 1s + .01s, and not even using incremental NNUE probing, which would be more efficient. I had to compile to 64 bit, as 32 bit is about 1/10th the NPS.

The strength difference between the last non-NNUE dev version TonysChess004r29 and Tony's Chess 0.03 is about the same 200 or so Elo, and that was hours and hours of programming and testing, over more than a year. Improvement via Stockfish's NNUE is easy and effective, and fun to play with, but it does kinda feel like cheating.

Madeleine Birchfield · Sat Nov 21, 2020 7:04 am

tmokonen wrote: ↑Sat Nov 21, 2020 6:53 am Well, Dany, it's stupidly easy to add NNUE to an engine. I just did it with Tony's Chess in, what, half an hour of work, if that. The results, using the net file nn-c3ca321c51c9.nnue:

Finished game 198 (TonysTest vs TonysChess004r29): 1-0 {White mates}
Score of TonysChess004r29 vs TonysTest: 15 - 133 - 50 [0.202] 198
ELO difference: -238.64 +/- 47.55
SPRT: llr -2.95, lbound -2.94, ubound 2.94 - H0 was accepted
Finished match

Wow, it passes SPRT after 198 games, with an approximately 200 Elo improvement. This test was at a ridiculously fast time control of 1s + .01s, and not even using incremental NNUE probing, which would be more efficient. I had to compile to 64 bit, as 32 bit is about 1/10th the NPS.

The strength difference between the last non-NNUE dev version TonysChess004r29 and Tony's Chess 0.03 is about the same 200 or so Elo, and that was hours and hours of programming and testing, over more than a year. Improvement via Stockfish's NNUE is easy and effective, and fun to play with, but it does kinda feel like cheating.

The harder part comes in training your own net for TonysTest instead of relying on Stockfish's network.

tmokonen · Post by **tmokonen** » Sat Nov 21, 2020 7:32 am

Madeleine Birchfield wrote: ↑Sat Nov 21, 2020 7:04 am The harder part comes in training your own net for TonysTest instead of relying on Stockfish's network.

Of course. This was just an experiment, and not something I will include in my final version. Plus, I am teasing Dany a little bit, as we were chatting online about NNUE earlier in the day.

Daniel Anulliero · Post by **Daniel Anulliero** » Sat Nov 21, 2020 10:26 am

tmokonen wrote: ↑Sat Nov 21, 2020 7:32 am
Madeleine Birchfield wrote: ↑Sat Nov 21, 2020 7:04 am The harder part comes in training your own net for TonysTest instead of relying on Stockfish's network.
Of course. This was just an experiment, and not something I will include in my final version. Plus, I am teasing Dany a little bit, as we were chatting online about NNUE earlier in the day.

Please do it with Isa now !

Mayhem NNUE - New NN engine

Re: Mayhem NNUE - New NN engine

Re: Mayhem NNUE - New NN engine

Re: Mayhem NNUE - New NN engine

Re: Mayhem NNUE - New NN engine

Re: Mayhem NNUE - New NN engine

Re: Mayhem NNUE - New NN engine

Re: Mayhem NNUE - New NN engine

Re: Mayhem NNUE - New NN engine

Re: Mayhem NNUE - New NN engine

Re: Mayhem NNUE - New NN engine