SF-NNUE going forward...

towforce · Post by **towforce** » Mon Jul 27, 2020 6:23 pm

Zenmastur wrote: ↑Mon Jul 27, 2020 6:30 amDepth 50? I thought about higher depths, but 50 plies is a pipe dream at best. The depths used thus far, do seem way too shallow, but because of the numbers of positions needed I'm not sure how much deeper you can go and still be able to produce sufficient quantities. IIRC SF around low teens or so seems to be a sweet spot for speed vs depth. I'd have to go back and look at my data as I don't recall exactly why I drew this conclusion. It had something to do with me mass analyzing and/or playing games to depth "x" for use in an opening book. I used a few tricks to speed things up a bit. But greater depth is definitely something to try. It will be interesting to see how much better a net gets just because the position have been searched deeper. I'll be very surprised if there is a large improvement.

Regards,

Zenmastur

I know that layers of neurons in an NN are very different in many ways to degrees of polynomial, and it's naughty to compare the two, but given that I have a lot more expertise in polynomials than in NNs, and that I'm not a well behaved person, I'm going to do it anyway!

The normal way to fit polynomials is to minimise the least squares of the differences between the data points and the polynomial value at that point. The higher the degree of the polynomial, the better the fit will be - but this comes at a price: As you add extra degrees to the polynomial, the resulting curve gets lumpy - and it's inevitable that some of the lumps are going to appear in places where you don't want them. The point is very well made in the polynomial fitting tool here. As you increase the degree of the polynomial on the tool (second control on the chart - a green right arrow), you can see the fit getting better and better!

However - when you get to degree 8, and you get a perfect fit, you can see that the polynomial that made the fit is lumpy and wavy. If your data is exactly right, then this could well be correct - but usually the data is not exactly right, and in the case of chess position evaluations, which should be one of { win, draw, lose }, it ABSOLUTELY IS NOT exactly right!

This is why everyone says that using the lowest degree polynomial possible is the way to go, accepting some differences between the data point and the polynomial.

If the "polynomial degree is like an NN layer" has any truth at all, then too many layers would result in a lot of over-fitting, and an outcome curve that would be much too lumpy in places where you wouldn't want lumps.

Zenmastur · Post by **Zenmastur** » Mon Jul 27, 2020 8:59 pm

Ovyron wrote: ↑Mon Jul 27, 2020 7:35 am
Zenmastur wrote: ↑Mon Jul 27, 2020 6:30 am I have noticed a marked difference in the evaluation of endgames when using TB's. I'm not sure exactly what the cause of the difference is but it seems I always get much better score in they analyze the same period of time. I realize there is a speed difference but this doesn't seem to account for the rather large differences I've seen.
What I mean is difference at the "game result" level. It doesn't matter if 3.00 is inaccurate and 1.00 is accurate and you spend time training the net so it shows 1.00, if the result of the game is the same.

What you need to show is a position where NNUE loses, or misses a win (or something like that) because of the eval difference, not a big eval difference that still produces the same game result.

I've run several matches at various time controls against various Sergio nets. SF11 06-24 version seems to beat all of them I've tried when no book is supplied and very quick time controls are used. SF is only showing a 4% speed advantage in NPS over the nets I've tested, but it's crushing them in actual game scores. e.g.

Code: Select all

 1.  SF-NNUE-0332             	2388.0/4998	1171-1393-2434 
 2.  SF11-0624                  2846.5/4997	1717-1021-2259 
 3.  SF-NNUE-2141             	2351.0/4997	1146-1441-2410 
 4.  SF-NNUE-1134             	2408.5/4996	1218-1397-2381

With book openings and fast time controls SF seem a little behind the curve. At longer time controls (60”+0.6”) without books the gap closes but SF still leads. This leads me to believe that the opening phase isn't well handled by the current nets. If they aren't supplied book openings their play suffers quite a bit compared to SF.

I haven't as yet analyzed any of these games but I'm expecting SF to better out of the opening and better in the endgames. In any case, it's clear that without the aid of books these nets are getting crushed. We'll have to wait and see why they are loosing when I take a closer look.

I'm also wondering about their endgame play. If all the positions used to train the net are searched to the same depth this would seem to put them at a disadvantage simply because in a normal game the endgame position can get searched very deep compared to positions in the opening.

Zenmastur wrote: ↑Mon Jul 27, 2020 6:30 amSo you think the net size is good enough as is?

Ovyron wrote: ↑Mon Jul 27, 2020 7:35 am Currently there's 20MB nets and 30MB nets and that's a 50% increase in size for no improvement that I've seen.

hmmm...
Is someone producing 30Mb nets with similar quality to Sergio's 20Mb ones? I haven't seen any, but I haven't looked.

Zenmastur wrote: ↑Mon Jul 27, 2020 6:30 amIIRC SF around low teens or so seems to be a sweet spot for speed vs depth.

Ovyron wrote: ↑Mon Jul 27, 2020 7:35 am You want speed to analyze more positions. Double the speed, double the positions. The problem I'm seeing is that after some positions it plateaus and there's diminishing results so more positions don't help. You could as well analyze the same ones with more depth.

I did a test on my system to see how fast I can analyze to some depth. It looks like I can analyze a little over 20 positions a second to depth 14 per core used. This is about 1.75M per day per core. To generate 5 billion positions per week would require about 413 cores running 24/7. Or about 13 x 32 core threadrippers. This is clearly doable since SF testing network generally has over a 1,000 cores running at any one time. You would only get one new net per week, but considering the rate at which the nets are being tested this seems plenty fast enough if you get improved nets due to higher search depths.

Ovyron wrote: ↑Mon Jul 27, 2020 7:35 am We're currently in a crisis at which 60% performance against Stockfish 11 hasn't been achieved by any net, if more positions don't help and more depth doesn't help, and bigger net doesn't help, then, yeah, let's start trying other things like multiple nets for game stage.

A crisis???

A little melodramatic don't you think?

I haven't tested all Sergio's nets yet. I plan on doing a round robin with most of the early nets to see where the progress seems to slow to a crawl. I'm assuming that each net builds on the last net by additional training. Assuming this is true, then it should become clear when the net starts to saturate.

Regards,

Zenmastur

Ovyron · Post by **Ovyron** » Mon Jul 27, 2020 10:44 pm

Rowen wrote: ↑Mon Jul 27, 2020 9:09 am Hi
Perhaps my presumptions are incorrect, but could specialised nets be created that train an engine to play like a human or humans with a particular strength , personality, characteristic, or play like Tal etc, etc.
Thanks

Sorry, that's not how it works. You can't make a net like a move. You can only show it the move at some arbitrary depth, and refute it.

So if you got all the positions that Tal played on his life and trained a net at big depth on them, instead of producing a net that plays like Tal, you'd end with a net that'd destroy Tal. An Anti-Tal net.

It's the same with any other entity, whatever you train it for, it'll learn to defeat it, so we don't even know if Stockfish NNUE plays better chess than Stockfish, but it has learned to defeat Stockfish (they're mostly anti-Stockfish nets!), that's why people report 90 elo advantages and such.

What happens is that Stockfish is so strong that being good at defeating it is creating things good at defeating everything else, but I haven't seen a test that shows that, say, Stockfish NNUE is better at beating Leela than regular Stockfish, but people are just stopping using Leela because Leela is worse than NNUE at beating Stockfish.

Ovyron · Post by **Ovyron** » Mon Jul 27, 2020 11:16 pm

Zenmastur wrote: ↑Mon Jul 27, 2020 8:59 pm A little melodramatic don't you think?

I haven't tested all Sergio's nets yet. I plan on doing a round robin with most of the early nets to see where the progress seems to slow to a crawl. I'm assuming that each net builds on the last net by additional training. Assuming this is true, then it should become clear when the net starts to saturate.

According to Rebel it happened at 2210:

Rebel wrote: ↑Sat Jul 25, 2020 8:56 pm I am giving up for the moment testing Sergio nets, although (just) 2000 games is far from accurate the magic seems to have gone quite quickly after the first releases. I have tested 22 versions. The wait is for the SF team, they have more man and computer power. Can't wait....
Code: Select all
SF-NNUE (popcount) vs Stockfish 11, 2000 games, tc 40m/20s, input 8moves.pgn
henk-2706     54.7%   2020-07-19
sergio-1432   56.1%   2020-07-21 15:56
sergio-1907   58.9%   2020-07-21 19:14
sergio-2323   58.4%   2020-07-21 23:23
sergio-0359   58.8%   2020-07-22 03:59
sergio-0944   58.9%   2020-07-22 09:44
sergio-1153   58.3%   2020-07-22 11:53
sergio-1807   58.0%   2020-07-22 18:07
sergio-2210   59.0%   2020-07-22 22:10
sergio-0511   58.2%   2020-07-23 05:11
sergio-1134   56.5%   2020-07-23 11:34 with 4moves.pgn 58.7%
sergio-1844   57.8%   2020-07-23 18:44 
sergio-1843   58.7%   2020-07-23 18:43 
sergio-0123   57.2%   2020-07-24 01:23 
sergio-0640   56.7%   2020-07-24 06:40 
sergio-1240   57.6%   2020-07-24 12:40
sergio-1224   58.1%   2020-07-24 12:24
sergio-1732   58.7%   2020-07-24 17:32
sergio-2344   57.2%   2020-07-24 23:54
ribbit-0.1    58.7% = 1134 (!!)
sergio-1313   58.5%   2020-07-25 13:13
sergio-2242   57.5%   2020-07-25 22:42
Note that the experimental versions are labeled with the time of day on the webpage.

Later nets were good at beating each other, but not Stockfish 11. People upgrade because other people are using older nets, so if you're on a net someone with an updated net could beat you. But against non-nets there's a crisis.

Zenmastur · Post by **Zenmastur** » Tue Jul 28, 2020 4:12 am

Ovyron wrote: ↑Mon Jul 27, 2020 11:16 pm
Zenmastur wrote: ↑Mon Jul 27, 2020 8:59 pm A little melodramatic don't you think?

I haven't tested all Sergio's nets yet. I plan on doing a round robin with most of the early nets to see where the progress seems to slow to a crawl. I'm assuming that each net builds on the last net by additional training. Assuming this is true, then it should become clear when the net starts to saturate.
According to Rebel it happened at 2210:
Rebel wrote: ↑Sat Jul 25, 2020 8:56 pm I am giving up for the moment testing Sergio nets, although (just) 2000 games is far from accurate the magic seems to have gone quite quickly after the first releases. I have tested 22 versions. The wait is for the SF team, they have more man and computer power. Can't wait....
Code: Select all
SF-NNUE (popcount) vs Stockfish 11, 2000 games, tc 40m/20s, input 8moves.pgn
henk-2706     54.7%   2020-07-19
sergio-1432   56.1%   2020-07-21 15:56
sergio-1907   58.9%   2020-07-21 19:14
sergio-2323   58.4%   2020-07-21 23:23
sergio-0359   58.8%   2020-07-22 03:59
sergio-0944   58.9%   2020-07-22 09:44
sergio-1153   58.3%   2020-07-22 11:53
sergio-1807   58.0%   2020-07-22 18:07
sergio-2210   59.0%   2020-07-22 22:10
sergio-0511   58.2%   2020-07-23 05:11
sergio-1134   56.5%   2020-07-23 11:34 with 4moves.pgn 58.7%
sergio-1844   57.8%   2020-07-23 18:44 
sergio-1843   58.7%   2020-07-23 18:43 
sergio-0123   57.2%   2020-07-24 01:23 
sergio-0640   56.7%   2020-07-24 06:40 
sergio-1240   57.6%   2020-07-24 12:40
sergio-1224   58.1%   2020-07-24 12:24
sergio-1732   58.7%   2020-07-24 17:32
sergio-2344   57.2%   2020-07-24 23:54
ribbit-0.1    58.7% = 1134 (!!)
sergio-1313   58.5%   2020-07-25 13:13
sergio-2242   57.5%   2020-07-25 22:42
Note that the experimental versions are labeled with the time of day on the webpage.
Later nets were good at beating each other, but not Stockfish 11. People upgrade because other people are using older nets, so if you're on a net someone with an updated net could beat you. But against non-nets there's a crisis.

Well, if it's clear to you that the nets aren't improving against a "standard" opponent, namely SF, with more and more training then why do you think the current net size is adequate?

It's clear to me from playing games without a book that SF plays superior chess, comparatively speaking, in the opening. Even though SF isn't know for it's playing strength in the opening. This seems to indicate that a net for the opening phase is justified either that or a larger net needs to be used. Larger nets take more time to process, so splitting the nets to match the phase of the game geeps the nets relatively fast.

Regards,

Zenmastur

carldaman · Post by **carldaman** » Tue Jul 28, 2020 5:07 am

Others are showing almost constant improvement vs SF-dev.

http://talkchess.com/forum3/viewtopic.p ... 8&start=20

Ovyron · Post by **Ovyron** » Tue Jul 28, 2020 5:32 am

Zenmastur wrote: ↑Tue Jul 28, 2020 4:12 am It's clear to me from playing games without a book that SF plays superior chess, comparatively speaking, in the opening. Even though SF isn't know for it's playing strength in the opening.

My theory is that it's happening because the people creating the nets are only training from positions after an opening has been played. I.e. they don't have data on the opening position or on the position after 1.e4, or the position after 1.d4 Nf6, etc. That would cause the net to play weaker than SF without book.

Anyway, you could train a net with opening positions exclusively, then use software like Eman Chimera to start the game using Stockfish NNUE with a your opening net and switch to regular net after the opening is over to see how better it performs.

Nay Lin Tun · Post by **Nay Lin Tun** » Tue Jul 28, 2020 6:55 am

How to introduce Neural Network concept to old traditional chess engine community?

towforce · Post by **towforce** » Tue Jul 28, 2020 9:51 am

Ovyron wrote: ↑Tue Jul 28, 2020 5:32 amAnyway, you could train a net with opening positions exclusively, then use software like Eman Chimera to start the game using Stockfish NNUE with a your opening net and switch to regular net after the opening is over to see how better it performs.

How about: multiple nets for different types of position, and a master net to choose which net will do the eval.

Rowen · Post by **Rowen** » Tue Jul 28, 2020 10:52 am

Ovyron wrote: ↑Mon Jul 27, 2020 10:44 pm
Rowen wrote: ↑Mon Jul 27, 2020 9:09 am Hi
Perhaps my presumptions are incorrect, but could specialised nets be created that train an engine to play like a human or humans with a particular strength , personality, characteristic, or play like Tal etc, etc.
Thanks
Sorry, that's not how it works. You can't make a net like a move. You can only show it the move at some arbitrary depth, and refute it.

So if you got all the positions that Tal played on his life and trained a net at big depth on them, instead of producing a net that plays like Tal, you'd end with a net that'd destroy Tal. An Anti-Tal net.

It's the same with any other entity, whatever you train it for, it'll learn to defeat it, so we don't even know if Stockfish NNUE plays better chess than Stockfish, but it has learned to defeat Stockfish (they're mostly anti-Stockfish nets!), that's why people report 90 elo advantages and such.

What happens is that Stockfish is so strong that being good at defeating it is creating things good at defeating everything else, but I haven't seen a test that shows that, say, Stockfish NNUE is better at beating Leela than regular Stockfish, but people are just stopping using Leela because Leela is worse than NNUE at beating Stockfish.

@Ovyron and @towforce. thanks for the interesting information. I thought I knew little about nets,it seems I know even less than that! Ok so from now on I will think of nets as data containing 'improvements'. or simply good, or at least better moves. Most of my 'research' is limited to articles I have seen on the chessbase website concerning Leela, fat fritz etc, I should probably do some reading around the subject.
Thanks

SF-NNUE going forward...

Re: SF-NNUE going forward...

Re: SF-NNUE going forward...

Re: SF-NNUE going forward...

Re: SF-NNUE going forward...

Re: SF-NNUE going forward...

Re: SF-NNUE going forward...

Re: SF-NNUE going forward...

Re: SF-NNUE going forward...

Re: SF-NNUE going forward...

Re: SF-NNUE going forward...