SF+NNUE reach the ceiling?

Discussion of anything and everything relating to chess playing software and machines.

Moderator: Ras

mwyoung
Posts: 2727
Joined: Wed May 12, 2010 10:00 pm

Re: SF+NNUE reach the ceiling?

Post by mwyoung »

peter wrote: Thu Aug 27, 2020 1:36 am
mwyoung wrote: Thu Aug 27, 2020 1:28 am It is a book of elite GM games Played to 6 moves. Standard stuff for testing. If the GMs played it. It is in the book.
There isn't any standard stuff for testing if it comes to openings nowadays at all.
If you give 6 moves of all the moves GM played, with so little amounts of games, chance is big, there are some of the (probably repeated with alternate colours) positions better for LC0 and before the ones better for SF come along, match is already over. (Of course that could be the case in advantage for SF as well.)

Who tells you, LC0 doesn't like GM- moves more (or less) than SF NNUE does?
Why exactly 6 moves?
Why not 7, 8, 9, 10, 5, 4, 3, 2?
Bookless games would be interesting even more, to see, how the engines really succeed with openings of their own, isn't it?
Who cares for doublets, if it's for counting the points only anyhow?
:)
Still just to give some more or less provocative thoughts about Elo in computerchess in modern times.
I don't know how to make this clear. I test non stop, and have many many games. And they are posted here.

If you would like to test the engines to your liking, with no book, a gambit book, a 30 move book. Go for it! And post it here.

I test with 6 moves because it gives you many more unique positions then 1,2,3,4,5...And still let's us test the engines in the opening. But there is nothing wrong with testing the engines to 12 moves. Other then human theory is often bad. The bigger the move count that is used.
"The worst thing that can happen to a forum is a running wild attacking moderator(HGM) who is not corrected by the community." - Ed Schröder
But my words like silent raindrops fell. And echoed in the wells of silence.
corres
Posts: 3657
Joined: Wed Nov 18, 2015 11:41 am
Location: hungary

Re: SF+NNUE reach the ceiling?

Post by corres »

peter wrote: Wed Aug 26, 2020 10:48 pm ...
If you are already convinced, these 4 weeks were all to be expected at the utmost, you'll probably make a self- fulfilling prophecy for your own use of NNUE, but as well as for LC0- like nets and engines, for PUCT and MCTS and A-B-search and all of these things, that have reached such a high elo-performance nowadays.
Let it be developed for game- playing, let the engines play and watch them draw and forget about the rest. As Chrilly Donninger used to say: "Like watching the washing machine doing the laundry".
I does not know the Japanese developers how many years worked on shogi but I am sure it was much more than four weeks. Without them there is no SF+NNUE.
I agree you in the viewpoint of gaming chess driving only for higher and higher Elo is a rather duly thing because it will only enhance the number of draws. But chess is chess: It is a draw-type game. And there is a far aim too:
Solving the chess or at least to reach the "perfect" play.
peter
Posts: 3533
Joined: Sat Feb 16, 2008 7:38 am
Full name: Peter Martan

Re: SF+NNUE reach the ceiling?

Post by peter »

corres wrote: Thu Aug 27, 2020 8:25 am I agree you in the viewpoint of gaming chess driving only for higher and higher Elo is a rather duly thing because it will only enhance the number of draws. But chess is chess: It is a draw-type game. And there is a far aim too:
Solving the chess or at least to reach the "perfect" play.
Agreed too!
What I want to point out is, that there isn't a sinlge one way of counting Elo- points anymore in computerchess, if there ever was one, it's always been arbitrarily chosen, which openings, hardware- TC and engine- pools were "standard".
Nobody knows, if the more Elo- points you get by the smaller and smaller "standard"- match isn't simply ongoing Elo- producing only, having less and less meaning for human use of engines and for human chess itself at all.

Solving chess and reaching "perfect" play is quite a different task than developing high eng-eng-Elo-engines only, and it will be human task always to judge "solved" and "perfect". Engines can give outputs and match- results, If man tells a single position or a game or the whole chess to be solved, it's a subjective judgement only, as well as interpretation of any Elo- result is, as long as we don't have 32men tbs.

I know many players (therefore stopping to play, e.g. in corr.- chess) thinking chess to be solved as for their own pov. since quite a long time anyhow, at least as for computerchess.
Look at the engine- output of the starting- position, engines can easily give you an eval in wdl- probabilities nowadays. "Normal gamers" wouldn't start to play with such a low winning- probability at all, would they?
Yet draw can be as thrilling as win, even more if it's thrilling draw and boring win. Problem with computerchess is, if you can't judge the one engine or version to be stronger or weaker than the other one because of too many draws and too little difference in performance to stay inside of statistical confidence- interval with a certain eng-eng-match, there can be much thrill in the single games yet, draws or wins, it doesn' answer your question, which one engine or version is "stronger" or "weaker.

:) regards
Peter.
corres
Posts: 3657
Joined: Wed Nov 18, 2015 11:41 am
Location: hungary

Re: SF+NNUE reach the ceiling?

Post by corres »

peter wrote: Thu Aug 27, 2020 9:21 am ...
I know many players (therefore stopping to play, e.g. in corr.- chess) thinking chess to be solved as for their own pov. since quite a long time anyhow, at least as for computerchess.
Look at the engine- output of the starting- position, engines can easily give you an eval in wdl- probabilities nowadays. "Normal gamers" wouldn't start to play with such a low winning- probability at all, would they?
Yet draw can be as thrilling as win, even more if it's thrilling draw and boring win. Problem with computerchess is, if you can't judge the one engine or version to be stronger or weaker than the other one because of too many draws and too little difference in performance to stay inside of statistical confidence- interval with a certain eng-eng-match, there can be much thrill in the single games yet, draws or wins, it doesn' answer your question, which one engine or version is "stronger" or "weaker.
From about the end of 1900`years is obvious, there is no sense playing in the same group the human GMs and computers. As it is senseless to run competition between human and car or horse and train.
Nowadays only the OTB chess games have real value (without cheating). Computers and chess engine are only auxiliary tool for preparation of GM matches. For an average (or weaker) chess player remained the watching on monitor the engine-engine games.
I love better to play with a good video game because it is more active doing than watching a monitor or
a TV (gazing a TV-series).
corres
Posts: 3657
Joined: Wed Nov 18, 2015 11:41 am
Location: hungary

Re: SF+NNUE reach the ceiling?

Post by corres »

corres wrote: Wed Aug 26, 2020 8:25 pm As we could suppose the possibilities to enhance Elo of SF+NNUE are limited because the chess knowledge of every AB engines also restricted and the reinforced learning can not give plus information to the net of NNUE only it can sharpen the "picture" what can add only limited Elo-enhancement.
It is a reminiscent...
Jhoravi
Posts: 291
Joined: Wed May 08, 2013 6:49 am

Re: SF+NNUE reach the ceiling?

Post by Jhoravi »

Isn't it that in the classical evaluation function, the positions evaluated at the leaf are just quiet positions because all captures and threats are already exhausted during quiescence search? With that in mind, I wonder if the position statistics stored in the NNUE Net are also just quiet positions instead of all positions.
User avatar
pohl4711
Posts: 2907
Joined: Sat Sep 03, 2011 7:25 am
Location: Berlin, Germany
Full name: Stefan Pohl

Re: SF+NNUE reach the ceiling?

Post by pohl4711 »

peter wrote: Thu Aug 27, 2020 9:21 am
Yet draw can be as thrilling as win, even more if it's thrilling draw and boring win. Problem with computerchess is, if you can't judge the one engine or version to be stronger or weaker than the other one because of too many draws and too little difference in performance to stay inside of statistical confidence- interval with a certain eng-eng-match, there can be much thrill in the single games yet, draws or wins, it doesn' answer your question, which one engine or version is "stronger" or "weaker.
Thats exactly the point. Fortunately, Larry Kaufman and me solved this problem already and eliminated draws in computerchess completely. And the statistics and Elo-spreading are just mind-blowing!

https://www.sp-cc.de/nbsc-armageddon-openings.htm

When you look at the testing-result and comparsion to classical openingsets, it is just overwhelming. From my site:

As you can see, the NBSC-Armageddon results are "out-of-this-world": no draws, of course, and an Elo-spreading, that ist around 2.25x wider, than using "classical" openings: Average Elo-spreading of Balsa, SuperGM, Chad 8ply, SF Framework openings, HERT 500 and J.Noomen openings is = 142 in this test-series. Average Elo-spreading of the NBSC Armageddon openings (3-5 moves) is = 318 (!!!)
To make clear, what this really means, mention, that you need an errorbar in your engine-testings, which is 2.25x bigger, when using classical openings, compared to the errorbar, you need, when using NBSC Advanced Armageddon openings, to make a ranking of engines (Engine A better than B, better than C) statistically valid.

Because the ratings of engines are spreaded so much by the NBSC Advanced Armageddon openings. And mention, that you need around 4x more (!) games in order to half the size of the errorbar. That means, you have to play only around 20%-25% amount of games for the same statistical quality of testing-results, when using NBSC Advanced Armageddon openings, compared to classical opening-sets !!! So, all testwork can be done in a 75%-80% shorter time. Or, when you play the same number of games, the statistical quality of the results is way better than ever before!

And my new Advanced Armageddon scoring gives each of the 3 possible gameendings (win white, draw, win black) a different score, like classical chess scoring, which is a huge improvement to classical Armageddon scoring (especially for enginechess, because the engines do not know, that they are playing Armageddon) where a draw and a win for black have the same scoring (1 point for black).
mmt
Posts: 343
Joined: Sun Aug 25, 2019 8:33 am
Full name: .

Re: SF+NNUE reach the ceiling?

Post by mmt »

SF NNUE should help better evaluate your NBSC variant and come up with more openings because different networks will play slightly differently. I looked at the games produced and they are exciting. I think it's a very promising idea to replace classical chess. Especially now that chess is getting some traction among streamers, it would help to get new viewers and good players who are most affected by too many draws should have a reason to promote it. Some more evaluation is needed and then you should try to popularize it everywhere if you want to become known as the famous inventors of a new game :). Come up with a catchy name, ask sites to offer this variant, make patches for GUIs to support it, write a paper like the Alpha Zero team did for the variants they evaluated, ask famous players to comment on it, make websites, write articles, post exciting games, tell streamers, post on forums, etc.
User avatar
pohl4711
Posts: 2907
Joined: Sat Sep 03, 2011 7:25 am
Location: Berlin, Germany
Full name: Stefan Pohl

Re: SF+NNUE reach the ceiling?

Post by pohl4711 »

mmt wrote: Thu Sep 10, 2020 5:14 pm
SF NNUE should help better evaluate your NBSC variant and come up with more openings because different networks will play slightly differently. I looked at the games produced and they are exciting. I think it's a very promising idea to replace classical chess. Especially now that chess is getting some traction among streamers, it would help to get new viewers and good players who are most affected by too many draws should have a reason to promote it. Some more evaluation is needed and then you should try to popularize it everywhere if you want to become known as the famous inventors of a new game :). Come up with a catchy name, ask sites to offer this variant, make patches for GUIs to support it, write a paper like the Alpha Zero team did for the variants they evaluated, ask famous players to comment on it, make websites, write articles, post exciting games, tell streamers, post on forums, etc.
There is no need for any "patches for GUIs to support it". Because NBSC Armageddon openings are no chess-variant. And so, it will not "replace classical chess". You dont understand the concept. NBSC Armageddon is classical chess! Each opening line starts from the chess starting position and then normal (legal) chess moves follow. The only difference to classical chess is the scoring-system, which is used only, after all games are finished. So, all games are classical chess until they are finished. Nothing else!
Each ChessGUI and each chessengine on the planet can play the NBSC Armageddon openings, without any changes or patches.
Chess960 is a chess variant (for example). NBSC Armageddon openings are not!
mmt
Posts: 343
Joined: Sun Aug 25, 2019 8:33 am
Full name: .

Re: SF+NNUE reach the ceiling?

Post by mmt »

pohl4711 wrote: Thu Sep 10, 2020 7:37 pm There is no need for any "patches for GUIs to support it". Because NBSC Armageddon openings are no chess-variant. And so, it will not "replace classical chess". You dont understand the concept. NBSC Armageddon is classical chess! Each opening line starts from the chess starting position and then normal (legal) chess moves follow. The only difference to classical chess is the scoring-system, which is used only, after all games are finished. So, all games are classical chess until they are finished. Nothing else!
Each ChessGUI and each chessengine on the planet can play the NBSC Armageddon openings, without any changes or patches.
Chess960 is a chess variant (for example). NBSC Armageddon openings are not!
You're missing the point. Read "Assessing Game Balance with AlphaZero: Exploring Alternative Rule Sets in Chess". One of the variants among others evaluated was no-castling chess. And since there is a change to the scoring rules with no-black castling (or no black short-castling) also, it is even more certainly a different variant. It is absolutely not classical chess. It can however replace classical chess and a lot of knowledge accumulated by playing classical chess can be reused.

Here is Kramnik making a distinction between classical chess and no-castling chess for example: "...possibly opening a game with 1. f4,
which is not a great idea in classical chess..." But I'm sure a world champion for 6 years doesn't know what "classical chess" means?

First, you need a patch for GUIs for a new scoring system. Second, you need a patch so that when you click "New Game," you get an option to choose this variant and you need a clear indicator about which type of game you're playing. The GUI should also support new opening books. So you're also wrong that you don't need any changes. The good part is that these changes are relatively minor giving this variant a higher chance of being implemented.

I suggest you leave promotion of this variant to Larry Kaufman or somebody else - your writing style with exclamation points everywhere and arguing with the person who likes it shows that it's not your thing at all.