Question about "CuteChess"

Rebel · Post by **Rebel** » Sat Oct 16, 2021 9:07 pm

For cute starters

set ENGINE-ONE-NAME=Orion_0.7_NNUE
set ENGINE-ONE-EXE=Orion_0.7_NNUE.exe
set ENGINE-ONE-PROTOCOL=uci

set ENGINE-TWO-NAME=Komodo_14
set ENGINE-TWO-EXE=Komodo_14.exe
set ENGINE-TWO-PROTOCOL=uci

set THREADS=20
set RESTART=on

set TIME-CONTROL=15+0.1
set TIME-MARGIN=1000

set DRAW-MOVE-NUMBER=160
set DRAW-MOVE-COUNT=3
set DRAW-SCORE=100

set RESIGN-MOVE-COUNT=5
set RESIGN-SCORE=999

set GAMES-TO-PLAY=5000
set PGN-INPUT=8-moves.pgn
set PGN-INPUT-PLIES=20
set PGN-OUTPUT=all.pgn

del %PGN-OUTPUT%

cutechess-cli -concurrency %THREADS% -engine name=%ENGINE-ONE-NAME% cmd=%ENGINE-ONE-EXE% proto=%ENGINE-ONE-PROTOCOL% -engine name=%ENGINE-TWO-NAME% cmd=%ENGINE-TWO-EXE% proto=%ENGINE-TWO-PROTOCOL% -each restart=%RESTART% tc=%TIME-CONTROL% timemargin=%TIME-MARGIN% -draw movenumber=%DRAW-MOVE-NUMBER% movecount=%DRAW-MOVE-COUNT% score=%DRAW-SCORE% -resign movecount=%RESIGN-MOVE-COUNT% score=%RESIGN-SCORE% -rounds %GAMES-TO-PLAY% -repeat -pgnout %PGN-OUTPUT% -openings file=%PGN-INPUT% plies=%PGN-INPUT-PLIES%

Save as batch file (*.bat)

Jouni · Post by **Jouni** » Sat Oct 16, 2021 10:08 pm

I prefer using GUI. 1) Everything can be done inside GUI 2) You see games and see if something is wrong like nps

.

AdminX · Post by **AdminX** » Sat Oct 16, 2021 10:17 pm

Rebel wrote: ↑Sat Oct 16, 2021 9:07 pm For cute starters

Code: Select all

set ENGINE-ONE-NAME=Orion_0.7_NNUE
set ENGINE-ONE-EXE=Orion_0.7_NNUE.exe
set ENGINE-ONE-PROTOCOL=uci

set ENGINE-TWO-NAME=Komodo_14
set ENGINE-TWO-EXE=Komodo_14.exe
set ENGINE-TWO-PROTOCOL=uci

set THREADS=20
set RESTART=on

set TIME-CONTROL=15+0.1
set TIME-MARGIN=1000

set DRAW-MOVE-NUMBER=160
set DRAW-MOVE-COUNT=3
set DRAW-SCORE=100

set RESIGN-MOVE-COUNT=5
set RESIGN-SCORE=999

set GAMES-TO-PLAY=5000
set PGN-INPUT=8-moves.pgn
set PGN-INPUT-PLIES=20
set PGN-OUTPUT=all.pgn

del %PGN-OUTPUT%

cutechess-cli -concurrency %THREADS% -engine name=%ENGINE-ONE-NAME% cmd=%ENGINE-ONE-EXE% proto=%ENGINE-ONE-PROTOCOL% -engine name=%ENGINE-TWO-NAME% cmd=%ENGINE-TWO-EXE% proto=%ENGINE-TWO-PROTOCOL% -each restart=%RESTART% tc=%TIME-CONTROL% timemargin=%TIME-MARGIN% -draw movenumber=%DRAW-MOVE-NUMBER% movecount=%DRAW-MOVE-COUNT% score=%DRAW-SCORE% -resign movecount=%RESIGN-MOVE-COUNT% score=%RESIGN-SCORE% -rounds %GAMES-TO-PLAY% -repeat -pgnout %PGN-OUTPUT% -openings file=%PGN-INPUT% plies=%PGN-INPUT-PLIES%

Save as batch file (*.bat)

Thanks Ed.

Modern Times · Post by **Modern Times** » Sat Oct 16, 2021 10:23 pm

Jouni wrote: ↑Sat Oct 16, 2021 10:08 pm I prefer using GUI. 1) Everything can be done inside GUI 2) You see games and see if something is wrong like nps .

Agreed. Personally I want to see the engines playing in the GUI for extra assurance that all is well. And simply watch them if I want to.

lkaufman · Post by **lkaufman** » Sat Oct 16, 2021 11:29 pm

Modern Times wrote: ↑Sat Oct 16, 2021 10:23 pm
Jouni wrote: ↑Sat Oct 16, 2021 10:08 pm I prefer using GUI. 1) Everything can be done inside GUI 2) You see games and see if something is wrong like nps .
Agreed. Personally I want to see the engines playing in the GUI for extra assurance that all is well. And simply watch them if I want to.

Yes indeed. I'll probably never use Little Blitzer again!

ChickenLogic · Post by **ChickenLogic** » Sun Oct 17, 2021 2:38 am

Jouni wrote: ↑Sat Oct 16, 2021 10:08 pm I prefer using GUI. 1) Everything can be done inside GUI 2) You see games and see if something is wrong like nps .

There is a severe difference between a couple of hundred games at long time controls while watching some of them and playing multiple thousands of games with 8 or more games going on at the same time. This is what 'testing' means to me. For this a script is simply better. A log file can be written too which can be viewed live as well. Thus checking for anything foul isn't exactly hard either.

I had a script that asked for a path. It did check for binaries in that folder and used the names of the binaries to set up the tournament. It didn't take any more labor than putting the wanted engines in a folder and starting the script with a single command. I forgot to make a backup so naturally I lost it after getting rid of my old OS. Nowadays I mainly watch VLTC games and don't really do proper testing anymore so I didn't exactly have a reason to write it again.

That was before we had fancy .nnue and .bin files (and w/e else floats around) so all it took to get the names was a simple for loop. Now it may need some extra lines to ensure only the names of binaries are used.

I did assume that people working on Komodo would choose to save time for every test they queue over being able to watch the games live.

lkaufman · Post by **lkaufman** » Sun Oct 17, 2021 4:28 am

ChickenLogic wrote: ↑Sun Oct 17, 2021 2:38 am
Jouni wrote: ↑Sat Oct 16, 2021 10:08 pm I prefer using GUI. 1) Everything can be done inside GUI 2) You see games and see if something is wrong like nps .
There is a severe difference between a couple of hundred games at long time controls while watching some of them and playing multiple thousands of games with 8 or more games going on at the same time. This is what 'testing' means to me. For this a script is simply better. A log file can be written too which can be viewed live as well. Thus checking for anything foul isn't exactly hard either.

I had a script that asked for a path. It did check for binaries in that folder and used the names of the binaries to set up the tournament. It didn't take any more labor than putting the wanted engines in a folder and starting the script with a single command. I forgot to make a backup so naturally I lost it after getting rid of my old OS. Nowadays I mainly watch VLTC games and don't really do proper testing anymore so I didn't exactly have a reason to write it again.

That was before we had fancy .nnue and .bin files (and w/e else floats around) so all it took to get the names was a simple for loop. Now it may need some extra lines to ensure only the names of binaries are used.

I did assume that people working on Komodo would choose to save time for every test they queue over being able to watch the games live.

Most of our testing is done on our own Linux tester, which doesn't allow viewing of the games. But I do some testing on Windows myself on my own i9 laptop, and for those tests it is nice to see the games, even if they are at roughly one second a move or so, as I can quickly tell if I did something wrong or something is corrupted. This is usually for tests that we wouldn't typically do on our tester, such as tests against engines that don't even have Linux versions or with special conditions that aren't in our tester. Stockfish apparently makes all decisions by rigorous rules and math, as far as I know, but we (Komodo) use much more human judgment, and in particular I am able to make some judgments as a GM, where it helps to actually watch games. We don't normally keep or even test ideas that seem wrong to me from a chess perspective, even if some test might show them to be worth an elo or two. Do any strong chess players make similar judgements for Stockfish? Sometimes we see things in Stockfish that just don't make sense to me; they must have tested well, but perhaps did so for some other reason than the actual merit of the idea itself. We would prefer not to promote illogical ideas that show a tiny elo gain.

ChickenLogic · Post by **ChickenLogic** » Sun Oct 17, 2021 4:35 pm

For the watching games part: ok fair enough.

About the 'human judgement' part I'll have to 'side' with Stockfish's approach. Here's why:
Chess, and chess engines are so complex that I doubt anybody has a complete enough understanding of what's happening to be able to judge what's going on.
Let's say we deem an idea to be violating our understanding of chess but test it anyway - what are the possible outcomes?

a) we're correct and the test result confirms our understanding to be correct (best case)

b) we're correct but the test doesn't agree (let's presume a STC test and LTC test under fishtest conditions passed). What now? The test is precise enough for scientific discovery. However, under the assumption that our judgement is correct we can only conclude that there is another part in the code that is so flawed it enables another flawed idea. Alternatively both tests could've been a fluke but that's very unlikely.

c) we're incorrect in our understanding and the test confirms that our judgement is wrong. a) and c) may be indistinguishable for us since nobody likes to concede to be incorrect.

d) we're actually incorrect about an idea that we deem to be good and the test passed with statistical significance due to luck (worst case)

Especially after the introduction of NNUE I'd argue it's impossible to judge an idea. Search depends on evaluation and the NNUE evaluation (for better or for worse) contains millions and millions of parameters. And we do know that neural nets aren't flawless either so we've potentially got thousands of 'invalid ideas' in our evaluation to correct for in search.

With a purely mathematical approach we reduce it to:
a) idea works
b) idea doesn't work
with a very slight chance of being wrong.
Maybe we do accept flawed ideas. But is it necessarily bad? I don't think so. Stuff that doesn't work anymore constantly gets simplified away.

lkaufman · Post by **lkaufman** » Sun Oct 17, 2021 6:03 pm

ChickenLogic wrote: ↑Sun Oct 17, 2021 4:35 pm For the watching games part: ok fair enough.

About the 'human judgement' part I'll have to 'side' with Stockfish's approach. Here's why:
Chess, and chess engines are so complex that I doubt anybody has a complete enough understanding of what's happening to be able to judge what's going on.
Let's say we deem an idea to be violating our understanding of chess but test it anyway - what are the possible outcomes?

a) we're correct and the test result confirms our understanding to be correct (best case)

b) we're correct but the test doesn't agree (let's presume a STC test and LTC test under fishtest conditions passed). What now? The test is precise enough for scientific discovery. However, under the assumption that our judgement is correct we can only conclude that there is another part in the code that is so flawed it enables another flawed idea. Alternatively both tests could've been a fluke but that's very unlikely.

c) we're incorrect in our understanding and the test confirms that our judgement is wrong. a) and c) may be indistinguishable for us since nobody likes to concede to be incorrect.

d) we're actually incorrect about an idea that we deem to be good and the test passed with statistical significance due to luck (worst case)

Especially after the introduction of NNUE I'd argue it's impossible to judge an idea. Search depends on evaluation and the NNUE evaluation (for better or for worse) contains millions and millions of parameters. And we do know that neural nets aren't flawless either so we've potentially got thousands of 'invalid ideas' in our evaluation to correct for in search.

With a purely mathematical approach we reduce it to:
a) idea works
b) idea doesn't work
with a very slight chance of being wrong.
Maybe we do accept flawed ideas. But is it necessarily bad? I don't think so. Stuff that doesn't work anymore constantly gets simplified away.

Stockfish probably has to operate this way because so many people are involved, you need rules. It seems to work fine for the stated goal of optimizing Elo. But it has bad side effects. Sometimes Stockfish can play quite poorly, usually in positions where it is unlikely to change the game result (but still possible). For example Uri ran tests that showed that Stockfish actually sometimes drew or even lost games to much weaker engines despite receiving Queen odds (!!!). I can't imagine that this would happen to Komodo Dragon ever (short of an outright bug). Testing won't detect bad play in positions with a 99% win probability, that is where judgment is needed. We want Dragon to play well in any position that is not resignable, optimising Elo is not our sole goal. I guess that is the difference. NNUE does make it much harder to use judgment, but there are still places for it.

Question about "CuteChess"

Re: Question about "CuteChess"

Re: Question about "CuteChess"

Re: Question about "CuteChess"

Re: Question about "CuteChess"

Re: Question about "CuteChess"

Re: Question about "CuteChess"

Re: Question about "CuteChess"

Re: Question about "CuteChess"

Re: Question about "CuteChess"