top engines vs 2400 engine knight odds matches

lkaufman · Post by **lkaufman** » Mon Sep 16, 2019 4:39 am

I've been running a series of matches at the (human) World Rapid Championship time limit of 15' + 10" between top engines (running on 7 threads on a 4.9 Ghz 8 core I7 or for Lc0 on an RTX 2080 (with 2 CPU threads) against a single thread engine rated 2415 on the CCRL 40/40 list (Pawny 0.2 64 bit), with the top engines giving knight odds (12 games b1, 8 games g1 knight). I used a very short opening book I made to provide variety while keeping the evals for both sides pretty close to optimal. In each case I picked what I thought were optimal settings for the strong engine for giving knight odds. My prior subjective opinion was that Lc0 using old network 11248 (which beat GM Naroditsky rather badly in a blitz match at handicaps averaging around knight odds) would be very strong, that current strong Lc0 network (I used the one Kai Laskos said was strongest in a recent post, J13b2.136) would be awful, that Stockfish (very latest, Sept 15 2019) with max Contempt (100) would be pretty good, and that while both Komodo and Komodo MCTS with Contempt at 150 for each should be quite good, MCTS should be better. Here are the results:
Stockfish, Komodo (latest regular dev. version), and Lc0 11248 all defeated Pawny by the same margin, 13 to 7! Quite a coincidence.
Komodo MCTS (latest dev. version) won by a much wider margin, 16.5 to 3.5 (as reported elsewhere earlier). This is more than 150 elo better than the other 3 engines, probably too wide a gap to be attributed to sample error.
Lc0 J13b2.136 lost every game (well, 14 straight, when I stopped the test as it was obvious that it had no chance). I expected this; versions after Resign was introduced into training simply can't play chess a piece down, they just want to lose and get it over with!

It seems that I should have chosen an opponent around 2500 on the CCRL 40/40 list rather than around 2400. I did try one match with Gaviota 0.8 64 bit, which is probably a bit too strong (it beat Komodo MCTS 12 to 8, same conditions). If anyone wants to propose an alternate opponent, closer to 2500 on that list, which is free and easy to download and run in the Fritz GUI, please do so. It can be single thread or four thread, as long as the relevant rating is on that list near 2500.

Raphexon · Post by **Raphexon** » Mon Sep 16, 2019 7:17 pm

Can you tell me how KMCTS works at ultra long time control?
What happens when the MCTS hash is full?

lkaufman · Post by **lkaufman** » Mon Sep 16, 2019 9:15 pm

Raphexon wrote: ↑Mon Sep 16, 2019 7:17 pm Can you tell me how KMCTS works at ultra long time control?
What happens when the MCTS hash is full?

big

In timed play it should just go ahead and move when MCTS hash is full. In infinite analysis mode it will just stop changing anything, waiting for the user to do something. The default is enough for normal time limits on normal computers, but if you have very good hardware and/or are using time limits beyond standard tournament time limits you should increase it. Since computers with many threads almost almost also have big memory, it should almost always be possible to set MCTS hash large enough to accommodate your number of threads at anything up thru overnight analysis, although I haven't actually confirmed that this is so. I mean that I think that the amount of time needed to fill MCTS that used say half the memory on your machine would be more (probably much more) than 8 hours for any computer anyone would actually be likely to buy (maybe not some very old one with tiny RAM), but I'm not certain.

Gian-Carlo Pascutto · Post by **Gian-Carlo Pascutto** » Thu Sep 19, 2019 6:57 pm

You can try this with Stoofvlees as well (you should have a copy). I expect it to do even better than lc0.

lkaufman · Post by **lkaufman** » Fri Sep 20, 2019 3:39 am

Gian-Carlo Pascutto wrote: ↑Thu Sep 19, 2019 6:57 pm You can try this with Stoofvlees as well (you should have a copy). I expect it to do even better than lc0.

I've been too busy to check it out, but since you made this comment I'm motivated to check it out next week when I should have time. Is this the chess analog of the Leela Go situation where the Zero versions that are the strongest in normal Go are awful at high handicap GO, while normal Leela is pretty good at it?

Gian-Carlo Pascutto · Post by **Gian-Carlo Pascutto** » Mon Sep 23, 2019 4:41 pm

lkaufman wrote: ↑Fri Sep 20, 2019 3:39 am
Gian-Carlo Pascutto wrote: ↑Thu Sep 19, 2019 6:57 pm You can try this with Stoofvlees as well (you should have a copy). I expect it to do even better than lc0.
I've been too busy to check it out, but since you made this comment I'm motivated to check it out next week when I should have time. Is this the chess analog of the Leela Go situation where the Zero versions that are the strongest in normal Go are awful at high handicap GO, while normal Leela is pretty good at it?

Yes, it's a reasonable analogy.

lkaufman · Post by **lkaufman** » Mon Sep 23, 2019 6:32 pm

Gian-Carlo Pascutto wrote: ↑Mon Sep 23, 2019 4:41 pm
lkaufman wrote: ↑Fri Sep 20, 2019 3:39 am
Gian-Carlo Pascutto wrote: ↑Thu Sep 19, 2019 6:57 pm You can try this with Stoofvlees as well (you should have a copy). I expect it to do even better than lc0.
I've been too busy to check it out, but since you made this comment I'm motivated to check it out next week when I should have time. Is this the chess analog of the Leela Go situation where the Zero versions that are the strongest in normal Go are awful at high handicap GO, while normal Leela is pretty good at it?
Yes, it's a reasonable analogy.

Thanks. Unfortunately it is just as hopeless at giving knight odds as the newer Lc0 networks are, they both just give away pieces almost as if they were playing giveaway chess. It's easy to see the reason: the eval for queen odds is less negative than the eval for pawn and two move odds (remove f7, play e4, WTM)!! Unless a program has some idea of the relative badness of being down different amounts of material, it cannot play sensibly when way behind. Let me know if you have a version that you think addresses this issue.

Gian-Carlo Pascutto · Post by **Gian-Carlo Pascutto** » Mon Sep 23, 2019 6:55 pm

lkaufman wrote: ↑Mon Sep 23, 2019 6:32 pm Thanks. Unfortunately it is just as hopeless at giving knight odds as the newer Lc0 networks are, they both just give away pieces almost as if they were playing giveaway chess. It's easy to see the reason: the eval for queen odds is less negative than the eval for pawn and two move odds (remove f7, play e4, WTM)!! Unless a program has some idea of the relative badness of being down different amounts of material, it cannot play sensibly when way behind. Let me know if you have a version that you think addresses this issue.

Thanks for testing.

I'm a bit surprised and need to think about this more.

lkaufman · Post by **lkaufman** » Mon Sep 23, 2019 7:29 pm

Gian-Carlo Pascutto wrote: ↑Mon Sep 23, 2019 6:55 pm
lkaufman wrote: ↑Mon Sep 23, 2019 6:32 pm Thanks. Unfortunately it is just as hopeless at giving knight odds as the newer Lc0 networks are, they both just give away pieces almost as if they were playing giveaway chess. It's easy to see the reason: the eval for queen odds is less negative than the eval for pawn and two move odds (remove f7, play e4, WTM)!! Unless a program has some idea of the relative badness of being down different amounts of material, it cannot play sensibly when way behind. Let me know if you have a version that you think addresses this issue.
Thanks for testing.

I'm a bit surprised and need to think about this more.

Here is a simple test for any engine to know if it will be any good at giving handicaps or playing lost positions. Just compare the eval with b1 knight off with the eval with both White knights off. The eval should be something like twice as bad when down two knights as when down one. The version I tested of your engine thinks it's actually better to be down two knights than one knight!!

Gian-Carlo Pascutto · Post by **Gian-Carlo Pascutto** » Mon Sep 23, 2019 9:11 pm

lkaufman wrote: ↑Mon Sep 23, 2019 7:29 pm The eval should be something like twice as bad when down two knights as when down one. The version I tested of your engine thinks it's actually better to be down two knights than one knight!!

The raw eval in the starting position is allright (-1.6 vs -2.4). I wonder if it gets confused wrt development and castling rights from both positions.

top engines vs 2400 engine knight odds matches

top engines vs 2400 engine knight odds matches

Re: top engines vs 2400 engine knight odds matches

Re: top engines vs 2400 engine knight odds matches

Re: top engines vs 2400 engine knight odds matches

Re: top engines vs 2400 engine knight odds matches

Re: top engines vs 2400 engine knight odds matches

Re: top engines vs 2400 engine knight odds matches

Re: top engines vs 2400 engine knight odds matches

Re: top engines vs 2400 engine knight odds matches

Re: top engines vs 2400 engine knight odds matches