Devlog of Leorik

algerbrex · Post by **algerbrex** » Tue Jul 19, 2022 1:24 am

lithander wrote: ↑Mon Jul 18, 2022 11:29 pm I use the varied.bin that came with SCID and you can find it here: https://sourceforge.net/p/scid/code/ci/ ... ree/books/

I don't know much about opening books but I was wondering if your book was maybe too small or otherwise not a good at representing what the engines would face under tournament/testing conditions. In the extreme case no book is used at all and two equal strength engines basically replay the same match over and over. If one engine happens to win this game from the starting position than it would win all the time and it would look like this engine is much stronger. So I was just thinking maybe it's something to do with the openings...

I just ran a match at the same time-controls you used: 5s + 500ms increment and Hash set to 50MB on an i7-9700K with 7 games in parallel and got this result which seems again more inline with expectations than your gauntlet results.
Code: Select all
Score of Leorik-2.2 vs blunder-8.0.0: 571 - 347 - 502  [0.579] 1420
...      Leorik-2.2 playing White: 313 - 153 - 244  [0.613] 710
...      Leorik-2.2 playing Black: 258 - 194 - 258  [0.545] 710
...      White vs Black: 507 - 411 - 502  [0.534] 1420
Elo difference: 55.3 +/- 14.6, LOS: 100.0 %, DrawRatio: 35.4 %

Interesting. I'll try using varied.bin to run some further tests, just in case it is the opening book.

Still not sure why my results are so skewed compared to yours, but I'll keep doing a little more investigating since it seems wrong to just chalk this up to fast time controls. It seems hard to me for fast time controls to completely explain why two engines that seem to be 30 Elo apart for you, are more than 100 Elo apart for me when all of our other conditions are the same. Especially since the gauntlet game out as expected, though Blunder 8.4.5 and 8.0.0 got crushed by most of the gauntlet, 8.4.5 gained 35 Elo, which translates about as expected for 8.4.5 being ~50 Elo stronger in self-play.

j.t. · Post by **j.t.** » Tue Jul 19, 2022 1:35 am

Mike Sherwin wrote: ↑Mon Jul 18, 2022 12:49 am I found myself in a game against a human player. Oh no, I didn't want to play a human yet! I get nervous when I play humans and my level of play drops precipitously.

On lichess you can disable the rating in the preferences, so that you won't be distracted by your own rating or the rating of your opponent. Or you can enable "Zen" mode, so that you only see the chess board when playing. Maybe that helps you when playing humans.

Mike Sherwin · Post by **Mike Sherwin** » Tue Jul 19, 2022 3:56 am

lithander wrote: ↑Mon Jul 18, 2022 11:44 pm
Mike Sherwin wrote: ↑Mon Jul 18, 2022 11:34 pm I posted the game already. I know that I did. But it is not showing so here it is again. But without comment because I lack the energy and will to do it all again.
Wow, you did it again! That's amazing. I have no idea how you know when to trade a rook for a bishop but apparently it was exactly what was needed in that situation. Pitty you lost the extensive commentary... I would have loved to read that. Did you win first try?

Okay, so I got in a 20 minute nap. Got a bite to eat. And have my nightly cup of coffee cooling off. So now I will reveal my approach when I play Leorik.

1. d4 d5 nothing fancy just a solid game
2. c4 c6 but with some dynamics
3. Nc3 Nf6
4. Nf3 e6 semi-slav, very easy to play
5. e3 Nd7
6. Bd3 dc
7. Bc4 b5
8. Be2 a3 getting ready to push c5
9. e4 Bb4 i don't want b4, e5, PxN, PxN and open up the game for Leorik
10. e5 Nd5
11. Bd2 Bb7? but Bb7 was a bad move, should have played BxNc3 like I intended
12. 0-0? Bc3 Leorik did not find the best move
13. bc c5
14. a4 c4 not the best but I knew Leorik would not take advantage
15. Qb1 Bc6 all is well now
16. Bg5 Qb6
17. Qc2 h6
18. Bc1 0-0 had to castle before being denied
19. Ba3 Rfc8 the mighty a3 bishop has no helpers and no targets
20. Qd2 Nf8 making sure I have defenders
21. h4 Ng6
22. g3 Ne7 just searching for a plan
23. Bc5 Qb7 the mesquitoes are bad this time of year
24. Rfb1 Nb6 bite a horse, you know you want too
25. Nh2 Nd5 i'm coming for you mosquito
26. a5 Nd7 SF, a5 was a good move? Now I have no worries on the QS
27. Ba3 Qc7 mosquito flies away

28. BD6 Qd8 damn mosquito is back and chases me into bedroom of naked woman

29. Re1 Nc7 not the best but I'm chasing that damn mosquito
30. Bh5 Ne8
31. Bb4 Nf8
32. Qc2 Rc7
33. Bg4 f5 not a good move f5 but I wanted to get something going on the KS
34. Bh5 g6 i offer that damn Mosquito a tasty bite that it could not resist
35. Bf8 Kf8
36. Bg6 Rg7
37. h5 Nc7 get the knight back in the game
38. Qc1 Qg5 my plan to eliminate the Skeleton King's archers
39. Qg5 hg
40. g4 Rg6 g4 justified my whole plan
41. hg f4 now only black has play
42. f3 Kg7 Darn I could have gone b4 immediately
43. Rdc1 Rc8
44. Rab8 Nd5 probably Leorik's losing move
45. Nf1 b4 b4 the winning move because it totally shuts down whites play
and I did not miss it the second time

I don't think I played in this game as well as I played in some of the other games. It is just that I am able to predict what Leorik will do to a large extent because I understand Leorik's programming. So I try to avoid positions it plays well. And definitely I avoid slugfest where the search sees what I would miss.

Leorik in my opinion played very human like once again with the exception it traded its most valuable active pieces to win a small amount of material. A good human player would not do that. In the following position White has no attacks and no mobility on blacks side of the board while black does. And those backward pawns and corralled knight are atrocious.

[fen]8/2r5/p3p1k1/Pb1nP1p1/1P1P1pP1/2p2P2/8/R1R1KN2 w - - 4 50 [/fen]

Mike Sherwin · Post by **Mike Sherwin** » Tue Jul 19, 2022 3:57 am

j.t. wrote: ↑Tue Jul 19, 2022 1:35 am
Mike Sherwin wrote: ↑Mon Jul 18, 2022 12:49 am I found myself in a game against a human player. Oh no, I didn't want to play a human yet! I get nervous when I play humans and my level of play drops precipitously.
On lichess you can disable the rating in the preferences, so that you won't be distracted by your own rating or the rating of your opponent. Or you can enable "Zen" mode, so that you only see the chess board when playing. Maybe that helps you when playing humans.

Thanks!

algerbrex · Post by **algerbrex** » Tue Jul 19, 2022 5:24 pm

I won't keep posting here to avoid cluttering up your dev log after this post, but after running a second test last night versus Leorik 2.2, with the command:

Code: Select all

cutechess-cli.exe ^
-pgnout leorik-gauntlet.pgn ^
-engine cmd=Leorik-2.2.exe ^
-engine cmd=blunder-7.6.0.exe ^
-engine cmd=blunder-8.0.0.exe ^
-engine cmd=blunder.exe ^
-each proto=uci option.Hash=16 tc=inf/5+0.5 book=varied.bin ^
-games 2 -rounds 400 -repeat 2 ^
-tournament gauntlet ^
-concurrency 8 ^
-ratinginterval 10 ^
-recover

The results seemed a bit more reasonable:

Code: Select all

Rank Name                          Elo     +/-   Games   Score    Draw
   0 Leorik 2.2                     81      12    2400   61.5%   30.7%
   1 Blunder 8.4.5                 -59      20     800   41.6%   31.9%
   2 Blunder 8.0.0                 -78      20     800   38.9%   29.9%
   3 Blunder 7.6.0                -107      21     800   35.1%   30.4%

Finished match

I'm still not sure what's going on. It's hard for me to buy Blunder 8.4.5 is 60 Elo weaker than Leorik 2.2 at 5s+0.5s, and 8.0.0 is nearly 80 Elo weaker still. I even briefly tried 30s+0.5s, and Leorik 2.2 still pulled ahead quite easily.

Going back and testing Leorik 2.1, all three versions of Blunder actually had a hard time pulling ahead of it, even though I think a test result you posted a while back showed 7.6.0 was still a bit stronger than Leorik 2.1, which means 8.0.0 and 8.4.5 should be clearly stronger.

It doesn't just seem to be Leorik, though. Testing Blunder versus Inanis 1.0.1, rated ~2690 on the CCRL Blitz list, or Dumb 1.9, rated 2700, or even Nebula 2.0 rated 2654, the latest dev version of Blunder is still getting smashed. Even though 8.0.0 is rated 20 Elo points higher than Nebula 2.0 on the CCRL, and from the testing results, it beat Nebula 20-12 at 2min + 1s. If it was just Leorik that smashed Blunder, it'd be strange to me but I could explain it because certain engines have always give Blunder trouble. For a while it was rustic, and then Minimal Chess, and then Nalwald. But this seems to be a pattern with multiple, different engines.

All of this is odd to me, and while I'm going to keep doing some more investigating, I may just have to chalk it up to the fast time controls for now, as I don't see what else could be causing this huge gap between Blunder and other engines at these time controls. The only other ideas I have are to try a different match manager besides cutechess, try a different computer, or try some other engines rated around 2700 and see if they match up pretty evenly against Blunder.

Guenther · Post by **Guenther** » Tue Jul 19, 2022 7:20 pm

algerbrex wrote: ↑Tue Jul 19, 2022 5:24 pm I won't keep posting here to avoid cluttering up your dev log after this post, but after running a second test last night versus Leorik 2.2, with the command:
Code: Select all
cutechess-cli.exe ^
-pgnout leorik-gauntlet.pgn ^
-engine cmd=Leorik-2.2.exe ^
-engine cmd=blunder-7.6.0.exe ^
-engine cmd=blunder-8.0.0.exe ^
-engine cmd=blunder.exe ^
-each proto=uci option.Hash=16 tc=inf/5+0.5 book=varied.bin ^
-games 2 -rounds 400 -repeat 2 ^
-tournament gauntlet ^
-concurrency 8 ^
-ratinginterval 10 ^
-recover
The results seemed a bit more reasonable:
Code: Select all
Rank Name                          Elo     +/-   Games   Score    Draw
   0 Leorik 2.2                     81      12    2400   61.5%   30.7%
   1 Blunder 8.4.5                 -59      20     800   41.6%   31.9%
   2 Blunder 8.0.0                 -78      20     800   38.9%   29.9%
   3 Blunder 7.6.0                -107      21     800   35.1%   30.4%

Finished match
I'm still not sure what's going on. It's hard for me to buy Blunder 8.4.5 is 60 Elo weaker than Leorik 2.2 at 5s+0.5s, and 8.0.0 is nearly 80 Elo weaker still. I even briefly tried 30s+0.5s, and Leorik 2.2 still pulled ahead quite easily.

Going back and testing Leorik 2.1, all three versions of Blunder actually had a hard time pulling ahead of it, even though I think a test result you posted a while back showed 7.6.0 was still a bit stronger than Leorik 2.1, which means 8.0.0 and 8.4.5 should be clearly stronger.

It doesn't just seem to be Leorik, though. Testing Blunder versus Inanis 1.0.1, rated ~2690 on the CCRL Blitz list, or Dumb 1.9, rated 2700, or even Nebula 2.0 rated 2654, the latest dev version of Blunder is still getting smashed. Even though 8.0.0 is rated 20 Elo points higher than Nebula 2.0 on the CCRL, and from the testing results, it beat Nebula 20-12 at 2min + 1s. If it was just Leorik that smashed Blunder, it'd be strange to me but I could explain it because certain engines have always give Blunder trouble. For a while it was rustic, and then Minimal Chess, and then Nalwald. But this seems to be a pattern with multiple, different engines.

All of this is odd to me, and while I'm going to keep doing some more investigating, I may just have to chalk it up to the fast time controls for now, as I don't see what else could be causing this huge gap between Blunder and other engines at these time controls. The only other ideas I have are to try a different match manager besides cutechess, try a different computer, or try some other engines rated around 2700 and see if they match up pretty evenly against Blunder.

Have you checked the games themselves? Are they avalaible for download?
I see you use concurrency 8, what hardware is used for this and is the computer doing some other tasks?
What depths are expected for Blunder at those tcs - did you realize perhaps time trouble blunders in those games?
Last not least (this might also indicate a problem with concurrency) - what happens if you change the order of
the started programs and place Leorik last in the sequence!?

Well, I would like to check those games ;-)

algerbrex · Post by **algerbrex** » Tue Jul 19, 2022 7:37 pm

Guenther wrote: ↑Tue Jul 19, 2022 7:20 pm Have you checked the games themselves? Are they avalaible for download?
I see you use concurrency 8, what hardware is used for this and is the computer doing some other tasks?
What depths are expected for Blunder at those tcs - did you realize perhaps time trouble blunders in those games?
Last not least (this might also indicate a problem with concurrency) - what happens if you change the order of
the started programs and place Leorik last in the sequence!?

Well, I would like to check those games

I admit I didn't check the games thoroughly, although I went through a couple of them and didn't see anything odd. Here are the games available for download if you'd like to look through them yourself!

The hardware I run all of my testing on is my laptop, which isn't the best, but it's worked so far. It runs on a Ryzen 7 4700U with 8 cores and 8GB of RAM. And whenever I run test games, I don't generally use my computer for other tests, I usually just go about my day and only leave the terminal open to run the games. In fact, I've actually invested in a little desktop fan to make sure my computer isn't overheating while running games, which has seemed to help. I'll try running with a smaller concurrency number though, as it very well might be the problem and I'm missing something.

At roughly 0.5s per move, Blunder should be reaching about a depth of 10-12, which I don't think is unreasonable. I hadn't imagined time trouble might be an issue, but I'll check for that too in some of the games. And I can also try switching the order to see what happens, although as I said I'm not sure how much it might affect the outcome. Wouldn't hurt to check though

Guenther · Post by **Guenther** » Tue Jul 19, 2022 9:41 pm

algerbrex wrote: ↑Tue Jul 19, 2022 7:37 pm
Guenther wrote: ↑Tue Jul 19, 2022 7:20 pm Have you checked the games themselves? Are they avalaible for download?
I see you use concurrency 8, what hardware is used for this and is the computer doing some other tasks?
What depths are expected for Blunder at those tcs - did you realize perhaps time trouble blunders in those games?
Last not least (this might also indicate a problem with concurrency) - what happens if you change the order of
the started programs and place Leorik last in the sequence!?

Well, I would like to check those games ;-)
I admit I didn't check the games thoroughly, although I went through a couple of them and didn't see anything odd. Here are the games available for download if you'd like to look through them yourself!

The hardware I run all of my testing on is my laptop, which isn't the best, but it's worked so far. It runs on a Ryzen 7 4700U with 8 cores and 8GB of RAM. And whenever I run test games, I don't generally use my computer for other tests, I usually just go about my day and only leave the terminal open to run the games. In fact, I've actually invested in a little desktop fan to make sure my computer isn't overheating while running games, which has seemed to help. I'll try running with a smaller concurrency number though, as it very well might be the problem and I'm missing something.

At roughly 0.5s per move, Blunder should be reaching about a depth of 10-12, which I don't think is unreasonable. I hadn't imagined time trouble might be an issue, but I'll check for that too in some of the games. And I can also try switching the order to see what happens, although as I said I'm not sure how much it might affect the outcome. Wouldn't hurt to check though :)

I did some first checks and replied in your Blunder thread to unburden Lithanders topic here.

lithander · Post by **lithander** » Wed Jul 20, 2022 1:26 am

Mike Sherwin wrote: ↑Tue Jul 19, 2022 3:56 am So I try to avoid positions it plays well. And definitely I avoid slugfest where the search sees what I would miss.

Leorik in my opinion played very human like once again with the exception it traded its most valuable active pieces to win a small amount of material. A good human player would not do that. In the following position White has no attacks and no mobility on blacks side of the board while black does. And those backward pawns and corralled knight are atrocious.

Inspired by your post I implemented an 'eval' command today so we can ask Leorik of his opinion of the position. For the position you posted Leorik is up 3 pawns in material, by my own naive counting. And still the position is losing for black. I wasn't too optimistic about Leoriks static eval to get that right:

[fen]8/2r5/p3p1k1/Pb1nP1p1/1P1P1pP1/2p2P2/8/R1R1KN2 w - - 4 50[/fen]

Code: Select all

             MG  +  EG
Material:   133      6 * 0,65
   Pawns:    44    -72 * 0,65
Mobility:   -58
--------+------------------------
   White:    75

Despite everything I've added recently to make the eval less materialistic it's still +75 cp in favor for white. But even Stockfish seems to give a positive score for white. (of course not when you let it search)

That you successfully played Leorik into a position he has a hard time grasping becomes even clearer in the position where he makes it's losing move:

[fen]2r5/2n3k1/p1b1p1P1/Pp2P1p1/2pP1pP1/2P2P2/7N/R1R3K1 w - - 3 44[/fen]

Code: Select all

             MG  +  EG
Material:   208     37 * 0,65
   Pawns:    -8     84 * 0,65
Mobility:   -14
--------+------------------------
   White:   265

I planned to not work on the eval for a while and focus on the search instead, specifically I wanted to implement SEE and maybe try using it for move ordering. But this little case-study gives me new ideas for the eval, now! So thanks again for exposing Leorik's weaknesses!

lithander · Post by **lithander** » Sat Jul 23, 2022 1:59 pm

lithander wrote: ↑Wed Jul 20, 2022 1:26 am But this little case-study gives me new ideas for the eval, now!

So one of the ideas is that Mike mentioned backward pawns and I had to look up what that is. Wikipedia says: "In chess, a backward pawn is a pawn that is behind all pawns of the same color on the adjacent files and cannot be safely advanced."

My implementation of this with bitboards looks like this:

Code: Select all

        public static ulong GetBackwardWhitePawns(BoardState board)
        {
            ulong blackPawns = board.Black & board.Pawns;
            ulong whitePawns = board.White & board.Pawns;
            ulong blackAttacks = LeftDown(blackPawns) | RightDown(blackPawns);
            ulong whiteAttacks = LeftUp(whitePawns) | RightUp(whitePawns);
            //white pawns behind all friendly adjacent pawns whose up-square is attacked by black pawns are backward
            return Down(Up(whitePawns) & blackAttacks & ~FillUp(whiteAttacks));
        }

[fen]rn2k2r/1pp1nb2/3p1pqp/2b1p1p1/p1P1P1P1/P1NP2Q1/1P3PP1/R1B1KBNR w KQkq - 0 1[/fen]

Code: Select all

BackwardPawns:
- - - - - - - -
- - - - - - - -
- - - - - B - B
- - - - - - - -
- - - - - - - -
- - - W - - - -
- W - - - - - -
- - - - - - - -

This is what the example where I found this position also identifies as backward pawns. I tested a few more positions from the internet and they all seemed correctly identified so hopefully my implementation is correct. I assume for 'backwardness' it doesn't matter if the target square is actually empty or not?

Then I changed my pawn structure code to give a -5 cp malus to all backward pawns. After retuning the PSQTs this gave better MSE than -4 or -6 and so I stuck with it. I made a build and ran a selftest overnight:

Code: Select all

Score of Leorik 2.2.2 vs Leorik 2.2: 1689 - 1598 - 2416  [0.508] 5703
...      Leorik 2.2.2 playing White: 945 - 654 - 1253  [0.551] 2852
...      Leorik 2.2.2 playing Black: 744 - 944 - 1163  [0.465] 2851
...      White vs Black: 1889 - 1398 - 2416  [0.543] 5703
Elo difference: 5.5 +/- 6.8, LOS: 94.4 %, DrawRatio: 42.4 %

I used the GMIS tool I received from Günther to make sure that the thus extended pawn structure eval isn't significantly slower:

Code: Select all

 nr                           player    games  pts%      sumTime aveDep    aveTime
  1                     Leorik 2.2.2     5713  50.8  00:09:49:23  11.71  00:00:399
  2                       Leorik 2.2     5713  49.2  00:09:44:55  11.71  00:00:396

I usually don't get super excited about 5 Elo gains. Every change to the evaluation followed by re-tune could give +5 Elo in self-play just because you're lucky or something. So for now I pushed the changes into a branch instead of right into the master.

Devlog of Leorik

Re: New Version 2.2

Re: New Version 2.2

Re: Devlog of Leorik

Re: New Version 2.2

Re: Devlog of Leorik

Re: Devlog of Leorik

Re: Devlog of Leorik

Re: Devlog of Leorik

Re: Devlog of Leorik

Re: Devlog of Leorik