Progress on Blunder

Discussion of anything and everything relating to chess playing software and machines.

Moderator: Ras

Wolfgang
Posts: 989
Joined: Sat May 13, 2006 1:08 am

Re: Progress on Blunder

Post by Wolfgang »

https://cegt.forumieren.com/t1695-new-e ... t-two#3346

I'll add ~500-600 games for a more exact rating.

Pleas keep in mind, that CEGT ratings are about 150 points lower than those of CCRL. ;)
Best
Wolfgang
CEGT-Team
www.cegt.net
www.cegt.forumieren.com
User avatar
algerbrex
Posts: 608
Joined: Sun May 30, 2021 5:03 am
Location: United States
Full name: Christian Dean

Re: Progress on Blunder

Post by algerbrex »

Wolfgang wrote: Sat Jul 16, 2022 3:24 pm https://cegt.forumieren.com/t1695-new-e ... t-two#3346

I'll add ~500-600 games for a more exact rating.

Pleas keep in mind, that CEGT ratings are about 150 points lower than those of CCRL. ;)
Thanks, I appreciate the testing!
Guenther
Posts: 4718
Joined: Wed Oct 01, 2008 6:33 am
Location: Regensburg, Germany
Full name: Guenther Simon

Re: Blunder vs. Leorik anomalities

Post by Guenther »

This is a continuation of a discussion about test games vs. Leorik with 'somehow suspicious results',
put away from Leoriks original topic for not spamming it with other issues.
forum3/viewtopic.php?f=7&t=79049&sid=fb ... 10#p930530

I downloaded the game file and had a first quick look, searching for irregular depths with assumed too high times.


First I noticed, when analysing with my slow 8.45 (V1) build right for the test (not exactly the same as yours from the pgn, but it only has
that cosmetic nodes count change added) that it would never use more than 0.0x - 0.1x seconds for depth 8.
Still I found a lot of depth 8 moves with higher tc on your surely faster hardware with a faster compilation.
I guess they all came from Blunder versions - not thoroughly checked - and I looked at a few so far which might appear to be game changers.

To my surprise despite playing from the pgn from the beginning while analysing I couldn't even get the 22...h6 move here in any iteration
and 8/0.52s is also totally out of place speedwise.
Regrettably I have no tool/script handy which adds conditions to a simple reg search for "/8 0.5*" or "/0.4*", so I don't know exactly how often this happened w/o changing the game result.

[pgn][Event "?"]
[Site "?"]
[Date "2022.07.19"]
[Round "105"]
[White "Leorik 2.2"]
[Black "Blunder 8.4.5"]
[Result "1-0"]
[ECO "A43"]
[GameDuration "00:01:14"]
[GameEndTime "2022-07-19T01:54:18.885 Central Daylight Time"]
[GameStartTime "2022-07-19T01:53:04.527 Central Daylight Time"]
[Opening "Old Benoni defense"]
[PlyCount "139"]
[TimeControl "5+0.5"]

1. d4 {book} c5 {book}
2. dxc5 {book} e6 {book}
3. Nc3 {+0.27/12 0.35} Nf6 {book}
4. e4 {-0.06/12 0.55} Nc6 {-0.09/11 0.59}
5. Bd3 {-0.01/12 0.42} Bxc5 {+0.08/12 0.44}
6. Nf3 {+0.10/13 2.2} O-O {-0.01/12 0.44}
7. O-O {+0.17/12 0.58} Bb6 {-0.07/10 0.47}
8. e5 {+0.48/12 0.42} Ne8 {-0.33/10 0.48}
9. Bf4 {+0.73/11 0.75} f6 {-0.37/10 0.50}
10. Bg3 {+0.65/11 0.47} fxe5 {-0.11/11 0.55}
11. Nxe5 {+0.64/11 0.33} Nxe5 {-0.19/11 0.55}
12. Bxe5 {+0.72/13 0.53} Qg5 {-0.21/12 0.58}
13. Qe2 {+0.48/12 0.38} d6 {-0.21/11 0.65}
14. Bg3 {+0.68/11 0.57} a6 {-0.24/11 0.66}
15. Rad1 {+0.78/11 0.43} d5 {-0.20/10 0.72}
16. Na4 {+0.64/12 0.37} Ba7 {-0.29/13 0.74}
17. c4 {+0.30/12 0.58} Bd7 {-0.22/12 0.73}
18. Nc3 {+0.45/12 0.52} d4 {-0.33/11 0.69}
19. Ne4 {+0.74/11 0.47} Qe7 {-0.39/12 0.69}
20. Rfe1 {+0.79/11 0.66} Rc8 {-0.28/10 0.66}
21. Be5 {+1.09/11 0.35} Rd8 {-0.83/10 0.64}
22. Qg4 {+1.09/11 0.56} h6 {-0.30/8 0.52}
23. c5 {+1.40/11 0.59}*[/pgn]

Here e.g. I never get 12. Qe2 with 7.60 while analysing again from the beginning. Also depth 8 in 0.41s seems much too slow.
(surrounded by some depth 9 moves)

[pgn][Event "?"]
[Site "?"]
[Date "2022.07.19"]
[Round "114"]
[White "Blunder 7.6.0"]
[Black "Leorik 2.2"]
[Result "0-1"]
[ECO "A00"]
[GameDuration "00:01:19"]
[GameEndTime "2022-07-19T02:02:51.662 Central Daylight Time"]
[GameStartTime "2022-07-19T02:01:32.454 Central Daylight Time"]
[Opening "Benko's Opening"]
[PlyCount "158"]
[TimeControl "5+0.5"]

1. g3 {book} Nf6 {book}
2. Bg2 {book} e5 {book}
3. c4 {book} Bc5 {book}
4. Nf3 {0.00/11 0.67} Nc6 {+0.08/12 0.33}
5. O-O {-0.01/12 0.39} d6 {book}
6. e3 {book} Bg4 {+0.13/13 0.57}
7. d3 {+0.09/11 0.52} Qd7 {+0.19/12 0.35}
8. Nc3 {+0.12/10 0.52} O-O {+0.08/12 0.41}
9. a3 {+0.16/9 0.39} a5 {+0.10/12 1.1}
10. Bd2 {+0.16/10 0.40} Ne7 {0.00/12 1.1}
11. Rc1 {+0.24/9 0.40} h6 {+0.07/10 0.36}
12. Qe2 {+0.18/8 0.41} a4 {+0.26/11 0.48}
13. Qd1 {+0.02/9 0.40} Rfe8 {+0.22/10 0.48}
14. e4 {0.00/9 0.40} Ng6 {+0.22/11 0.46}[/pgn]

'/8 0.^([4-6]^)' (regex syntax for old ultraedit finds 449 entries and I believe all are for Blunder versions
even found some for depth 7 (and I am quite sure all are for Blunder versions again)

Code: Select all

Suche nach '/7 0.^([4-6]^)' in 'C:\Downloads\leorik-gauntlet.pgn' :
C:\Downloads\leorik-gauntlet.pgn/2460: 12. Ng5 {+0.41/7 0.56} Bd7 {0.00/13 0.59}
C:\Downloads\leorik-gauntlet.pgn/6379: 23. Rxe8+ {+0.91/12 0.45} Rxe8 {-0.93/7 0.61}

C:\Downloads\leorik-gauntlet.pgn/24090: 17. Rfe1 {+0.15/7 0.42} Ne5 {-0.06/12 2.0}

C:\Downloads\leorik-gauntlet.pgn/53963: 13. Kb1 {+0.53/11 0.47} g5 {0.00/7 0.41}
C:\Downloads\leorik-gauntlet.pgn/57291: 21. f4 {+0.42/10 0.41} g4 {-0.41/7 0.65}

C:\Downloads\leorik-gauntlet.pgn/61269: 17. Rb2 {+0.26/7 0.41} e6 {+0.12/11 0.99}
C:\Downloads\leorik-gauntlet.pgn/67370: 24. Qf3 {+1.77/12 1.8} Ra6 {-1.35/7 0.53}

C:\Downloads\leorik-gauntlet.pgn/68757: 20. h3 {+0.69/10 0.89} Qe3+ {-0.39/7 0.61}

C:\Downloads\leorik-gauntlet.pgn/147712: 28. Bc3 {+0.01/10 0.58} Bg7 {+0.25/7 0.43}

C:\Downloads\leorik-gauntlet.pgn/164989: 33. bxa3 {-1.09/7 0.44} Rab8 {+5.34/13 0.33}

C:\Downloads\leorik-gauntlet.pgn/171282: 29. Ng5 {+0.85/11 1.0} hxg4 {+0.43/7 0.53}
C:\Downloads\leorik-gauntlet.pgn/191054: 13. Rf3 {+0.57/12 1.8} Bg7 {-0.45/7 0.41}
C:\Downloads\leorik-gauntlet.pgn/198166: 53. Rgxg7+ {0.00/15 0.55} Nxg7 {+0.17/7 0.54}

Fundstellen: '/7 0.^([4-6]^)' 36 Einträge. (removed 23 high score entries) => 13 left
I quickly checked one of them:
It seems the depth 7 move was not decisive but a later depth 8 move again seemed to spoil the game.

Code: Select all

C:\Downloads\leorik-gauntlet.pgn/53963: 13. Kb1 {+0.53/11 0.47} g5 {0.00/7 0.41}
[pgn]
[Event "?"]
[Site "?"]
[Date "2022.07.19"]
[Round "97"]
[White "Leorik 2.2"]
[Black "Blunder 7.6.0"]
[Result "1-0"]
[ECO "A04"]
[GameDuration "00:00:46"]
[GameEndTime "2022-07-19T01:44:49.284 Central Daylight Time"]
[GameStartTime "2022-07-19T01:44:02.636 Central Daylight Time"]
[Opening "Reti Opening"]
[PlyCount "93"]
[TimeControl "5+0.5"]

1. Nf3 {book} g6 {book}
2. c4 {book} Bg7 {book}
3. Nc3 {book} e5 {book}
4. d4 {book} exd4 {-0.27/12 0.40}
5. Bg5 {+0.39/11 0.35} f6 {-0.22/12 0.39}
6. Qxd4 {+0.48/13 0.40} Nc6 {-0.45/12 0.40}
7. Qe3+ {+0.25/14 0.51} Nge7 {-0.30/12 0.40}
8. Bh6 {+0.34/14 0.49} O-O {-0.36/13 0.41}
9. Bxg7 {+0.36/13 0.36} Kxg7 {-0.44/13 0.41}
10. O-O-O {+0.45/13 0.56} d6 {-0.44/11 0.40}
11. c5 {+0.46/12 0.65} Ne5 {-0.18/10 0.41}
12. Qf4 {+0.41/11 0.54} Be6 {-0.33/9 0.54}
13. Kb1 {+0.53/11 0.47} g5 {0.00/7 0.41}
14. Qb4 {+0.60/11 0.38} N7c6 {+0.29/9 0.54}
15. Qxb7 {+0.32/13 0.99} Rb8 {+0.19/12 0.53}
16. Qa6 {+0.40/13 0.51} Bc8 {+0.26/11 0.41}
17. Qa4 {+0.26/13 0.45} Rb4 {+0.56/12 0.42}
18. Qc2 {0.00/13 0.45} Nc4 {+0.44/12 0.42}
19. Ka1 {+0.12/12 1.0} Rxb2 {+0.28/12 0.54}
20. Qc1 {-0.08/12 3.1} g4 {+0.68/9 0.70}
21. Nd4 {+0.34/11 0.32} N6a5 {-0.22/10 0.80}
22. c6 {+0.74/10 0.33} Re8 {-0.28/8 0.75}
23. Nd5 {+1.42/11 0.41}*[/pgn]

And here an example of a missevaluation (depth 8 played no role here as the game was already won for Leorik),
which I found interesting. Blunder 8.0.0 was just unlucky because at depth 11 it would have avoided 24. Qxa7??
This is just added for completeness, I think the real issue might be surprisingly low depths despite too high times used.
(concurrency? heat?)

[pgn][Event "?"]
[Site "?"]
[Date "2022.07.19"]
[Round "106"]
[White "Blunder 8.0.0"]
[Black "Leorik 2.2"]
[Result "0-1"]
[ECO "A10"]
[GameDuration "00:00:31"]
[GameEndTime "2022-07-19T01:53:56.755 Central Daylight Time"]
[GameStartTime "2022-07-19T01:53:25.240 Central Daylight Time"]
[Opening "English Opening"]
[PlyCount "66"]
[TimeControl "5+0.5"]

1. c4 {book} g6 {book}
2. g3 {book} Bg7 {book}
3. Bg2 {book} e5 {book}
4. Nc3 {book} Ne7 {book}
5. Nf3 {book} Nbc6 {-0.18/13 0.49}
6. O-O {+0.21/11 0.44} O-O {-0.18/13 0.67}
7. e3 {+0.39/10 0.45} d5 {-0.01/13 0.51}
8. cxd5 {+0.05/13 0.48} Nxd5 {-0.06/13 0.35}
9. Nxd5 {+0.07/12 0.48} Qxd5 {+0.07/13 0.37}
10. d4 {+0.26/12 0.52} e4 {+0.01/12 0.49}
11. Ne1 {+0.18/12 0.53} Rd8 {+0.21/12 0.99}
12. Nd3 {-0.05/9 0.73} Nxd4 {+0.85/13 0.45}
13. exd4 {-0.79/14 0.76} Qxd4 {+0.77/13 0.53}
14. Bg5 {-0.59/14 0.62} exd3 {+0.66/12 0.38}
15. Bxd8 {-0.93/15 0.67} Bg4 {+0.95/13 0.34}
16. Bf3 {-0.54/16 0.74} Bxf3 {+0.73/14 0.38}
17. Qxf3 {-0.70/16 0.73} Rxd8 {+0.66/14 0.47}
18. Qxb7 {-0.81/15 0.68} Be5 {+0.63/13 0.48}
19. Rab1 {-0.75/14 0.68} d2 {+0.63/12 0.46}
20. b3 {-0.75/12 0.65} h5 {+0.59/12 0.43}
21. b4 {-0.80/12 0.63} h4 {+0.75/12 0.62}
22. Qf3 {-0.91/13 0.63} Qc4 {+1.01/11 0.38}
23. Qe3 {-0.92/13 0.61} Bf6 {+0.82/12 0.95}
24. Qxa7 {-0.95/10 0.60} h3 {+7.11/13 0.47}
25. Qb7 {-1.53/12 0.50} c6 {+11.23/15 0.55}
26. b5 {-0.67/8 0.44} Qe4 {+16.09/15 0.43}
27. Qxc6 {-12.91/14 0.43} Qxb1 {+24.31/16 1.0}
28. Qg2 {-17.08/15 0.47} Qf5 {+26.37/16 0.36}
29. a3 {-17.68/14 0.44} d1=Q {+M9/14 0.33}
30. b6 {-M8/11 0.45} Qdf3 {+M7/13 0.38}
31. Qh1 {-M6/12 0.61} Qxh1+ {+M5/14 0.41}
32. Kxh1 {-M4/12 0.60} Qf3+ {+M3/15 0.93}
33. Kg1 {-M2/13 0.60} Qg2# {+M1/15 0.62}
{Black mates}
0-1[/pgn]
https://rwbc-chess.de

[Trolls n'existent pas...]
User avatar
algerbrex
Posts: 608
Joined: Sun May 30, 2021 5:03 am
Location: United States
Full name: Christian Dean

Re: Blunder vs. Leorik anomalities

Post by algerbrex »

Guenther wrote: Tue Jul 19, 2022 9:40 pm ...
Thanks for the investigative work. This is all helpful. It's definitely odd. Blunder should easily be getting deeper than depth 7-8 if it searched that position for 400-500ms. And Leorik is already a good bit faster than Blunder, so I would agree that all those 7-8/0.4-0.5 searches are likely all versions of Blunder.

My environment very well could be an issue as you suggested, but your findings also raise suspicion about my time allocation code and transposition table usage. I'm going to double-check my code there to see if I find anything odd, or anything that might be causing Blunder to get such low depths with such high search times.
User avatar
algerbrex
Posts: 608
Joined: Sun May 30, 2021 5:03 am
Location: United States
Full name: Christian Dean

Re: Progress on Blunder

Post by algerbrex »

Since I'll be going out of town in a bit, and starting back getting ready for college, I'm not sure how much time I'll have to work on Blunder in the coming weeks, so I've went ahead and released Blunder 8.5.5: https://github.com/algerbrex/blunder/re ... tag/v8.5.5

Should be ~40-50 Elo stronger than 8.0.0 in self-play.
Guenther
Posts: 4718
Joined: Wed Oct 01, 2008 6:33 am
Location: Regensburg, Germany
Full name: Guenther Simon

Re: Blunder vs. Leorik anomalities

Post by Guenther »

algerbrex wrote: Tue Jul 19, 2022 10:19 pm
Guenther wrote: Tue Jul 19, 2022 9:40 pm ...
Thanks for the investigative work. This is all helpful. It's definitely odd. Blunder should easily be getting deeper than depth 7-8 if it searched that position for 400-500ms. And Leorik is already a good bit faster than Blunder, so I would agree that all those 7-8/0.4-0.5 searches are likely all versions of Blunder.

My environment very well could be an issue as you suggested, but your findings also raise suspicion about my time allocation code and transposition table usage. I'm going to double-check my code there to see if I find anything odd, or anything that might be causing Blunder to get such low depths with such high search times.
With the help of another tool (GMIS by Ferdinand) I can give now precise numbers even with an eval conditional per player
for this regex

Code: Select all

^([0-1]^).??^/^([7-8]^) 0.^([4-8]^)
Literally this means all depth 7-8 moves at 0.4x - 0.8x seconds, while still being inside -1.99/+1.99 evals.
(@higher evals the influence of such events on the final results would be neglectable)

Code: Select all

Leorik  2.2    2
Blunder 7.60 241
Blunder 8.00  33
Blunder 8.45  98
And here the average depths and used time for all (inside -10/+10 evals - which avoids extremely high depths and verly low times
when nearing mates).

Code: Select all

A. Score range with scoring Probability:

file         : leorik-gauntlet.pgn
move window  : [1, 1000]
score window : [-10.0, +10.0]
notes        : 1. games and pts% are not affected by move and score windows.
               2. Table is sorted by scoring probability in descending order

 nr                           player    games pts(%) scoProb
  1                       Leorik 2.2     2400   61.5    0.57
  2                    Blunder 8.4.5      800   41.6    0.45
  3                    Blunder 8.0.0      800   38.9    0.42
  4                    Blunder 7.6.0      800   35.1    0.40


B. Time and Depth average:

file         : leorik-gauntlet.pgn
move window  : [1, 1000]
score window : [-10.0, +10.0]
notes        : 1. games and pts% are not affected by move and score windows.
               2. Table is sorted by aveDep in descending order.
               3. aveTime is the average time/move in m:s:ms
               4. sumTime is in d:h:m:s

 nr                           player    games  pts%      sumTime aveDep    aveTime
  1                       Leorik 2.2     2400  61.5  00:22:04:48  15.95  00:00:533
  2                    Blunder 8.0.0      800  38.9  00:07:40:51  15.75  00:00:539
  3                    Blunder 7.6.0      800  35.1  00:06:41:34  14.68  00:00:488
  4                    Blunder 8.4.5      800  41.6  00:07:35:48  14.49  00:00:541
One can see that 8.00 and 8.45 already use time much better in this tc.
I think though Leorik is more flexible in its time management and sometimes uses up to 3 time the inc here
and generally seems to play a tad slower for the first 30-40 moves. The average depth doesn't show this though
in the stats below, but I could create it also for different move windows.
Surprisingly though the average depth of 8.45 is below 7.60 and even quite a lot to 8.00.

Still I believe the main problem was the low depths at too high times which practically never affected Leorik.
I would be curious to see this repeated at concurrency 6 or 7.

Last not least good luck for the coming college season :)
https://rwbc-chess.de

[Trolls n'existent pas...]
User avatar
algerbrex
Posts: 608
Joined: Sun May 30, 2021 5:03 am
Location: United States
Full name: Christian Dean

Re: Blunder vs. Leorik anomalities

Post by algerbrex »

Guenther wrote: Wed Jul 20, 2022 4:20 pm And here the average depths and used time for all (inside -10/+10 evals - which avoids extremely high depths and verly low times
when nearing mates).

Code: Select all

A. Score range with scoring Probability:

file         : leorik-gauntlet.pgn
move window  : [1, 1000]
score window : [-10.0, +10.0]
notes        : 1. games and pts% are not affected by move and score windows.
               2. Table is sorted by scoring probability in descending order

 nr                           player    games pts(%) scoProb
  1                       Leorik 2.2     2400   61.5    0.57
  2                    Blunder 8.4.5      800   41.6    0.45
  3                    Blunder 8.0.0      800   38.9    0.42
  4                    Blunder 7.6.0      800   35.1    0.40


B. Time and Depth average:

file         : leorik-gauntlet.pgn
move window  : [1, 1000]
score window : [-10.0, +10.0]
notes        : 1. games and pts% are not affected by move and score windows.
               2. Table is sorted by aveDep in descending order.
               3. aveTime is the average time/move in m:s:ms
               4. sumTime is in d:h:m:s

 nr                           player    games  pts%      sumTime aveDep    aveTime
  1                       Leorik 2.2     2400  61.5  00:22:04:48  15.95  00:00:533
  2                    Blunder 8.0.0      800  38.9  00:07:40:51  15.75  00:00:539
  3                    Blunder 7.6.0      800  35.1  00:06:41:34  14.68  00:00:488
  4                    Blunder 8.4.5      800  41.6  00:07:35:48  14.49  00:00:541
One can see that 8.00 and 8.45 already use time much better in this tc.
I think though Leorik is more flexible in its time management and sometimes uses up to 3 time the inc here
and generally seems to play a tad slower for the first 30-40 moves. The average depth doesn't show this though
in the stats below, but I could create it also for different move windows.
Surprisingly though the average depth of 8.45 is below 7.60 and even quite a lot to 8.00.

Still I believe the main problem was the low depths at too high times which practically never affected Leorik.
I would be curious to see this repeated at concurrency 6 or 7.

Last not least good luck for the coming college season :)

Interesting, not what I expected. Although I could imagine 8.45 reaching an average depth that's a bit lower than 7.60 and 8.00, since I added singular extensions, so it's likely searching some branches much deeper than others, which was an overall net positive in testing, but does mean its average depth will go down.

But indeed, for whatever reason, it seems Leorik's time management is allowing it to search deeper and play more accurately, and for whatever reason Blunder is somehow only reach depth 7/8 in almost have a second.

What puzzles me is how whatever is going on is affecting all three versions of Blunder, but not Leorik? The fact that 7.6.0 is also affected seems to rule out this being something recent that I introduced into the code base, unless it's been a persistent bug since 7.6.0. So I'm leaning towards it being an environmental issue, but what issue would only affect Blunder so badly but not Leorik? It could be that Leorik's time management, including other engines like Dumb 1.9, Nebula 2.0, Inanis 1.01, and others just scales much better at these ultra-bullet time controls, but they're equal at longer time controls. But this still wouldn't quite explain why Blunder is reaching such low depths given such a long amount of time to search.

Either way I have some more investigating to do, including playing with the concurrency numbers :)

And thanks! I have a little less than a month before I go back, so I'll try to enjoy the rest of my summer, including working on Blunder a little more :)
User avatar
lithander
Posts: 918
Joined: Sun Dec 27, 2020 2:40 am
Location: Bremen, Germany
Full name: Thomas Jahn

Re: Blunder vs. Leorik anomalities

Post by lithander »

Guenther wrote: Wed Jul 20, 2022 4:20 pm With the help of another tool (GMIS by Ferdinand) I can give now precise numbers even with an eval conditional per player
for this regex

Code: Select all

^([0-1]^).??^/^([7-8]^) 0.^([4-8]^)
Literally this means all depth 7-8 moves at 0.4x - 0.8x seconds, while still being inside -1.99/+1.99 evals.
(@higher evals the influence of such events on the final results would be neglectable)

Code: Select all

Leorik  2.2    2
Blunder 7.60 241
Blunder 8.00  33
Blunder 8.45  98
I never thought about doing more with the PGNs than checking for time forfeits and using them to calculate Elo. Thanks for the inspiration, Günther. I'll have to give GMIS a try! (even though I hate regular expressions)

And what you found really sounds like a serious problem that could explain why Blunder is underperforming. If the average depth is something around 14 than "getting stuck" at depth 7 or 8 will cause a weak move that could decide the entire game.

But I'm not sure if the average depth is really meaningful if you compare different engines or even different versions of the same engine. You first need to come up with hypothesis based on the implementation details.
When I think I have improved the speed (nps) of my engine but everything else should stay the same, then the average depth should go up slightly. But when I improve the eval by adding more terms the engine takes more time to compute the evaluation and a small reduction in depth would not indicate a regression in strength. And when you change the parameters that control the search (how you extend/reduce/prune) a change in depth would be expected but it could mean anything.
Minimal Chess (simple, open source, C#) - Youtube & Github
Leorik (competitive, in active development, C#) - Github & Lichess
User avatar
algerbrex
Posts: 608
Joined: Sun May 30, 2021 5:03 am
Location: United States
Full name: Christian Dean

Re: Blunder vs. Leorik anomalities

Post by algerbrex »

Guenther wrote: Wed Jul 20, 2022 4:20 pm Still I believe the main problem was the low depths at too high times which practically never affected Leorik.
I would be curious to see this repeated at concurrency 6 or 7.
Unfortunately for Blunder, even playing with different concurrency levels doesn't seem to help. Even trying a concurrency level of 1 still shows Leorik pulling ahead quite easily in any match, so I'm assuming the issue is with Blunder.

I don't have a good hypothesis on what the issue might be right now, but whatever is going on doesn't affect Blunder as longer time controls like 2'+1'' (or at least the effect is negligible), but does at hyper bullet time controls. And the issue doesn't seem to affect many other engines, even ones that are weaker than Blunder at 2'+1'' time control.

The only explanations that come to mind right now are that Blunder's transposition table has a bug, which is causing it to reach much lower depths than it should on occasion, or some part of the search is causing an explosion that causes Blunder only to reach half the depth is should in certain positions. I suppose the time manager could also be causing an issue, accidentally cutting the search off too early, but I doubt it. And of course, trying to take positions and feed them back into Blunder, where it searched uncharacteristically low depths, doesn't help as it easily reaches depth 10+. I'll try priming the transposition table by going through all the subsequent positions leading up as well, but I'm not sure that'll help either.

I initially thought aspiration windows maybe were the issue, and the researches were causing search instabilities and making Blunder get stuck at low depths. But removing them didn't seem to help, as going back through the PGN file, there were still positions where Leorik was searching depth 12-13 minimum, and Blunder was struggling to reach depth 8-9.
User avatar
algerbrex
Posts: 608
Joined: Sun May 30, 2021 5:03 am
Location: United States
Full name: Christian Dean

Re: Blunder vs. Leorik anomalities

Post by algerbrex »

So, I seem to have made a bit of progress towards solving this issue.

I decided to add a small bit of logging code to Blunder, which will detect a situation where a low depth is reached (<= 8), while I relatively high amount of time was spent searching (>= 400ms). When this occurred I had Blunder log some statistics about the search, such as how much time was spent in the main search function, quiescence search, move sorting function, etc., as well as the number of beta-cutoffs and tt-cutoffs. I then had the engine purposely crash, so I would know what position caused the anomalously low depth search.

After a couple of hundred games I was able to get a crash, and Blunder logged the statics. Nothing looked off in terms of how the time was distributed during the search, but the number of beta-cutoffs and tt-cutoffs was very off. During the anomalous search Blunder reached a depth of 8 and recorded ~5K tt-cutoffs and ~25K beta-cutoffs overall, in ~500ms. Taking the same position and letting Blunder search it for 500ms, after priming the transposition table by having Blunder search positions leading up to the problematic one, Blunder reached a depth 11, 3 ply deeper, and recorded ~15K tt-cutoffs and ~66K beta-cutoffs, much more than what was logged.

So although I haven't pinpointed the issue yet, it seems to be a problem with the transposition table and/or move ordering/scoring, as both seem to be severely underperforming in certain cases. Of course now the million dollar question is why. Time to do some poking around.