Progress on Blunder

emadsen · Post by **emadsen** » Fri Jul 01, 2022 1:39 am

algerbrex wrote: ↑Thu Jun 30, 2022 4:01 am Hi everyone,

I've decided to go ahead and release Blunder 8.0.0, despite some reservations (although, truthfully, I have reservations with every release). The release notes can be seen here: https://github.com/algerbrex/blunder/re ... tag/v8.0.0.

I expect Blunder 8.0.0 to be roughly 50 Elo stronger at bullet time controls.

Nice. Congrats! Blunder 8.0.0 and my beta version of MadChess should be close in strength.

algerbrex · Post by **algerbrex** » Fri Jul 01, 2022 1:57 am

emadsen wrote: ↑Fri Jul 01, 2022 1:39 am Nice. Congrats! Blunder 8.0.0 and my beta version of MadChess should be close in strength.

Thanks Erik! MadChess 3.0 has continued to be a very resilient sparing partner for Blunder. Glad to hear you've started to find more time to begin working on MadChess again, I'd like to see it break 2700.

Do you ever have any plans on adding any sort of neural network into the codebase?

Wolfgang · Post by **Wolfgang** » Fri Jul 01, 2022 3:44 am

CEGT 40/4: https://cegt.forumieren.com/t1695-new-e ... t-two#3346

algerbrex · Post by **algerbrex** » Fri Jul 01, 2022 3:49 am

Wolfgang wrote: ↑Fri Jul 01, 2022 3:44 am CEGT 40/4: https://cegt.forumieren.com/t1695-new-e ... t-two#3346

Thanks Wolfgang for the heads up

lithander · Post by **lithander** » Fri Jul 01, 2022 10:57 am

I ran a quick test (5s + 500ms increment) against Leorik 2.1.9 on two cores during work:

Code: Select all

[tc=5+0.5 Hash=50]
Score of Leorik-2.1.9 vs blunder-8.0.0: 197 - 132 - 160  [0.566] 489
...      Leorik-2.1.9 playing White: 107 - 57 - 81  [0.602] 245
...      Leorik-2.1.9 playing Black: 90 - 75 - 79  [0.531] 244
...      White vs Black: 182 - 147 - 160  [0.536] 489
Elo difference: 46.5 +/- 25.4, LOS: 100.0 %, DrawRatio: 32.7 %

...and a test on longer timecontrols (10s + 2s increment) over night:

Code: Select all

[tc=10+2 Hash=50]
Score of Leorik-2.1.9 vs blunder-8.0.0: 442 - 372 - 544  [0.526] 1358
...      Leorik-2.1.9 playing White: 251 - 167 - 261  [0.562] 679
...      Leorik-2.1.9 playing Black: 191 - 205 - 283  [0.490] 679
...      White vs Black: 456 - 358 - 544  [0.536] 1358
Elo difference: 17.9 +/- 14.3, LOS: 99.3 %, DrawRatio: 40.1 %

No crashes or time-forfeits!

algerbrex · Post by **algerbrex** » Fri Jul 01, 2022 11:02 am

lithander wrote: ↑Fri Jul 01, 2022 10:57 am I ran a quick test (5s + 500ms increment) against Leorik 2.1.9 on two cores during work:

Code: Select all

[tc=5+0.5 Hash=50]
Score of Leorik-2.1.9 vs blunder-8.0.0: 197 - 132 - 160  [0.566] 489
...      Leorik-2.1.9 playing White: 107 - 57 - 81  [0.602] 245
...      Leorik-2.1.9 playing Black: 90 - 75 - 79  [0.531] 244
...      White vs Black: 182 - 147 - 160  [0.536] 489
Elo difference: 46.5 +/- 25.4, LOS: 100.0 %, DrawRatio: 32.7 %

...and a test on longer timecontrols (10s + 2s increment) over night:

Code: Select all

[tc=10+2 Hash=50]
Score of Leorik-2.1.9 vs blunder-8.0.0: 442 - 372 - 544  [0.526] 1358
...      Leorik-2.1.9 playing White: 251 - 167 - 261  [0.562] 679
...      Leorik-2.1.9 playing Black: 191 - 205 - 283  [0.490] 679
...      White vs Black: 456 - 358 - 544  [0.536] 1358
Elo difference: 17.9 +/- 14.3, LOS: 99.3 %, DrawRatio: 40.1 %

No crashes or time-forfeits!

Nice! Thanks for testing! Glad to see there were no crashes.

And glad to see the strength difference isn't as exaggerated at longer time controls, as I believe Blunder's strength is shifting more towards longer time controls, despite its roots of being stronger at bullet.

Regardless, Leorik seems like quite a formidable bullet opponent!

lithander · Post by **lithander** » Fri Jul 01, 2022 11:24 am

I was surprised by the rather large difference between the two time controls as well. But it's something I experience with Leorik against a lot of engines (e.g. MadChess especially). Leorik can beat them on fast time controls consistently but in the CCRL they are still ahead.

And I realize now that the data I quoted doesn't allow you to draw any meaningful conclusions about how much Blunder 8.0 got stronger than it's predecessor because Leorik 2.1.9 is of undefined strength. So let me include this old result, too:

Code: Select all

[tc=5+0.5 book=varied.bin]
Score of Leorik-2.1.8 vs blunder-7.6.0: 1407 - 707 - 1050  [0.611] 3164
...      Leorik-2.1.8 playing White: 780 - 300 - 503  [0.652] 1583
...      Leorik-2.1.8 playing Black: 627 - 407 - 547  [0.570] 1581
...      White vs Black: 1187 - 927 - 1050  [0.541] 3164
Elo difference: 78.2 +/- 10.0, LOS: 100.0 %, DrawRatio: 33.2 %

The difference between 2.1.9 and 2.1.8 is small (~10 Elo?) but would go on top.

algerbrex · Post by **algerbrex** » Fri Jul 01, 2022 12:21 pm

lithander wrote: ↑Fri Jul 01, 2022 11:24 am I was surprised by the rather large difference between the two time controls as well. But it's something I experience with Leorik against a lot of engines (e.g. MadChess especially). Leorik can beat them on fast time controls consistently but in the CCRL they are still ahead.

And I realize now that the data I quoted doesn't allow you to draw any meaningful conclusions about how much Blunder 8.0 got stronger than it's predecessor because Leorik 2.1.9 is of undefined strength. So let me include this old result, too:
Code: Select all
[tc=5+0.5 book=varied.bin]
Score of Leorik-2.1.8 vs blunder-7.6.0: 1407 - 707 - 1050  [0.611] 3164
...      Leorik-2.1.8 playing White: 780 - 300 - 503  [0.652] 1583
...      Leorik-2.1.8 playing Black: 627 - 407 - 547  [0.570] 1581
...      White vs Black: 1187 - 927 - 1050  [0.541] 3164
Elo difference: 78.2 +/- 10.0, LOS: 100.0 %, DrawRatio: 33.2 %
The difference between 2.1.9 and 2.1.8 is small (~10 Elo?) but would go on top.

I noticed the same in the early development days of Blunder, with MadChess myself as well. My theory is often times evaluation and/or search parameters are tuned particularly to work best at certain time controls. For you and me, we utilize hyper-bullet time control testing, which means over time the small tweaks and changes we make may exhibit strong gains at short time controls, but are not as effective at shorter ones. Conversely, as far as I'm aware of, Erik has used blitz (40/2 min) time control testing for MadChess. Of course there are counter examples to this, as I said, just a toy theory I have

But from this other test you gave, it looks like Blunder may have gained more realistically, around 30-40 Elo at very short time controls, but hopefully more at longer ones.

I suppose I'll have to wait until CCRL testing for Blunder begins, and the CEGT testing is finished.

Wolfgang · Post by **Wolfgang** » Fri Jul 01, 2022 12:59 pm

lithander wrote: ↑Fri Jul 01, 2022 11:24 am I was surprised by the rather large difference between the two time controls as well. But it's something I experience with Leorik against a lot of engines (e.g. MadChess especially). Leorik can beat them on fast time controls consistently but in the CCRL they are still ahead.

And I realize now that the data I quoted doesn't allow you to draw any meaningful conclusions about how much Blunder 8.0 got stronger than it's predecessor because Leorik 2.1.9 is of undefined strength. So let me include this old result, too:
Code: Select all
[tc=5+0.5 book=varied.bin]
Score of Leorik-2.1.8 vs blunder-7.6.0: 1407 - 707 - 1050  [0.611] 3164
...      Leorik-2.1.8 playing White: 780 - 300 - 503  [0.652] 1583
...      Leorik-2.1.8 playing Black: 627 - 407 - 547  [0.570] 1581
...      White vs Black: 1187 - 927 - 1050  [0.541] 3164
Elo difference: 78.2 +/- 10.0, LOS: 100.0 %, DrawRatio: 33.2 %
The difference between 2.1.9 and 2.1.8 is small (~10 Elo?) but would go on top.

I'll start a test with Leorik 2.1 soon.
https://cegt.forumieren.com/t1695-new-e ... t-two#3347

If you want a newer version to be tested, feel free to send it by mail to me. I'll send you a PM

emadsen · Post by **emadsen** » Fri Jul 01, 2022 7:01 pm

algerbrex wrote: ↑Fri Jul 01, 2022 1:57 am Do you ever have any plans on adding any sort of neural network into the codebase?

At the moment, no. The simplicity of a hand-crafted evaluation appeals to me. MadChess supports a "staticscore" UCI command (an extension, not an official UCI command) that breaks down the evaluation by parameter (material, piece location, mobility, king safety, etc), white, black, MG, EG, and EG scaling. That appeals to me personally, as a chess patzer, more than the black-box magic of NNUE.

But who knows, I could change my mind later.

Progress on Blunder

Re: Blunder 8.0.0 released

Re: Blunder 8.0.0 released

Re: Progress on Blunder

Re: Progress on Blunder

Re: Progress on Blunder

Re: Progress on Blunder

Re: Progress on Blunder

Re: Progress on Blunder

Re: Progress on Blunder

Re: Blunder 8.0.0 released