Progress on Blunder

Guenther · Post by **Guenther** » Sun Jul 24, 2022 7:34 pm

algerbrex wrote: ↑Sun Jul 24, 2022 6:11 pm
Guenther wrote: ↑Sun Jul 24, 2022 1:54 pm Very early it settles on 7. Nc3 and the expected answer 7...Be7 neglecting the natural 7...d4
After making the move 7. Nc3 on the board it gets incredible stuck in analysis with 7...Be7 and after 50 ms
there is zero output anymore for 260 seconds! I first thought it crashed.
WB was begging for output all along (sending '.') and finally after over 61M nodes it put another PV
with 7...Bf5, but d4 (Leoriks move) was never considered.
Next two plies similar things happened (not that extreme though, but again with moves from hash
and not considered replies and the game very early was lost, even when Leorik could have played
d3! one move earlier.
I am not completely sure when the tactical horizon with inbetween possible Nc7+ was too high
and what other programs of similar strength think here, but it seems even that a few plies later
it should have realized, but I don't know why it didn't.
Thanks. That's definitely an issue, as d4 should be well within Blunder's horizon. I imagine what's happening is that for whatever reason d4 is getting pruned or reduced away, and I'll need to investigate why.

Probably the most glaring weakness Blunder still has is the poor tuning of its sub-optimal margins and reductions. The majority of values I have in the engine are either from gut feeling or very quick testing to get a rough value. So I need to spend some time anyway working on tuning those better.

...

Just a correction of a typo:
It wasn't 260s but 160s when search seemed stuck.

algerbrex · Post by **algerbrex** » Sun Jul 24, 2022 11:44 pm

While I keep brainstorming some solutions to Blunder's transposition table problems, I've decided to go ahead and start working on a new chess engine, something I've given some thought too for a while.

I'm not abandoning Blunder. As a matter of fact, I'm actually going to be doing quite the opposite, and it's one of the reasons I decided to start working on the new engine now. I enjoy chess programming, but it can become draining to work on a stubborn bug for many hours at a time, so it'd be nice to have another engine to take a break sigh and work on whenever I encounter difficulties with one and visa versa.

But I've also thought about making a new engine for a while.

When I first got into chess programming, the first version of Blunder I ever wrote, version 0.0.1 I suppose you could call it, was written in Python. I thoroughly enjoy programming and using Python, but I quickly learned it just wasn't up to the task speedwise. So I eventually settled on Go, since it had been on my radar before and the simple, straightforward syntax and garbage collection were appealing, especially with getting something up and running quickly. And Go has worked very well speedwise.

But I never gave much serious deliberation of what language I wanted to use, and with this new engine, I'd like to do that. I'd like to use a language that's relatively unique with regard to chess programming, so I'm not writing yet another C/C++ chess engine. And I'd also like to use one which is a bit more low-level, and doesn't have automatic memory management. I'm a little curious to see if I how much, if at all, I can improve on the move genetors I've written speedwise (not that you can't write a very fast move generator in a garbage collected language. This really just says more about me and the quality of my programming

), but mostly this is teach me how to better use a manual memory management language, since I think it's smart to get some experience with that now, while I'm still in college.

So far languages I'm considering are ones like Zig, Odin, Nim, and other languages in that niche of being "modern" C-like languages. I find them interesting and think it would be fun to attempt to create a mature piece of software in them.

And maybe Rust, although I doubt I'll end up going with it right now - besides it seems its no longer that much of a unique language when it comes to chess programming

Most of the newer chess engines I encounter seem to be written in Rust.

I'll start slow with trying to write a move generator from scratch and go from there for now, so no clue when the first version will be ready. Could be a couple of weeks, or a month or two.

j.t. · Post by **j.t.** » Mon Jul 25, 2022 9:03 pm

algerbrex wrote: ↑Sun Jul 24, 2022 11:44 pm So far languages I'm considering are ones like Zig, Odin, Nim, and other languages in that niche of being "modern" C-like languages. I find them interesting and think it would be fun to attempt to create a mature piece of software in them.

I can say that programming in Nim feels really nice. But it is not really a C-like language, more like a mix between C++, Python and Pascal. But it can be quite performant for sure.

algerbrex · Post by **algerbrex** » Mon Jul 25, 2022 9:34 pm

j.t. wrote: ↑Mon Jul 25, 2022 9:03 pm
algerbrex wrote: ↑Sun Jul 24, 2022 11:44 pm So far languages I'm considering are ones like Zig, Odin, Nim, and other languages in that niche of being "modern" C-like languages. I find them interesting and think it would be fun to attempt to create a mature piece of software in them.
I can say that programming in Nim feels really nice. But it is not really a C-like language, more like a mix between C++, Python and Pascal. But it can be quite performant for sure.

Ah ok, I see. I haven't done much research either way yet, although I noticed Nim was always thrown around with the other languages I mentioned.

It being Python-esque is attractive to me, as I have the most experience programming Python, so it seems quite natural, along with it being able to be competitive performance-wise, since (afaik?) it transpiles to C or C++. I suppose I'll invest a little time in each one and see which I like the best.

Right now, I've started using Zig, and while I find the syntax a little clunky, the usage of allocators to manage memory seems fairly straightforward, and it has some nice syntax sugar, although the fact that it's meant for lower-level programming can be felt. I'm curious to see what sort of performance I'll get when I wrap the move generator and make/unmake framework.

algerbrex · Post by **algerbrex** » Wed Jul 27, 2022 1:59 pm

So I've spent a bit of time playing around with the transposition table, seeing what could be improved, and I decided to try rounding up the TT-size to a power of two, so I could use bitwise-and to calculate an index into the table instead of modulo, which should save some time. I also kept in the previous bug fix where the global age wasn't changing. During testing against 8.5.5 no major regression was observed:

Code: Select all

Score of Blunder 8.6.5 vs Blunder 8.5.5: 1063 - 1041 - 1896  [0.503] 4000
...      Blunder 8.6.5 playing White: 581 - 468 - 951  [0.528] 2000
...      Blunder 8.6.5 playing Black: 482 - 573 - 945  [0.477] 2000
...      White vs Black: 1154 - 950 - 1896  [0.525] 4000
Elo difference: 1.9 +/- 7.8, LOS: 68.4 %, DrawRatio: 47.4 %
SPRT: llr -0.274 (-9.3%), lbound -2.94, ubound 2.94
Finished match

But what I cared more about was calculating the number of "low-depth searches" (score < 1cp, depth <= 8, and move time >= 400ms). I wrote a very quick and dirty Python script to calculate the total number of low-depth searches, as I've defined them, for any group of engines in a PGN file. And using this script I confirmed the above changes, while not eliminating my TT issues, did reduce the number of low-depth searches by a decent bit:

Code: Select all

Blunder 8.5.5: 262
Blunder 8.6.5: 217

So I've kept them for now and merged them into the development branch. When I get back from the gym and running some errands today, I'm going to do a little more investigation to see specifically whether the number of low-depth searches caused by probing/storing in the TT was reduced since given what Guenther has shown, I think Blunder in general has a problem with search explosions, which I'll need to dedicate some time to.

Out of curiosity, here are some statistics for low-depth searches from the last couple of tests I ran, a gauntlet for 8.0.0, a gauntlet for 8.4.5, and the Leorik gauntlet:

Code: Select all

Blunder 8.0.0 Gauntlet
----------------------------
Nebula 2.0: 351
Blunder 8.0.0: 501
Rodin v8.0.0: 30
Admete 1.5.0: 148
Zahak 5.0: 7
Cheese 1.8 64 Bits: 1
Leorik 2.2: 1
Velvet v1.1.0: 2

Blunder 8.4.5 Gauntlet
----------------------------
Nebula 2.0: 328
Blunder 8.4.5: 354
Rodin v8.0.0: 23
Admete 1.5.0: 127
Zahak 5.0: 1
Velvet v1.1.0: 2

Leorik Gauntlet
-------------------
Blunder 7.6.0: 240
Blunder 8.0.0: 37
Blunder 8.4.5: 143
Leorik 2.2: 1

Although it looks a bit odd on the face of it, it's not super surprising to me that v8.4.5 would have more low-depth searches in the Leorik gauntlet, as I did add singular extensions in that version, so it's to be expected some searches would take longer. Out of curiosity, I'll probably re-run the Leorik gauntlet with v8.6.5 to see how it fairs. I'll report back with the results.

Guenther · Post by **Guenther** » Wed Jul 27, 2022 2:53 pm

algerbrex wrote: ↑Wed Jul 27, 2022 1:59 pm
But what I cared more about was calculating the number of "low-depth searches" (score < 1cp, depth <= 8, and move time >= 400ms). I wrote a very quick and dirty Python script to calculate the total number of low-depth searches, as I've defined them, for any group of engines in a PGN file. And using this script I confirmed the above changes, while not eliminating my TT issues, did reduce the number of low-depth searches by a decent bit:
Code: Select all
Blunder 8.5.5: 262
Blunder 8.6.5: 217

It could be helpful, if you add another condition to your script to get even more meaningful (precise) output.
E.g. check only lost games for the players, or even better a condition which is able to find out, if the low depth rules used
also correlate with an eval loss of a certain value in the next 2 or 3 moves, thus it also has probably impacted the final game result.

algerbrex · Post by **algerbrex** » Wed Jul 27, 2022 3:39 pm

Guenther wrote: ↑Wed Jul 27, 2022 2:53 pm It could be helpful, if you add another condition to your script to get even more meaningful (precise) output.
E.g. check only lost games for the players, or even better a condition which is able to find out, if the low depth rules used
also correlate with an eval loss of a certain value in the next 2 or 3 moves, thus it also has probably impacted the final game result.

Good point, that's a metric I didn't think to measure. I'll look into that.

In a perfect world, regardless of if the low-depth search was decisive towards the final game result, I'd still like to eliminate them altogether. But I may just have to live with the next best thing of having the low-depth searches be negligible by seeing how often they correlate with some of the metrics you mentioned like a loss.

algerbrex · Post by **algerbrex** » Thu Jul 28, 2022 2:45 am

So, I ran another gauntlet with Leorik out of curiosity, and I'm starting to become convinced concurrency and my laptop's peculiarities are playing a role in some of what I'm observing.

So I had started running a gauntlet with a concurrency of 8, and it was 2/3 of the way through, with 8.6.5 behind Leorik by about ~40 Elo points, before I misremembered that I had used a concurrency of 4 for the original gauntlet. So I restarted the gauntlet with a concurrency of 4, and all three versions of Blunder seemed to perform at a much higher level, even though the gauntlet didn't finish and despite changing the concurrency number. And the number of low-depth searches after several hundred games was tiny.

But after I realized I was wrong and the original gauntlet used a concurrency of 8, I stopped the gauntlet and restarted the gauntlet with a concurrency of 8, and ended up with results where all three versions performed much worst against Leorik, even though in the original first gauntlet, they all performed quite well, given how they had all done before:

Code: Select all

Rank Name                          Elo     +/-   Games   Score    Draw
   0 Leorik 2.2                     97      12    2400   63.5%   31.2%
   1 Blunder 8.6.5                 -64      20     800   40.9%   30.9%
   2 Blunder 8.0.0                 -91      20     800   37.3%   32.8%
   3 Blunder 7.6.0                -137      21     800   31.2%   29.9%

Finished match

So I suspect that concurrency is definitely playing a role here, and I'm not too worried about the results. What I was more curious to see was the low-depth searches. With Blunder 8.4.5 they were pretty high. It seems some of the tweaks I made to the TT load/store functions did seem to help things, as the total number of low-depth searches for the dev version of Blunder now dropped from 143 down to 59:

Code: Select all

1st Leorik Gauntlet
------------------------
Blunder 7.6.0: 240
Blunder 8.0.0: 37
Blunder 8.4.5: 143
Leorik 2.2: 1

2nd Leorik Gauntlet
------------------------
Blunder 7.6.0: 319
Blunder 8.0.0: 47
Blunder 8.6.5: 59
Leorik 2.2:1

Now, I imagine many of the remaining low-depth searches seen are due to search explosions rather than performance spikes, so I'm going to dedicate some time to looking into Blunder's search constants, while also trying to settle on a decent concurrency number.

algerbrex · Post by **algerbrex** » Thu Jul 28, 2022 2:55 am

On a different note, I've also been working on basically what is a port of my current move generator to Zig, with a few tweaks here and there, and it's been an interesting but mostly fun experience, although Zig definitely has a learning curve, at least for someone not coming from a systems programming background.

I shelved that project for now as it's something I'll probably work on incrementally when I get bored with Blunder, and I had to take a break from learning Zig.

What I've decided to pick up for now in its place is creating a checkers engine in Rust. I've wanted to get more familiar with Rust for a while, and it's the whole concept of memory safety at compile-time with concepts like ownership seems interesting to learn. And a checkers engine seems like something that's doable enough to where I won't burn out, but challenging enough to where it should keep me entertained.

Anyway, I digress, since this is a chess engine forum.

Modern Times · Post by **Modern Times** » Thu Jul 28, 2022 3:25 am

algerbrex wrote: ↑Thu Jul 28, 2022 2:45 am So, I ran another gauntlet with Leorik out of curiosity, and I'm starting to become convinced concurrency and my laptop's peculiarities are playing a role in some of what I'm observing.

So I had started running a gauntlet with a concurrency of 8, and it was 2/3 of the way through, with 8.6.5 behind Leorik by about ~40 Elo points, before I misremembered that I had used a concurrency of 4 for the original gauntlet.

I don't know what CPU you have. but say it is 4 cores, 8 threads. In my experience If you use 8 threads then each instance will have lower nps and lower search depth. I've run tests here with SF to demonstrate that on Ryzen and Intel. That extra thread capability is not a free lunch. They are competing for the same resource a lot of the time. And I've always wondered about the possibility of inconsistencies. On top of that, the CPU clock speed itself will likely slow down which of course slows the engines down too. For my own engine testing I do not go above the number of actual cores, although I know some do, e.g. Stefan Pohl, Carlos etc.

Progress on Blunder

Re: Progress on Blunder

Re: Progress on Blunder & A New Engine

Re: Progress on Blunder & A New Engine

Re: Progress on Blunder & A New Engine

Re: Progress on Blunder

Re: Progress on Blunder

Re: Progress on Blunder

Re: Progress on Blunder

Re: Progress on Blunder

Re: Progress on Blunder