Progress on Blunder

algerbrex · Post by **algerbrex** » Mon Mar 06, 2023 5:09 am

Chessqueen wrote: ↑Fri Mar 03, 2023 1:07 pm ...

I appreciate the interest in Blunder Jorge, but I would prefer if you made a separate thread for your game testing. Just makes things easier for everyone

Thanks.

Chessqueen · Post by **Chessqueen** » Mon Mar 06, 2023 6:07 am

algerbrex wrote: ↑Mon Mar 06, 2023 5:09 am
Chessqueen wrote: ↑Fri Mar 03, 2023 1:07 pm ...
I appreciate the interest in Blunder Jorge, but I would prefer if you made a separate thread for your game testing. Just makes things easier for everyone Thanks.

I will, I am pretty sure your newer NN version when you release it, will be at least 20 to 30 Elo rating higher

algerbrex · Post by **algerbrex** » Wed Mar 15, 2023 5:45 am

Progress on updating Blunder for the 9.0.0 version release has been coming along fairly smoothly.

For this refactoring, I dumped the old evaluation values from previous versions, which were tuned using the popular Zurichess dataset, and instead opted to lichess games from https://database.lichess.org, which is a really nice resource for getting tons of raw data.

I used the games from March, 2013 to generate ~1M positions to run through my tuner. The initial values were pretty sloppy, but of course were far, far better than just the material values that I started with. In between re-adding and re-testing search features, I've done about 2-3 re-tunings, and so far all of them have netted 50-60 Elo. Of course, the exact values is over-inflated due to self-play, but it's a good indication that the evaluation terms are getting better. But they're far from being optimal and are still a good bit weaker than the orginal values. My plan is to keep re-adding search features for a while longer while recording the self-play games, and use those self-play games to keep re-tuning the evaluation. I technically could just generate 100K games all in one go, but I'm a bit too impatient for that

The other biggest change so far has been getting a good replacement scheme working for Blunder. Blunder 8.5.5 has a replacement scheme, but the code that implemented it wasn't very clean in my opinion, and the aging mechanism left a lot to be desired. In the most recent dev version I re-wrote the transposition table code. The transposition table is now a linear array of bucket structs, with each bucket holding 4 entry lots. I might experiment with more or less, as I made sure I didn't hardcode in the number of buckets, but 4 proved to work quite well.

As for the aging scheme, I'm now doing something more traditional now where I have a 16-bit counter that I modulo by 16, and just store that every time I write to an entry, and as part of the replacement scheme compare and replace any entry in a bucket that doesn't have the current age.

To actually be able to test these changes in a reasonable amount of time, I made the transposition table pretty tiny, with just 20K entries. And at a time control of 20+0.2s, and using numbers I've seen floating around here, that should put a good amount of pressure on the table and contrast between a decent replacement scheme basic always replace.

The idea seemed to work as the one with the new replacement scheme tested to be ~50 Elo better in self-play. Of course again, an over-inflated value, but still a good indication that the new replacement scheme is much better than just the always replace that current dev had. And in testing this scheme versus what Blunder currently had results looked promising too, but I need to do some proper testing there as well.

KhepriChess · Post by **KhepriChess** » Wed Mar 15, 2023 7:59 pm

algerbrex wrote: ↑Wed Mar 15, 2023 5:45 am Progress on updating Blunder for the 9.0.0 version release has been coming along fairly smoothly.

For this refactoring, I dumped the old evaluation values from previous versions, which were tuned using the popular Zurichess dataset, and instead opted to lichess games from https://database.lichess.org, which is a really nice resource for getting tons of raw data.

Any particular reason for opting for a lichess dataset over Zurichess?

algerbrex wrote: ↑Wed Mar 15, 2023 5:45 am My plan is to keep re-adding search features for a while longer while recording the self-play games

Okay, I've been wanting to ask since I'm in the same boat, but how do you choose what search features to add at what point (e.g. futility then razor or razor then futility) and how far do you go/how long do you spend in tweaking various parameters on those features? I basically drove myself crazy my last release tweaking the search.

algerbrex wrote: ↑Wed Mar 15, 2023 5:45 am The other biggest change so far has been getting a good replacement scheme working for Blunder. Blunder 8.5.5 has a replacement scheme, but the code that implemented it wasn't very clean in my opinion, and the aging mechanism left a lot to be desired. In the most recent dev version I re-wrote the transposition table code. The transposition table is now a linear array of bucket structs, with each bucket holding 4 entry lots. I might experiment with more or less, as I made sure I didn't hardcode in the number of buckets, but 4 proved to work quite well.

In earlier version of Blunder, did you ever use TT without a replacement scheme (or rather, just with an "always replace" scheme)? Once you switched to buckets and aging, what king of an improvement did you see?

algerbrex · Post by **algerbrex** » Wed Mar 15, 2023 10:17 pm

KhepriChess wrote: ↑Wed Mar 15, 2023 7:59 pm Any particular reason for opting for a lichess dataset over Zurichess?

Well, part of my goal for this refactoring is to make my engine more original. And that included not relying on other engines' data for tuning my evaluation. So Zurichess, while a extremely high quality and useful dataset, wouldn't work for my current purposes. Of course I don't believe there's anything unethical about using the dataset, I used it myself for tuning every previous version of Blunder. But I got inspired by seeing what Thomas (Leorik's author) was capable of doing by starting from scratch and wanted to try it myself.

But I don't mind using lichess data, or other high quality human chess game datasets like MillionBase 3.45 (https://rebel13.nl/download/data.html).

It also was an option to be ultra-pure, not using any outside data whatsoever, and use just material values, randomness so the engine doesn't just shuffle its pieces back and forth, and an opening books to create an intitial set of evaluation values. And I might experiment with this route in the future.

KhepriChess wrote: ↑Wed Mar 15, 2023 7:59 pm Okay, I've been wanting to ask since I'm in the same boat, but how do you choose what search features to add at what point (e.g. futility then razor or razor then futility) and how far do you go/how long do you spend in tweaking various parameters on those features? I basically drove myself crazy my last release tweaking the search.

When I first started writing the engine, I knew about certain search features that were considered standard and fairly easy to implement, so I opted to implement those first. Things like killer moves, principal variation search, check extensions, etc. And after that really it was a matter of browsing the chess programming wiki and the threads on here to find interesting ideas to try. I would also sometimes browse through the README's of other engines to see if they had any features that might be interesting to implement myself and play with.

As far as time spent, my general approach has been to get a feature to work, move on for a bit, and then come back later to tweak it. For example, with null move pruning, I started with a fixed reduction, and then after adding a few more features, I revisited it and tweaked it to be a dynamic reduction depending on the depth, and then a few more features later I came back and tweaked it some more.

Now there was no reason I couldn't have just done all those tweaks at once, but personally, I find it hard to work on one feature for days or weeks at a time, and having a variety of ideas to try helps keep me engaged and my thinking fresh. And from what it sounds like you might prefer this approach too. Because it can definitely be frustrating to bang your head of trying to squeeze Elo out of one particular feature for days and days.

KhepriChess wrote: ↑Wed Mar 15, 2023 7:59 pm In earlier version of Blunder, did you ever use TT without a replacement scheme (or rather, just with an "always replace" scheme)? Once you switched to buckets and aging, what king of an improvement did you see?

Yup, I definitely did. Up until version 8.0.0, Blunder used an always-replace scheme, which worked perfectly well, and what I always recommend starting with. In version 8.0.0 I implemented a bit of a hacky bucket system, which I'm honestly not even sure contributed much Elo, if any, due to bugs! In this most recent development version, I took my time to make my code much cleaner and make sure the aging and replacement scheme was working as it should. The dev version utilizes 4 buckets spefically. Here's the code if you're interested: https://github.com/algerbrex/blunder/bl ... osition.go

The difficulty in testing replacement schemes that I've learned is that you need to ensure a good amount of pressure it being put on the TT, and it's not so big that no real replacement ever needs to happen. And this can be espeically difficult to do when testing with short time controls.

So what I did is, using some estimates by Harm Geert Muller, I determined if I used a time control of 20+0.2s and set my table size to just contain 20K 16-bit entries (or 5K 64-bit buckets), each move there would be roughly 200K, or 10x as many unique positions to store as there were TT slots, so Blunder would be forced to decide what entries to replace. The commit I made relating to it whet over more of the math in detail if you're interested: https://github.com/algerbrex/blunder/co ... 542b602829

Using this approach seemed to work pretty well, as SPRT testing using cutechess showed the version using a smarter replacement scheme than just always replace was ~50 Elo stronger in self-play (see the aformentioned linked commit). Of course this exact figure is inflated by the self-place aspect, but nevertheless, it seems to me to be a pretty good indication that the new replacement scheme is an improvement overall. And using a smarter replacement scheme also speed up PERFT calculations quite considerably as well.

Progress on Blunder

Re: Progress on Blunder

Re: Progress on Blunder

Re: Progress on Blunder

Re: Progress on Blunder

Re: Progress on Blunder