likeawizard wrote: ↑Tue Nov 22, 2022 3:45 pm
So I am happy to release v1.0.0. and have decided on a name Tofiks.
Congratzs on the release! I'm looking forward to see it in tournaments and on the CCRL!
likeawizard wrote: ↑Tue Nov 22, 2022 3:45 pm
I am not quite happy with how the Hash table works. Increasing the table size I would expect to see better performance but it seems to have very little effect to none.
In the fast time-control's we usually use for selfplay Elo testing it's hard to fill up the Hash-table if you use normal hash-sizes of several hundred MBs. Maybe try setting an unusually small size to see how it behaves when hash-collisions actually happen. Also I think tracking some statistics of overwrites and fill-ratio can help you develop a good intuition of whats going on.
likeawizard wrote: ↑Tue Nov 22, 2022 3:45 pm
There was a recent thread here - A mate in 11 for amateurs. The numbers presented there were humiliating. Both in raw performance and also the number of nodes it took other engines to crack it. So I need plenty of extensions and reductions to ensure I can find better moves with less nodes.
My engine also did very poorly on that problem. But I'm not too worried about it because the position is very unusual. I don't have the kind of check extensions that would be needed to do great here. It should be an interesting exercise to optimize the engine for this particular problem but I'm not so sure whether it will actually help the general case.
likeawizard wrote: ↑Tue Nov 22, 2022 3:45 pm
PST optimizations- well for a lack of a better word - I just pulled them out of my ass. I guess the technical term would be 'empirical' but certainly some more data driven approach would benefit them.
I am also slowly learning of NN principles and might later see if adding some NNUE evaluation could help.
I'm absolutely sure that NNUE can help your engine! It's a bit of a magic bullet, it seems. But there's a large space between handwritten PSTs (maybe together with some HCE terms) on one hand and NNUE on the other end. Personally I wouldn't want to miss exploring that space.
My journey started with a PST-only evaluation where I borrowed the values from PeSTO. Then I tried to write my own tuner to
generate values as good as PeSTO's from scratch. But the "from scratch" claim can be contested because I took the data for my tuner (thousands of annotated FENs) from
Zurichess and in the process of creating that data not only Zurichess but also Stockfish was used. None the less it was an important step.
Over time I added other evaluation terms to my evaluation and retuned the tables for that. For Leorik I
rewrote the tuner from scratch changing it from texel's tuning to gradient descent and focusing on performance. Tuning tables got orders of magnitutde faster.
When I started adding a handcrafted evaluation (HCE) for real in Leorik I used the tuner not only to adjust the PSTs but to guide my implementation of the HCE. It was a bit messy but basically I would add new features (e.g. instead of "pawn on e6" I'd have seperate features for different types of pawns, passed, isolated end so on. And I also game up with features to represent mobility.) and then I'd look what table-weights the tuner would produce and approximate that with formulas in my HCE.
Then I got a bit short-sighted due to the tournaments Leorik was starting to play in, especially the Amateur Series run by Graham. I wanted to have a new, stronger version whenever a new season started. But improving the HCE beyond the most basic point was really hard for me. A lot of attempts, some of them promising at first, only yielded marginal improvements after rigorous testing. I didn't get King Safety to work at all. I got frustrated with fiddling with the details of my engine and wasting time and energy on so many pointless test runs. The urge to take a look at other engine's source code (which I don't want to do) got harder to resist and I took a long break.
But recently I've found my motivation again: the gap between what I have and NNUE engines is still vast and full of interesting problems. What I admire about AlphaZero is that the AI learned to play chess through self-play. My next goal is a version of Leorik that has no handcrafted evaluation formulas anymore. Everything should be based on the same principle: features of the position and tables with weights that determine how each feature should affect the evaluation. These weights are automatically derived by the tuner by minimizing the evaluation error on a dataset of annotated positions. And this dataset needs to be created with selfplay matches of Leorik (no other engines should be involved) and the very first version of the engine generating the first batch of data would be one with all weights set to zero except that I'll keep the basic material values. (Queen being 9pts, Rook 5pts etc...)
Basically I supply all the code and algorithms but none of the chess logic beyond the game's basic rules. Developing this framework has been great fun again! And the trajectory will hopefully lead to NNUEs eventually but the next goal is just to reach the same strength that Leorik already has now by different "purer" means!
