Engine development and blog.

JacquesRW · Post by **JacquesRW** » Mon Sep 26, 2022 2:22 pm

likeawizard wrote: ↑Mon Sep 26, 2022 11:09 am Currently my at the start of my eval function I have checks whether the game is decided: It tests if the king to move is in check and if the side to move has any moves. If check and no moves then it awards +/- Inf, if the king is not in check then I am declaring a stalemate. (It currently lacks insufficent material, threefold and 50-move draw checks, but that's another topic)

Once I removed the checks I saw my engine increase it's performance massively. Determining checks and responses is obviously expensive. So I began to think whether there is a smarter solution. So far the kings had 0 weight in the eval but I am thinking giving the king a very high score. The same score that would be used for alphabeta initial call as +/- inf. (of course keeping int under/overflows in mind)

And let the search algorithm deal with (stale)mates. If the legal move generator returns no moves then perform a isInCheck and determine the result on that? Does this sound reasonable? What are other engines using for mate detection?

Yeah if you have a legal move generator you can just check if the move list is empty in the search and only then determine if in check. For example, I do:

Code: Select all

 
// generating moves
let mut moves = MoveList::default();
self.board.gen_moves::<{ MoveType::ALL }>(&mut moves);

// checking for checkmate/stalemate
if moves.is_empty() {
    // 0 if stalemate, mate score - ply if mate
    return king_in_check as i16 * (-MAX_SCORE + ply as i16);
}

If you have a pseudolegal generator you can keep a count of legal moves found as you go through the generated moves, and if it is zero at the end then you can check for mate/stalemate. Some pseudolegal based engines also give the king a very high score as you mentioned to avoid this and just not let the king be "captured".

Generally I don't think it is worth checking in eval due to the time it takes, relatively few positions will result in the game ending on average.

likeawizard · Post by **likeawizard** » Mon Sep 26, 2022 6:18 pm

Thank you. It seems reasonable and yes I only started to think about it when I made that test and saw the potential huge performance gain.

likeawizard · Post by **likeawizard** » Sat Oct 01, 2022 8:44 am

I have a lot of questions and I'll try to contain them all to this thread unless it is worthy of a separate thread.

I was wondering about iterative deepening. What level should the iterative deepening search start at? I am currently starting at depth 1. But is it worth it? There is a considerable overhead in initializing each step so the nps for depth=1 is abysmal (not that it is great beyond it). Would it not be more pragmatic starting at 2? I currently do not have working transposition tables implemented. Should it depend on that too maybe?

Here's some arbitrary sample data:

Code: Select all

Depth: 1 (0.97) Move: [b5d4] (271.0knps, total: 113.0 (1.0 112.0), QN: 99%, evals: 99%)
Depth: 2 (0.88) Move: [f3d4 g8e7] (817.8knps, total: 22.9k (43.0 22.9k), QN: 99%, evals: 99%)
Depth: 3 (0.99) Move: [f3d4 g8e7 d1f3] (653.4knps, total: 112.3k (204.0 112.1k), QN: 99%, evals: 99%)
Depth: 4 (0.90) Move: [f3d4 c6e5 d1g4 c8g4] (1.3Mnps, total: 7.7M (3.9k 7.7M), QN: 99%, evals: 99%)

lithander · Post by **lithander** » Sat Oct 08, 2022 1:02 am

likeawizard wrote: ↑Sat Oct 01, 2022 8:44 am I was wondering about iterative deepening. What level should the iterative deepening search start at? I am currently starting at depth 1. But is it worth it? There is a considerable overhead in initializing each step so the nps for depth=1 is abysmal (not that it is great beyond it). Would it not be more pragmatic starting at 2? I currently do not have working transposition tables implemented. Should it depend on that too maybe?

Here's some arbitrary sample data:
Code: Select all
Depth: 1 (0.97) Move: [b5d4] (271.0knps, total: 113.0 (1.0 112.0), QN: 99%, evals: 99%)
Depth: 2 (0.88) Move: [f3d4 g8e7] (817.8knps, total: 22.9k (43.0 22.9k), QN: 99%, evals: 99%)
Depth: 3 (0.99) Move: [f3d4 g8e7 d1f3] (653.4knps, total: 112.3k (204.0 112.1k), QN: 99%, evals: 99%)
Depth: 4 (0.90) Move: [f3d4 c6e5 d1g4 c8g4] (1.3Mnps, total: 7.7M (3.9k 7.7M), QN: 99%, evals: 99%)

There's little advantage of doing iterative deepening without a way of transferring what you learned from the previous to the next iteration. A transposition table allows you to store the best move for most encountered positions and next time you see it you play that move first. If you don't do that yet then you should at least play the PV moves from the previous iterations first. Based on your example Depth 2 should start searching the move b5d4. Depth 3 searches f3d4 first and then g8e7...

Even if you do only that you should already see a significant benefit from using iterative deepening. Whether you start at depth 1 or not shouldn't make a difference. Sure the reported nps for the first iterations is low but who cares... the amount of searched nodes is even lower and so you'll zip to depth 6 or 7 in no time (a few milliseconds) regardless of whether you skip depth 1 or not.

likeawizard · Post by **likeawizard** » Sun Oct 09, 2022 9:18 pm

lithander wrote: ↑Sat Oct 08, 2022 1:02 am There's little advantage of doing iterative deepening without a way of transferring what you learned from the previous to the next iteration. A transposition table allows you to store the best move for most encountered positions and next time you see it you play that move first. If you don't do that yet then you should at least play the PV moves from the previous iterations first. Based on your example Depth 2 should start searching the move b5d4. Depth 3 searches f3d4 first and then g8e7...

Even if you do only that you should already see a significant benefit from using iterative deepening. Whether you start at depth 1 or not shouldn't make a difference. Sure the reported nps for the first iterations is low but who cares... the amount of searched nodes is even lower and so you'll zip to depth 6 or 7 in no time (a few milliseconds) regardless of whether you skip depth 1 or not.

Well one advantage of doing iterative deepening without transposition tables is having a the same basic alpha-beta but being able to limit your search to some arbitrary time and also obtaining the best variation and feeding it into the next step is already a nice step up in performance. (though still a net negative but certainly cheaper than running the search without that little extra help from the PV from the previous line)

But of course I understand the great value of transposition tables. When letting my bot play itself with sharing the same TT with fixed move times it appears that when a move is calculated to depth 8 and played. Then when looking for the response depth 7 is usually calculated instantaneously as many of those positions were already evaluated and stored in the TT on the previous search. Of course it's not always read directly from the TT due to inevitable index collisions and overwriting some entries.

But yea I still need to figure out some stuff in my TT code as it seems it sometimes finds say a mate at depth 8 and the very next move it can no longer see it at depth 6 and other glitches like that... But that's me being dumb and my code a mess.

likeawizard · Post by **likeawizard** » Tue Nov 22, 2022 3:45 pm

I think it is coming close to a year since I started developing my engine. There were of course times of inactivity. Recently I decided to fork my existing engine and refactor everything and make it into a UCI engine. Before it was mostly an all-in-one lichess bot. I think finally I feel like it has no major bugs and all things are aligning as they should. So I am happy to release v1.0.0. and have decided on a name Tofiks.

The repository and complied engine executables can be found on my GitHub - https://github.com/likeawizard/tofiks
It still actively playing on Lichess https://lichess.org/@/likeawizard-bot (even more so now since I am using the official lichess bot which has built in matchmaking)

While all the pieces are in place and working as I think they are intended. I am by no means done and finished.
There are several things I want to improve on:

The movegen could probably be optimized for more performance.
I am not quite happy with how the Hash table works. Increasing the table size I would expect to see better performance but it seems to have very little effect to none.
There was a recent thread here - A mate in 11 for amateurs. The numbers presented there were humiliating. Both in raw performance and also the number of nodes it took other engines to crack it. So I need plenty of extensions and reductions to ensure I can find better moves with less nodes.
PST optimizations- well for a lack of a better word - I just pulled them out of my ass. I guess the technical term would be 'empirical' but certainly some more data driven approach would benefit them.
I am also slowly learning of NN principles and might later see if adding some NNUE evaluation could help.

lithander · Post by **lithander** » Thu Nov 24, 2022 12:57 pm

likeawizard wrote: ↑Tue Nov 22, 2022 3:45 pm So I am happy to release v1.0.0. and have decided on a name Tofiks.

Congratzs on the release! I'm looking forward to see it in tournaments and on the CCRL!

likeawizard wrote: ↑Tue Nov 22, 2022 3:45 pm I am not quite happy with how the Hash table works. Increasing the table size I would expect to see better performance but it seems to have very little effect to none.

In the fast time-control's we usually use for selfplay Elo testing it's hard to fill up the Hash-table if you use normal hash-sizes of several hundred MBs. Maybe try setting an unusually small size to see how it behaves when hash-collisions actually happen. Also I think tracking some statistics of overwrites and fill-ratio can help you develop a good intuition of whats going on.

likeawizard wrote: ↑Tue Nov 22, 2022 3:45 pm There was a recent thread here - A mate in 11 for amateurs. The numbers presented there were humiliating. Both in raw performance and also the number of nodes it took other engines to crack it. So I need plenty of extensions and reductions to ensure I can find better moves with less nodes.

My engine also did very poorly on that problem. But I'm not too worried about it because the position is very unusual. I don't have the kind of check extensions that would be needed to do great here. It should be an interesting exercise to optimize the engine for this particular problem but I'm not so sure whether it will actually help the general case.

likeawizard wrote: ↑Tue Nov 22, 2022 3:45 pm PST optimizations- well for a lack of a better word - I just pulled them out of my ass. I guess the technical term would be 'empirical' but certainly some more data driven approach would benefit them.
I am also slowly learning of NN principles and might later see if adding some NNUE evaluation could help.

I'm absolutely sure that NNUE can help your engine! It's a bit of a magic bullet, it seems. But there's a large space between handwritten PSTs (maybe together with some HCE terms) on one hand and NNUE on the other end. Personally I wouldn't want to miss exploring that space.

My journey started with a PST-only evaluation where I borrowed the values from PeSTO. Then I tried to write my own tuner to generate values as good as PeSTO's from scratch. But the "from scratch" claim can be contested because I took the data for my tuner (thousands of annotated FENs) from Zurichess and in the process of creating that data not only Zurichess but also Stockfish was used. None the less it was an important step.

Over time I added other evaluation terms to my evaluation and retuned the tables for that. For Leorik I rewrote the tuner from scratch changing it from texel's tuning to gradient descent and focusing on performance. Tuning tables got orders of magnitutde faster.

When I started adding a handcrafted evaluation (HCE) for real in Leorik I used the tuner not only to adjust the PSTs but to guide my implementation of the HCE. It was a bit messy but basically I would add new features (e.g. instead of "pawn on e6" I'd have seperate features for different types of pawns, passed, isolated end so on. And I also game up with features to represent mobility.) and then I'd look what table-weights the tuner would produce and approximate that with formulas in my HCE.

Then I got a bit short-sighted due to the tournaments Leorik was starting to play in, especially the Amateur Series run by Graham. I wanted to have a new, stronger version whenever a new season started. But improving the HCE beyond the most basic point was really hard for me. A lot of attempts, some of them promising at first, only yielded marginal improvements after rigorous testing. I didn't get King Safety to work at all. I got frustrated with fiddling with the details of my engine and wasting time and energy on so many pointless test runs. The urge to take a look at other engine's source code (which I don't want to do) got harder to resist and I took a long break.

But recently I've found my motivation again: the gap between what I have and NNUE engines is still vast and full of interesting problems. What I admire about AlphaZero is that the AI learned to play chess through self-play. My next goal is a version of Leorik that has no handcrafted evaluation formulas anymore. Everything should be based on the same principle: features of the position and tables with weights that determine how each feature should affect the evaluation. These weights are automatically derived by the tuner by minimizing the evaluation error on a dataset of annotated positions. And this dataset needs to be created with selfplay matches of Leorik (no other engines should be involved) and the very first version of the engine generating the first batch of data would be one with all weights set to zero except that I'll keep the basic material values. (Queen being 9pts, Rook 5pts etc...)

Basically I supply all the code and algorithms but none of the chess logic beyond the game's basic rules. Developing this framework has been great fun again! And the trajectory will hopefully lead to NNUEs eventually but the next goal is just to reach the same strength that Leorik already has now by different "purer" means!

likeawizard · Post by **likeawizard** » Fri Nov 25, 2022 12:23 pm

Thank you for your detailed feedback. I will certainly test the Hash table with extremely small size. I will need some time to digest most of your comments.

Tournaments sounds fun and certainly something I'd like to get into to keep me motivated. I will look into if I can find anything entry level. I am not quite sure what the exact procedure of testing and inclusion of CCRL is.

lithander · Post by **lithander** » Fri Nov 25, 2022 12:54 pm

Thank you for your detailed feedback. I will certainly test the Hash table with extremely small size.

likeawizard wrote: ↑Fri Nov 25, 2022 12:23 pm I will need some time to digest most of your comments.

My comments turned a bit into a monologue that might have been a better fit for my own devlog. But the point I wanted to make was that between your "empirical" PSTs and NNUEs there's a lot of room for more traditional, simpler machine learning techniques like e.g. linear regression. I would not try to implement NNUE directly unless you have a machine learning background or want to have the strongest possible engine as soon as possible.

likeawizard wrote: ↑Fri Nov 25, 2022 12:23 pm Tournaments sounds fun and certainly something I'd like to get into to keep me motivated. I will look into if I can find anything entry level. I am not quite sure what the exact procedure of testing and inclusion of CCRL is.

In my case I was just active in in the programming forums during development and both my engines were just discovered without my initiative. Well, i have links in my signature, maybe that helped, too. But the CCRL testers and other forum members were reading the dev forums apparently.

You could probably make a post here https://talkchess.com/forum3/viewtopic. ... &start=150 about your recent release just to be sure.

Most tournaments only include engines above a certain strength threshold. I always posted an expected Elo value when I made a release on github or wrote about it in my devlog. If you don't know your Elo maybe run a little gauntlet on your own PC against a few other engines that you think are in range.

Engine development and blog.

Re: Engine development and blog.

Re: Engine development and blog.

Re: Engine development and blog.

Re: Engine development and blog.

Re: Engine development and blog.

Re: Engine development and blog.

Re: Engine development and blog.

Re: Engine development and blog.

Re: Engine development and blog.