TT: Overhead of incrementally updating Zobrist key

Discussion of chess software programming and technical issues.

Moderator: Ras

User avatar
hgm
Posts: 28426
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: TT: Overhead of incrementally updating Zobrist key

Post by hgm »

Knowing that branches can slow down the CPU a lot I usually treat empty squares as just another piece type, of zero value and with zero key, so that every move can be treated as a capture. Due to the predominance of QS nodes most moves are true captures anyway.

BTW, you can use a trick here to eliminate the need for separate application of a side-to-move key: use the key you intended for that as the Zobrist key for the empty square (independent of location). Strictly speaking, to get exactly the same result, you should then also XOR all the Zobrist keys of the other pieces with this same key, but since they are all randomly chosen anyway that would not resort in any effect. But if you imagine that every key (including that for the empty square, which would normally be 0) is XOR'ed with a constant C, then the incremental update would do (C^fromKey) ^(C^toKey) ^ (C^victimKey) = C^(fromKey^toKey^victimKey), so that you get the XOR with C for free.
lojic
Posts: 71
Joined: Thu Jan 28, 2021 9:50 pm
Full name: Brian Adkins

Re: TT: Overhead of incrementally updating Zobrist key

Post by lojic »

Joost Buijs wrote: Mon Feb 15, 2021 2:13 pm Nightmare runs perft(5) for this fen in 2 seconds (without bulk-counting).

Timings according to Intel VTune are:

generate_captures: 93 ms.
generate_quiet_moves: 200 ms.
legality_check: 200 ms.
do_move: 578 ms.
undo_move: 722 ms.
perft: 200 ms.

Most of the time is spend in do_move and undo_move.

I use a partial copy/make. Things like the hashkeys are copied, other things like the board array are updated/downdated.
Thanks for those timings - I need to work on my legality check!
lojic
Posts: 71
Joined: Thu Jan 28, 2021 9:50 pm
Full name: Brian Adkins

Re: TT: Overhead of incrementally updating Zobrist key

Post by lojic »

mvanthoor wrote: Mon Feb 15, 2021 12:45 pm
lojic wrote: Sun Feb 14, 2021 10:58 pm so roughly 27% of the time is make/unmake move. I was testing that FEN w/ a depth of 4 to allow more measurements, but the overhead seems irrespective of depth. Computing perft for that FEN w/ a depth of 5 takes 40.5 seconds for 146,963,716 total nodes for 3.6M/sec.
What language is your engine written in; something like Python? Does it use copy/make instead of make/unmake? (Or both... an interpreted language, and copy/make with lots of board copying?)
...
I'm using Racket, which is a Scheme dialect. It's compiled, and it's one of the faster Scheme dialects. Significantly faster than interpreted langs, such as Ruby/Python, but also significantly slower than C/C++ for code, such as a chess engine, that's in the sweet spot for C/C++.

I do make/unmake for the board, but I use a stack for the following state variables which are all packed into a single integer:

Code: Select all

(fixnum-fields state ([ ep-idx          7        ]
                      [ black-king-idx  7        ]
                      [ white-king-idx  7        ]
                      [ whites-move     1 #:flag ]
                      [ w-queenside-ok  1 #:flag ]
                      [ w-kingside-ok   1 #:flag ]
                      [ b-queenside-ok  1 #:flag ]
                      [ b-kingside-ok   1 #:flag ]))
User avatar
mvanthoor
Posts: 1784
Joined: Wed Jul 03, 2019 4:42 pm
Location: Netherlands
Full name: Marcel Vanthoor

Re: TT: Overhead of incrementally updating Zobrist key

Post by mvanthoor »

Oh yes, the engine in Racket. Sorry, I forgot. Then the other one bey user eigolf must be the Python one.
Author of Rustic, an engine written in Rust.
Releases | Code | Docs | Progress | CCRL
lojic
Posts: 71
Joined: Thu Jan 28, 2021 9:50 pm
Full name: Brian Adkins

Re: TT: Overhead of incrementally updating Zobrist key

Post by lojic »

lojic wrote: Sun Feb 14, 2021 10:58 pm I make the last ply so I can check the legality of the move. I use perft both for testing and for timing the generate/make/unmake parts of the engine.

I just ran a profile, and the perft spends most of the time as follows:
is-legal? 51.4%
unmake-move 13.8%
make-move 13.5%
generate-moves 10.9%
other 10.4%

so roughly 27% of the time is make/unmake move. I was testing that FEN w/ a depth of 4 to allow more measurements, but the overhead seems irrespective of depth. Computing perft for that FEN w/ a depth of 5 takes 40.5 seconds for 146,963,716 total nodes for 3.6M/sec.

Code: Select all

cpu time: 39725 real time: 40529 gc time: 3
perft(5) results:
  Totals:     146963716
  Captures:   28490460
  EP Cap:     58509
  Castles:    0
  Promotions: 24660
I made a ton of performance enhancements today, mostly using Racket's "unsafe" operations which avoid some runtime checking overhead. This dropped the time from 40.5 seconds down to 26.7 seconds, so that's nice. I haven't even begun to address the Zobrist key update issues that several of you have provided tips for. I think I may be able to get it down to ~ 20 seconds with those changes. Being 10x slower than a tuned C implementation wouldn't be a huge surprise in the context of chess engine programming.

I have no idea what the engine's current rating against other engines is, but playing many 10+5 and 15+10 games against humans puts it close to 1,900 (with me micro-managing the time a lot). While I do have a TT in place now, it's not using the best move from the TT yet, so that should help a *lot* I think when I add that.

Profiling after the performance tweaks gives:

Code: Select all

is-legal?      = 15,268 ms
unmake         =  3,472 ms
generate-moves =  3,128 ms
make-move      =  3,032 ms
perft itself   =  2,448 ms
Total          = 27,348 ms
So actually, given that profile and the tips you all have provided, is-legal? should probably be cut in half, at least, so sub-20 seconds seems very possible now w/o even being very clever.
User avatar
hgm
Posts: 28426
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: TT: Overhead of incrementally updating Zobrist key

Post by hgm »

The best speedup of a code section is usually achieved by not exectuting it at all.
User avatar
mvanthoor
Posts: 1784
Joined: Wed Jul 03, 2019 4:42 pm
Location: Netherlands
Full name: Marcel Vanthoor

Re: TT: Overhead of incrementally updating Zobrist key

Post by mvanthoor »

lojic wrote: Tue Feb 16, 2021 3:53 am ...
I'd be happy to run a quick test for you with the engine. I managed to compile it, but It doesn't behave like a UCI (or XBoard) engine as it tries to play in the console.

There's no further information "chess.exe --help" does nothing, and I'm not able to quickly find out if "raco exe ..." actually creates a release executable. The docs say that the program is compiled into byte code and the packaged within the racket executable; it does create a 30 MB exe. I have that now... so how can I run perft, or a tournament?
Author of Rustic, an engine written in Rust.
Releases | Code | Docs | Progress | CCRL
lojic
Posts: 71
Joined: Thu Jan 28, 2021 9:50 pm
Full name: Brian Adkins

Re: TT: Overhead of incrementally updating Zobrist key

Post by lojic »

mvanthoor wrote: Tue Feb 16, 2021 10:32 am
lojic wrote: Tue Feb 16, 2021 3:53 am ...
I'd be happy to run a quick test for you with the engine. I managed to compile it, but It doesn't behave like a UCI (or XBoard) engine as it tries to play in the console.

There's no further information "chess.exe --help" does nothing, and I'm not able to quickly find out if "raco exe ..." actually creates a release executable. The docs say that the program is compiled into byte code and the packaged within the racket executable; it does create a 30 MB exe. I have that now... so how can I run perft, or a tournament?
Yes, I decided to add the transposition table feature before the UCI feature. So it currently only plays in the console. When you start it, enter w or b to indicate which color the engine should play. When you enter a move (PGN format e.g. Qd4 Nge7 O-O) you can optionally enter a time in minutes for it to think. It starts with 5 seconds, but if you enter: e4 30 it will think for 30 seconds, and think time will stay at 30 until changed. No need to make an executable, you can just run "racket chess.rkt". Also, I use "rlwrap" on my system to wrap the program w/ readline support so you can backspace, etc. e.g. rlwrap racket chess.rkt

I realize I'm probably the only one motivated to go through that at this point, so I'll let you know when UCI support is in :)

I'm suspecting that the ELO rankings for engines playing each other is much lower than human ELO rankings because my engine is consistently beating folks around 1,900, and got a draw with a 2,000+, but it got crushed easily by wukong even when I gave it 40 seconds to think and wukong was moving much faster.
lojic
Posts: 71
Joined: Thu Jan 28, 2021 9:50 pm
Full name: Brian Adkins

Re: TT: Overhead of incrementally updating Zobrist key

Post by lojic »

mvanthoor wrote: Tue Feb 16, 2021 10:32 am ... so how can I run perft, or a tournament?
I missed the perft question. To run perft: racket perft.rkt

The depth and start are currently hardcoded here - you can change max-level to another value, and it currently has that FEN starting position from earlier in this thread. You can replace that with another FEN (it does require the half-move and full-move at the end, you can just use 0 1)
User avatar
mvanthoor
Posts: 1784
Joined: Wed Jul 03, 2019 4:42 pm
Location: Netherlands
Full name: Marcel Vanthoor

Re: TT: Overhead of incrementally updating Zobrist key

Post by mvanthoor »

lojic wrote: Tue Feb 16, 2021 4:49 pm I realize I'm probably the only one motivated to go through that at this point, so I'll let you know when UCI support is in :)
OK; at that point I can run the engine through my own gauntlet, which I use for testing Rustic.

My advice would be to build the UCI (or XBoard, or both) function as soon as possible, because then you will be able to determine the strength of each improvement you make. This is why I released my engine as Rustic Alpha 1, with no features apart from MVV-LVA move sorting. (This is very easy to implement, and not adding it just makes the engine do massive amounts of useless work, and thus it will search less deep and be weaker.)
I'm suspecting that the ELO rankings for engines playing each other is much lower than human ELO rankings because my engine is consistently beating folks around 1,900, and got a draw with a 2,000+, but it got crushed easily by wukong even when I gave it 40 seconds to think and wukong was moving much faster.
I've just started a tournament against WukongJS 1.5. I see the engine already has a hash table, much better sorting, and a more extensive evaluation than Rustic. Even in the middle game, WukonJS outcalculates Rustic by 2-3 ply (this increases up to 5-6 ply in the endgame) even though Rustic is literally almost 8-10x as fast.

I assume that the hash table, better move sorting and better evaluation gives WukongJS too much of a head-start right now; both against Rustic, and your engine. (Is your engine called "RacketChess", "Racket", or something else, by the way?)

If Wukong, as a Javascript engine, can reach depth 13-14 without breaking a sweat, I'm curious to see what Rustic can do once I finally get the time to figure out how to implement a hash table in Rust in such a way that I can re-use it for Perft, playing, and pawn hashes without having to implement it three times.
Author of Rustic, an engine written in Rust.
Releases | Code | Docs | Progress | CCRL