Hi,
The move generator of stockfish is ~14x faster than the one of embla (compared via perft 6).
So I'm thinking there's some room for improvement in Embla.
Currently Embla implements an 8x8 array of pointers. Those pointers point to the 2 x 16 piece-objects. Those objects calculate the moves for their object-type and state. This is then stored in an array.
There are several alternative implementations that I know of. Bitboards, mailbox, 0x88, etc.
Question now is: which is faster?
Strictly for (pseudo legal) move generation. I would like to keep search & eval as they are (for now).
fast(er) movegen
Moderators: hgm, Rebel, chrisw
-
- Posts: 481
- Joined: Tue Jul 03, 2018 10:19 am
- Full name: Folkert van Heusden
-
- Posts: 1871
- Joined: Sat Nov 25, 2017 2:28 pm
- Location: France
Re: fast(er) movegen
I recently copied the hyberbola quintessence bitboard approach (https://www.chessprogramming.org/Hyperbola_Quintessence) of amoeba (https://github.com/abulmo/amoeba).
The code inside minic, including the move generator, is simple to use
https://github.com/tryingsomestuff/Mini ... r/minic.cc
Does perft 6 on start position in aournd 10 sec
Thus 10Mnps for perft
The code inside minic, including the move generator, is simple to use
https://github.com/tryingsomestuff/Mini ... r/minic.cc
Does perft 6 on start position in aournd 10 sec
Code: Select all
# +-+-+-+-+-+-+-+-+
# |r|n|b|q|k|b|n|r|
# +-+-+-+-+-+-+-+-+
# |p|p|p|p|p|p|p|p|
# +-+-+-+-+-+-+-+-+
# | | | | | | | | |
# +-+-+-+-+-+-+-+-+
# | | | | | | | | |
# +-+-+-+-+-+-+-+-+
# | | | | | | | | |
# +-+-+-+-+-+-+-+-+
# | | | | | | | | |
# +-+-+-+-+-+-+-+-+
# |P|P|P|P|P|P|P|P|
# +-+-+-+-+-+-+-+-+
# |R|N|B|Q|K|B|N|R|
# +-+-+-+-+-+-+-+-+
# wk e1
# bk e8
# Turn white
# Phase 1# Static score 0
# Hash 4715363712485770064
# FEN rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 48 49
#
# Info 2018-12-10 15:13:30-300: pseudoNodes 119961738
# Info 2018-12-10 15:13:30-300: validNodes 119060324
# Info 2018-12-10 15:13:30-300: Waiting for workers to join...
# Info 2018-12-10 15:13:30-300: ... ok threadPool deleted
real 0m11,211s
-
- Posts: 2554
- Joined: Fri Nov 26, 2010 2:00 pm
- Location: Czech Republic
- Full name: Martin Sedlak
Re: fast(er) movegen
This is most likely because they do bulk counting at leaf nodes and you don't, not that it would mean much.
Movegen is just a small fraction of total time anyway.
Martin Sedlak
-
- Posts: 77
- Joined: Mon Apr 16, 2018 6:56 pm
Re: fast(er) movegen
Hi Flok,
I have the same feeling that my engine has a slow move generation. QPerft tool by HGM has a really fast move generation (IDK if it is the fastest), his tool is at least 17x faster than my engine, but it is only move generation.
His tool depth 6:
My engine depth 6:
I have the same feeling that my engine has a slow move generation. QPerft tool by HGM has a really fast move generation (IDK if it is the fastest), his tool is at least 17x faster than my engine, but it is only move generation.
His tool depth 6:
Code: Select all
./perft 6
...
Quick Perft by H.G. Muller
Perft mode: No hashing, bulk counting in horizon nodes
perft( 1)= 20 ( 0.001 sec)
perft( 2)= 400 ( 0.000 sec)
perft( 3)= 8902 ( 0.000 sec)
perft( 4)= 197281 ( 0.002 sec)
perft( 5)= 4865609 ( 0.044 sec)
perft( 6)= 119060324 ( 0.848 sec)
Code: Select all
perft 6
119060324 nodes in 15089ms (7890537 nps)
-
- Posts: 35
- Joined: Thu Oct 19, 2017 4:59 pm
- Location: Germany, Berlin
- Full name: Jost Triller
Re: fast(er) movegen
Have you done some profiling like
with compiler option -fno-omit-frame-pointer?
My perft function is also not really fast. However, it seems like the move generator takes only 7% of the total search time, while the evaluation takes about 30%.
Code: Select all
perf record -g ./bin/binary
perf report -g "graph,0.5,caller"
My perft function is also not really fast. However, it seems like the move generator takes only 7% of the total search time, while the evaluation takes about 30%.
-
- Posts: 1871
- Joined: Sat Nov 25, 2017 2:28 pm
- Location: France
Re: fast(er) movegen
qperft is optimized a lot, this is not a move generator to use in engine I think.
Indeed move gen using bitboard won't cost more than 10% of cputime during a real case search, but it is fun to optimize a little anyway.
Indeed move gen using bitboard won't cost more than 10% of cputime during a real case search, but it is fun to optimize a little anyway.
-
- Posts: 49
- Joined: Tue Aug 12, 2014 11:21 am
- Location: Lund
- Full name: Alessandro Iavicoli
Re: fast(er) movegen
AdaChess uses a 10x12 array for the board design plus and a pieces-list to quickly find where the pieces are located.
The most expensive task while generating moves is the test for validity: not leaving the king in check. Therefore, engines tends to either don't test it (and leave the search to skip those nodes) or perform the legality check only when strictly required. AdaChess uses this second approach, by considering absolute pinned pieces movement, king escapes and so on.
Those are the best number on my Surface Pro after some tries. No hash used, and 1 core.
AdaChess, however, performs a search to detect the kind of check (if a move checks the opponent king), that makes the generation a bit more heavy than necessary.
In the italian forum, by talking about perft with other italian programmers, we agreed that once the move generator is optimized to reduce the legality test on the minimum, then there's no reason to spend more time to optimize it further (due to the limited amount of time that the engine spend in generating moves).
I believe that the optimization of the move generator became significant only once the engine reaches 3000+ Elo or when every small improvements matters.
The most expensive task while generating moves is the test for validity: not leaving the king in check. Therefore, engines tends to either don't test it (and leave the search to skip those nodes) or perform the legality check only when strictly required. AdaChess uses this second approach, by considering absolute pinned pieces movement, king escapes and so on.
Those are the best number on my Surface Pro after some tries. No hash used, and 1 core.
Code: Select all
Depth Nodes Captures En-Passant Castles Promotions Checks Checkmates Time
1 20 0 0 0 0 0 0 0.00
2 400 0 0 0 0 0 0 0.00
3 8902 34 0 0 0 12 0 0.00
4 197281 1576 0 0 0 469 8 0.02
5 4865609 82719 258 0 0 27351 347 0.47
6 119060324 2812008 5248 0 0 809099 10828 11.50
In the italian forum, by talking about perft with other italian programmers, we agreed that once the move generator is optimized to reduce the legality test on the minimum, then there's no reason to spend more time to optimize it further (due to the limited amount of time that the engine spend in generating moves).
I believe that the optimization of the move generator became significant only once the engine reaches 3000+ Elo or when every small improvements matters.
-
- Posts: 27790
- Joined: Fri Mar 10, 2006 10:06 am
- Location: Amsterdam
- Full name: H G Muller
-
- Posts: 1871
- Joined: Sat Nov 25, 2017 2:28 pm
- Location: France
Re: fast(er) movegen
sorry, didn't know.
-
- Posts: 77
- Joined: Mon Apr 16, 2018 6:56 pm
Re: fast(er) movegen
HGM's move generation on qperft is a good reference for speed. If you want to change your board representation there is no doubt that bitboard representation is the fastest.