Perft search speed bottleneck

Discussion of chess software programming and technical issues.

Moderator: Ras

JarvisXerxes
Posts: 20
Joined: Thu May 02, 2013 5:52 pm
Location: United States

Re: Perft search speed bottleneck

Post by JarvisXerxes »

syzygy wrote:
syzygy wrote:I've now experimented a bit with efficiently checking legality before making the move, but I am still losing nps compared to simply playing/testing/undoing. My perft speed has gone through the roof, but that does not buy me anything useful.
With some more effort I finally got the "checking legality before making the move"-scheme slightly faster than my old play/test/undo scheme. I suspect though that some of the optimisations I found can also be applied to my old implementation.

I then went on to implement a full legal move generator. The pinned pieces part can be done quite efficiently, but when implementing the generation of legal king moves I became convinced that this scheme would be of no help for alpha/beta search.

To my surprise, the legal move generator does increase nps during alpha/beta...

I don't think going from pseudo-legal to legal should be high on anybody's list, but well, it is an interesting exercise.

My perft now beats hgm's qperft on my machine (12.31s vs 15.19s for perft 7 and 336.25s vs 410.49s for perft 8). Before switching to full legal move generation, my perft was still a bit slower.

My checks generator still only generates pseudo-legal moves, so those still have to be checked in my select_move().

It might be that I could get a further speed up by generating pseudo-legal king moves.
May I ask what's your relevant hardware info? Also, are you doing the perft from the initial position?
On my machine, hgm's qperft (compiled by gcc) spends 52.3s on perft 7. Hardware problem, it seems.
Life is a statistical distribution.
syzygy
Posts: 5889
Joined: Tue Feb 28, 2012 11:56 pm

Re: Perft search speed bottleneck

Post by syzygy »

JarvisXerxes wrote:May I ask what's your relevant hardware info? Also, are you doing the perft from the initial position?
From the initial position on an i7-3930K at 4.2Ghz (running Linux).
On my machine, hgm's qperft (compiled by gcc) spends 52.3s on perft 7. Hardware problem, it seems.
I compiled qperft as a 64-bit executable using gcc-4.8.1 -O3. It crashes when using hasing (it seems to assume that pointers are 32 bits wide), but works fine without hashing.

It turns out that qperft gains a lot from profile-guided optimisation (first -fprofile-generate, run, then -fprofile-use), whereas my engine hardly benefits from it. Compiled with these options the gap is much smaller (qperft 7 in 13.29s). (I could easily speed it up though by generate captures and non-captures together, but that would hurt alpha/beta. I don't know what qperft does here.)

I just recompiled perft again but using gcc-4.7.2 (which I still use for my engine) and get 13.16s for perft 7. Unfortunately gcc-4.8.x seems to be a regression in terms of speed of the generate executable.
syzygy
Posts: 5889
Joined: Tue Feb 28, 2012 11:56 pm

Re: Perft search speed bottleneck

Post by syzygy »

syzygy wrote:To my surprise, the legal move generator does increase nps during alpha/beta...
I haven't verified this yet, but it might be that legal move generation is mainly beneficial for captures (in the qsearch). Legal captures by non-kings are cheap to generate, and captures by kings are relatively rare.
JarvisXerxes
Posts: 20
Joined: Thu May 02, 2013 5:52 pm
Location: United States

Re: Perft search speed bottleneck

Post by JarvisXerxes »

syzygy wrote:
JarvisXerxes wrote:May I ask what's your relevant hardware info? Also, are you doing the perft from the initial position?
From the initial position on an i7-3930K at 4.2Ghz (running Linux).
On my machine, hgm's qperft (compiled by gcc) spends 52.3s on perft 7. Hardware problem, it seems.
I compiled qperft as a 64-bit executable using gcc-4.8.1 -O3. It crashes when using hasing (it seems to assume that pointers are 32 bits wide), but works fine without hashing.
My CPU clock speed's half yours. That explains.
Also, I compiled my perft (64 bit) using VC++ 11. Does it do good optimizations? I don't feel like compiling with cygwin unless necessary, because I'd have to include cygwin.dll every time I distribute the executable.
Life is a statistical distribution.
syzygy
Posts: 5889
Joined: Tue Feb 28, 2012 11:56 pm

Re: Perft search speed bottleneck

Post by syzygy »

JarvisXerxes wrote:Also, I compiled my perft (64 bit) using VC++ 11. Does it do good optimizations?
I have no experience with VC++, but from what I understand gcc nowadays beats it.
I don't feel like compiling with cygwin unless necessary, because I'd have to include cygwin.dll every time I distribute the executable.
If you use gcc included in mingw-64, you get native windows executables that do not depend on cygwin.dll.
JarvisXerxes
Posts: 20
Joined: Thu May 02, 2013 5:52 pm
Location: United States

Re: Perft search speed bottleneck

Post by JarvisXerxes »

syzygy wrote: If you use gcc included in mingw-64, you get native windows executables that do not depend on cygwin.dll.
Gee I'm not aware of this. Thanks Ronald.
Life is a statistical distribution.
Chan Rasjid
Posts: 588
Joined: Thu Mar 09, 2006 4:47 pm
Location: Singapore

Re: Perft search speed bottleneck

Post by Chan Rasjid »

Hello Sven,
Sven Schüle wrote:
Chan Rasjid wrote:I compare the ratio of skip invalid moves over the number of valid moves searched, but only for non-qs nodes where there was a full generation/evasion.

It is 34.5% in the position above for a search to depth 5.
I get 3.1% with my current engine. I also have kind of a "toy engine" that has no dedicated evasion generator. There I get about 55% illegal moves for the same position in full-width search nodes to depth 5.

I calculate "illegal moves / (legal moves + illegal moves)". If you want "illegal moves / legal moves" then my numbers would be about 3.2% and 122%.

You did not reply to Ronald's remark yet, so may I ask again: does your engine use an evasion generator for all positions where the moving side is in check? From your numbers it seems it doesn't.

Sven
I think you are correct. No matter how I try playing with various opponents including TSCP, I could not get back the figure of 30/40 %.

I have a dedicated check evasion generator that does only king escape, capture of attacker or blocking sliding attacks. For double checks, only the king moves. Most invalid moves are eliminated except for some difficult ones that may still be left to verification.

I have simplified things and now just measure the % of pseudo-legal moves generated that are invalid. What I get is just 1-3% in early stages and increases a little when the pieces drop off. It still is only about 5% and rarely come to 10% for searching one game move.

I don't really know how I got the earlier figure of 30/40%. I made major changes to my root move ordering and found it was wrong. I had to redo everything. When my program can again run, I just check on this % of invalid moves.

I also just implemented scan_delete_invalid() when is_move_invalid() (called before make()) fails; the pin-ray would then be available and I just scan through the remaining moves to delete any with the same from-sq that moves away from the pin-ray. I check with perft for scan_delete on/off and found perft was ok. But there is something not right. When I do analysis search for some test positions, the node counts differ for scan_delete on/off. I cannot yet figure out why it is so. May be some bugs introduced. I'll have to check further.

Best Regards,
Rasjid.