Minic raw speed

Discussion of chess software programming and technical issues.

Moderators: bob, hgm, Harvey Williamson

Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
User avatar
xr_a_y
Posts: 777
Joined: Sat Nov 25, 2017 1:28 pm
Location: France

Minic raw speed

Post by xr_a_y » Sat Nov 09, 2019 10:25 am

Minic is twice as slow as others 2800-3000 elo engines.
Minic is copy-make (not make unmake) and use HQBB not magic.
Moreover, Minic is sorting all moves as soon as TT move is not a beta cut-off, and this is quite slow due to detection of bad capture (SEE).

When profiling it, eval is the first consuming spot, with 15%, then is piece movement generation (for move generation and attack) with 12% and then comes move sorting with 10%. Then are things like use of TT, pvs and qsearch, ...

I tried profiling with linux perf, gprof, with visual studio, but it feels like results are not totally "real" to me.

Being twice as fast at this level is around what ? 45elo ? (this is what I get with 2 threads).
Do you think I shall go for make-unmake and magic BB or try to improve move ordering and pruning with stuff like counter move history?

Ras
Posts: 1161
Joined: Tue Aug 30, 2016 6:19 pm
Contact:

Re: Minic raw speed

Post by Ras » Sat Nov 09, 2019 11:38 am

xr_a_y wrote:
Sat Nov 09, 2019 10:25 am
Minic is sorting all moves as soon as TT move is not a beta cut-off
You could benchmark how often your first move cuts off if it isn't a TT hit, this will probably be something like 50-70%. Means, in all those cases, sorting the full move list is unnecessary because you're only going to use the topmost move as per your pre-sorting scheme (e.g. MVV/LVA, SEE or whatever).

I'm going through the move list, swapping the "best" move to the top, and try that. Only if that doesn't give a cutoff, I sort the rest of the list to avoid possible O(n^2) degradation. Note also that shellsort can be faster than quicksort for short lists like the move list of a chess engine.

That could about halve the time spent in sorting.
Rasmus Althoff
https://www.ct800.net

User avatar
xr_a_y
Posts: 777
Joined: Sat Nov 25, 2017 1:28 pm
Location: France

Re: Minic raw speed

Post by xr_a_y » Sat Nov 09, 2019 3:21 pm

Ras wrote:
Sat Nov 09, 2019 11:38 am
xr_a_y wrote:
Sat Nov 09, 2019 10:25 am
Minic is sorting all moves as soon as TT move is not a beta cut-off
You could benchmark how often your first move cuts off if it isn't a TT hit, this will probably be something like 50-70%. Means, in all those cases, sorting the full move list is unnecessary because you're only going to use the topmost move as per your pre-sorting scheme (e.g. MVV/LVA, SEE or whatever).

I'm going through the move list, swapping the "best" move to the top, and try that. Only if that doesn't give a cutoff, I sort the rest of the list to avoid possible O(n^2) degradation. Note also that shellsort can be faster than quicksort for short lists like the move list of a chess engine.

That could about halve the time spent in sorting.
ok but what is expensive is not the sort algorithm itself, it is more the computation of the move score. So if I need to now which one is best (after the TT one), I need to score them all (at least captures ...)

Sven
Posts: 3829
Joined: Thu May 15, 2008 7:57 pm
Location: Berlin, Germany
Full name: Sven Schüle
Contact:

Re: Minic raw speed

Post by Sven » Sat Nov 09, 2019 6:14 pm

xr_a_y wrote:
Sat Nov 09, 2019 3:21 pm
Ras wrote:
Sat Nov 09, 2019 11:38 am
xr_a_y wrote:
Sat Nov 09, 2019 10:25 am
Minic is sorting all moves as soon as TT move is not a beta cut-off
You could benchmark how often your first move cuts off if it isn't a TT hit, this will probably be something like 50-70%. Means, in all those cases, sorting the full move list is unnecessary because you're only going to use the topmost move as per your pre-sorting scheme (e.g. MVV/LVA, SEE or whatever).

I'm going through the move list, swapping the "best" move to the top, and try that. Only if that doesn't give a cutoff, I sort the rest of the list to avoid possible O(n^2) degradation. Note also that shellsort can be faster than quicksort for short lists like the move list of a chess engine.

That could about halve the time spent in sorting.
ok but what is expensive is not the sort algorithm itself, it is more the computation of the move score. So if I need to now which one is best (after the TT one), I need to score them all (at least captures ...)
The only scoring effort that can have performance impact is most probably SEE. Do you apply SEE to all captures, or only to "losing" captures?
Sven Schüle (engine author: Jumbo, KnockOut, Surprise)

User avatar
xr_a_y
Posts: 777
Joined: Sat Nov 25, 2017 1:28 pm
Location: France

Re: Minic raw speed

Post by xr_a_y » Sat Nov 09, 2019 6:16 pm

Sven wrote:
Sat Nov 09, 2019 6:14 pm
The only scoring effort that can have performance impact is most probably SEE. Do you apply SEE to all captures, or only to "losing" captures?
That sounds like a weird question to me :oops:
I define which capture is good or bad using SEE, how can I know a capture is good before SEE ?.

Sven
Posts: 3829
Joined: Thu May 15, 2008 7:57 pm
Location: Berlin, Germany
Full name: Sven Schüle
Contact:

Re: Minic raw speed

Post by Sven » Sat Nov 09, 2019 6:21 pm

xr_a_y wrote:
Sat Nov 09, 2019 6:16 pm
Sven wrote:
Sat Nov 09, 2019 6:14 pm
The only scoring effort that can have performance impact is most probably SEE. Do you apply SEE to all captures, or only to "losing" captures?
That sounds like a weird question to me :oops:
I define which capture is good or bad using SEE, how can I know a capture is good before SEE ?.
If the moving piece is of equal or lower value compared to the victim (e.g. QxQ, BxR) then you can save SEE since the SEE score would almost never be <0. So "losing capture" in this case is a cheap pre-classification of captures, not a result returned from SEE.
Sven Schüle (engine author: Jumbo, KnockOut, Surprise)

User avatar
xr_a_y
Posts: 777
Joined: Sat Nov 25, 2017 1:28 pm
Location: France

Re: Minic raw speed

Post by xr_a_y » Sat Nov 09, 2019 6:25 pm

Let's try

Ras
Posts: 1161
Joined: Tue Aug 30, 2016 6:19 pm
Contact:

Re: Minic raw speed

Post by Ras » Sat Nov 09, 2019 9:16 pm

xr_a_y wrote:
Sat Nov 09, 2019 3:21 pm
So if I need to now which one is best (after the TT one), I need to score them all (at least captures ...)
You could use MVV/LVA at this stage because that's cheap, pick the highest one of these and see whether you get a cut-off. If not, you can score and sort the remaining ones via SEE.
Rasmus Althoff
https://www.ct800.net

User avatar
lucasart
Posts: 3041
Joined: Mon May 31, 2010 11:29 am
Full name: lucasart
Contact:

Re: Minic raw speed

Post by lucasart » Sun Nov 10, 2019 12:37 am

Sven wrote:
Sat Nov 09, 2019 6:14 pm
xr_a_y wrote:
Sat Nov 09, 2019 3:21 pm
Ras wrote:
Sat Nov 09, 2019 11:38 am
xr_a_y wrote:
Sat Nov 09, 2019 10:25 am
Minic is sorting all moves as soon as TT move is not a beta cut-off
You could benchmark how often your first move cuts off if it isn't a TT hit, this will probably be something like 50-70%. Means, in all those cases, sorting the full move list is unnecessary because you're only going to use the topmost move as per your pre-sorting scheme (e.g. MVV/LVA, SEE or whatever).

I'm going through the move list, swapping the "best" move to the top, and try that. Only if that doesn't give a cutoff, I sort the rest of the list to avoid possible O(n^2) degradation. Note also that shellsort can be faster than quicksort for short lists like the move list of a chess engine.

That could about halve the time spent in sorting.
ok but what is expensive is not the sort algorithm itself, it is more the computation of the move score. So if I need to now which one is best (after the TT one), I need to score them all (at least captures ...)
The only scoring effort that can have performance impact is most probably SEE. Do you apply SEE to all captures, or only to "losing" captures?
How would know that a capture is "losing" without calculating its SEE ?

Edit: Just saw your answer on the other post. Yes, of course, I do that, it's a basic SEE optimisation.
Theory and practice sometimes clash. And when that happens, theory loses. Every single time.

Joost Buijs
Posts: 987
Joined: Thu Jul 16, 2009 8:47 am
Location: Almere, The Netherlands

Re: Minic raw speed

Post by Joost Buijs » Sun Nov 10, 2019 6:54 am

xr_a_y wrote:
Sat Nov 09, 2019 10:25 am
Minic is twice as slow as others 2800-3000 elo engines.
Minic is copy-make (not make unmake) and use HQBB not magic.
Moreover, Minic is sorting all moves as soon as TT move is not a beta cut-off, and this is quite slow due to detection of bad capture (SEE).

When profiling it, eval is the first consuming spot, with 15%, then is piece movement generation (for move generation and attack) with 12% and then comes move sorting with 10%. Then are things like use of TT, pvs and qsearch, ...

I tried profiling with linux perf, gprof, with visual studio, but it feels like results are not totally "real" to me.

Being twice as fast at this level is around what ? 45elo ? (this is what I get with 2 threads).
Do you think I shall go for make-unmake and magic BB or try to improve move ordering and pruning with stuff like counter move history?
Nightmare uses partial copy-make and make-unmake, for things that are small in size and time consuming to compute (like the hashes etc.) I use copy-make for other things (like updating the piece array) I use make-unmake.

I sort captures in exactly in the same way as you, if the TT-move doesn't produce a beta cutoff I score all captures with SEE and sort them in 2 stages, first I sort all winning and equal captures, the losing captures are sorted at the end (if all the other moves didn't produce a beta cutoff). In quiescence I sort all captures at once (winning and losing).

Using MVV/LVA for the first capture after the TT-move helps a tiny bit, but this doesn't make a world of a difference, in fact I don't use it at all at the moment. Nightmare on a single core does between 2.5 and 4 mnps (depending upon the stage of the game), I don't know how fast others are but this doesn't seem slow to me.

I measured with __rdtsc() how many clock cycles SEE takes on average, in my case this is ~38 cycles (10 nsec on my computer), if I add the MVV/LVA trick this drops to an average of ~36 cycles, this is hardly noticeable in practice.

I don't know what HQBB is, I use pext() (or magics when I have to run on an AMD processor).

Usually I see a larger improvement with speed doubling, something like 60 Elo, it can depend upon the engine though, I have the impression that the more I prune the less influence speed has.

If your engine is twice as slow as others you should work on that, improving the branching factor will help of course, but if you can manage to double the speed it will give you another 60 Elo.

Post Reply