One of the most disturbing things for me is why SmarThink is so slow. I did a lot of things to deal with it with no luck. I'm simply still not understood — where are major slowdown lives.
So, here are some of theories that was examined:
1. SmarThink NPS is low due to different node counting. Looks that I'm counting nodes at the same way as other engines. I have nodes++ at the beginning of the Quiescence function, at the beginning of the main Search() function and one more at the root for each depth++ in iterative deepening. So looks I'm doing it in a right way. OK, some engines including pruned nodes, but looks that not much of them.
2. Slow eval. Of course, SmarThink's eval looks to be a little bit slower than in Fruit or Crafty, but not so much. Possible problems here are:
2.1. Using PopCnt() to evaluate mobility (SmarThink exludes some "bad" squares from mobility). I've experimented with replacing this code to precalculated mobility tables with no luck some years ago. Is it worth a new try?..
2.2. No lazy eval until most recent versions. Well, there are several approaches to lazy eval — "delta" approach — when you're trying to measure how move can potentially change previous eval an then comparing eval bounds with alpha/beta window. I'm failed with it. Probably in my case it works bad with eval cache. So may be I did something wrong? When trying to make forward pruning I'm usually call Evaluate() after prune move candidate, to measure position "optimism" delta. So that's why Evaluate() can be called twice. But eval cache helps in this case. So probably it worth to try to handle this special case — storing eval w/o cache? Or may be it's time to remove eval in forward pruning at all?..
Also, my eval cache is able to handle side change (in most cases; I have separated "assymmetry score" to do it) — it helps a lot in a null-move framework.
Another lazy approach is to skip full eval if material + pawns is < alpha – LAZY_BOUND or > beta + LAZY_BOUND. So it also fails for me due to worse caching.
The trird way is to use some window near root score/material and if material is outside these bounds, calculated at the root, — just return material eval. I have a limited success with this approach — it works, but speed is still not good(
3. TT probes/stores in quiescence. I've tried to turn it off with completely no luck. Of course it increases NPS a bit, but still not the the level of best engines.
4. Repetition() code. Well, I have a cycle here. Someone uses something like hash table. It really helps?..
5. Move generator. I have my own bitboards + bitlines. It doesn't seem to be slower than usual rotated bitboards according to perft() info.
6. Move sorting. Well, simple selection sort here with aquiring next best move at each step. Probably it's not bad?..
7. Cache issues. Well, I spent a lot of time with profiler, removing some uneccessary arrays and so on. It's really worth a try to make a deeper analysis here?..
Do you have any suggestions for me?)
Slowpoke
Moderator: Ras
-
bob
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: Slowpoke
Profile your code to see where most of the time is spent. That's the bottleneck that needs to be worked on...Sergei S. Markoff wrote:One of the most disturbing things for me is why SmarThink is so slow. I did a lot of things to deal with it with no luck. I'm simply still not understood — where are major slowdown lives.
So, here are some of theories that was examined:
1. SmarThink NPS is low due to different node counting. Looks that I'm counting nodes at the same way as other engines. I have nodes++ at the beginning of the Quiescence function, at the beginning of the main Search() function and one more at the root for each depth++ in iterative deepening. So looks I'm doing it in a right way. OK, some engines including pruned nodes, but looks that not much of them.
2. Slow eval. Of course, SmarThink's eval looks to be a little bit slower than in Fruit or Crafty, but not so much. Possible problems here are:
2.1. Using PopCnt() to evaluate mobility (SmarThink exludes some "bad" squares from mobility). I've experimented with replacing this code to precalculated mobility tables with no luck some years ago. Is it worth a new try?..
2.2. No lazy eval until most recent versions. Well, there are several approaches to lazy eval — "delta" approach — when you're trying to measure how move can potentially change previous eval an then comparing eval bounds with alpha/beta window. I'm failed with it. Probably in my case it works bad with eval cache. So may be I did something wrong? When trying to make forward pruning I'm usually call Evaluate() after prune move candidate, to measure position "optimism" delta. So that's why Evaluate() can be called twice. But eval cache helps in this case. So probably it worth to try to handle this special case — storing eval w/o cache? Or may be it's time to remove eval in forward pruning at all?..
Also, my eval cache is able to handle side change (in most cases; I have separated "assymmetry score" to do it) — it helps a lot in a null-move framework.
Another lazy approach is to skip full eval if material + pawns is < alpha – LAZY_BOUND or > beta + LAZY_BOUND. So it also fails for me due to worse caching.
The trird way is to use some window near root score/material and if material is outside these bounds, calculated at the root, — just return material eval. I have a limited success with this approach — it works, but speed is still not good(
3. TT probes/stores in quiescence. I've tried to turn it off with completely no luck. Of course it increases NPS a bit, but still not the the level of best engines.
4. Repetition() code. Well, I have a cycle here. Someone uses something like hash table. It really helps?..
5. Move generator. I have my own bitboards + bitlines. It doesn't seem to be slower than usual rotated bitboards according to perft() info.
6. Move sorting. Well, simple selection sort here with aquiring next best move at each step. Probably it's not bad?..
7. Cache issues. Well, I spent a lot of time with profiler, removing some uneccessary arrays and so on. It's really worth a try to make a deeper analysis here?..
Do you have any suggestions for me?)
-
Sergei S. Markoff
- Posts: 227
- Joined: Mon Sep 12, 2011 11:27 pm
- Location: Moscow, Russia
Re: Slowpoke
Thanks, Bob, I know about profilerbob wrote: Profile your code to see where most of the time is spent. That's the bottleneck that needs to be worked on...
But I'm really don't see any serious hotpoints here. Do you have some stats of how much time is spent by your engine in most notable parts of the code?
I can easily compare it with my own numbers, I did it some years before and found no anomalies
The Force Be With You!
-
Steve Maughan
- Posts: 1318
- Joined: Wed Mar 08, 2006 8:28 pm
- Location: Florida, USA
Re: Slowpoke
Hi Sergei,
Can I ask:
- How does SmarThink do *perft* (legal move generation, pseudo legal, use hash or don't use hash)
- What is your speed at perft positions?
Maybe take these as examples:
http://chessprogramming.wikispaces.com/Perft+Results
Steve
Can I ask:
- How does SmarThink do *perft* (legal move generation, pseudo legal, use hash or don't use hash)
- What is your speed at perft positions?
Maybe take these as examples:
http://chessprogramming.wikispaces.com/Perft+Results
Steve
http://www.chessprogramming.net - Juggernaut & Maverick Chess Engine
-
Henk
- Posts: 7251
- Joined: Mon May 27, 2013 10:31 am
Re: Slowpoke
1) Everything computed per move in the leaves or near the leaves of the tree is slow.
1a) Everything computed per ChessPiece in the leaves or near the leaves of the tree is slow.
[ trivial: Traversing redundant tree branches is slow.]
1a) Everything computed per ChessPiece in the leaves or near the leaves of the tree is slow.
[ trivial: Traversing redundant tree branches is slow.]
-
Sergei S. Markoff
- Posts: 227
- Joined: Mon Sep 12, 2011 11:27 pm
- Location: Moscow, Russia
Re: Slowpoke
Well, perft code is one of the oldest pieces of code. How it looks in 1.40:Steve Maughan wrote:Hi Sergei,
Can I ask:
- How does SmarThink do *perft* (legal move generation, pseudo legal, use hash or don't use hash)
- What is your speed at perft positions?
Maybe take these as examples:
http://chessprogramming.wikispaces.com/Perft+Results
Steve
Code: Select all
void OptionPerft(int depth)
{
int i, os;
os = sp;
GenerateCaptures();
GenerateAllPromotions();
for(i = os; i < sp; i++) if (M_CPT(stack[i].parts.move) == KING)
{
sp = os;
return;
}
GenerateNonCaptures(TRUE);
for (i = os; i < sp; i++)
{
MakeMove(stack[i].parts.move, TRUE);
if (depth - 1) OptionPerft(depth - 1);
else if (!Checked(ChangeSide(board->SideToMove))) total_moves++;
UnMakeMove();
}
sp = os;
}
Code: Select all
void OptionPerft(int depth)
{
int i, os, moves_done = 0;
os = sp;
if (tree[gl].in_check)
{
ProduceCheckEvasions();
}
else
{
GenerateCaptures();
GenerateAllPromotions();
GenerateNonCaptures(TRUE);
}
for (i = os; i < sp; i++)
{
if (!tree[gl].in_check && !EN_PASSANT(stack[i].parts.move) && IsSelfCheckingMove(stack[i].parts.move)) continue;
tree[gl + 1].in_check = IsCheckingMove(stack[i].parts.move);
MakeMove(stack[i].parts.move, 0 == moves_done++);
if (!tree[gl].in_check && EN_PASSANT(stack[i].parts.move))
{
if (Checked(ChangeSide(board->SideToMove)))
{
UnMakeMove();
continue;
}
}
if (depth - 1)
{
gl++;
OptionPerft(depth - 1);
gl--;
}
else total_moves++;
UnMakeMove();
}
sp = os;
}
Old code:
perft 6 — 9.48 sec at my I5-2520 2.50Ghz
perft 7 — 253.23
New code:
perft 6 — 13.33 sec
perft 7 — 350.40
Note: second argument of MakeMove is flag that indicated that MakeMove() should store undo information.
The Force Be With You!
-
Sergei S. Markoff
- Posts: 227
- Joined: Mon Sep 12, 2011 11:27 pm
- Location: Moscow, Russia
Re: Slowpoke
Well, I've removed this old trick. Now it's 13.06 and 342.43 respectively.Sergei S. Markoff wrote:Note: second argument of MakeMove is flag that indicated that MakeMove() should store undo information.
The Force Be With You!
-
Sergei S. Markoff
- Posts: 227
- Joined: Mon Sep 12, 2011 11:27 pm
- Location: Moscow, Russia
Re: Slowpoke
Crafty 23.8 perft: 8.62, 206.73.Sergei S. Markoff wrote:Well, I've removed this old trick. Now it's 13.06 and 342.43 respectively.Sergei S. Markoff wrote:Note: second argument of MakeMove is flag that indicated that MakeMove() should store undo information.
Crafty's perft code looks to be almost similar to mine old versions — obviously years ago I've checked Crafty's source to find proper perft implementation.
The Force Be With You!
-
cetormenter
- Posts: 170
- Joined: Sun Oct 28, 2012 9:46 pm
Re: Slowpoke
On your computer it takes Crafty approximately 63% longer to complete its perft in both positions. Using this information and then using perft on Nirvana, it would complete perft 6 in approximately 9.5 seconds and perft 7 in 248.5 seconds on your computer. This is about on par with your "faster" perft code. However this still seems to be slow to me as Nirvana does not use any bitboards at all and instead uses a 10x12 board similar to that in found in Fruit. Using bitboards you should be able to get a nice speed up over this method.Sergei S. Markoff wrote:Crafty 23.8 perft: 8.62, 206.73.Sergei S. Markoff wrote:Well, I've removed this old trick. Now it's 13.06 and 342.43 respectively.Sergei S. Markoff wrote:Note: second argument of MakeMove is flag that indicated that MakeMove() should store undo information.
Crafty's perft code looks to be almost similar to mine old versions — obviously years ago I've checked Crafty's source to find proper perft implementation.
-
Sergei S. Markoff
- Posts: 227
- Joined: Mon Sep 12, 2011 11:27 pm
- Location: Moscow, Russia
Re: Slowpoke
The main problem of perft is that different engines handles different things at MakeMove(). I have material index update/calculation here, several additional bitboards like bishops&queens, rooks&queens, raw material, psq scores. Also I'm doing move ordering score calculation during move generation.cetormenter wrote:On your computer it takes Crafty approximately 63% longer to complete its perft in both positions. Using this information and then using perft on Nirvana, it would complete perft 6 in approximately 9.5 seconds and perft 7 in 248.5 seconds on your computer. This is about on par with your "faster" perft code. However this still seems to be slow to me as Nirvana does not use any bitboards at all and instead uses a 10x12 board similar to that in found in Fruit. Using bitboards you should be able to get a nice speed up over this method.
Anyway, timings of my engine in perft looks to be comparable to others. Ok, let's suppose it's 10–15% slower than Crafty in move generation/make/unmake. Based on % of time spent in these parts of code, this slowdown can only cause few percents overall slowdown of the engine — so obviuosly it's not a main problem.
The Force Be With You!