Slowpoke

Discussion of chess software programming and technical issues.

Moderator: Ras

Sergei S. Markoff
Posts: 227
Joined: Mon Sep 12, 2011 11:27 pm
Location: Moscow, Russia

Slowpoke

Post by Sergei S. Markoff »

One of the most disturbing things for me is why SmarThink is so slow. I did a lot of things to deal with it with no luck. I'm simply still not understood — where are major slowdown lives.
So, here are some of theories that was examined:

1. SmarThink NPS is low due to different node counting. Looks that I'm counting nodes at the same way as other engines. I have nodes++ at the beginning of the Quiescence function, at the beginning of the main Search() function and one more at the root for each depth++ in iterative deepening. So looks I'm doing it in a right way. OK, some engines including pruned nodes, but looks that not much of them.
2. Slow eval. Of course, SmarThink's eval looks to be a little bit slower than in Fruit or Crafty, but not so much. Possible problems here are:
2.1. Using PopCnt() to evaluate mobility (SmarThink exludes some "bad" squares from mobility). I've experimented with replacing this code to precalculated mobility tables with no luck some years ago. Is it worth a new try?..
2.2. No lazy eval until most recent versions. Well, there are several approaches to lazy eval — "delta" approach — when you're trying to measure how move can potentially change previous eval an then comparing eval bounds with alpha/beta window. I'm failed with it. Probably in my case it works bad with eval cache. So may be I did something wrong? When trying to make forward pruning I'm usually call Evaluate() after prune move candidate, to measure position "optimism" delta. So that's why Evaluate() can be called twice. But eval cache helps in this case. So probably it worth to try to handle this special case — storing eval w/o cache? Or may be it's time to remove eval in forward pruning at all?..
Also, my eval cache is able to handle side change (in most cases; I have separated "assymmetry score" to do it) — it helps a lot in a null-move framework.
Another lazy approach is to skip full eval if material + pawns is < alpha – LAZY_BOUND or > beta + LAZY_BOUND. So it also fails for me due to worse caching.
The trird way is to use some window near root score/material and if material is outside these bounds, calculated at the root, — just return material eval. I have a limited success with this approach — it works, but speed is still not good(
3. TT probes/stores in quiescence. I've tried to turn it off with completely no luck. Of course it increases NPS a bit, but still not the the level of best engines.
4. Repetition() code. Well, I have a cycle here. Someone uses something like hash table. It really helps?..
5. Move generator. I have my own bitboards + bitlines. It doesn't seem to be slower than usual rotated bitboards according to perft() info.
6. Move sorting. Well, simple selection sort here with aquiring next best move at each step. Probably it's not bad?..
7. Cache issues. Well, I spent a lot of time with profiler, removing some uneccessary arrays and so on. It's really worth a try to make a deeper analysis here?..

Do you have any suggestions for me?)
The Force Be With You!
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Slowpoke

Post by bob »

Sergei S. Markoff wrote:One of the most disturbing things for me is why SmarThink is so slow. I did a lot of things to deal with it with no luck. I'm simply still not understood — where are major slowdown lives.
So, here are some of theories that was examined:

1. SmarThink NPS is low due to different node counting. Looks that I'm counting nodes at the same way as other engines. I have nodes++ at the beginning of the Quiescence function, at the beginning of the main Search() function and one more at the root for each depth++ in iterative deepening. So looks I'm doing it in a right way. OK, some engines including pruned nodes, but looks that not much of them.
2. Slow eval. Of course, SmarThink's eval looks to be a little bit slower than in Fruit or Crafty, but not so much. Possible problems here are:
2.1. Using PopCnt() to evaluate mobility (SmarThink exludes some "bad" squares from mobility). I've experimented with replacing this code to precalculated mobility tables with no luck some years ago. Is it worth a new try?..
2.2. No lazy eval until most recent versions. Well, there are several approaches to lazy eval — "delta" approach — when you're trying to measure how move can potentially change previous eval an then comparing eval bounds with alpha/beta window. I'm failed with it. Probably in my case it works bad with eval cache. So may be I did something wrong? When trying to make forward pruning I'm usually call Evaluate() after prune move candidate, to measure position "optimism" delta. So that's why Evaluate() can be called twice. But eval cache helps in this case. So probably it worth to try to handle this special case — storing eval w/o cache? Or may be it's time to remove eval in forward pruning at all?..
Also, my eval cache is able to handle side change (in most cases; I have separated "assymmetry score" to do it) — it helps a lot in a null-move framework.
Another lazy approach is to skip full eval if material + pawns is < alpha – LAZY_BOUND or > beta + LAZY_BOUND. So it also fails for me due to worse caching.
The trird way is to use some window near root score/material and if material is outside these bounds, calculated at the root, — just return material eval. I have a limited success with this approach — it works, but speed is still not good(
3. TT probes/stores in quiescence. I've tried to turn it off with completely no luck. Of course it increases NPS a bit, but still not the the level of best engines.
4. Repetition() code. Well, I have a cycle here. Someone uses something like hash table. It really helps?..
5. Move generator. I have my own bitboards + bitlines. It doesn't seem to be slower than usual rotated bitboards according to perft() info.
6. Move sorting. Well, simple selection sort here with aquiring next best move at each step. Probably it's not bad?..
7. Cache issues. Well, I spent a lot of time with profiler, removing some uneccessary arrays and so on. It's really worth a try to make a deeper analysis here?..

Do you have any suggestions for me?)
Profile your code to see where most of the time is spent. That's the bottleneck that needs to be worked on...
Sergei S. Markoff
Posts: 227
Joined: Mon Sep 12, 2011 11:27 pm
Location: Moscow, Russia

Re: Slowpoke

Post by Sergei S. Markoff »

bob wrote: Profile your code to see where most of the time is spent. That's the bottleneck that needs to be worked on...
Thanks, Bob, I know about profiler :)
But I'm really don't see any serious hotpoints here. Do you have some stats of how much time is spent by your engine in most notable parts of the code?
I can easily compare it with my own numbers, I did it some years before and found no anomalies :(
The Force Be With You!
User avatar
Steve Maughan
Posts: 1318
Joined: Wed Mar 08, 2006 8:28 pm
Location: Florida, USA

Re: Slowpoke

Post by Steve Maughan »

Hi Sergei,

Can I ask:

- How does SmarThink do *perft* (legal move generation, pseudo legal, use hash or don't use hash)
- What is your speed at perft positions?

Maybe take these as examples:

http://chessprogramming.wikispaces.com/Perft+Results

Steve
http://www.chessprogramming.net - Juggernaut & Maverick Chess Engine
Henk
Posts: 7251
Joined: Mon May 27, 2013 10:31 am

Re: Slowpoke

Post by Henk »

1) Everything computed per move in the leaves or near the leaves of the tree is slow.

1a) Everything computed per ChessPiece in the leaves or near the leaves of the tree is slow.

[ trivial: Traversing redundant tree branches is slow.]
Sergei S. Markoff
Posts: 227
Joined: Mon Sep 12, 2011 11:27 pm
Location: Moscow, Russia

Re: Slowpoke

Post by Sergei S. Markoff »

Steve Maughan wrote:Hi Sergei,

Can I ask:

- How does SmarThink do *perft* (legal move generation, pseudo legal, use hash or don't use hash)
- What is your speed at perft positions?

Maybe take these as examples:

http://chessprogramming.wikispaces.com/Perft+Results

Steve
Well, perft code is one of the oldest pieces of code. How it looks in 1.40:

Code: Select all

void OptionPerft(int depth)
{
	int i, os;

	os = sp;

	GenerateCaptures();
	GenerateAllPromotions();

	for(i = os; i < sp; i++) if (M_CPT(stack[i].parts.move) == KING)
	{
		sp = os;
		return;
	}

	GenerateNonCaptures(TRUE);

	for (i = os; i < sp; i++)
	{
		MakeMove(stack[i].parts.move, TRUE);
		if (depth - 1) OptionPerft(depth - 1);
		else if (!Checked(ChangeSide(board->SideToMove))) total_moves++;
		UnMakeMove();
	}

	sp = os;
}
Just now I've redone it to be more like it really works in engine:

Code: Select all

void OptionPerft(int depth)
{
	int i, os, moves_done = 0;

	os = sp;

	if (tree[gl].in_check)
	{
		ProduceCheckEvasions();
	}
	else
	{
		GenerateCaptures();
		GenerateAllPromotions();
		GenerateNonCaptures(TRUE);
	}

	for (i = os; i < sp; i++)
	{
		if (!tree[gl].in_check && !EN_PASSANT(stack[i].parts.move) && IsSelfCheckingMove(stack[i].parts.move)) continue;
		tree[gl + 1].in_check = IsCheckingMove(stack[i].parts.move);
		MakeMove(stack[i].parts.move, 0 == moves_done++);

		if (!tree[gl].in_check && EN_PASSANT(stack[i].parts.move))
		{
			if (Checked(ChangeSide(board->SideToMove)))
			{
				UnMakeMove();
				continue;
			}
		}

		if (depth - 1)
		{
			gl++;
			OptionPerft(depth - 1);
			gl--;
		}
		else total_moves++;
		UnMakeMove();
	}

	sp = os;
}
Surprisingly newer code looks to me much slower at least for start position.
Old code:
perft 6 — 9.48 sec at my I5-2520 2.50Ghz
perft 7 — 253.23
New code:
perft 6 — 13.33 sec
perft 7 — 350.40

Note: second argument of MakeMove is flag that indicated that MakeMove() should store undo information.
The Force Be With You!
Sergei S. Markoff
Posts: 227
Joined: Mon Sep 12, 2011 11:27 pm
Location: Moscow, Russia

Re: Slowpoke

Post by Sergei S. Markoff »

Sergei S. Markoff wrote:Note: second argument of MakeMove is flag that indicated that MakeMove() should store undo information.
Well, I've removed this old trick. Now it's 13.06 and 342.43 respectively.
The Force Be With You!
Sergei S. Markoff
Posts: 227
Joined: Mon Sep 12, 2011 11:27 pm
Location: Moscow, Russia

Re: Slowpoke

Post by Sergei S. Markoff »

Sergei S. Markoff wrote:
Sergei S. Markoff wrote:Note: second argument of MakeMove is flag that indicated that MakeMove() should store undo information.
Well, I've removed this old trick. Now it's 13.06 and 342.43 respectively.
Crafty 23.8 perft: 8.62, 206.73.
Crafty's perft code looks to be almost similar to mine old versions — obviously years ago I've checked Crafty's source to find proper perft implementation.
The Force Be With You!
cetormenter
Posts: 170
Joined: Sun Oct 28, 2012 9:46 pm

Re: Slowpoke

Post by cetormenter »

Sergei S. Markoff wrote:
Sergei S. Markoff wrote:
Sergei S. Markoff wrote:Note: second argument of MakeMove is flag that indicated that MakeMove() should store undo information.
Well, I've removed this old trick. Now it's 13.06 and 342.43 respectively.
Crafty 23.8 perft: 8.62, 206.73.
Crafty's perft code looks to be almost similar to mine old versions — obviously years ago I've checked Crafty's source to find proper perft implementation.
On your computer it takes Crafty approximately 63% longer to complete its perft in both positions. Using this information and then using perft on Nirvana, it would complete perft 6 in approximately 9.5 seconds and perft 7 in 248.5 seconds on your computer. This is about on par with your "faster" perft code. However this still seems to be slow to me as Nirvana does not use any bitboards at all and instead uses a 10x12 board similar to that in found in Fruit. Using bitboards you should be able to get a nice speed up over this method.
Sergei S. Markoff
Posts: 227
Joined: Mon Sep 12, 2011 11:27 pm
Location: Moscow, Russia

Re: Slowpoke

Post by Sergei S. Markoff »

cetormenter wrote:On your computer it takes Crafty approximately 63% longer to complete its perft in both positions. Using this information and then using perft on Nirvana, it would complete perft 6 in approximately 9.5 seconds and perft 7 in 248.5 seconds on your computer. This is about on par with your "faster" perft code. However this still seems to be slow to me as Nirvana does not use any bitboards at all and instead uses a 10x12 board similar to that in found in Fruit. Using bitboards you should be able to get a nice speed up over this method.
The main problem of perft is that different engines handles different things at MakeMove(). I have material index update/calculation here, several additional bitboards like bishops&queens, rooks&queens, raw material, psq scores. Also I'm doing move ordering score calculation during move generation.

Anyway, timings of my engine in perft looks to be comparable to others. Ok, let's suppose it's 10–15% slower than Crafty in move generation/make/unmake. Based on % of time spent in these parts of code, this slowdown can only cause few percents overall slowdown of the engine — so obviuosly it's not a main problem.
The Force Be With You!