Slowpoke

Sergei S. Markoff · Post by **Sergei S. Markoff** » Tue Dec 17, 2013 9:41 pm

One of the most disturbing things for me is why SmarThink is so slow. I did a lot of things to deal with it with no luck. I'm simply still not understood — where are major slowdown lives.
So, here are some of theories that was examined:

1. SmarThink NPS is low due to different node counting. Looks that I'm counting nodes at the same way as other engines. I have nodes++ at the beginning of the Quiescence function, at the beginning of the main Search() function and one more at the root for each depth++ in iterative deepening. So looks I'm doing it in a right way. OK, some engines including pruned nodes, but looks that not much of them.
2. Slow eval. Of course, SmarThink's eval looks to be a little bit slower than in Fruit or Crafty, but not so much. Possible problems here are:
2.1. Using PopCnt() to evaluate mobility (SmarThink exludes some "bad" squares from mobility). I've experimented with replacing this code to precalculated mobility tables with no luck some years ago. Is it worth a new try?..
2.2. No lazy eval until most recent versions. Well, there are several approaches to lazy eval — "delta" approach — when you're trying to measure how move can potentially change previous eval an then comparing eval bounds with alpha/beta window. I'm failed with it. Probably in my case it works bad with eval cache. So may be I did something wrong? When trying to make forward pruning I'm usually call Evaluate() after prune move candidate, to measure position "optimism" delta. So that's why Evaluate() can be called twice. But eval cache helps in this case. So probably it worth to try to handle this special case — storing eval w/o cache? Or may be it's time to remove eval in forward pruning at all?..
Also, my eval cache is able to handle side change (in most cases; I have separated "assymmetry score" to do it) — it helps a lot in a null-move framework.
Another lazy approach is to skip full eval if material + pawns is < alpha – LAZY_BOUND or > beta + LAZY_BOUND. So it also fails for me due to worse caching.
The trird way is to use some window near root score/material and if material is outside these bounds, calculated at the root, — just return material eval. I have a limited success with this approach — it works, but speed is still not good(
3. TT probes/stores in quiescence. I've tried to turn it off with completely no luck. Of course it increases NPS a bit, but still not the the level of best engines.
4. Repetition() code. Well, I have a cycle here. Someone uses something like hash table. It really helps?..
5. Move generator. I have my own bitboards + bitlines. It doesn't seem to be slower than usual rotated bitboards according to perft() info.
6. Move sorting. Well, simple selection sort here with aquiring next best move at each step. Probably it's not bad?..
7. Cache issues. Well, I spent a lot of time with profiler, removing some uneccessary arrays and so on. It's really worth a try to make a deeper analysis here?..

Do you have any suggestions for me?)

bob · Post by **bob** » Tue Dec 17, 2013 9:48 pm

Sergei S. Markoff wrote:One of the most disturbing things for me is why SmarThink is so slow. I did a lot of things to deal with it with no luck. I'm simply still not understood — where are major slowdown lives.
So, here are some of theories that was examined:

1. SmarThink NPS is low due to different node counting. Looks that I'm counting nodes at the same way as other engines. I have nodes++ at the beginning of the Quiescence function, at the beginning of the main Search() function and one more at the root for each depth++ in iterative deepening. So looks I'm doing it in a right way. OK, some engines including pruned nodes, but looks that not much of them.
2. Slow eval. Of course, SmarThink's eval looks to be a little bit slower than in Fruit or Crafty, but not so much. Possible problems here are:
2.1. Using PopCnt() to evaluate mobility (SmarThink exludes some "bad" squares from mobility). I've experimented with replacing this code to precalculated mobility tables with no luck some years ago. Is it worth a new try?..
2.2. No lazy eval until most recent versions. Well, there are several approaches to lazy eval — "delta" approach — when you're trying to measure how move can potentially change previous eval an then comparing eval bounds with alpha/beta window. I'm failed with it. Probably in my case it works bad with eval cache. So may be I did something wrong? When trying to make forward pruning I'm usually call Evaluate() after prune move candidate, to measure position "optimism" delta. So that's why Evaluate() can be called twice. But eval cache helps in this case. So probably it worth to try to handle this special case — storing eval w/o cache? Or may be it's time to remove eval in forward pruning at all?..
Also, my eval cache is able to handle side change (in most cases; I have separated "assymmetry score" to do it) — it helps a lot in a null-move framework.
Another lazy approach is to skip full eval if material + pawns is < alpha – LAZY_BOUND or > beta + LAZY_BOUND. So it also fails for me due to worse caching.
The trird way is to use some window near root score/material and if material is outside these bounds, calculated at the root, — just return material eval. I have a limited success with this approach — it works, but speed is still not good(
3. TT probes/stores in quiescence. I've tried to turn it off with completely no luck. Of course it increases NPS a bit, but still not the the level of best engines.
4. Repetition() code. Well, I have a cycle here. Someone uses something like hash table. It really helps?..
5. Move generator. I have my own bitboards + bitlines. It doesn't seem to be slower than usual rotated bitboards according to perft() info.
6. Move sorting. Well, simple selection sort here with aquiring next best move at each step. Probably it's not bad?..
7. Cache issues. Well, I spent a lot of time with profiler, removing some uneccessary arrays and so on. It's really worth a try to make a deeper analysis here?..

Do you have any suggestions for me?)

Profile your code to see where most of the time is spent. That's the bottleneck that needs to be worked on...

Sergei S. Markoff · Post by **Sergei S. Markoff** » Tue Dec 17, 2013 9:59 pm

bob wrote: Profile your code to see where most of the time is spent. That's the bottleneck that needs to be worked on...

Thanks, Bob, I know about profiler

But I'm really don't see any serious hotpoints here. Do you have some stats of how much time is spent by your engine in most notable parts of the code?
I can easily compare it with my own numbers, I did it some years before and found no anomalies

Steve Maughan · Post by **Steve Maughan** » Tue Dec 17, 2013 10:34 pm

Hi Sergei,

Can I ask:

- How does SmarThink do *perft* (legal move generation, pseudo legal, use hash or don't use hash)
- What is your speed at perft positions?

Maybe take these as examples:

http://chessprogramming.wikispaces.com/Perft+Results

Steve

Henk · Post by **Henk** » Tue Dec 17, 2013 10:42 pm

1) Everything computed per move in the leaves or near the leaves of the tree is slow.

1a) Everything computed per ChessPiece in the leaves or near the leaves of the tree is slow.

[ trivial: Traversing redundant tree branches is slow.]

Sergei S. Markoff · Post by **Sergei S. Markoff** » Tue Dec 17, 2013 11:37 pm

Steve Maughan wrote:Hi Sergei,

Can I ask:

- How does SmarThink do *perft* (legal move generation, pseudo legal, use hash or don't use hash)
- What is your speed at perft positions?

Maybe take these as examples:

http://chessprogramming.wikispaces.com/Perft+Results

Steve

Well, perft code is one of the oldest pieces of code. How it looks in 1.40:

Code: Select all

void OptionPerft(int depth)
{
	int i, os;

	os = sp;

	GenerateCaptures();
	GenerateAllPromotions();

	for(i = os; i < sp; i++) if (M_CPT(stack[i].parts.move) == KING)
	{
		sp = os;
		return;
	}

	GenerateNonCaptures(TRUE);

	for (i = os; i < sp; i++)
	{
		MakeMove(stack[i].parts.move, TRUE);
		if (depth - 1) OptionPerft(depth - 1);
		else if (!Checked(ChangeSide(board->SideToMove))) total_moves++;
		UnMakeMove();
	}

	sp = os;
}

Just now I've redone it to be more like it really works in engine:

Code: Select all

void OptionPerft(int depth)
{
	int i, os, moves_done = 0;

	os = sp;

	if (tree[gl].in_check)
	{
		ProduceCheckEvasions();
	}
	else
	{
		GenerateCaptures();
		GenerateAllPromotions();
		GenerateNonCaptures(TRUE);
	}

	for (i = os; i < sp; i++)
	{
		if (!tree[gl].in_check && !EN_PASSANT(stack[i].parts.move) && IsSelfCheckingMove(stack[i].parts.move)) continue;
		tree[gl + 1].in_check = IsCheckingMove(stack[i].parts.move);
		MakeMove(stack[i].parts.move, 0 == moves_done++);

		if (!tree[gl].in_check && EN_PASSANT(stack[i].parts.move))
		{
			if (Checked(ChangeSide(board->SideToMove)))
			{
				UnMakeMove();
				continue;
			}
		}

		if (depth - 1)
		{
			gl++;
			OptionPerft(depth - 1);
			gl--;
		}
		else total_moves++;
		UnMakeMove();
	}

	sp = os;
}

Surprisingly newer code looks to me much slower at least for start position.
Old code:
perft 6 — 9.48 sec at my I5-2520 2.50Ghz
perft 7 — 253.23
New code:
perft 6 — 13.33 sec
perft 7 — 350.40

Note: second argument of MakeMove is flag that indicated that MakeMove() should store undo information.

Sergei S. Markoff · Post by **Sergei S. Markoff** » Tue Dec 17, 2013 11:59 pm

Sergei S. Markoff wrote:Note: second argument of MakeMove is flag that indicated that MakeMove() should store undo information.

Well, I've removed this old trick. Now it's 13.06 and 342.43 respectively.

Sergei S. Markoff · Post by **Sergei S. Markoff** » Wed Dec 18, 2013 12:08 am

Sergei S. Markoff wrote:
Sergei S. Markoff wrote:Note: second argument of MakeMove is flag that indicated that MakeMove() should store undo information.
Well, I've removed this old trick. Now it's 13.06 and 342.43 respectively.

Crafty 23.8 perft: 8.62, 206.73.
Crafty's perft code looks to be almost similar to mine old versions — obviously years ago I've checked Crafty's source to find proper perft implementation.

cetormenter · Post by **cetormenter** » Wed Dec 18, 2013 12:34 am

Sergei S. Markoff wrote:
Sergei S. Markoff wrote:
Sergei S. Markoff wrote:Note: second argument of MakeMove is flag that indicated that MakeMove() should store undo information.
Well, I've removed this old trick. Now it's 13.06 and 342.43 respectively.
Crafty 23.8 perft: 8.62, 206.73.
Crafty's perft code looks to be almost similar to mine old versions — obviously years ago I've checked Crafty's source to find proper perft implementation.

On your computer it takes Crafty approximately 63% longer to complete its perft in both positions. Using this information and then using perft on Nirvana, it would complete perft 6 in approximately 9.5 seconds and perft 7 in 248.5 seconds on your computer. This is about on par with your "faster" perft code. However this still seems to be slow to me as Nirvana does not use any bitboards at all and instead uses a 10x12 board similar to that in found in Fruit. Using bitboards you should be able to get a nice speed up over this method.

Sergei S. Markoff · Post by **Sergei S. Markoff** » Wed Dec 18, 2013 1:00 am

cetormenter wrote:On your computer it takes Crafty approximately 63% longer to complete its perft in both positions. Using this information and then using perft on Nirvana, it would complete perft 6 in approximately 9.5 seconds and perft 7 in 248.5 seconds on your computer. This is about on par with your "faster" perft code. However this still seems to be slow to me as Nirvana does not use any bitboards at all and instead uses a 10x12 board similar to that in found in Fruit. Using bitboards you should be able to get a nice speed up over this method.

The main problem of perft is that different engines handles different things at MakeMove(). I have material index update/calculation here, several additional bitboards like bishops&queens, rooks&queens, raw material, psq scores. Also I'm doing move ordering score calculation during move generation.

Anyway, timings of my engine in perft looks to be comparable to others. Ok, let's suppose it's 10–15% slower than Crafty in move generation/make/unmake. Based on % of time spent in these parts of code, this slowdown can only cause few percents overall slowdown of the engine — so obviuosly it's not a main problem.

Slowpoke

Slowpoke

Re: Slowpoke

Re: Slowpoke

Re: Slowpoke

Re: Slowpoke

Re: Slowpoke

Re: Slowpoke

Re: Slowpoke

Re: Slowpoke

Re: Slowpoke