Minic raw speed

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

User avatar
xr_a_y
Posts: 1871
Joined: Sat Nov 25, 2017 2:28 pm
Location: France

Re: Minic raw speed

Post by xr_a_y »

I put some timers inside Minic and I am totally amazed by your 36 cycles SEE !

Here some figures

Code: Select all

            cycle   % of total   calls  cycle/call
See      3738155258  16.3821%   4307603  867
Apply    3676719096  16.1129%  17811023  206
Eval     7667154230  33.6006%   3041458 2520
Attack   6080342832  26.6465% 297610485   20
Generate 1852695910  8.11926%   2598478  712
So that my SEE seems to be more around 800 cycles per call.

What are you profiling with only 36 cycles ? only a few specific lines of SEE ?

Maybe something is that I accumulate the timer and count one call for SEE only if it is a "long" one, one that needs to go through the attacker list.
User avatar
lucasart
Posts: 3232
Joined: Mon May 31, 2010 1:29 pm
Full name: lucasart

Re: Minic raw speed

Post by lucasart »

xr_a_y wrote: Mon Nov 11, 2019 8:31 am I put some timers inside Minic and I am totally amazed by your 36 cycles SEE !

Here some figures

Code: Select all

            cycle   % of total   calls  cycle/call
See      3738155258  16.3821%   4307603  867
Apply    3676719096  16.1129%  17811023  206
Eval     7667154230  33.6006%   3041458 2520
Attack   6080342832  26.6465% 297610485   20
Generate 1852695910  8.11926%   2598478  712
So that my SEE seems to be more around 800 cycles per call.

What are you profiling with only 36 cycles ? only a few specific lines of SEE ?

Maybe something is that I accumulate the timer and count one call for SEE only if it is a "long" one, one that needs to go through the attacker list.
You compute attackers from scratch at every iteration. This is slowing you down:

Code: Select all

bool threatsFound = getAttackers(p2, to, stmAttackers, ~p2.c);
You should only compute it once, and merely refresh for slider attacks along the segment through which a piece was moved (*). Or, even better in my experience, not even bother with see through attacks, and just compute once the attackers, and consume that bitboard in the loop.

Also, you should have some precomputed bitboard of squares attacked by the enemy after playing moves. This really saves time, for example, you can avoid generating king walking into check moves (less to filter afterwards), and in SEE you can know immediately whether or not you need to run the getAttackers() calculation and enter the loop.

But before optimising SEE(), make sure you've really thought things through with the search and move sorting, such that you do not compute too many SEE's (ie. think "do less", before "do faster").

(*) you can even skip that if the moved piece is a knight (another micro-optimisation)

PS: Your "Apply" is very slow. Perhaps you should start with that instead. Again, it could be that the apply is slow, or that you just do too many applies that could be avoided.
Theory and practice sometimes clash. And when that happens, theory loses. Every single time.
Joost Buijs
Posts: 1563
Joined: Thu Jul 16, 2009 10:47 am
Location: Almere, The Netherlands

Re: Minic raw speed

Post by Joost Buijs »

xr_a_y wrote: Mon Nov 11, 2019 8:31 am I put some timers inside Minic and I am totally amazed by your 36 cycles SEE !

Here some figures

Code: Select all

            cycle   % of total   calls  cycle/call
See      3738155258  16.3821%   4307603  867
Apply    3676719096  16.1129%  17811023  206
Eval     7667154230  33.6006%   3041458 2520
Attack   6080342832  26.6465% 297610485   20
Generate 1852695910  8.11926%   2598478  712
So that my SEE seems to be more around 800 cycles per call.

What are you profiling with only 36 cycles ? only a few specific lines of SEE ?

Maybe something is that I accumulate the timer and count one call for SEE only if it is a "long" one, one that needs to go through the attacker list.
No, I timed the whole SEE on a large number of positions. It is possible that my figures are too optimistic because I used Perft() to time it, so it might run largely from the cache. It could be slower when I time it in the full search.

The source code is rather simple:

Code: Select all

int position_t::see(move_t* move)
{
	auto piece_value = [](int piece) { return material::piece_value[0][piece]; };
	bb_t occ = occupied();
	int side = state.side_to_move, exchange[32];

	int at_stake = piece_value(Type(pieces[move->from]));
	int captured = piece_value(Type(pieces[move->to]));
	occ ^= Bit(move->from); // remove agressor

	if (move->type & mtEnpc)
	{
		int csq = move->to ^ 8;
		captured = piece_value(Pawn);
		occ ^= Bit(csq); // remove victim
	}

	if (move->type & mtProm)
	{
		captured += piece_value(move->info) - 2 * piece_value(Pawn);
		at_stake = piece_value(move->info);
	}

	if (!(state.attacked & Bit(move->to))) return captured; // early exit

	int n = 0;

	exchange[n++] = captured;

	while (true)
	{
		// swap sides
		side = Opp(side);

		// get the least valued attacker
		bb_t att = bb_pawns[side] & occ & masks::pawn_attacks[Opp(side)][move->to];

		if (att)
		{
			exchange[n] = at_stake - exchange[n - 1]; n++;
			at_stake = piece_value(Pawn);
			occ ^= _blsi(att);
			continue;
		}

		att = bb_knights[side] & occ & masks::knight_attacks[move->to];
		if (att)
		{
			exchange[n] = at_stake - exchange[n - 1]; n++;
			at_stake = piece_value(Knight);
			occ ^= _blsi(att);
			continue;
		}

		att = bb_bishops[side] & occ & magic::bishop_attacks(move->to, occ);
		if (att)
		{
			exchange[n] = at_stake - exchange[n - 1]; n++;
			at_stake = piece_value(Bishop);
			occ ^= _blsi(att);
			continue;
		}

		att = bb_rooks[side] & occ & magic::rook_attacks(move->to, occ);
		if (att)
		{
			exchange[n] = at_stake - exchange[n - 1]; n++;
			at_stake = piece_value(Rook);
			occ ^= _blsi(att);
			continue;
		}

		att = bb_queens[side] & occ & magic::queen_attacks(move->to, occ);
		if (att)
		{
			exchange[n] = at_stake - exchange[n - 1]; n++;
			at_stake = piece_value(Queen);
			occ ^= _blsi(att);
			continue;
		}

		att = bb_king[side] & occ & masks::king_attacks[move->to];
		if (att)
		{
			exchange[n] = at_stake - exchange[n - 1]; n++;
			at_stake = piece_value(King);
			occ ^= _blsi(att);
			continue;
		}

		break;
	}

	// negamax
	while (--n) if (exchange[n] > -exchange[n - 1]) exchange[n - 1] = -exchange[n];

	return exchange[0];
}
Now I look at it, it is somehow not the last version, I call bishop_attacks and rook_attacks twice (again in queen_attacks), I can also store the attack vectors for rook and bishop in two temporary 64 bit variables. I believe it didn't matter much. Anyway this is the version that I timed.
User avatar
xr_a_y
Posts: 1871
Joined: Sat Nov 25, 2017 2:28 pm
Location: France

Re: Minic raw speed

Post by xr_a_y »

Joost Buijs wrote: Mon Nov 11, 2019 4:52 pm No, I timed the whole SEE on a large number of positions. It is possible that my figures are too optimistic because I used Perft() to time it, so it might run largely from the cache. It could be slower when I time it in the full search.

The source code is rather simple:

Code: Select all

int position_t::see(move_t* move)
{
	auto piece_value = [](int piece) { return material::piece_value[0][piece]; };
	bb_t occ = occupied();
	int side = state.side_to_move, exchange[32];

	int at_stake = piece_value(Type(pieces[move->from]));
	int captured = piece_value(Type(pieces[move->to]));
	occ ^= Bit(move->from); // remove agressor

	if (move->type & mtEnpc)
	{
		int csq = move->to ^ 8;
		captured = piece_value(Pawn);
		occ ^= Bit(csq); // remove victim
	}

	if (move->type & mtProm)
	{
		captured += piece_value(move->info) - 2 * piece_value(Pawn);
		at_stake = piece_value(move->info);
	}

	if (!(state.attacked & Bit(move->to))) return captured; // early exit

	int n = 0;

	exchange[n++] = captured;

	while (true)
	{
		// swap sides
		side = Opp(side);

		// get the least valued attacker
		bb_t att = bb_pawns[side] & occ & masks::pawn_attacks[Opp(side)][move->to];

		if (att)
		{
			exchange[n] = at_stake - exchange[n - 1]; n++;
			at_stake = piece_value(Pawn);
			occ ^= _blsi(att);
			continue;
		}

		att = bb_knights[side] & occ & masks::knight_attacks[move->to];
		if (att)
		{
			exchange[n] = at_stake - exchange[n - 1]; n++;
			at_stake = piece_value(Knight);
			occ ^= _blsi(att);
			continue;
		}

		att = bb_bishops[side] & occ & magic::bishop_attacks(move->to, occ);
		if (att)
		{
			exchange[n] = at_stake - exchange[n - 1]; n++;
			at_stake = piece_value(Bishop);
			occ ^= _blsi(att);
			continue;
		}

		att = bb_rooks[side] & occ & magic::rook_attacks(move->to, occ);
		if (att)
		{
			exchange[n] = at_stake - exchange[n - 1]; n++;
			at_stake = piece_value(Rook);
			occ ^= _blsi(att);
			continue;
		}

		att = bb_queens[side] & occ & magic::queen_attacks(move->to, occ);
		if (att)
		{
			exchange[n] = at_stake - exchange[n - 1]; n++;
			at_stake = piece_value(Queen);
			occ ^= _blsi(att);
			continue;
		}

		att = bb_king[side] & occ & masks::king_attacks[move->to];
		if (att)
		{
			exchange[n] = at_stake - exchange[n - 1]; n++;
			at_stake = piece_value(King);
			occ ^= _blsi(att);
			continue;
		}

		break;
	}

	// negamax
	while (--n) if (exchange[n] > -exchange[n - 1]) exchange[n - 1] = -exchange[n];

	return exchange[0];
}
Now I look at it, it is somehow not the last version, I call bishop_attacks and rook_attacks twice (again in queen_attacks), I can also store the attack vectors for rook and bishop in two temporary 64 bit variables. I believe it didn't matter much. Anyway this is the version that I timed.
So you update attack in each loop as the while loop as me, except to are doing it step by step (piece type by piece type, which is of course way better than what I do ...)
User avatar
xr_a_y
Posts: 1871
Joined: Sat Nov 25, 2017 2:28 pm
Location: France

Re: Minic raw speed

Post by xr_a_y »

So I speed up SEE a little but not much elo gain.
Then I studied apply, that is slow mainly because move is validated (not putting own king is check) without knowing pinned pieces before.
So I decided to switch to Magic BB do see if this will be faster ... not that much elo gain again
:cry: :cry: :cry:
User avatar
xr_a_y
Posts: 1871
Joined: Sat Nov 25, 2017 2:28 pm
Location: France

Re: Minic raw speed

Post by xr_a_y »

This gives this

Code: Select all

   1 minic_dev                      23      10    1700   53.3%   60.6%
   2 minic_1.10                     -4      10    1702   49.4%   60.8%
   3 minic_1.09                     -8      10    1701   48.8%   62.6%
   4 minic_1.00                    -11      11    1701   48.4%   57.5%
1.10 being better SEE and dev being magic BB.
Joost Buijs
Posts: 1563
Joined: Thu Jul 16, 2009 10:47 am
Location: Almere, The Netherlands

Re: Minic raw speed

Post by Joost Buijs »

xr_a_y wrote: Sun Nov 17, 2019 9:20 pm So I speed up SEE a little but not much elo gain.
Then I studied apply, that is slow mainly because move is validated (not putting own king is check) without knowing pinned pieces before.
So I decided to switch to Magic BB do see if this will be faster ... not that much elo gain again
:cry: :cry: :cry:
Even when you double the speed of SEE() it won't have a big influence on the speed overall since it takes just a small fraction of the total time.

I calculate checkers and pinned pieces for the opponent (in one procedure) once after each move, having information about pinned pieces makes move validation quick and straightforward.

Code: Select all

void position_t::get_checkers_and_pinned(bb_t& checkers, bb_t& pinned)
{
	int ksq = king_square(own()), opp = this->opp();
	bb_t bq_att = magic::bishop_attacks(ksq, occupied());
	bb_t rq_att = magic::rook_attacks(ksq, occupied());

	// checkers
	checkers  = masks::pawn_attacks[own()][ksq] & pawns(opp);
	checkers |= masks::knight_attacks[ksq] & knights(opp);
	checkers |= bq_att & bishops_queens(opp);
	checkers |= rq_att & rooks_queens(opp);

	// pinned
	pinned = 0;

	// pinned diagonal
	bb_t pp = bq_att & occupied(own());
	bb_t pinners = magic::bishop_attacks(ksq, occupied() & ~pp) & bishops_queens(opp);
	while (pinners) pinned |= pp & masks::between[ksq][_lsbr(pinners)];

	// pinned lateral
	pp = rq_att & occupied(own());
	pinners = magic::rook_attacks(ksq, occupied() & ~pp) & rooks_queens(opp);
	while (pinners) pinned |= pp & masks::between[ksq][_lsbr(pinners)];
}

Code: Select all

bool position_t::legal_move(move_t* move)
{
	if (move->type & mtEnpc)
	{
		const bb_t blockers = ~Bit(move->from) & (Bit(move->to) | occupied()) & ~Bit(move->to ^ 8);

		return !((magic::bishop_attacks(king_square(own()), blockers) & bishops_queens(opp())) |
			(magic::rook_attacks(king_square(own()), blockers) & rooks_queens(opp())));
	}

	if (Bit(move->from) & pinned())
		return (Bit(move->to) & masks::beyond[king_square(own())][move->from]);

	return true;
}
I don't know of my code is optimal, maybe there are smarter ways to do this, I already use this for 18 years or so and I never felt the need to change this.
jdart
Posts: 4366
Joined: Fri Mar 10, 2006 5:23 am
Location: http://www.arasanchess.org

Re: Minic raw speed

Post by jdart »

Implementing killers and counter moves can also delay the point where you need to sort.

--Jon
User avatar
xr_a_y
Posts: 1871
Joined: Sat Nov 25, 2017 2:28 pm
Location: France

Re: Minic raw speed

Post by xr_a_y »

jdart wrote: Tue Nov 26, 2019 3:16 pm Implementing killers and counter moves can also delay the point where you need to sort.

--Jon
I use Killer and Counter indeed, but only to sort already generated moves. That's a good point.
User avatar
Deberger
Posts: 91
Joined: Sat Nov 02, 2019 6:42 pm
Full name: ɹǝƃɹǝqǝᗡ ǝɔnɹꓭ

Re: Minic raw speed

Post by Deberger »

To delay sorting is good, but you can also delay generating the move list.

That is, validate one move without generating all the moves.

This approach will naturally have diminishing returns, but the TT move and Killer moves are worth testing.

Also, if you validate the TT move, you can avoid some hash collisions.

Also, if you separate the TT into White-to-move and Black-to-move you can mitigate the effects of hash collisions.