Page 2 of 3
Re: Minic raw speed
Posted: Mon Nov 11, 2019 8:31 am
by xr_a_y
I put some timers inside Minic and I am totally amazed by your 36 cycles SEE !
Here some figures
Code: Select all
cycle % of total calls cycle/call
See 3738155258 16.3821% 4307603 867
Apply 3676719096 16.1129% 17811023 206
Eval 7667154230 33.6006% 3041458 2520
Attack 6080342832 26.6465% 297610485 20
Generate 1852695910 8.11926% 2598478 712
So that my SEE seems to be more around 800 cycles per call.
What are you profiling with only 36 cycles ? only a few specific lines of SEE ?
Maybe something is that I accumulate the timer and count one call for SEE only if it is a "long" one, one that needs to go through the attacker list.
Re: Minic raw speed
Posted: Mon Nov 11, 2019 1:30 pm
by lucasart
xr_a_y wrote: ↑Mon Nov 11, 2019 8:31 am
I put some timers inside Minic and I am totally amazed by your 36 cycles SEE !
Here some figures
Code: Select all
cycle % of total calls cycle/call
See 3738155258 16.3821% 4307603 867
Apply 3676719096 16.1129% 17811023 206
Eval 7667154230 33.6006% 3041458 2520
Attack 6080342832 26.6465% 297610485 20
Generate 1852695910 8.11926% 2598478 712
So that my SEE seems to be more around 800 cycles per call.
What are you profiling with only 36 cycles ? only a few specific lines of SEE ?
Maybe something is that I accumulate the timer and count one call for SEE only if it is a "long" one, one that needs to go through the attacker list.
You compute attackers from scratch at every iteration. This is slowing you down:
Code: Select all
bool threatsFound = getAttackers(p2, to, stmAttackers, ~p2.c);
You should only compute it once, and merely refresh for slider attacks along the segment through which a piece was moved (*). Or, even better in my experience, not even bother with see through attacks, and just compute once the attackers, and consume that bitboard in the loop.
Also, you should have some precomputed bitboard of squares attacked by the enemy after playing moves. This really saves time, for example, you can avoid generating king walking into check moves (less to filter afterwards), and in SEE you can know immediately whether or not you need to run the getAttackers() calculation and enter the loop.
But before optimising SEE(), make sure you've really thought things through with the search and move sorting, such that you do not compute too many SEE's (ie. think "do less", before "do faster").
(*) you can even skip that if the moved piece is a knight (another micro-optimisation)
PS: Your "Apply" is very slow. Perhaps you should start with that instead. Again, it could be that the apply is slow, or that you just do too many applies that could be avoided.
Re: Minic raw speed
Posted: Mon Nov 11, 2019 4:52 pm
by Joost Buijs
xr_a_y wrote: ↑Mon Nov 11, 2019 8:31 am
I put some timers inside Minic and I am totally amazed by your 36 cycles SEE !
Here some figures
Code: Select all
cycle % of total calls cycle/call
See 3738155258 16.3821% 4307603 867
Apply 3676719096 16.1129% 17811023 206
Eval 7667154230 33.6006% 3041458 2520
Attack 6080342832 26.6465% 297610485 20
Generate 1852695910 8.11926% 2598478 712
So that my SEE seems to be more around 800 cycles per call.
What are you profiling with only 36 cycles ? only a few specific lines of SEE ?
Maybe something is that I accumulate the timer and count one call for SEE only if it is a "long" one, one that needs to go through the attacker list.
No, I timed the whole SEE on a large number of positions. It is possible that my figures are too optimistic because I used Perft() to time it, so it might run largely from the cache. It could be slower when I time it in the full search.
The source code is rather simple:
Code: Select all
int position_t::see(move_t* move)
{
auto piece_value = [](int piece) { return material::piece_value[0][piece]; };
bb_t occ = occupied();
int side = state.side_to_move, exchange[32];
int at_stake = piece_value(Type(pieces[move->from]));
int captured = piece_value(Type(pieces[move->to]));
occ ^= Bit(move->from); // remove agressor
if (move->type & mtEnpc)
{
int csq = move->to ^ 8;
captured = piece_value(Pawn);
occ ^= Bit(csq); // remove victim
}
if (move->type & mtProm)
{
captured += piece_value(move->info) - 2 * piece_value(Pawn);
at_stake = piece_value(move->info);
}
if (!(state.attacked & Bit(move->to))) return captured; // early exit
int n = 0;
exchange[n++] = captured;
while (true)
{
// swap sides
side = Opp(side);
// get the least valued attacker
bb_t att = bb_pawns[side] & occ & masks::pawn_attacks[Opp(side)][move->to];
if (att)
{
exchange[n] = at_stake - exchange[n - 1]; n++;
at_stake = piece_value(Pawn);
occ ^= _blsi(att);
continue;
}
att = bb_knights[side] & occ & masks::knight_attacks[move->to];
if (att)
{
exchange[n] = at_stake - exchange[n - 1]; n++;
at_stake = piece_value(Knight);
occ ^= _blsi(att);
continue;
}
att = bb_bishops[side] & occ & magic::bishop_attacks(move->to, occ);
if (att)
{
exchange[n] = at_stake - exchange[n - 1]; n++;
at_stake = piece_value(Bishop);
occ ^= _blsi(att);
continue;
}
att = bb_rooks[side] & occ & magic::rook_attacks(move->to, occ);
if (att)
{
exchange[n] = at_stake - exchange[n - 1]; n++;
at_stake = piece_value(Rook);
occ ^= _blsi(att);
continue;
}
att = bb_queens[side] & occ & magic::queen_attacks(move->to, occ);
if (att)
{
exchange[n] = at_stake - exchange[n - 1]; n++;
at_stake = piece_value(Queen);
occ ^= _blsi(att);
continue;
}
att = bb_king[side] & occ & masks::king_attacks[move->to];
if (att)
{
exchange[n] = at_stake - exchange[n - 1]; n++;
at_stake = piece_value(King);
occ ^= _blsi(att);
continue;
}
break;
}
// negamax
while (--n) if (exchange[n] > -exchange[n - 1]) exchange[n - 1] = -exchange[n];
return exchange[0];
}
Now I look at it, it is somehow not the last version, I call bishop_attacks and rook_attacks twice (again in queen_attacks), I can also store the attack vectors for rook and bishop in two temporary 64 bit variables. I believe it didn't matter much. Anyway this is the version that I timed.
Re: Minic raw speed
Posted: Thu Nov 14, 2019 7:39 am
by xr_a_y
Joost Buijs wrote: ↑Mon Nov 11, 2019 4:52 pm
No, I timed the whole SEE on a large number of positions. It is possible that my figures are too optimistic because I used Perft() to time it, so it might run largely from the cache. It could be slower when I time it in the full search.
The source code is rather simple:
Code: Select all
int position_t::see(move_t* move)
{
auto piece_value = [](int piece) { return material::piece_value[0][piece]; };
bb_t occ = occupied();
int side = state.side_to_move, exchange[32];
int at_stake = piece_value(Type(pieces[move->from]));
int captured = piece_value(Type(pieces[move->to]));
occ ^= Bit(move->from); // remove agressor
if (move->type & mtEnpc)
{
int csq = move->to ^ 8;
captured = piece_value(Pawn);
occ ^= Bit(csq); // remove victim
}
if (move->type & mtProm)
{
captured += piece_value(move->info) - 2 * piece_value(Pawn);
at_stake = piece_value(move->info);
}
if (!(state.attacked & Bit(move->to))) return captured; // early exit
int n = 0;
exchange[n++] = captured;
while (true)
{
// swap sides
side = Opp(side);
// get the least valued attacker
bb_t att = bb_pawns[side] & occ & masks::pawn_attacks[Opp(side)][move->to];
if (att)
{
exchange[n] = at_stake - exchange[n - 1]; n++;
at_stake = piece_value(Pawn);
occ ^= _blsi(att);
continue;
}
att = bb_knights[side] & occ & masks::knight_attacks[move->to];
if (att)
{
exchange[n] = at_stake - exchange[n - 1]; n++;
at_stake = piece_value(Knight);
occ ^= _blsi(att);
continue;
}
att = bb_bishops[side] & occ & magic::bishop_attacks(move->to, occ);
if (att)
{
exchange[n] = at_stake - exchange[n - 1]; n++;
at_stake = piece_value(Bishop);
occ ^= _blsi(att);
continue;
}
att = bb_rooks[side] & occ & magic::rook_attacks(move->to, occ);
if (att)
{
exchange[n] = at_stake - exchange[n - 1]; n++;
at_stake = piece_value(Rook);
occ ^= _blsi(att);
continue;
}
att = bb_queens[side] & occ & magic::queen_attacks(move->to, occ);
if (att)
{
exchange[n] = at_stake - exchange[n - 1]; n++;
at_stake = piece_value(Queen);
occ ^= _blsi(att);
continue;
}
att = bb_king[side] & occ & masks::king_attacks[move->to];
if (att)
{
exchange[n] = at_stake - exchange[n - 1]; n++;
at_stake = piece_value(King);
occ ^= _blsi(att);
continue;
}
break;
}
// negamax
while (--n) if (exchange[n] > -exchange[n - 1]) exchange[n - 1] = -exchange[n];
return exchange[0];
}
Now I look at it, it is somehow not the last version, I call bishop_attacks and rook_attacks twice (again in queen_attacks), I can also store the attack vectors for rook and bishop in two temporary 64 bit variables. I believe it didn't matter much. Anyway this is the version that I timed.
So you update attack in each loop as the while loop as me, except to are doing it step by step (piece type by piece type, which is of course way better than what I do ...)
Re: Minic raw speed
Posted: Sun Nov 17, 2019 9:20 pm
by xr_a_y
So I speed up SEE a little but not much elo gain.
Then I studied apply, that is slow mainly because move is validated (not putting own king is check) without knowing pinned pieces before.
So I decided to switch to Magic BB do see if this will be faster ... not that much elo gain again
Re: Minic raw speed
Posted: Mon Nov 18, 2019 7:17 am
by xr_a_y
This gives this
Code: Select all
1 minic_dev 23 10 1700 53.3% 60.6%
2 minic_1.10 -4 10 1702 49.4% 60.8%
3 minic_1.09 -8 10 1701 48.8% 62.6%
4 minic_1.00 -11 11 1701 48.4% 57.5%
1.10 being better SEE and dev being magic BB.
Re: Minic raw speed
Posted: Mon Nov 18, 2019 1:08 pm
by Joost Buijs
xr_a_y wrote: ↑Sun Nov 17, 2019 9:20 pm
So I speed up SEE a little but not much elo gain.
Then I studied apply, that is slow mainly because move is validated (not putting own king is check) without knowing pinned pieces before.
So I decided to switch to Magic BB do see if this will be faster ... not that much elo gain again
Even when you double the speed of SEE() it won't have a big influence on the speed overall since it takes just a small fraction of the total time.
I calculate checkers and pinned pieces for the opponent (in one procedure) once after each move, having information about pinned pieces makes move validation quick and straightforward.
Code: Select all
void position_t::get_checkers_and_pinned(bb_t& checkers, bb_t& pinned)
{
int ksq = king_square(own()), opp = this->opp();
bb_t bq_att = magic::bishop_attacks(ksq, occupied());
bb_t rq_att = magic::rook_attacks(ksq, occupied());
// checkers
checkers = masks::pawn_attacks[own()][ksq] & pawns(opp);
checkers |= masks::knight_attacks[ksq] & knights(opp);
checkers |= bq_att & bishops_queens(opp);
checkers |= rq_att & rooks_queens(opp);
// pinned
pinned = 0;
// pinned diagonal
bb_t pp = bq_att & occupied(own());
bb_t pinners = magic::bishop_attacks(ksq, occupied() & ~pp) & bishops_queens(opp);
while (pinners) pinned |= pp & masks::between[ksq][_lsbr(pinners)];
// pinned lateral
pp = rq_att & occupied(own());
pinners = magic::rook_attacks(ksq, occupied() & ~pp) & rooks_queens(opp);
while (pinners) pinned |= pp & masks::between[ksq][_lsbr(pinners)];
}
Code: Select all
bool position_t::legal_move(move_t* move)
{
if (move->type & mtEnpc)
{
const bb_t blockers = ~Bit(move->from) & (Bit(move->to) | occupied()) & ~Bit(move->to ^ 8);
return !((magic::bishop_attacks(king_square(own()), blockers) & bishops_queens(opp())) |
(magic::rook_attacks(king_square(own()), blockers) & rooks_queens(opp())));
}
if (Bit(move->from) & pinned())
return (Bit(move->to) & masks::beyond[king_square(own())][move->from]);
return true;
}
I don't know of my code is optimal, maybe there are smarter ways to do this, I already use this for 18 years or so and I never felt the need to change this.
Re: Minic raw speed
Posted: Tue Nov 26, 2019 3:16 pm
by jdart
Implementing killers and counter moves can also delay the point where you need to sort.
--Jon
Re: Minic raw speed
Posted: Tue Nov 26, 2019 4:59 pm
by xr_a_y
jdart wrote: ↑Tue Nov 26, 2019 3:16 pm
Implementing killers and counter moves can also delay the point where you need to sort.
--Jon
I use Killer and Counter indeed, but only to sort already generated moves. That's a good point.
Re: Minic raw speed
Posted: Wed Nov 27, 2019 7:09 pm
by Deberger
To delay sorting is good, but you can also delay generating the move list.
That is, validate one move without generating all the moves.
This approach will naturally have diminishing returns, but the TT move and Killer moves are worth testing.
Also, if you validate the TT move, you can avoid some hash collisions.
Also, if you separate the TT into White-to-move and Black-to-move you can mitigate the effects of hash collisions.