emadsen wrote: ↑Thu Jul 01, 2021 1:27 am
pedrojdm2021 wrote: ↑Wed Jun 30, 2021 6:49 am
For example in
Kiwipete position: (r3k2r/p1ppqpb1/bn2pnp1/3PN3/1p2P3/2N2Q1p/PPPBBPPP/R3K2R w KQkq -) it does a search of depth 9 with this results:
Nodes searched: 1286811
Search time: 12596ms ( ~13 seconds ) in a AMD FX 8350 @4ghz CPU
so it means that my engine AI it's searching ~100k positions per second, i think it is pretty bad speed
so, anyone have any ideas to improve my nps? or another pruning technique that i should apply?
It's difficult to say without seeing more of your code. My engine is developed in C# using bitboards. It searches the above position to depth 9 in 573ms, but it reduces and prunes heavily.
Code: Select all
go depth 9
info depth 1 seldepth 7 time 41 nodes 329 score cp 274 nps 8014 pv e2a6
info hashfull 0 currmove e5c6 currmovenumber 48
info depth 2 seldepth 10 time 45 nodes 2701 score cp 179 nps 60071 pv e2a6 b4c3
info hashfull 0 currmove e5c6 currmovenumber 48
info depth 3 seldepth 15 time 50 nodes 7532 score cp 191 nps 149916 pv e2a6 b4c3 d2c3
info hashfull 0 currmove e5c6 currmovenumber 48
info depth 4 seldepth 15 time 59 nodes 16756 score cp 179 nps 283216 pv e2a6 b4c3 d2c3 e6d5
info hashfull 0 currmove e5c6 currmovenumber 48
info depth 5 seldepth 16 time 97 nodes 90036 score cp 42 nps 929553 pv c3b5 f6d5 d2g5 e7g5 f3f7
info hashfull 0 currmove e5c6 currmovenumber 48
info depth 6 seldepth 19 time 156 nodes 197933 score cp -52 nps 1268081 pv e2a6 e6d5 c3d5 f6d5 e5f7 e7f7
info hashfull 0 currmove e5c6 currmovenumber 48
info depth 7 seldepth 18 time 253 nodes 379474 score cp -8 nps 1498143 pv e2a6 e6d5 c3b5 e7e5 d2f4 e5b2 b5c7
info hashfull 0 currmove f3e3 currmovenumber 48
info depth 8 seldepth 21 time 328 nodes 508946 score cp -8 nps 1551662 pv e2a6 e6d5 c3b5 e7e5 d2f4 e5b2 b5c7 e8f8
info hashfull 0 currmove e1c1 currmovenumber 48
info depth 9 seldepth 21 time 573 nodes 964299 score cp 4 nps 1681850 pv d5e6 e7e6 e2a6 h3g2 f3g2 e6e5 f2f4 e5d6 c3b5
info hashfull 1 currmove e1c1 currmovenumber 48
bestmove d5e6
The search speed ramps up to about 3 million NPS after 1.5 secs.
I found no improvement in complementing bitboards with a mailbox array for a GetPiece(int Square) method. If statements that mask piece bitboards with Board.SquareMasks[Square] until it finds one > 0 performs fine. Lack of a mailbox array for determining the piece on a given square definitely is not the root cause of your engine's slow search.
Most likely the root problem is a) memory allocations your engine does in search but not in perft or b) your search algorithm. Don't allocate any memory on the heap during search. Pre-allocate all data structures at engine startup. Could be bad move ordering. How does your engine's search metrics compare to MadChess' metrics? I suggest you measure these metrics. Enable them when the "debug on" UCI command is sent.
Code: Select all
info depth 9 seldepth 21 time 576 nodes 964299 score cp 4 nps 1672678 pv d5e6 e7e6 e2a6 h3g2 f3g2 e6e5 f2f4 e5d6 c3b5
info hashfull 1 currmove e1c1 currmovenumber 48
info string Cache Hit = 20.71% Score Cutoff = 21.16% Best Move Hit = 26.32% Invalid Best Moves = 0
info string Null Move Cutoffs = 51.40% Beta Cutoff Move Number = 1.26 Beta Cutoff First Move = 90.68%
info string Evals = 171,825
info string Stopping search at 577 milliseconds.
bestmove d5e6
I just implemented that mailbox for getting the piece in square, and i gained about ~100ms in performance in the same perft as in the post, at least it's something
The thing is that my engine is not a uci engine, it's meant to play inside the unity application that i'm developing. so there is 0 UCI engine features, at least with communitcation with the console, but i do have a console application project for testing and debugging the engine itself, so i have a little command list to do search and so on..
a) memory allocations your engine does in search
i don't know if i am doing that.... but visual studio in the profiler it points to I/O operations to be the cause of the issue...
for example my negamax function looks like this (is quite long, but it's because of it's documentation and features added into it)
Code: Select all
// Fail hard alpha-beta negamax implementation
[MethodImpl(MethodImplOptions.AggressiveInlining)]
private int Negamax(int _alpha, int _beta, int _depth, bool _makeNull = true)
{
// check if the search is cancelled every 4096 nodes
if ((searched_nodes & 4096) == 0 && AbortSearch)
{
return 0;
}
// -------- [ Transposition table lookup ] ----------------------------
TTEntry entry = TTable.Lookup(board.hash);
if (entry.hash == board.hash && entry.depth >= _depth)
{
if (entry.eval < -MATE_SCORE) entry.eval += ply;
if (entry.eval > MATE_SCORE) entry.eval -= ply;
if (entry.flag == TTFlag.EXACT)
{
return entry.eval;
}
else if (entry.flag == TTFlag.ALPHA && entry.eval <= _alpha)
return _alpha;
else if (entry.flag == TTFlag.BETA && entry.eval >= _beta)
return _beta;
}
TTFlag flag = TTFlag.ALPHA;
// ----------------------------------------------------------------------
// Quiescense search
if (_depth <= 0)
{
return Quiescense(_alpha,_beta);
}
// increment node searched count in the current search (useful for debugging)
searched_nodes++;
// if we have found a repeated position during the current search path, avoid it
if (ply > 0 && _makeNull && IsRepetition())
{
return 0;
}
// is the player to move in check?
bool in_check = board.InCheck(board.sideToMove);
// -------------- [ Null move pruning ] ------------------------------
if (_makeNull && !in_check && _depth >= 3 && ply > 0)
{
MakeNullMove();
int val = -Negamax(-_beta, -_beta + 1, _depth -3, !_makeNull);
UnMakeNullMove();
if (val >= _beta)
return _beta;
}
// -------------- [ Futility Pruning Initialization ] ------------------
// Futility Pruning flag
bool f_prune = false;
// Main condition to consider Futility Pruning
if (!in_check && _makeNull && Math.Abs(_alpha) < MATE_SCORE && _depth == 2)
{
f_prune = (Evaluate() + AISettings.mg_material_score[ChessPiece.Rook]) < _alpha;
}
// -----------------------------------------------------------------------
// have we found our principal variation? (best move / path)
bool found_pv = false;
// cache score
int score = 0;
// how many legal moves we have so far
byte legal_moves = 0;
// how many moves have we searched so far
byte moves_searched = 0;
// movements & move index in the current node
List<int> moves = board.GenerateMoves();
byte moveIndex = 0;
// -------------- [ Search iteration ] ------------------------------------
for(moveIndex = 0; moveIndex < moves.Count; moveIndex++)
{
OrderMoves(moveIndex,ref moves);
// -------------- [ Futility Pruning ] --------------------------------
if (f_prune && moves_searched > 2 && MoveUtility.Get_flag_capture(moves[moveIndex]) == 0
&& MoveUtility.Get_flag_castling(moves[moveIndex]) == 0 && MoveUtility.Get_promotion(moves[moveIndex]) == 0)
{
if(board.MakeMove(moves[moveIndex]))
{
if (!board.InCheck(board.sideToMove))
{
board.UnMakeMove(moves[moveIndex]);
return Evaluate();
}
board.UnMakeMove(moves[moveIndex]);
}
}
if (board.MakeMove(moves[moveIndex]))
{
ply++;
repetition_index++;
repetitions_table[repetition_index] = board.hash;
// -------------- [ Principal Variation Search ] ------------------
if(found_pv)
{
// try searching with a fixed window
score = -Negamax(-_alpha -1, -_alpha, _depth -1);
// if move fails, do a normal search
if (score > _alpha && score < _beta)
score = -Negamax(-_beta, -_alpha, _depth -1);
}else
{
// -------------- [ Late Move reductions (LMR) ] ------------------
if (moves_searched >= FULL_DEPTH_MOVES && _depth >= REDUCTION_LIMIT &&
// LMR validation
(MoveUtility.Get_flag_capture(moves[moveIndex]) == 0 && MoveUtility.Get_promotion(moves[moveIndex]) == 0
&& !in_check && _depth >= REDUCTION_LIMIT))
// fixed window search with a reduced depth
score = -Negamax(-_alpha -1, -_alpha, _depth -2);
else
// Hack to ensure that full-depth search is done.
score = _alpha + 1;
if (score > _alpha)
{
// try searching with a fixed window
score = -Negamax(-_alpha -1, -_alpha, _depth -1);
// our fails outside the window, do a normal search
if (score > _alpha && score < _beta)
score = -Negamax(-_beta, -_alpha, _depth -1);
}
}
// ----------------------------------------------------------------------
board.UnMakeMove(moves[moveIndex]);
// decrement ply & add legal moves
ply--;
repetition_index--;
legal_moves++;
moves_searched++;
// fail-hard beta cut-off
if (score >= _beta)
{
killer_moves[1][ply] = killer_moves[0][ply];
killer_moves[0][ply] = moves[moveIndex];
TTable.Store(new TTEntry(TTFlag.BETA, board.hash, _depth , _beta), ply);
return _beta;
}
// found a better move
if (score > _alpha)
{
history_moves[board.sideToMove][MoveUtility.Get_square_from(moves[moveIndex])]
[MoveUtility.Get_square_to(moves[moveIndex])] += _depth;
_alpha = score;
pv[ply] = moves[moveIndex];
flag = TTFlag.EXACT;
found_pv = true;
}
// -------------- [ Mate Distance Pruning ] ------------------
// we already found a checkmate, no need to search more
if(_alpha >= MATE_SCORE)
{
return _alpha;
}
// we can not be checkmated yet, so its's safe to prune here
if (_beta <= (-MATE_VALUE + 2) && !in_check)
{
return _beta;
}
}
}
// -------------- [ Finish search ] -------------------------------
// Legality check
if (legal_moves == 0)
{
if (in_check)
{
// retun mate score (closest distance to mate)
return -MATE_VALUE + ply;
}else
{
// return stalemate (draw) score
return 0;
}
}
TTable.Store(new TTEntry(flag, board.hash, _depth, _alpha), ply);
// node fails low
return _alpha;
}
}
In there you can see that i am using some local variables for some stuff... like the TTEntry (struct) for transposition tables, booleans for check detection, and so on... each node i get a new list<int> instance from the moveGenerator, things like that, i don't know if that is what you mean with "allocate memory on the heap"
i have something similar for QS and i do use local variables for move ordeing too, i tried to avoid that by moving some of the variables to global variables inside the class for example some variables used in the evaluation (like some dummy bitboard to loop over the pieces), but it broke the search and the engine started giving another moves in some depth levels, so actually i don't know clearly how to avoid declaration of that stuff inside the methods that they are in use
and also i have tried to "reduce" the bit-count from int to encode the moves to ushort to encoding the moves and adding an enum to flag features, so instead of encoding the move in 19 bits, only encoding it in 16 bits, but it started generating issues with the search, and instead of gaining performance in the perft, it did the opposite
so i reverted back these changes and for now i am again with int dataType for movements
....maybe the issue is the move generator?
for generating moves i have something like this:
Code: Select all
[MethodImpl(MethodImplOptions.AggressiveInlining)]
public List<int> GenerateMoves()
{
moves = new List<int>(64);
ulong own_pieces = occupancies[sideToMove];
// Generate king moves
Append_moves_king(own_pieces);
// we have double check here. so we can return early
if (BitUtility.CountBits(GetKingAttackers(sideToMove)) >= 2) return moves;
// Generate pawn moves
Append_moves_pawn(own_pieces);
// Generate knight moves
Append_moves_kinght(own_pieces);
// Generate bishop moves
Append_moves_sliding(own_pieces, ChessPiece.Bishop);
// Generate rook moves
Append_moves_sliding(own_pieces, ChessPiece.Rook);
// Generate queen moves
Append_moves_sliding(own_pieces, ChessPiece.Queen);
return moves;
}
and for each piece is something like this, with variations based on the piece type (this one is for kinghts):
Code: Select all
[MethodImpl(MethodImplOptions.AggressiveInlining)]
private void Append_moves_kinght(ulong _own_pieces)
{
ulong knight_moves = 0ul;
ulong pieces = bitboards[sideToMove][ChessPiece.Knight];
byte square_from;
byte square_to;
byte capture;
while (pieces > 0)
{
square_from = BitUtility.GetLS1BIndex(pieces);
knight_moves = ChessData.GetKnightMoves(square_from) & ~ _own_pieces;
while (knight_moves > 0)
{
square_to = BitUtility.GetLS1BIndex(knight_moves);
capture = (byte)(BitUtility.ContainsBit(occupancies[sideToMove ^ 1], square_to) ? 1 : 0);
moves.Add(MoveUtility.Encode_move(square_from, square_to , 0, capture,0,0));
BitUtility.RemoveLS1B(ref knight_moves);
}
BitUtility.RemoveLS1B(ref pieces);
}
}