mvanthoor wrote: ↑Sat Jan 29, 2022 8:49 pm
It runs at about the same speed on your CPU, but it is much newer. Are you using the BMI2-version on a Zen2 CPU perhaps? If you have a Zen2, you'd better run the popcnt version.
I tried old, generic, bmi2 and popcnt version of rustic alpha 1 (had them all on my harddisk, still) and bmi2 and popcnt were a lot faster than old and generic but popcnt seems even faster than BMI2 on my Ryzen 3600 for some reason. Don't ask me why but I made sure I used the fastest of the available versions.
mvanthoor wrote: ↑Sat Jan 29, 2022 8:49 pm
I cannot explain why your engine runs at 14M nodes per second. That is ridiculously fast. I also find that to be strange because, according to dangi's test with regard to bitboard implementations, Magic Bitboards is faster than the implementation you have used.
My (staged) move generator runs at 60M nps in perft and when computing the evaluation incrementally it's still 40M nps. So 14M nodes during search didn't seem so fast to me because the search isn't doing much else. Just generating moves, playing them and looking at the eval.
mvanthoor wrote: ↑Sat Jan 29, 2022 8:49 pm
But, do you still have staged move generation in the 0.1 version of your engine? Rustic Alpha 1 doesn't have that.
I do generate captures and quiet moves separately! It's hard to "disable" it. Otherwise I wouldn't know in what range to apply the move sorting. It's really written with that kind of 2 phase generation from the get go because I think it makes just a lot of sense to not waste time on generating quiet moves when most of the time you don't need them. I'm willing to accept that as the root cause for the speed difference, though.
mvanthoor wrote: ↑Sat Jan 29, 2022 8:49 pm
Rustic 1 counts nodes when it is actually going to do work: so it counts nodes after generating moves. If it cuts nodes off, a node is not counted. (This is true for both a/b and qsearch.) Rustic 3 counts nodes as they are visited: it counts immediately after entering the a/b function and qsearch function. To make sure it does not count a node at depth 0 a second time, it compensates for this by substracting a node just before entering qsearch. Rustic 3 seems to be faster than 1 because it counts visited nodes instead of nodes generating moves, but in practice they're the same speed.
I count visited nodes, too. Sometimes the visit is a short one: just looking whether side to move is in check (not cheap, actually) and then the standpat score of the position might already beat beta so no moves are generated. In all other cases I will also generate moves for each node that I count. So some degree of effort is spend for each "visited" node. Here's the source...
Code: Select all
private int Evaluate(int depth, int remaining, int alpha, int beta, MoveGen moveGen)
{
if (remaining == 0)
return EvaluateQuiet(depth, alpha, beta, moveGen);
NodesVisited++;
if (Aborted)
return 0;
BoardState current = Positions[depth];
BoardState next = Positions[depth + 1];
int score;
bool movesPlayed = false;
for (int i = moveGen.CollectCaptures(current); i < moveGen.Next; i++)
{
PickBestMove(i, moveGen.Next);
if (next.PlayAndUpdate(current, ref Moves[i]))
{
movesPlayed = true;
score = -Evaluate(depth + 1, remaining - 1, -beta, -alpha, moveGen);
if (score >= beta)
return beta;
if (score > alpha)
alpha = score;
}
}
for (int i = moveGen.CollectQuiets(current); i < moveGen.Next; i++)
{
if (next.PlayAndUpdate(current, ref Moves[i]))
{
movesPlayed = true;
score = -Evaluate(depth + 1, remaining - 1, -beta, -alpha, moveGen);
if (score >= beta)
return beta;
if (score > alpha)
alpha = score;
}
}
//checkmate or draw?
if (!movesPlayed)
return current.IsChecked(current.SideToMove) ? Evaluation.Checkmate(depth) : 0;
return alpha;
}
private int EvaluateQuiet(int depth, int alpha, int beta, MoveGen moveGen)
{
NodesVisited++;
if (Aborted)
return 0;
BoardState current = Positions[depth];
BoardState next = Positions[depth + 1];
bool inCheck = current.IsChecked(current.SideToMove);
//if inCheck we can't use standPat, need to escape check!
if (!inCheck)
{
int standPatScore = (int)current.SideToMove * current.Eval.Score;
if (standPatScore >= beta)
return beta;
if (standPatScore > alpha)
alpha = standPatScore;
}
bool movesPlayed = false;
for (int i = moveGen.CollectCaptures(current); i < moveGen.Next; i++)
{
PickBestMove(i, moveGen.Next);
if (next.PlayAndUpdate(current, ref Moves[i]))
{
movesPlayed = true;
int score = -EvaluateQuiet(depth + 1, -beta, -alpha, moveGen);
if (score >= beta)
return beta;
if (score > alpha)
alpha = score;
}
}
if (inCheck)
{
for (int i = moveGen.CollectQuiets(current); i < moveGen.Next; i++)
{
if (next.PlayAndUpdate(current, ref Moves[i]))
{
movesPlayed = true;
int score = -EvaluateQuiet(depth + 1, -beta, -alpha, moveGen);
if (score >= beta)
return beta;
if (score > alpha)
alpha = score;
}
}
if (!movesPlayed)
return Evaluation.Checkmate(depth);
}
//stalemate?
//if (expandedNodes == 0 && !LegalMoves.HasMoves(position))
// return 0;
return alpha;
}
Please correct me if this is not the standard way of counting nodes!
The development version of Rustic (and basically every version since 3) reaches depth 10 in 9.4 seconds IF I give it a 1MB TT, so it uses hash table / PV move sorting.
If Leorik has exact feature parity with Rustic 1 (which does NOT have PV-move ordering or a hash move ordering; or a TT for that matter), then I can't explain why it is so much faster than Rustic. If you have an explanation, I'd be glad to hear it. Rustic consistently needs less features for the same strength than other engines I test it against, but that doesn't mean there is not a bottleneck somewhere that is holding it back.
I definitely don't use PV-move ordering or a TT. Actually there's no benefit at all from having an incremental search at the moment except that when I abort due to running out of time I can return the result of the previous search.
I'm conflicted about sharing my code at this point. I have no problem with disclosing the source code or executables for reference purposes to answer questions like "is it really that fast? and why?" but I'm not quite ready to declare everything "open source" in the sense that everyone can just copy it and do their own derivative work. I might decide to open source everything later (e.g. under the MIT license like MinimalChess or GPL or whatever...) but for now, if I make the repo public I'd like to clearly communicate "look, but don't touch!" to the viewer.
Anybody knows a license that does just that?