I am currently testing a new version of Leptir (Leptir N1, faster search then Leptir Analyzer). In my new test, this engine manages 103 out of 110 solutions. For a quick check, I tested the engine against Stockfish dev in the bullet.
7500 kns each engine and Ponder ON (this is absolutely important to me because Ponder ON is played on the 3 known servers, PlayChess, InfinityChess, Lichess) - time control 60s + 0.1s.
After 270 games the status is 50%-50%.
My second engine GoldenEye (this engine can learn - but learn is switched off for the test) is only -1 behind Stockfish (Noomen 3move Book).
Sometimes I pause the test when I'm doing other things, currently it looks like this (second run games 59 - 500, Powerfritz 18 GUI) :
I expected a bigger advantage for Stockfish dev. Yes, I know these are too few games (Noomen 3moves book, contains only 250 variants), but all the most important openings of human theory are played - that's enough for me.
Both engines are better than Stockfish dev (only 91 solutions) with 101 and 103 solutions in position tests. It's inexplicable to me. Stockfish dev is not better in games but clearly weaker in position tests. So what are the merits of Stockfish dev compared to my clones?
Download Noomen 3move book PGN:
https://pixeldrain.com/u/2r3iGiP8