Fluctuations in Stockfish evaluation

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

syzygy
Posts: 5563
Joined: Tue Feb 28, 2012 11:56 pm

Re: Fluctuations in Stockfish evaluation

Post by syzygy »

Javier Ros wrote: Tue Jun 15, 2021 8:42 am Of course I have used 1 thread to avoid non-determination due multiple threads.
If the single-threaded AVX2 and BMI2 versions indeed produce different results (on exactly the same settings and starting from an empty hash table etc.), then there might be a bug. (Unless the discrepancy is known and accepted.)
It is true that this is not a traditional mathematical method looking for an exact solution, but it is desirable that it should be similar.
That is a subjective feeling which the SF developers cannot care about.
In every next search iteration new things about the position can be discovered, so fluctuations are inherent.
Moreover, SF's search algorithm is extremely far removed from a well-defined mathematical method. Lots of search decisions are made on the basis of information that is available by chance for some positions but not for others. The search is a completely inconsistent mess, but it works very well. Try to "fix" it and you lose 100s of Elo. If consistency is what you want, then you don't want Stockfish.
ernst
Posts: 352
Joined: Thu Mar 09, 2006 6:00 pm

Re: Fluctuations in Stockfish evaluation

Post by ernst »

syzygy wrote: Tue Jun 15, 2021 1:04 am
Javier Ros wrote: Mon Jun 14, 2021 6:45 pm I wonder what is the origin of these unpleasant fluctuations and if there is a solution.
What would be the point of searching longer/deeper if fluctuations did not exist? If you don't want fluctuations, search to depth 1.
This one made me chuckle. Shock and Awe must be your middle name. :lol:
This post may either be cause or result of misunderstandings.
Javier Ros
Posts: 200
Joined: Fri Oct 12, 2012 12:48 pm
Location: Seville (SPAIN)
Full name: Javier Ros

Re: Fluctuations in Stockfish evaluation

Post by Javier Ros »

syzygy wrote: Tue Jun 15, 2021 3:49 pm
Javier Ros wrote: Tue Jun 15, 2021 8:42 am Of course I have used 1 thread to avoid non-determination due multiple threads.
If the single-threaded AVX2 and BMI2 versions indeed produce different results (on exactly the same settings and starting from an empty hash table etc.), then there might be a bug. (Unless the discrepancy is known and accepted.)
It is true that this is not a traditional mathematical method looking for an exact solution, but it is desirable that it should be similar.
That is a subjective feeling which the SF developers cannot care about.
In every next search iteration new things about the position can be discovered, so fluctuations are inherent.
Moreover, SF's search algorithm is extremely far removed from a well-defined mathematical method. Lots of search decisions are made on the basis of information that is available by chance for some positions but not for others. The search is a completely inconsistent mess, but it works very well. Try to "fix" it and you lose 100s of Elo. If consistency is what you want, then you don't want Stockfish.
Of course I like Stockfish, it is one of the most important achievements in computer chess.

Perhaps these fluctuations are due to the complexity of chess and not due to alpha-beta algorithm or Stockfish programming.

I was wondering if anyone knew anything else about this.
chrisw
Posts: 4315
Joined: Tue Apr 03, 2012 4:28 pm

Re: Fluctuations in Stockfish evaluation

Post by chrisw »

Javier Ros wrote: Tue Jun 15, 2021 6:09 pm
syzygy wrote: Tue Jun 15, 2021 3:49 pm
Javier Ros wrote: Tue Jun 15, 2021 8:42 am Of course I have used 1 thread to avoid non-determination due multiple threads.
If the single-threaded AVX2 and BMI2 versions indeed produce different results (on exactly the same settings and starting from an empty hash table etc.), then there might be a bug. (Unless the discrepancy is known and accepted.)
It is true that this is not a traditional mathematical method looking for an exact solution, but it is desirable that it should be similar.
That is a subjective feeling which the SF developers cannot care about.
In every next search iteration new things about the position can be discovered, so fluctuations are inherent.
Moreover, SF's search algorithm is extremely far removed from a well-defined mathematical method. Lots of search decisions are made on the basis of information that is available by chance for some positions but not for others. The search is a completely inconsistent mess, but it works very well. Try to "fix" it and you lose 100s of Elo. If consistency is what you want, then you don't want Stockfish.
Of course I like Stockfish, it is one of the most important achievements in computer chess.

Perhaps these fluctuations are due to the complexity of chess and not due to alpha-beta algorithm or Stockfish programming.

I was wondering if anyone knew anything else about this.
Fluctuations. Not because of any property of chess. Not because of alga beta algorithm. And we can assume there is nothing wrong or unstable with “Stockfish programming” whatever that means.

With no pruning and no extensions there would be full repeatability, no “fluctuations”. They’re a property of the extensions and the pruning, in particular that these are statistical based on what was recently happening in the search tree. Small differences create more differences in which part of the tree (subtrees) gets searched. Which parts of the tree affects returned evaluations a little and changes the statistical history on which future extend/prune decisions get made. A butterfly flapped its wings in Outer Mongolia and so on. It’s a miracle it all hangs together, but it does. Or, as Dawkins might say: it works, bitches.
Javier Ros
Posts: 200
Joined: Fri Oct 12, 2012 12:48 pm
Location: Seville (SPAIN)
Full name: Javier Ros

Re: Fluctuations in Stockfish evaluation

Post by Javier Ros »

chrisw wrote: Tue Jun 15, 2021 8:14 pm Fluctuations. Not because of any property of chess. Not because of alga beta algorithm. And we can assume there is nothing wrong or unstable with “Stockfish programming” whatever that means.

With no pruning and no extensions there would be full repeatability, no “fluctuations”. They’re a property of the extensions and the pruning, in particular that these are statistical based on what was recently happening in the search tree. Small differences create more differences in which part of the tree (subtrees) gets searched. Which parts of the tree affects returned evaluations a little and changes the statistical history on which future extend/prune decisions get made. A butterfly flapped its wings in Outer Mongolia and so on. It’s a miracle it all hangs together, but it does. Or, as Dawkins might say: it works, bitches.
Thanks for the explanation.
KLc
Posts: 140
Joined: Wed Jun 03, 2020 6:46 am
Full name: Kurt Lanc

Re: Fluctuations in Stockfish evaluation

Post by KLc »

chrisw wrote: Tue Jun 15, 2021 8:14 pm Fluctuations. Not because of any property of chess. Not because of alga beta algorithm. And we can assume there is nothing wrong or unstable with “Stockfish programming” whatever that means.

With no pruning and no extensions there would be full repeatability, no “fluctuations”. They’re a property of the extensions and the pruning, in particular that these are statistical based on what was recently happening in the search tree. Small differences create more differences in which part of the tree (subtrees) gets searched. Which parts of the tree affects returned evaluations a little and changes the statistical history on which future extend/prune decisions get made. A butterfly flapped its wings in Outer Mongolia and so on. It’s a miracle it all hangs together, but it does. Or, as Dawkins might say: it works, bitches.
I think this makes sense because other engines do not show such massive fluctuations; probably because there's less pruning.
Ferdy
Posts: 4833
Joined: Sun Aug 10, 2008 3:15 pm
Location: Philippines

Re: Fluctuations in Stockfish evaluation

Post by Ferdy »

Javier Ros wrote: Mon Jun 14, 2021 6:45 pm Stockfish's progress is impressive, especially after nnue.

However, the fluctuations in the evaluation are still there.
For example at initial position with one thread, discarding the first two minutes, we have with version Stockfish_13_win_x64_bmi2 with 1 Gb RAM in a i7 7700HQ after 30 minutes variations from +0.14 to +0.42 and alternating best moves between d4, c4 and e4, final best move e4 +0.21 ply 44/54-

and version Stockfish_13_win_x64_avx2 with 1 Gb RAM in a Ryzen 7 2700x after 30 minutes variations from +0.14 to +0.34 and alternating best moves between e4, d4 and c4, final best move c4 +0.26 ply 43/59

I wonder what is the origin of these unpleasant fluctuations and if there is a solution.
On the other hand, I don't understand the discrepancy between the results of the bmi2 and avx versions.

i7 7700HQ
FEN: rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1

Stockfish_13_win_x64_bmi2:
NNUE evaluation using nn-62ef826d1a6d.nnue enabled

33/42 02:01 112.723.623 927.033 +0,23 1.d2-d4 Ng8-f6 2.c2-c4 e7-e6 3.Ng1-f3 d7-d5 4.Nb1-c3 Bf8-e7 5.g2-g3 O-O 6.Bf1-g2 d5xc4 7.Qd1-a4 c7-c6 8.Qa4xc4 b7-b5 9.Qc4-b3 a7-a5 10.O-O Bc8-b7 11.Bc1-g5 Nb8-d7 12.Nf3-e5 a5-a4 13.Qb3-c2 Nd7xe5 14.d4xe5 Nf6-d7 15.Bg5xe7 Qd8xe7 16.f2-f4 Qe7-c5+ 17.Rf1-f2 b5-b4
34/46+ 02:40 148.577.172 925.662 +0,31 1.c2-c4
34/46- 02:58 164.495.839 921.122 +0,14 1.c2-c4 e7-e5
34/46+ 04:09 228.962.917 919.356 +0,26 1.e2-e4
34/46 04:30 248.303.538 919.019 +0,25 1.e2-e4 e7-e6 2.d2-d4 d7-d5 3.Nb1-c3 Ng8-f6 4.e4-e5 Nf6-d7 5.Nc3-e2 c7-c5 6.c2-c3 Qd8-b6 7.Ng1-f3 Nb8-c6 8.g2-g3 c5xd4 9.c3xd4 Bf8-b4+ 10.Bc1-d2 a7-a5 11.Bf1-h3 Bb4xd2+ 12.Qd1xd2 Qb6-b4 13.Qd2xb4 a5xb4 14.Ke1-d2 Nd7-b6 15.b2-b3 Ke8-e7 16.Rh1-c1 Bc8-d7 17.Ne2-f4
35/43+ 04:48 264.801.996 918.080 +0,34 1.e2-e4
35/43+ 04:52 268.629.346 918.193 +0,42 1.e2-e4
35/43- 05:03 278.267.739 916.632 +0,17 1.e2-e4 e7-e6
35/46+ 05:36 305.839.963 909.550 +0,36 1.e2-e4
35/46 05:40 309.541.736 909.820 +0,37 1.e2-e4 e7-e6 2.d2-d4 d7-d5 3.Nb1-c3 Ng8-f6 4.Bc1-g5 Bf8-e7 5.e4-e5 Nf6-d7 6.Bg5xe7 Qd8xe7 7.Qd1-d2 O-O 8.f2-f4 c7-c5 9.Ng1-f3 Nb8-c6 10.O-O-O a7-a6 11.g2-g4 b7-b5 12.h2-h4 f7-f6 13.e5xf6 Nd7xf6 14.d4xc5 Qe7xc5 15.Bf1-d3 Nf6xg4 16.Rh1-g1 Ng4-f2 17.Bd3xh7+ Kg8xh7 18.Rd1-f1
36/44- 06:02 329.768.348 910.212 +0,29 1.e2-e4 e7-e6
36/44- 06:30 355.266.460 910.750 +0,21 1.e2-e4 e7-e5
36/44+ 06:41 365.553.131 910.908 +0,29 1.e2-e4
36/44 06:46 370.216.738 910.798 +0,26 1.e2-e4 e7-e5 2.Ng1-f3 Nb8-c6 3.Bf1-b5 a7-a6 4.Bb5-a4 Ng8-f6 5.O-O Nf6xe4 6.d2-d4 b7-b5 7.Ba4-b3 d7-d5 8.d4xe5 Bc8-e6 9.c2-c3 Bf8-c5 10.Qd1-d3 O-O 11.Bc1-e3 Bc5xe3 12.Qd3xe3 Nc6-e7 13.Bb3-c2 Be6-f5 14.Nb1-d2 Ne4xd2 15.Qe3xd2 Bf5xc2 16.Qd2xc2 h7-h6 17.a2-a4 c7-c5 18.a4xb5 a6xb5 19.Ra1xa8
37/48+ 07:07 389.903.701 911.222 +0,34 1.e2-e4
37/48- 07:28 408.460.257 911.711 +0,18 1.e2-e4 e7-e5
37/48+ 07:48 427.277.778 911.950 +0,30 1.e2-e4
37/48 07:58 436.138.449 912.173 +0,32 1.e2-e4 e7-e5 2.Ng1-f3 Nb8-c6 3.Bf1-b5 Ng8-f6 4.O-O Nf6xe4 5.Rf1-e1 Ne4-d6 6.Nf3xe5 Bf8-e7 7.Bb5-f1 Nc6xe5 8.Re1xe5 O-O 9.d2-d4 Be7-f6 10.Re5-e1 Rf8-e8 11.Bc1-f4 Re8xe1 12.Qd1xe1 Nd6-e8 13.c2-c3 a7-a5 14.Nb1-d2 d7-d5 15.Bf1-d3 a5-a4 16.Nd2-f3 a4-a3 17.b2-b4 Ne8-d6 18.Qe1-e2 Bc8-f5 19.Bd3xf5 Nd6xf5 20.Qe2-b5 b7-b6
38/44- 09:17 507.342.185 910.604 +0,24 1.e2-e4 e7-e5
38/45+ 09:30 519.454.524 910.142 +0,32 1.e2-e4
38/45 09:33 521.726.724 910.236 +0,26 1.e2-e4 e7-e5 2.Ng1-f3 Nb8-c6 3.Bf1-b5 Ng8-f6 4.O-O Nf6xe4 5.Rf1-e1 Ne4-d6 6.Nf3xe5 Bf8-e7 7.Bb5-f1 Nc6xe5 8.Re1xe5 O-O 9.d2-d4 Be7-f6 10.Re5-e1 Rf8-e8 11.Bc1-f4 Re8xe1 12.Qd1xe1 Nd6-e8 13.Bf1-d3 Bf6xd4 14.Bd3xh7+ Kg8xh7 15.Qe1-e4+ Kh7-g8 16.Qe4xd4 d7-d5 17.Nb1-d2 c7-c6 18.c2-c4 Ne8-f6 19.h2-h3 Bc8-f5 20.g2-g4
39/48- 10:57 597.679.603 908.516 +0,18 1.e2-e4 c7-c5
39/48+ 11:17 614.270.423 907.292 +0,26 1.e2-e4
39/48 11:39 634.154.420 906.100 +0,31 1.e2-e4 e7-e5 2.Ng1-f3 Nb8-c6 3.Bf1-b5 Ng8-f6 4.O-O Nf6xe4 5.Rf1-e1 Ne4-d6 6.Nf3xe5 Bf8-e7 7.Bb5-f1 Nc6xe5 8.Re1xe5 O-O 9.d2-d4 Be7-f6 10.Re5-e1 Rf8-e8 11.Bc1-f4 Re8xe1 12.Qd1xe1 Nd6-e8 13.c2-c3 a7-a5 14.Nb1-d2 d7-d5 15.Bf1-d3 a5-a4 16.Qe1-e3 Qd8-e7 17.Qe3-g3 Ne8-d6 18.a2-a3 Bc8-d7 19.Bf4xd6 Qe7xd6 20.Qg3xd6 c7xd6
40/47 13:38 739.243.466 903.525 +0,25 1.e2-e4 e7-e5 2.Ng1-f3 Nb8-c6 3.Bf1-b5 Ng8-f6 4.O-O Nf6xe4 5.Rf1-e1 Ne4-d6 6.Nf3xe5 Bf8-e7 7.Bb5-f1 Nc6xe5 8.Re1xe5 O-O 9.d2-d4 Be7-f6 10.Re5-e1 Rf8-e8 11.Bc1-f4 Re8xe1 12.Qd1xe1 Nd6-e8 13.c2-c3 a7-a5 14.a2-a4 d7-d6 15.Nb1-d2 Bc8-e6 16.h2-h3 g7-g6 17.Qe1-e4 c7-c6 18.Qe4-e3 Bf6-e7 19.c3-c4 Ne8-g7 20.Bf1-d3 Ng7-f5 21.Bd3xf5 Be6xf5 22.d4-d5
41/53- 15:28 838.052.070 902.356 +0,17 1.e2-e4 e7-e5
41/53+ 15:58 864.887.871 902.063 +0,25 1.e2-e4
41/53 16:05 870.885.354 902.053 +0,25 1.e2-e4 e7-e5 2.Ng1-f3 Nb8-c6 3.Bf1-b5 Ng8-f6 4.O-O Nf6xe4 5.Rf1-e1 Ne4-d6 6.Nf3xe5 Bf8-e7 7.Bb5-f1 Nc6xe5 8.Re1xe5 O-O 9.d2-d4 Be7-f6 10.Re5-e1 Rf8-e8 11.Bc1-f4 Re8xe1 12.Qd1xe1 Nd6-e8 13.c2-c3 d7-d5 14.a2-a4 Ne8-d6 15.Nb1-d2 a7-a5 16.Bf1-d3 Bc8-f5 17.Qe1-e3 Bf5xd3 18.Qe3xd3 Qd8-d7 19.g2-g3 Ra8-e8 20.Bf4xd6 Qd7xd6 21.Kg1-g2 Qd6-d7 22.Nd2-f3 c7-c6 23.h2-h3
42/51- 18:39 1.009.321.730 901.979 +0,17 1.e2-e4 e7-e5
42/51+ 19:19 1.045.305.608 901.815 +0,25 1.e2-e4
42/51 19:32 1.057.554.324 901.762 +0,21 1.e2-e4 e7-e5 2.Ng1-f3 Nb8-c6 3.Bf1-b5 Ng8-f6 4.O-O Nf6xe4 5.Rf1-e1 Ne4-d6 6.Nf3xe5 Bf8-e7 7.Bb5-f1 Nc6xe5 8.Re1xe5 O-O 9.d2-d4 Be7-f6 10.Re5-e1 Rf8-e8 11.Bc1-f4 Re8xe1 12.Qd1xe1 Nd6-e8 13.c2-c3 d7-d5 14.a2-a4 a7-a5 15.Bf1-d3 g7-g6 16.Nb1-d2 Ne8-g7 17.Nd2-f3 Bc8-f5 18.Bd3xf5 Ng7xf5 19.g2-g3 Qd8-d7 20.Qe1-e2 c7-c6 21.Qe2-d2 Ra8-e8 22.h2-h4 Nf5-g7
43/46- 22:50 1.234.101.899 900.226 +0,13 1.e2-e4 e7-e5
43/55+ 24:21 1.314.457.234 899.644 +0,21 1.e2-e4
43/55 24:56 1.346.285.181 899.530 +0,21 1.e2-e4 e7-e5 2.Ng1-f3 Nb8-c6 3.Bf1-b5 Ng8-f6 4.O-O Nf6xe4 5.Rf1-e1 Ne4-d6 6.Nf3xe5 Bf8-e7 7.Bb5-f1 Nc6xe5 8.Re1xe5 O-O 9.d2-d4 Be7-f6 10.Re5-e1 Rf8-e8 11.Bc1-f4 Re8xe1 12.Qd1xe1 Nd6-e8 13.c2-c3 d7-d5 14.Bf1-d3 g7-g6 15.Nb1-d2 Ne8-g7 16.Nd2-f3 c7-c6 17.Bf4-h6 a7-a5 18.Qe1-e3 Bc8-g4 19.Nf3-e5 Ng7-f5 20.Bd3xf5 Bg4xf5 21.Ra1-e1 a5-a4 22.h2-h3
44/54- 31:57 1.722.138.757 897.971 +0,12 1.e2-e4 e7-e5

Ryzen 7 2700x
FEN: rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1

Stockfish_13_win_x64_avx2:
NNUE evaluation using nn-62ef826d1a6d.nnue enabled

36/50- 02:22 145.261.236 1.017.891 +0,28 1.e2-e4 e7-e5
36/50 02:47 170.761.372 1.017.605 +0,25 1.e2-e4 e7-e5 2.Ng1-f3 Nb8-c6 3.Bf1-b5 Ng8-f6 4.O-O Nf6xe4 5.Rf1-e1 Ne4-d6 6.Nf3xe5 Bf8-e7 7.Bb5-f1 Nc6xe5 8.Re1xe5 O-O 9.d2-d4 Be7-f6 10.Re5-e1 Rf8-e8 11.c2-c3 Re8xe1 12.Qd1xe1 Nd6-e8 13.Bc1-f4 d7-d5 14.Bf1-d3 g7-g6 15.Nb1-d2 Ne8-g7 16.Nd2-f3 Bc8-f5 17.Bd3xf5 Ng7xf5 18.h2-h3 c7-c6 19.Qe1-d2 Nf5-d6 20.Ra1-e1 Nd6-e4 21.Qd2-e3 Qd8-d7 22.Nf3-e5
37/47- 03:17 200.367.178 1.015.823 +0,17 1.e2-e4 e7-e5
37/49 03:41 224.491.256 1.015.420 +0,17 1.e2-e4 e7-e5 2.Ng1-f3 Nb8-c6 3.Bf1-b5 Ng8-f6 4.O-O Nf6xe4 5.Rf1-e1 Ne4-d6 6.Nf3xe5 Bf8-e7 7.Bb5-f1 Nc6xe5 8.Re1xe5 O-O 9.d2-d4 Be7-f6 10.Re5-e1 Rf8-e8 11.c2-c3 Re8xe1 12.Qd1xe1 Nd6-e8 13.a2-a4 a7-a5 14.Bc1-f4 d7-d5 15.Qe1-e3 Bc8-f5 16.Nb1-d2 h7-h6 17.Nd2-f3 c7-c6 18.Ra1-e1 Ne8-c7 19.Bf4-e5 Nc7-e6 20.Be5xf6 Qd8xf6
38/51 04:17 261.164.439 1.014.439 +0,17 1.e2-e4 e7-e5 2.Ng1-f3 Nb8-c6 3.Bf1-b5 Ng8-f6 4.O-O Nf6xe4 5.Rf1-e1 Ne4-d6 6.Nf3xe5 Bf8-e7 7.Bb5-f1 Nc6xe5 8.Re1xe5 O-O 9.d2-d4 Be7-f6 10.Re5-e1 Rf8-e8 11.c2-c3 Re8xe1 12.Qd1xe1 Nd6-e8 13.a2-a4 a7-a5 14.Bc1-f4 d7-d5 15.Bf1-d3 g7-g6 16.Qe1-e2 Ne8-d6 17.Nb1-d2 c7-c6 18.Ra1-e1 Bc8-f5 19.g2-g3 Bf5xd3 20.Qe2xd3 Bf6-g5 21.Bf4xd6 Bg5xd2 22.Qd3xd2 Qd8xd6
39/49+ 06:18 382.983.236 1.011.764 +0,25 1.d2-d4
39/50+ 06:24 388.872.084 1.012.155 +0,34 1.d2-d4
39/51 07:13 438.409.469 1.011.329 +0,15 1.e2-e4 e7-e5 2.Ng1-f3 Nb8-c6 3.Bf1-b5 Ng8-f6 4.O-O Nf6xe4 5.Rf1-e1 Ne4-d6 6.Nf3xe5 Bf8-e7 7.Bb5-f1 Nc6xe5 8.Re1xe5 O-O 9.d2-d4 Be7-f6 10.Re5-e1 Rf8-e8 11.c2-c3 Re8xe1 12.Qd1xe1 Nd6-e8 13.Bc1-f4 d7-d5 14.Bf1-d3 g7-g6 15.h2-h3 Ne8-g7 16.Nb1-d2 Bc8-f5 17.Bd3xf5 Ng7xf5 18.Nd2-f3 c7-c6 19.g2-g3 Qd8-d7 20.Kg1-g2 Ra8-e8 21.Qe1-d2 Nf5-d6 22.Ra1-e1 Re8xe1 23.Qd2xe1 Bf6-g7
40/54+ 07:37 463.026.188 1.011.042 +0,23 1.e2-e4
40/54 08:49 535.027.657 1.009.886 +0,21 1.e2-e4 e7-e5 2.Ng1-f3 Nb8-c6 3.Bf1-b5 Ng8-f6 4.O-O Nf6xe4 5.Rf1-e1 Ne4-d6 6.Nf3xe5 Bf8-e7 7.Bb5-f1 Nc6xe5 8.Re1xe5 O-O 9.d2-d4 Be7-f6 10.Re5-e1 Rf8-e8 11.c2-c3 Re8xe1 12.Qd1xe1 Nd6-e8 13.Bc1-f4 d7-d5 14.Nb1-d2 Bc8-f5 15.Qe1-e3 c7-c6 16.Ra1-e1 h7-h6 17.Qe3-g3 Bf6-g5 18.Bf4xg5 h6xg5 19.Bf1-e2 Bf5-g6 20.f2-f4 g5xf4 21.Qg3xf4 f7-f6
41/54+ 17:48 1.078.531.366 1.009.298 +0,29 1.c2-c4
41/54 19:09 1.160.542.452 1.009.212 +0,14 1.c2-c4 Ng8-f6 2.Ng1-f3 e7-e6 3.g2-g3 d7-d5 4.Bf1-g2 d5xc4 5.Qd1-a4+ Bc8-d7 6.Qa4xc4 c7-c5 7.d2-d4 Nb8-c6 8.O-O c5xd4 9.Nf3xd4 Ra8-c8 10.Nb1-c3 Nc6xd4 11.Qc4xd4 Bf8-c5 12.Qd4-f4 Bd7-c6 13.Bg2xc6+ Rc8xc6 14.Rf1-d1 Qd8-c8 15.b2-b4 Bc5-b6 16.Nc3-a4 Bb6-c7 17.Qf4-e3 O-O 18.Bc1-b2 Nf6-g4 19.Qe3-b3 Bc7-e5 20.Ra1-c1 b7-b6 21.Rc1xc6 Qc8xc6 22.Bb2xe5 Ng4xe5 23.b4-b5
42/51+ 20:30 1.242.678.825 1.009.545 +0,23 1.c2-c4
42/51+ 20:47 1.259.308.192 1.009.454 +0,31 1.c2-c4
42/51 21:46 1.318.280.276 1.009.331 +0,13 1.c2-c4 Ng8-f6 2.Ng1-f3 e7-e6 3.d2-d4 d7-d5 4.Nb1-c3 d5xc4 5.e2-e4 Bf8-b4 6.Bc1-g5 c7-c5 7.Bf1xc4 c5xd4 8.Nf3xd4 Bb4xc3+ 9.b2xc3 Qd8-a5 10.Nd4-b5 Nf6xe4 11.Qd1-d4 O-O 12.Qd4xe4 a7-a6 13.Bc4-d3 f7-f5 14.Qe4-b4 a6xb5 15.Qb4xa5 Ra8xa5 16.Bg5-e7 Rf8-e8 17.Be7-b4 Ra5-a8 18.Bd3xb5 Bc8-d7 19.Bb5-c4 Bd7-c6 20.O-O Kg8-f7 21.Bc4-b3 Ra8-a6 22.Rf1-d1 Nb8-d7 23.a2-a4
43/59+ 24:31 1.486.668.887 1.010.557 +0,22 1.c2-c4
43/59+ 28:17 1.715.530.258 1.010.397 +0,30 1.c2-c4
43/59 29:13 1.771.801.468 1.010.411 +0,26 1.c2-c4 Ng8-f6 2.Ng1-f3 e7-e6 3.g2-g3 d7-d5 4.Bf1-g2 d5xc4 5.Qd1-a4+ c7-c6 6.Qa4xc4 b7-b5 7.Qc4-b3 Bc8-b7 8.O-O Nb8-d7 9.d2-d4 a7-a6 10.Nf3-e5 Nd7xe5 11.d4xe5 Nf6-d5 12.a2-a4 Bf8-e7 13.Nb1-c3 O-O 14.Nc3xd5 c6xd5 15.Bc1-e3 b5-b4 16.a4-a5 Bb7-c6 17.Be3-b6 Qd8-e8 18.Rf1-d1 Ra8-c8 19.Ra1-c1 Bc6-a4 20.Rc1xc8 Ba4xb3 21.Rc8xe8
One way of measuring how many pawn unit an engine is lying is by subtracting the root score from the pv leaf score.
Example from sf13.

Code: Select all

 43/55	24:56	1.346.285.181	899.530	+0,21	1.e2-e4 e7-e5 2.Ng1-f3 Nb8-c6 3.Bf1-b5 Ng8-f6 4.O-O Nf6xe4 5.Rf1-e1 Ne4-d6 6.Nf3xe5 Bf8-e7 7.Bb5-f1 Nc6xe5 8.Re1xe5 O-O 9.d2-d4 Be7-f6 10.Re5-e1 Rf8-e8 11.Bc1-f4 Re8xe1 12.Qd1xe1 Nd6-e8 13.c2-c3 d7-d5 14.Bf1-d3 g7-g6 15.Nb1-d2 Ne8-g7 16.Nd2-f3 c7-c6 17.Bf4-h6 a7-a5 18.Qe1-e3 Bc8-g4 19.Nf3-e5 Ng7-f5 20.Bd3xf5 Bg4xf5 21.Ra1-e1 a5-a4 22.h2-h3
root_score = 0.21

Take the pv.

Code: Select all

1.e2-e4 e7-e5 2.Ng1-f3 Nb8-c6 3.Bf1-b5 Ng8-f6 4.O-O Nf6xe4 5.Rf1-e1 Ne4-d6 6.Nf3xe5 Bf8-e7 7.Bb5-f1 Nc6xe5 8.Re1xe5 O-O 9.d2-d4 Be7-f6 10.Re5-e1 Rf8-e8 11.Bc1-f4 Re8xe1 12.Qd1xe1 Nd6-e8 13.c2-c3 d7-d5 14.Bf1-d3 g7-g6 15.Nb1-d2 Ne8-g7 16.Nd2-f3 c7-c6 17.Bf4-h6 a7-a5 18.Qe1-e3 Bc8-g4 19.Nf3-e5 Ng7-f5 20.Bd3xf5 Bg4xf5 21.Ra1-e1 a5-a4 22.h2-h3
Go to the leaf node of that pv, analyze it and get the root_score as the leaf score.

[d]r2q2k1/1p3p1p/2p2bpB/3pNb2/p2P4/2P1Q2P/PP3PP1/4R1K1 b - - 0 22

Code: Select all

score_lie = leaf_score - root_score

Code: Select all

abs_score_lie = abs(score_lie)
You can add more positions and get the average, min, max, stdev and others.

You can compare with other engines. Perhaps sf average score_lie is smaller. This will also check if the engine search producing the pv is normal. For engines with equal strength, I would prefer an engine with lower lies :)