modern engines with small number of nodes fail in the turing test

Uri Blass · Post by **Uri Blass** » Wed May 01, 2024 9:13 pm

Stockfish with small number of nodes blundered here with 18.Nxb5
It seems stockfish pruned Qxb5 that it considered to be a bad capture(a mistake that no strong human player is going to do and unfortunately many strong engines copied the same idea because old engines usually reject Nxb5 at depth 1 or 2)

[d]2r2rk1/pq3ppp/3bpn2/1p6/2PNP1PP/P3BP2/1PQ5/R3K2R w KQ - 0 18

Stockfish_24042819_x64_avx2:
NNUE evaluation using nn-ae6a388e4a1a.nnue (132MiB, (22528, 3072, 15, 32, 1))
NNUE evaluation using nn-baff1ede1f90.nnue (6MiB, (22528, 128, 15, 32, 1))
1/4 00:00 53 27k +0.87 Nd4xb5
2/2 00:00 102 51k +0.87 Nd4xb5
3/2 00:00 151 76k +0.87 Nd4xb5
4/2 00:00 200 100k +0.87 Nd4xb5
5/4 00:00 345 173k +1.14 Nd4xb5 Bd6-g3+ Ke1-e2
6/4 00:00 403 202k +1.14 Nd4xb5 Bd6-g3+ Ke1-e2
7/5 00:00 522 261k +1.31 Nd4xb5 Bd6-g3+ Ke1-e2
8/7 00:00 1k 345k +1.30 Nd4xb5 Bd6-g3+ Ke1-e2 a7-a6 Nb5-d4
9/13 00:00 5k 472k +0.54 Nd4xb5 Bd6-g3+ Ke1-d2 a7-a6 Nb5-d4 h7-h5
10/13 00:00 8k 423k +0.47 Nd4xb5 Bd6-g3+ Ke1-d2 h7-h5 Ra1-g1 a7-a6 Nb5-d4 Bg3-e5 g4-g5 Nf6-d7 Rg1-d1
11/17 00:00 19k 494k +0.27 Nd4xb5 Bd6-g3+ Ke1-d1 h7-h5 g4-g5
12/12 00:00 36k 526k -0.64 O-O Rc8xc4 Qc2-e2 h7-h5 Ra1-c1 h5xg4
13/17 00:00 61k 545k -0.74 Ra1-d1 b5xc4 h4-h5 h7-h6 g4-g5 h6xg5 Be3xg5 Rf8-d8

I find that old engines are closer to pass the turing test.

Here is for example fruit2.1

FEN: 2r2rk1/pq3ppp/3bpn2/1p6/2PNP1PP/P3BP2/1PQ5/R3K2R w KQ - 0 18

Fruit_21:
1/8 00:00 15 64 -5.15 c4xb5 Rc8xc2 Nd4xc2 Qb7xb5
1/8 00:00 30 128 +1.22 Nd4xb5 Bd6-g3+ Ke1-e2
2/8 00:00 953 4k -2.01 Nd4xb5 Qb7xb5 c4xb5 Rc8xc2 Be3xa7 Rc2xb2
2/8 00:00 1k 5k -0.01 Qc2-b3 Rc8xc4 Qb3xb5

Here is Anmon

FEN: 2r2rk1/pq3ppp/3bpn2/1p6/2PNP1PP/P3BP2/1PQ5/R3K2R w KQ - 0 18

AnMon 5.75:
1+ 00:00 7 32 +1.39 Nd4xb5
1 00:00 13 59 +1.56 Nd4xb5
2 00:00 200 917 -1.85 Nd4xb5 Qb7xb5
2+ 00:00 292 1k -0.76 h4-h5 Nf6xe4
2 00:00 404 2k -0.57 h4-h5 b5xc4
2+ 00:00 493 2k -0.47 c4-c5 Bd6xc5
2 00:00 495 2k -0.47 c4-c5 Bd6xc5
2+ 00:00 724 3k -0.40 b2-b3 Nf6xg4
2 00:00 823 4k +0.19 b2-b3 b5xc4 b3xc4

Here is bersesrk13 that is a new engine that needs more nodes.

FEN: 2r2rk1/pq3ppp/3bpn2/1p6/2PNP1PP/P3BP2/1PQ5/R3K2R w KQ - 0 18

Berserk13:
time -1 start 99982703 alloc 0 max 2147483647 depth 200 timeset 0 searchmoves 0
1/1 00:00 62 62k +0.12 Nd4xb5
2/2 00:00 107 107k +0.12 Nd4xb5 Bd6-g3+
3/3 00:00 157 157k +0.12 Nd4xb5 Bd6-g3+ Be3-f2
4/4 00:00 303 303k -0.84 O-O Rc8xc4
5/5 00:00 466 466k +1.03 Nd4xb5
6/6 00:00 598 598k +1.24 Nd4xb5
7/8 00:00 1k 1,260k +1.10 Nd4xb5 Bd6-g3+ Ke1-e2 a7-a6 Nb5-d4 Nf6-d7 b2-b3
8/10 00:00 3k 3,154k +0.98 Nd4xb5 Bd6-g3+ Ke1-f1 a7-a6 Nb5-d4 h7-h5 g4-g5 Nf6-d7 b2-b3
9/12 00:00 12k 801k -0.64 O-O Rc8xc4 Qc2-e2 h7-h5 g4-g5 Nf6-d7 Ra1-d1

Here is Stockfish10 that also prune Qxb5

FEN: 2r2rk1/pq3ppp/3bpn2/1p6/2PNP1PP/P3BP2/1PQ5/R3K2R w KQ - 0 18

Stockfish_10_x64_popcnt:
1/1 00:00 71 36k +2.39 Nd4xb5 Bd6-g3+ Ke1-e2
2/3 00:00 132 66k +2.39 Nd4xb5 Bd6-g3+ Ke1-e2
3/4 00:00 207 104k +2.08 Nd4xb5 Bd6-g3+ Ke1-e2 a7-a6
4/6 00:00 330 165k +2.15 Nd4xb5 Bd6-g3+ Be3-f2 Bg3xf2+ Ke1xf2 a7-a6
5/7 00:00 1k 637k +2.51 Nd4xb5 Bd6-g3+ Ke1-e2 a7-a6
6/7 00:00 2k 536k +2.58 Nd4xb5 Bd6-c5 Be3xc5 Rc8xc5 Ke1-e2 a7-a6
7/7 00:00 2k 637k +2.79 Nd4xb5 Bd6-c5 Be3xc5 Rc8xc5 Ke1-e2 Rf8-c8 b2-b3
8/8 00:00 3k 842k +3.46 Nd4xb5 Bd6-g3+ Ke1-e2 Bg3-b8 Be3-d4
9/11 00:00 8k 1,213k +3.37 Nd4xb5 Bd6-g3+ Ke1-f1 a7-a5 Ra1-d1 Rf8-d8 b2-b3 Bg3-e5 Rd1xd8+ Rc8xd8
10/14 00:00 24k 1,494k +1.04 b2-b3 b5xc4 b3xc4 Qb7-c7 Nd4-b5 Qc7xc4 Qc2xc4 Bd6-g3+ Ke1-e2 Rc8xc4 Nb5xa7 Rc4-c2+ Ke2-d3
11/16 00:00 127k 1,602k +0.23 Ke1-e2 Rc8xc4 Qc2-d3 Nf6-d7 Ra1-c1 Nd7-e5 Qd3-b3 Rf8-b8 Rc1xc4 Ne5xc4 Qb3-c3 Qb7-c7

Here is old Yace

FEN: 2r2rk1/pq3ppp/3bpn2/1p6/2PNP1PP/P3BP2/1PQ5/R3K2R w KQ - 0 18

Yace:
1 00:00 25 324 -4.27 c4xb5 Rc8xc2 Nd4xc2 Qb7xb5
1 00:00 55 714 -1.77 Nd4xb5 Qb7xb5 c4xb5 Rc8xc2 Be3xa7 Rc2xb2
1 00:00 81 1k -0.72 O-O Rc8xc4
1 00:00 138 138k -0.42 b2-b4 Rc8xc4
1 00:00 160 160k -0.38 a3-a4 Rc8xc4
1 00:00 415 415k +0.39 Qc2-d2 Rc8xc4 Nd4xe6
1 00:00 476 476k +0.42 Qc2-b3 Rc8xc4 Qb3xb5
1 00:00 662 662k +0.63 b2-b3 b5xc4 b3xc4
1 00:00 662 662k +0.63 b2-b3 b5xc4 b3xc4

old SOS

FEN: 2r2rk1/pq3ppp/3bpn2/1p6/2PNP1PP/P3BP2/1PQ5/R3K2R w KQ - 0 18

SOS 5.1 for Arena:
1/14 00:00 3k 42k +0.10 Qc2-d1 b5xc4 Nd4xe6 f7xe6 Qd1xd6 Qb7xb2 Qd6xe6+
2/14 00:00 4k 35k +0.10 Qc2-b3 Rc8xc4 Qb3xb5
3/14 00:00 4k 38k +0.10 Qc2-b3 Rc8xc4 Qb3xb5

abulmo2 · Post by **abulmo2** » Thu May 02, 2024 12:01 am

My simple engine Dumb 2.1, which is about the same league as Fruit 2.1 by its Elo, also takes some depth iterations (7) to find that Nxb5 is a wrong move. The culprit is Late Move Pruning. Old engines were more conservative with their moves and did not prune so aggressively moves (wrongly) assumed of poor quality.

Antihelion · Post by **Antihelion** » Thu May 02, 2024 12:41 am

Modern engines are not designed for small number of nodes. Also, what's this "turing test" you are referring to?

Uri Blass · Post by **Uri Blass** » Thu May 02, 2024 1:08 am

Antihelion wrote: ↑Thu May 02, 2024 12:41 am Modern engines are not designed for small number of nodes. Also, what's this "turing test" you are referring to?

A machine pass the turing test in some task if you cannot detect that is machine and not human based on the way it does the task.
For a chess engine passing the turing test means that observer cannot detect that it is an engine based on the choice of the moves of it(of course at high level it play too strong to be a human but in theory an engine can pass the turing test at low level.

I can detect that sombody with high rating that make this type of stupid tactical mistake Nxb5 is an engine because for strong humans it is easy to see that it is a bad move.

RubiChess · Post by **RubiChess** » Thu May 02, 2024 7:47 am

Uri Blass wrote: ↑Thu May 02, 2024 1:08 am
Antihelion wrote: ↑Thu May 02, 2024 12:41 am Modern engines are not designed for small number of nodes. Also, what's this "turing test" you are referring to?
A machine pass the turing test in some task if you cannot detect that is machine and not human based on the way it does the task.

Passing the turing test by finding a chess move that > 50% of humans will fail to find (not knowing how many humans even know the rules of chess). Well, strange definition.

Uri Blass · Post by **Uri Blass** » Thu May 02, 2024 8:33 am

RubiChess wrote: ↑Thu May 02, 2024 7:47 am
Uri Blass wrote: ↑Thu May 02, 2024 1:08 am
Antihelion wrote: ↑Thu May 02, 2024 12:41 am Modern engines are not designed for small number of nodes. Also, what's this "turing test" you are referring to?
A machine pass the turing test in some task if you cannot detect that is machine and not human based on the way it does the task.
Passing the turing test by finding a chess move that > 50% of humans will fail to find (not knowing how many humans even know the rules of chess). Well, strange definition.

An engine pass the turing test from my point of view in a specific position(with small fixed nodes that cause it to play at the level of strong human chess player)
If you cannot know based on the move that it prefers that it is an engine and not some strong human player(it does not suggest a move that it is illogical to expect from strong humans) .

of course I know engines fail the turing test in some fortress position because they blunder by gxh2 with small number of nodes

[d]8/4k3/8/8/1p1p1p1p/pPpPpPpP/P1P1P1PR/3KR3 b - - 0 1

My point is that I think that modern engines fail the turing test more often in positions from practical chess games because they miss some simple tactics that is not a typical tactics for humans to miss.

Antihelion · Post by **Antihelion** » Fri May 03, 2024 7:59 am

This is not a test. You are manipulating the conditions, the baseline, and the test cases so that whatever the engine suggests to you, it always fails your supposed "test". This has no practical or theoretical value, and all you are doing is to reinforce your incorrect, outdated, and ignorant assumptions on how chess engines actually work.

Uri Blass · Post by **Uri Blass** » Fri May 03, 2024 8:17 am

Antihelion wrote: ↑Fri May 03, 2024 7:59 am This is not a test. You are manipulating the conditions, the baseline, and the test cases so that whatever the engine suggests to you, it always fails your supposed "test". This has no practical or theoretical value, and all you are doing is to reinforce your incorrect, outdated, and ignorant assumptions on how chess engines actually work.

I do not see why an engine always fails my supposed test.
Engine at specific number of nodes and specific playing strength and specific position pass the test if it suggest a move that it is logical to expect humans with the relevant strength to play.

modern engines with small number of nodes fail in the turing test

modern engines with small number of nodes fail in the turing test

Re: modern engines with small number of nodes fail in the turing test

Re: modern engines with small number of nodes fail in the turing test

Re: modern engines with small number of nodes fail in the turing test

Re: modern engines with small number of nodes fail in the turing test

Re: modern engines with small number of nodes fail in the turing test

Re: modern engines with small number of nodes fail in the turing test

Re: modern engines with small number of nodes fail in the turing test