Stockfish Natural TB loses heavily to Stockfish master

Discussion of anything and everything relating to chess playing software and machines.

Moderators: Harvey Williamson, bob, hgm

Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
Post Reply
User avatar
Laskos
Posts: 10806
Joined: Wed Jul 26, 2006 8:21 pm
Full name: Kai Laskos

Stockfish Natural TB loses heavily to Stockfish master

Post by Laskos » Sun Sep 03, 2017 8:24 am

Both from the 2nd of september (Natural TB vs master). In this thread
http://talkchess.com/forum/viewtopic.php?t=65065
I showed that building a 3-mover EPD opening book from this endgame, unbalanced, borderline, tricky position

[D]1nn1k1n1/4p3/8/8/8/PPPPPPPP/8/4K3 w - - 0 1

gives an endgame test suite very sensitive on TBs, I used 6-men Syzygy from SSD. I got +76 ELO points SF Syzygy against SF NO TB difference from this endgame suite. I decided to test SF Syzygy master vs SF Natural TB, as Marco Costalba claimed he got a whopping 15 ELO points better with NTB from regular openings (2moves_v1. epd), which is quite hard to believe. The total contribution of TBs from regular openings is about that magnitude.
The tricky endgame openings are here (1000+ of them):
http://s000.tinyupload.com/?file_id=009 ... 2378736001

At 15''+ 0.15'' time control, the result in favor of Master vs Natural TB was so skewed, that I interrupted the match as no contest:

Code: Select all

Games Completed = 400 of 1000 (Avg game length = 38.117 sec)
Settings = RR/32MB/15000ms+150ms/M 1000000cp for 1000 moves, D 200000 moves/EPD:C:\LittleBlitzer\Disbalance2.epd(1079)
Time = 1935 sec elapsed, 2903 sec remaining
 1.  Stockfish 020917 64 BMI2       224.0/400	93-45-262  	(L: m=45 t=0 i=0 a=0)	(D: r=101 i=102 f=52 s=7 a=0)	(tpm=341.2 d=22.36 nps=1678295)
 2.  Stockfish 020917 64 BMI2 NTB	176.0/400	45-93-262  	(L: m=92 t=1 i=0 a=0)	(D: r=101 i=102 f=52 s=7 a=0)	(tpm=373.1 d=28.03 nps=1717616)
LOS=99.998% for master, ELO difference in favor of master of 42 ELO points.

I used LittleBlitzer to have relevant stats. Next I will test SF Natural vs SF NO TB, to see if Natural gives any benefit at all at this time control.

User avatar
Laskos
Posts: 10806
Joined: Wed Jul 26, 2006 8:21 pm
Full name: Kai Laskos

Re: Stockfish Natural TB loses heavily to Stockfish master

Post by Laskos » Sun Sep 03, 2017 10:28 am

Laskos wrote:Both from the 2nd of september (Natural TB vs master). In this thread
http://talkchess.com/forum/viewtopic.php?t=65065
I showed that building a 3-mover EPD opening book from this endgame, unbalanced, borderline, tricky position

[D]1nn1k1n1/4p3/8/8/8/PPPPPPPP/8/4K3 w - - 0 1

gives an endgame test suite very sensitive on TBs, I used 6-men Syzygy from SSD. I got +76 ELO points SF Syzygy against SF NO TB difference from this endgame suite. I decided to test SF Syzygy master vs SF Natural TB, as Marco Costalba claimed he got a whopping 15 ELO points better with NTB from regular openings (2moves_v1. epd), which is quite hard to believe. The total contribution of TBs from regular openings is about that magnitude.
The tricky endgame openings are here (1000+ of them):
http://s000.tinyupload.com/?file_id=009 ... 2378736001

At 15''+ 0.15'' time control, the result in favor of Master vs Natural TB was so skewed, that I interrupted the match as no contest:

Code: Select all

Games Completed = 400 of 1000 (Avg game length = 38.117 sec)
Settings = RR/32MB/15000ms+150ms/M 1000000cp for 1000 moves, D 200000 moves/EPD:C:\LittleBlitzer\Disbalance2.epd(1079)
Time = 1935 sec elapsed, 2903 sec remaining
 1.  Stockfish 020917 64 BMI2       224.0/400	93-45-262  	(L: m=45 t=0 i=0 a=0)	(D: r=101 i=102 f=52 s=7 a=0)	(tpm=341.2 d=22.36 nps=1678295)
 2.  Stockfish 020917 64 BMI2 NTB	176.0/400	45-93-262  	(L: m=92 t=1 i=0 a=0)	(D: r=101 i=102 f=52 s=7 a=0)	(tpm=373.1 d=28.03 nps=1717616)
LOS=99.998% for master, ELO difference in favor of master of 42 ELO points.

I used LittleBlitzer to have relevant stats. Next I will test SF Natural vs SF NO TB, to see if Natural gives any benefit at all at this time control.
Result SF Natural TB versus SF NO TB is not that conclusive even after 1000 games from this suite:

Code: Select all

Games Completed = 1000 of 1000 (Avg game length = 45.856 sec)
Settings = RR/32MB/15000ms+150ms/M 1000000cp for 1000 moves, D 200000 moves/EPD:C:\LittleBlitzer\Disbalance2.epd(1079)
Time = 5851 sec elapsed, 0 sec remaining
 1.  Stockfish 020917 64 BMI2    	487.5/1000	118-143-739  	(L: m=142 t=1 i=0 a=0)	(D: r=223 i=139 f=359 s=18 a=0)	(tpm=330.9 d=24.98 nps=2111334)
 2.  Stockfish 020917 64 BMI2 NTB	512.5/1000	143-118-739  	(L: m=118 t=0 i=0 a=0)	(D: r=223 i=139 f=359 s=18 a=0)	(tpm=334.1 d=31.99 nps=1900478)
LOS=93.9% for SF Natural TB, ELO difference +9 ELo points. More games are needed here, but for this tricky endgame suite, heavily depending on DTZ and 50-move rule, Natural TB comes closer to NO TB than to Master Syzygy. Did Marco rip off DTZ from the probing? The results with Natural TB is reminiscent of those WDL EGBBs.

User avatar
Laskos
Posts: 10806
Joined: Wed Jul 26, 2006 8:21 pm
Full name: Kai Laskos

Re: Stockfish Natural TB loses heavily to Stockfish master

Post by Laskos » Sun Sep 03, 2017 12:06 pm

My main concern is that I used some broken version of NTB2 (I used NTB2). Could someone check that this compile of NTB2 (2nd of September) is a correct one?
https://mega.nz/#!iMhRDLSL!Usl2xEoQxas_ ... mddf9ywWN0

To mention again that I use 6-men Syzygy from SSD.

syzygy
Posts: 4803
Joined: Tue Feb 28, 2012 10:56 pm

Re: Stockfish Natural TB loses heavily to Stockfish master

Post by syzygy » Sun Sep 03, 2017 12:48 pm

Laskos wrote:Did Marco rip off DTZ from the probing?
Yes, he did. And he will probably commit it.

So serious users will have to look for alternatives.

In his words "The reason why DTZ are ripped out it is because they proved to be totally useless in real games". Anything else is an illusion (his words).

User avatar
Laskos
Posts: 10806
Joined: Wed Jul 26, 2006 8:21 pm
Full name: Kai Laskos

Re: Stockfish Natural TB loses heavily to Stockfish master

Post by Laskos » Sun Sep 03, 2017 1:19 pm

syzygy wrote:
Laskos wrote:Did Marco rip off DTZ from the probing?
Yes, he did. And he will probably commit it.

So serious users will have to look for alternatives.

In his words "The reason why DTZ are ripped out it is because they proved to be totally useless in real games". Anything else is an illusion (his words).
??
Seriously, he said that?
He never saw WDL bases, and all algorithmic attempts to make them work? The behavior of this SF NTB2 is very similar to that of old WDL EGBBs, they improve the strength just a little bit even in very sensitive endgames.

I continued SF NTB2 vesrus SF NO TB to 2000 games from this very sensitive suite, and finally got a decisive result

+300 -222 =1478 for SF NTB2 against SF NO TB
+14 ELO points
LOS=99.97%

Compare that to +76 ELO of master Syzygy on this suite.

The funniest thing is that these Natural TB2s miss the win at the root. From my Hard 6-men wins suite, in 100 games:

Code: Select all

Games Completed = 100 of 100 (Avg game length = 25.580 sec)
Settings = RR/32MB/15000ms+150ms/M 1000000cp for 1000 moves, D 200000 moves/EPD:C:\LittleBlitzer\Hard6men.epd(359)
Time = 358 sec elapsed, 0 sec remaining
 1.  Stockfish 020917 64 BMI2 Syzygy Master 	69.0/100	50-12-38  	(L: m=12 t=0 i=0 a=0)	(D: r=14 i=4 f=18 s=2 a=0)	(tpm=228.2 d=19.44 nps=1068208)
 2.  Stockfish 020917 64 BMI2 NTB2             31.0/100	12-50-38  	(L: m=50 t=0 i=0 a=0)	(D: r=14 i=4 f=18 s=2 a=0)	(tpm=426.9 d=30.33 nps=1690359)
Enabled with Syzygy 6-men from SSD, SF NTB2 missed 38 out of 50 hard 6-men wins at the root. Master Syzygy never miss any TB win at the root.

User avatar
Laskos
Posts: 10806
Joined: Wed Jul 26, 2006 8:21 pm
Full name: Kai Laskos

Re: Stockfish Natural TB loses heavily to Stockfish master

Post by Laskos » Sun Sep 03, 2017 6:16 pm

Laskos wrote:
syzygy wrote:
Laskos wrote:Did Marco rip off DTZ from the probing?
Yes, he did. And he will probably commit it.

So serious users will have to look for alternatives.

In his words "The reason why DTZ are ripped out it is because they proved to be totally useless in real games". Anything else is an illusion (his words).
??
Seriously, he said that?
He never saw WDL bases, and all algorithmic attempts to make them work? The behavior of this SF NTB2 is very similar to that of old WDL EGBBs, they improve the strength just a little bit even in very sensitive endgames.

I continued SF NTB2 vesrus SF NO TB to 2000 games from this very sensitive suite, and finally got a decisive result

+300 -222 =1478 for SF NTB2 against SF NO TB
+14 ELO points
LOS=99.97%

Compare that to +76 ELO of master Syzygy on this suite.

The funniest thing is that these Natural TB2s miss the win at the root. From my Hard 6-men wins suite, in 100 games:

Code: Select all

Games Completed = 100 of 100 (Avg game length = 25.580 sec)
Settings = RR/32MB/15000ms+150ms/M 1000000cp for 1000 moves, D 200000 moves/EPD:C:\LittleBlitzer\Hard6men.epd(359)
Time = 358 sec elapsed, 0 sec remaining
 1.  Stockfish 020917 64 BMI2 Syzygy Master 	69.0/100	50-12-38  	(L: m=12 t=0 i=0 a=0)	(D: r=14 i=4 f=18 s=2 a=0)	(tpm=228.2 d=19.44 nps=1068208)
 2.  Stockfish 020917 64 BMI2 NTB2             31.0/100	12-50-38  	(L: m=50 t=0 i=0 a=0)	(D: r=14 i=4 f=18 s=2 a=0)	(tpm=426.9 d=30.33 nps=1690359)
Enabled with Syzygy 6-men from SSD, SF NTB2 missed 38 out of 50 hard 6-men wins at the root. Master Syzygy never miss any TB win at the root.
I also tested Easy_Mate2 patch against master, no significant difference:

Code: Select all

Games Completed = 1000 of 1000 (Avg game length = 35.544 sec)
Settings = RR/32MB/15000ms+150ms/M 1000000cp for 1000 moves, D 200000 moves/EPD:C:\LittleBlitzer\Disbalance2.epd(1079)
Time = 5161 sec elapsed, 0 sec remaining
 1.  Stockfish 030917 64 BMI2 Master    	506.5/1000	236-223-541  	(L: m=223 t=0 i=0 a=0)	(D: r=202 i=190 f=140 s=9 a=0)	(tpm=350.4 d=23.26 nps=1933287)
 2.  Stockfish 030917 64 BMI2 Easy_Mate2	493.5/1000	223-236-541  	(L: m=236 t=0 i=0 a=0)	(D: r=202 i=190 f=140 s=9 a=0)	(tpm=351.7 d=22.99 nps=1942667)
-4 +/- 8 ELO points (pentanomial error margins). If difference isn't detected in 1000 games on this very sensitive suite, it won't be detected from 2moves_v1.epd even in 20,000 games. But I might give it another run of 1000 games from this suite.

Uri Blass
Posts: 8795
Joined: Wed Mar 08, 2006 11:37 pm
Location: Tel-Aviv Israel

Re: Stockfish Natural TB loses heavily to Stockfish master

Post by Uri Blass » Sun Sep 03, 2017 6:21 pm

syzygy wrote:
Laskos wrote:Did Marco rip off DTZ from the probing?
Yes, he did. And he will probably commit it.

So serious users will have to look for alternatives.

In his words "The reason why DTZ are ripped out it is because they proved to be totally useless in real games". Anything else is an illusion (his words).
I wonder if real games are games that you adjudicate the result based on evaluation so the program does not need to prove the win in tablebase position.

User avatar
Laskos
Posts: 10806
Joined: Wed Jul 26, 2006 8:21 pm
Full name: Kai Laskos

Re: Stockfish Natural TB loses heavily to Stockfish master

Post by Laskos » Sun Sep 03, 2017 8:04 pm

Laskos wrote:I also tested Easy_Mate2 patch against master, no significant difference:

Code: Select all

Games Completed = 1000 of 1000 (Avg game length = 35.544 sec)
Settings = RR/32MB/15000ms+150ms/M 1000000cp for 1000 moves, D 200000 moves/EPD:C:\LittleBlitzer\Disbalance2.epd(1079)
Time = 5161 sec elapsed, 0 sec remaining
 1.  Stockfish 030917 64 BMI2 Master    	506.5/1000	236-223-541  	(L: m=223 t=0 i=0 a=0)	(D: r=202 i=190 f=140 s=9 a=0)	(tpm=350.4 d=23.26 nps=1933287)
 2.  Stockfish 030917 64 BMI2 Easy_Mate2	493.5/1000	223-236-541  	(L: m=236 t=0 i=0 a=0)	(D: r=202 i=190 f=140 s=9 a=0)	(tpm=351.7 d=22.99 nps=1942667)
-4 +/- 8 ELO points (pentanomial error margins). If difference isn't detected in 1000 games on this very sensitive suite, it won't be detected from 2moves_v1.epd even in 20,000 games. But I might give it another run of 1000 games from this suite.
The second run went as:
6-men Syzygy from SSD. No adjudications.

Code: Select all

Games Completed = 1000 of 1000 (Avg game length = 34.844 sec)
Settings = RR/32MB/15000ms+150ms/M 1000000cp for 1000 moves, D 200000 moves/EPD:C:\LittleBlitzer\Disbalance2.epd(1079)
Time = 5061 sec elapsed, 0 sec remaining
 1.  Stockfish 030917 64 BMI2 Master     	509.0/1000	242-224-534  	(L: m=224 t=0 i=0 a=0)	(D: r=199 i=203 f=124 s=8 a=0)	(tpm=353.4 d=23.63 nps=1965888)
 2.  Stockfish 030917 64 BMI2 Easy_Mate2 	491.0/1000	224-242-534  	(L: m=242 t=0 i=0 a=0)	(D: r=199 i=203 f=124 s=8 a=0)	(tpm=356.9 d=23.42 nps=1986997)
2 runs combined: -5 +/- 6 ELO points (pentanomial error margins), still within error margins. As the ELO sensitivity of 2moves_v1.epd to Syzygy is maybe 5 times weaker than that of the endgame suite used by me, if there is any difference, from 2moves_v1.epd it would be in the range of 1 ELO point.

syzygy
Posts: 4803
Joined: Tue Feb 28, 2012 10:56 pm

Re: Stockfish Natural TB loses heavily to Stockfish master

Post by syzygy » Sun Sep 03, 2017 11:04 pm

Laskos wrote:2 runs combined: -5 +/- 6 ELO points (pentanomial error margins), still within error margins.
Thanks.

I suspect that the error margin applies here. At least there is no difference in the ability to convert TB wins found into mate.

Ferdy
Posts: 4295
Joined: Sun Aug 10, 2008 1:15 pm
Location: Philippines

Re: Stockfish Natural TB loses heavily to Stockfish master

Post by Ferdy » Mon Sep 04, 2017 3:13 am

Laskos wrote:
syzygy wrote:
Laskos wrote:Did Marco rip off DTZ from the probing?
Yes, he did. And he will probably commit it.

So serious users will have to look for alternatives.

In his words "The reason why DTZ are ripped out it is because they proved to be totally useless in real games". Anything else is an illusion (his words).
??
Seriously, he said that?
He never saw WDL bases, and all algorithmic attempts to make them work? The behavior of this SF NTB2 is very similar to that of old WDL EGBBs, they improve the strength just a little bit even in very sensitive endgames.

I continued SF NTB2 vesrus SF NO TB to 2000 games from this very sensitive suite, and finally got a decisive result

+300 -222 =1478 for SF NTB2 against SF NO TB
+14 ELO points
LOS=99.97%

Compare that to +76 ELO of master Syzygy on this suite.

The funniest thing is that these Natural TB2s miss the win at the root. From my Hard 6-men wins suite, in 100 games:

Code: Select all

Games Completed = 100 of 100 (Avg game length = 25.580 sec)
Settings = RR/32MB/15000ms+150ms/M 1000000cp for 1000 moves, D 200000 moves/EPD:C:\LittleBlitzer\Hard6men.epd(359)
Time = 358 sec elapsed, 0 sec remaining
 1.  Stockfish 020917 64 BMI2 Syzygy Master 	69.0/100	50-12-38  	(L: m=12 t=0 i=0 a=0)	(D: r=14 i=4 f=18 s=2 a=0)	(tpm=228.2 d=19.44 nps=1068208)
 2.  Stockfish 020917 64 BMI2 NTB2             31.0/100	12-50-38  	(L: m=50 t=0 i=0 a=0)	(D: r=14 i=4 f=18 s=2 a=0)	(tpm=426.9 d=30.33 nps=1690359)
Enabled with Syzygy 6-men from SSD, SF NTB2 missed 38 out of 50 hard 6-men wins at the root. Master Syzygy never miss any TB win at the root.
How about running the 100 match again but increase the TC (say 120000s + 150ms), the goal is to see if NaturalTB can improve by having more time.
I don't have 6-men sy, so I cannot test this myself at the moment.

Post Reply