Stockfish Natural TB loses heavily to Stockfish master

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Stockfish Natural TB loses heavily to Stockfish master

Post by Laskos »

mcostalba wrote:Kay, do you want to get some fun? Get ready to have fun, I will post shortly the version with included no-silly sacrifices.
You wrote in your pull request:

Code: Select all

Currently this version plays in a "natural" way during the game and in a perfect way when root position is in TB and DTZ tables are available.

I have tested on more then 2000 endgame 5-men positions and it works for me.
You also posted the PGN with games having mixed 5-men and 6-men openings, having 5-men TBs, got a lucky run from 6-men positions, and claimed that master fails on 5-men root positions with 5-men TBs. In fact the only "forbidden" failures I saw in that PGN were that of "Natural" on 5-men postions. I posted one (from your PGN) in the previous post.

Now, I don't understand here the meaning of "no-silly sacrifices". Do you refer to solely to non-ELO losing artifacts of DTZ probing or the current state of ELO-losing "Natural" (fails from root position in TB)? I take "silliness" as weird moves, usually hilarious sacrifices.

=============================================

I tried this morning to quantify "naturalness" in term of "silliness" as seen by a strong human player. I took Fruit 2.1, an average engine, which is GM level, but at time control per position I used of 0.25s/move, behaves roughly as a pretty strong human. Then, in 1000 games at 0.25s/move, this quantified "naturalness" as "silliness" came as following:

Master:
1000 games
Silly moves in the range of N,B,R,Q sacrifices according to Fruit:
953

Utterly silly moves (Queen sacrifices) according to Fruit:
149

The histogram of the Fruit consideration of Master moves is the following:

Image





Early_Mate:
1000 games
Silly moves in the range of N,B,R,Q sacrifices according to Fruit:
883

Utterly silly moves (Queen sacrifices) according to Fruit:
100

The histogram of the Fruit consideration of Early_Mate moves is the following:

Image





TB-less:
1000 games
Silly moves in the range of N,B,R,Q sacrifices according to Fruit
658

Utterly silly moves (Queen sacrifices) according to Fruit:
22

The histogram of the Fruit consideration of TB-less moves is the following:

Image

============================

Therefore we see that early_mate does improve the "constructed" by me "naturalness", but there is a large room for improvement. Why don't you take on what is solidly built by Ronald, and develop it further? You probably saw by now that what Ronald achieved in Master is not trivial at all, and it's useless to perfect what works perfectly.
syzygy
Posts: 5557
Joined: Tue Feb 28, 2012 11:56 pm

Re: Stockfish Natural TB loses heavily to Stockfish master

Post by syzygy »

Toadofsky wrote:
syzygy wrote:
Laskos wrote:Did Marco rip off DTZ from the probing?
Yes, he did. And he will probably commit it.

So serious users will have to look for alternatives.

In his words "The reason why DTZ are ripped out it is because they proved to be totally useless in real games". Anything else is an illusion (his words).
While I do appreciate Marco's publishing his attempts so others can provide feedback, I don't understand his vague subjective criticism of your attempts. I could understand a claim that costs/benefits of DTZ in "real" games are challenging to test... but "Natural TB" seems equally difficult to test.

I suppose the good news is that (assuming an Elo loss on some or all machines) Git makes it possible for developers to revert or cherry-pick commits.
About DTZ, there is this recurring spin (just to be clear: not from you) that removing DTZ support from SF would save users the "trouble" of having to download those files. In reality, the use of DTZ tables has been entirely optional from the start. (Except for a long period after Marco's C++ rewrite where a bug crept in. It seems nobody (vocally) complained during that time, which suggests that only few users have WDL but not DTZ.)

Elo-wise their contribution will be minor; from any other point of view there is value in SF being able to reliably convert portions that it has announced as being won. But someone who wants to save the storage space or bandwidth can simply leave out the DTZ files and accept that some winning positions won't be converted.
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Stockfish Natural TB loses heavily to Stockfish master

Post by Laskos »

Laskos wrote:
mcostalba wrote:Kay, do you want to get some fun? Get ready to have fun, I will post shortly the version with included no-silly sacrifices.
You wrote in your pull request:

Code: Select all

Currently this version plays in a "natural" way during the game and in a perfect way when root position is in TB and DTZ tables are available.

I have tested on more then 2000 endgame 5-men positions and it works for me.
You also posted the PGN with games having mixed 5-men and 6-men openings, having 5-men TBs, got a lucky run from 6-men positions, and claimed that master fails on 5-men root positions with 5-men TBs. In fact the only "forbidden" failures I saw in that PGN were that of "Natural" on 5-men postions. I posted one (from your PGN) in the previous post.

Now, I don't understand here the meaning of "no-silly sacrifices". Do you refer to solely to non-ELO losing artifacts of DTZ probing or the current state of ELO-losing "Natural" (fails from root position in TB)? I take "silliness" as weird moves, usually hilarious sacrifices.

=============================================

I tried this morning to quantify "naturalness" in term of "silliness" as seen by a strong human player. I took Fruit 2.1, an average engine, which is GM level, but at time control per position I used of 0.25s/move, behaves roughly as a pretty strong human. Then, in 1000 games at 0.25s/move, this quantified "naturalness" as "silliness" came as following:

Master:
1000 games
Silly moves in the range of N,B,R,Q sacrifices according to Fruit:
953

Utterly silly moves (Queen sacrifices) according to Fruit:
149

The histogram of the Fruit consideration of Master moves is the following:

Image





Early_Mate:
1000 games
Silly moves in the range of N,B,R,Q sacrifices according to Fruit:
883

Utterly silly moves (Queen sacrifices) according to Fruit:
100

The histogram of the Fruit consideration of Early_Mate moves is the following:

Image





TB-less:
1000 games
Silly moves in the range of N,B,R,Q sacrifices according to Fruit
658

Utterly silly moves (Queen sacrifices) according to Fruit:
22

The histogram of the Fruit consideration of TB-less moves is the following:

Image

============================

Therefore we see that early_mate does improve the "constructed" by me "naturalness", but there is a large room for improvement. Why don't you take on what is solidly built by Ronald, and develop it further? You probably saw by now that what Ronald achieved in Master is not trivial at all, and it's useless to perfect what works perfectly.
Interesting results with Texel:

Texel + Syzygy + Gaviota:
1000 games
Silly moves in the range of N,B,R,Q sacrifices according to Fruit:
821

Utterly silly moves (Queen sacrifices) according to Fruit:
99



Texel TB-less
1000 games
Silly moves in the range of N,B,R,Q sacrifices according to Fruit:
513

Utterly silly moves (Queen sacrifices) according to Fruit:
11



And they have similar length of Wins (the shortest among different approaches). So, the short path to Win can be achieved both ways, by playing smoothly, or by making silly (maybe spectacular?) sacrifices. Texel + TBs is more Mate finding oriented in close to root positions, probably many shorter mates involve often "silly moves".
IQ
Posts: 162
Joined: Thu Dec 17, 2009 10:46 am

Re: Stockfish Natural TB loses heavily to Stockfish master

Post by IQ »

kranium wrote:
Well, many people 'tinker' with things as a way to learn how they work.
Take it apart, put it back together, break it, fix it, etc.

I don't see the harm of him tinkering with the TB code...his intentions are good, he'll figure it out.
I guess I don't understand why that's skin off your back.

Norm
Well, the problem here is that writing the tablebase generator and the probing code is no trivial task, there are a lot of symmetries, edge cases, awkward chess rules (en passant, castleling rights, 50 move rule etc), memory constraints to take into account. Quite frankly I consider that a major achievement by Ronald which I rank higher in difficulty than writing yet-another chess engine - if only because there are a lot less examples and prior work out there to look at.

Many of the design choices, code subleties and the reasoning behind them are what's called "sticky information." They are not easily conveyed by just looking at the code or documentation, they mainly reside in Ronalds head. Or to put it differently without input from the original author, its much much more time consuming to get all the finer points of the implementation. That's the learning curve Marco is on. Which would be no problem at all, if Marco wouldn't refuse to respect the original voice a little more, which might incidentally speed up his learning curve.

Short version: If you would perceive somebody to rape your brainchild and disrespect the fruit of your intellectual labour, how would you react? I think Ronald is to be applauded for his calmness and for his now fatalistic approach. Kudos to him! I still have hope Marco will see the light and who knows what a little more collaboration might yield....
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Stockfish Natural TB loses heavily to Stockfish master

Post by Laskos »

syzygy wrote:
Elo-wise their contribution will be minor; from any other point of view there is value in SF being able to reliably convert portions that it has announced as being won. But someone who wants to save the storage space or bandwidth can simply leave out the DTZ files and accept that some winning positions won't be converted.
I am a bit surprised to hear from you such statements. You could as well say that TBs generally are useless, Elo-wise their contribution is minor.

First, if Fishtest people are so incompetent as to run only "real games" from suites like 2moves_v1 in order to compare TB implementations, or generall TB Elo-gain, then they would better abandon all TBs for good, with "its ugly code nobody aside you understands".

Then there is an issue regarding perfect play with TBs at the root. It's again hard to see in what they call "real games" the Elo difference.

I consider this attitude towards the strongest ELO-wise engine a bit creepy.

I tried this morning WDL6 only compared to WDL6+DTZ6 (on SSD all) on easy, usually occurring in games 6-men Wins, WDL6 missed 64/500 of them, and this is highly is disturbing to me. I am often toying with close or not that close to the TBs endgames, and having a heuristic engine and imperfect TB play with WDL only, would ruin all my experiments.

Then, from 14-men sensitive endgame suite, I got the following, conclusive results in the gauntlet:

Code: Select all

Rank Name                          ELO     +/-   Games   Score   Draws
   1 SF_WDL6_DTZ6                   36      11    2000     55%     50%
   2 SF_WDL6                       -13      16    1000     48%     47%
   3 SF_No_TB                      -58      15    1000     42%     53%
Finished match
Huge ELO differences, but these are probably not "real games", "real games" are those of Fishtest.
syzygy
Posts: 5557
Joined: Tue Feb 28, 2012 11:56 pm

Re: Stockfish Natural TB loses heavily to Stockfish master

Post by syzygy »

Laskos wrote:
syzygy wrote:Elo-wise their contribution will be minor; from any other point of view there is value in SF being able to reliably convert portions that it has announced as being won. But someone who wants to save the storage space or bandwidth can simply leave out the DTZ files and accept that some winning positions won't be converted.
I am a bit surprised to hear from you such statements. You could as well say that TBs generally are useless, Elo-wise their contribution is minor.
But I did not equate minor Elo-wise with useless! I fully agree that it would be very unsatisfactory for an engine to announce a TB win found in the search and then fail to convert it.

My point was this: any argument that removing DTZ support has an advantage for users ("they no longer need to download those files") is misinformed and shows a lack of knowledge about current SF. It would not make anything possible that is not possible already.
Michel
Posts: 2272
Joined: Mon Sep 29, 2008 1:50 am

Re: Stockfish Natural TB loses heavily to Stockfish master

Post by Michel »

The SF people define elo as "elo measured from the starting position" (or something close to it). This metric is somewhat arbitrary but of course one cannot optimize for more than one metric.

Some changes do not have enough impact to be measurable by this metric (they will be "elo-neutral"), but they will still be relevant to the situations where they do apply (like some specialized endgame knowledge or DTZ tables).

The impact of such changes can still be measured objectively. It just requires an adapted set of starting positions. It is similar to using a microscope to enlarge things that are too small to be seen with the naked eye.
Ideas=science. Simplification=engineering.
Without ideas there is nothing to simplify.
mcostalba
Posts: 2684
Joined: Sat Jun 14, 2008 9:17 pm

Re: Stockfish Natural TB loses heavily to Stockfish master

Post by mcostalba »

Ok, here is my next attempt:

https://github.com/mcostalba/Stockfish/archive/NTB3.zip

I have extensively tested on 5-men and it works for me, it keeps the draws, converts the wins and is supposed to avoid silly sacrifices (although on 5-men testing this is limited).
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Stockfish Natural TB loses heavily to Stockfish master

Post by Laskos »

mcostalba wrote:Ok, here is my next attempt:

https://github.com/mcostalba/Stockfish/archive/NTB3.zip

I have extensively tested on 5-men and it works for me, it keeps the draws, converts the wins and is supposed to avoid silly sacrifices (although on 5-men testing this is limited).
Thanks, for now I am having troubles compiling it correctly. Can someone drop a link to a Windows x64 executable (I have a Haswell) here? I will check the NTB3 on 5-men and 6-men positions, then some of its properties as those analysed in this thread.

By the way, recently, on my very tough 6-men suite of Wins, I got poor results with Stockfish TB-less. It won only 13/100 of them, while Komodo 25/100, a significant difference. Even Shredder was better.

Code: Select all

Games Completed = 2600 of 2600 (Avg game length = 21.522 sec)
Settings = Gauntlet/32MB/5000ms+150ms/M 600000cp for 1000 moves, D 120000 moves/EPD:C:\LittleBlitzer\Wins_6men_E308.epd(308)
Time = 8161 sec elapsed, 0 sec remaining
 1.  Stockfish 050917 64 BMI2 Master Syzygy-6	1886.0/2600	1309-137-1154  	(L: m=137 t=0 i=0 a=0)	(D: r=350 i=169 f=614 s=21 a=0)	(tpm=186.0 d=22.75 nps=1739073)
 2.  Stockfish 050917 64 BMI2 	55.0/200	13-103-84  	(L: m=103 t=0 i=0 a=0)	(D: r=16 i=6 f=60 s=2 a=0)	(tpm=217.0 d=25.62 nps=2254423)
 3.  Komodo 11.2.2 64-bit     	62.5/200	25-100-75  	(L: m=100 t=0 i=0 a=0)	(D: r=32 i=7 f=34 s=2 a=0)	(tpm=207.1 d=20.12 nps=1985559)
 4.  Houdini 5.01 Pro x64-popc	57.0/200	14-100-86  	(L: m=100 t=0 i=0 a=0)	(D: r=34 i=13 f=36 s=3 a=0)	(tpm=216.5 d=17.72 nps=2489475)
 5.  Deep Shredder 13 x64     	58.0/200	16-100-84  	(L: m=100 t=0 i=0 a=0)	(D: r=30 i=10 f=42 s=2 a=0)	(tpm=192.1 d=22.38 nps=2792983)
 6.  Andscacs 0.92            	56.0/200	12-100-88  	(L: m=100 t=0 i=0 a=0)	(D: r=17 i=1 f=70 s=0 a=0)	(tpm=190.9 d=0.00 nps=1999985)
 7.  Fire 5 x64               	54.5/200	11-102-87  	(L: m=102 t=0 i=0 a=0)	(D: r=40 i=16 f=27 s=4 a=0)	(tpm=204.7 d=17.49 nps=2335855)
 8.  Booot 6.2_x64            	53.0/200	7-101-92  	(L: m=101 t=0 i=0 a=0)	(D: r=49 i=10 f=33 s=0 a=0)	(tpm=211.0 d=21.34 nps=971014)
 9.  Gull 3 x64               	53.0/200	6-100-94  	(L: m=100 t=0 i=0 a=0)	(D: r=22 i=23 f=49 s=0 a=0)	(tpm=223.2 d=15.17 nps=2502083)
10.  Fritz 15                 	56.0/200	12-100-88  	(L: m=100 t=0 i=0 a=0)	(D: r=12 i=24 f=52 s=0 a=0)	(tpm=226.2 d=19.29 nps=1143153)
11.  Fizbo 1.9                	50.0/200	0-100-100  	(L: m=100 t=0 i=0 a=0)	(D: r=36 i=28 f=36 s=0 a=0)	(tpm=223.9 d=16.72 nps=1991796)
12.  Hannibal 1.7 x64         	50.5/200	4-103-93  	(L: m=103 t=0 i=0 a=0)	(D: r=15 i=17 f=60 s=1 a=0)	(tpm=200.4 d=21.93 nps=2171275)
13.  Texel 1.07a27            	53.5/200	7-100-93  	(L: m=100 t=0 i=0 a=0)	(D: r=28 i=10 f=53 s=2 a=0)	(tpm=218.9 d=13.26 nps=2071925)
14.  Hakkapeliitta TCEC       	55.0/200	10-100-90  	(L: m=100 t=0 i=0 a=0)	(D: r=19 i=4 f=62 s=5 a=0)	(tpm=227.5 d=24.12 nps=3496192)
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Stockfish Natural TB loses heavily to Stockfish master

Post by Laskos »

mcostalba wrote:Ok, here is my next attempt:

https://github.com/mcostalba/Stockfish/archive/NTB3.zip

I have extensively tested on 5-men and it works for me, it keeps the draws, converts the wins and is supposed to avoid silly sacrifices (although on 5-men testing this is limited).
Ipman sent me a good and reliable compile . On hard 6-men Wins with 6-men TBs, NTB3 plays perfectly:

Score of SF_Master vs SF_NTB3: 308 - 308 - 0 [0.500] 616
ELO difference: 0.00 +/- 27.45
Finished match

With draws, it plays perfectly too. Good start!

Later I will test on the length of the path to win compared to early_mate2 (the best TB implementation on this for SF as of now), number of Mates resolved and "silliness" of the play as subjectively quantified by me.