H4 or S5 !?

Lyudmil Tsvetkov · Post by **Lyudmil Tsvetkov** » Thu Jun 05, 2014 7:54 pm

IWB wrote:
Vinvin wrote: The second run is finished, the 3 biggest differences up and down by opponent :
Code: Select all
Stockfish 5 SYZ4 - Equinox 2.02 (2975)   +4,1%
Stockfish 5 SYZ4 - Hannibal 1.4b (2870)  +4,1%
Stockfish 5 SYZ4 - Komodo 7a (3088)      +3,6%

Stockfish 5 SYZ4 - Texel 1.04 (2838)     -1,1%
Stockfish 5 SYZ4 - Jonny 6.00 (2798)     -1,1%
Stockfish 5 SYZ4 - Gull 3 (3057)         -1,6%
Complete comparison 1st run and 2nd run :
Code: Select all
Houdini 4          57,95%    57,05%
Komodo 7a          55,23%    58,86%
Gull 3             60,91%    59,32%
Critter 1,4a       68,41%    69,09%
Equinox 2,02       67,73%    71,82%
Deep Rybka 4,1     72,50%    73,41%
Chiron 2           77,50%    79,09%
Protector 1,6,0    82,27%    82,95%
Hannibal 1,4b      76,36%    80,45%
Naum 4,2           83,18%    84,09%
Texel 1,04         85,00%    83,86%
Senpai 1,0         85,00%    84,32%
HIARCS 14 WCSC 32b 85,45%    86,82%
Jonny 6,00         86,59%    85,45%
I think an individual comparision is missleading. You compare two sets of games with just 220 games each. The possible error is huge and even an identical run might be that much different.

Overall S5 with 4pc SYZ was better by 0.89% - but even that is within the error bar.

Anyhow, I will play the missing games vs DF14 and will check if I replace SF5 standard with this one ...

BYe
Ingo

Thanks for the test, Ingo, much appreciated.

Also, all Houdini performances thus far throughout the years are within error bars...
One thing you should do explicitly on your home page is write with very big letter SF is the new number one, instead of Houdini is leading by 5 elo, but it is very close.

You did a great job, one thing you need to do now is to have a great heading.

syzygy · Post by **syzygy** » Thu Jun 05, 2014 8:13 pm

Vinvin wrote:
IWB wrote:Overall S5 with 4pc SYZ was better by 0.89% - but even that is within the error bar.
...
BYe
Ingo
Here's the main question : is the 0.89% difference due to the settings or due to statistical error ? For me, it's due 100% to statistical error. The other mean is the 4 pcs EGTB add not even a fraction of rating point to SF ...

Just curious, what do you mean by your last sentence? "The other mean is the ..."

Vinvin · Post by **Vinvin** » Thu Jun 05, 2014 10:24 pm

syzygy wrote:
Vinvin wrote:
IWB wrote:Overall S5 with 4pc SYZ was better by 0.89% - but even that is within the error bar.
...
BYe
Ingo
Here's the main question : is the 0.89% difference due to the settings or due to statistical error ? For me, it's due 100% to statistical error. The other mean is the 4 pcs EGTB add not even a fraction of rating point to SF ...
Just curious, what do you mean by your last sentence? "The other mean is the ..."

"The other meaning of the last sentence" = If rating difference is explained by the statistical error, the 4 pcs EGTB doesn't gives an improvement in strength.

Vinvin · Post by **Vinvin** » Thu Jun 05, 2014 10:33 pm

Uri Blass wrote:
Vinvin wrote:
IWB wrote:Overall S5 with 4pc SYZ was better by 0.89% - but even that is within the error bar.
...
BYe
Ingo
Here's the main question : is the 0.89% difference due to the settings or due to statistical error ? For me, it's due 100% to statistical error. The other mean is the 4 pcs EGTB add not even a fraction of rating point to SF ...
I do not understand how do you know?

People talk only about advantage of playing the endgame perfectly
but it is not the only advantage and there are at least 2 different advantage by tablebases.

1)Knowing positions not to go.

Stockfish's static evaluation is wrong for some tablebases position and stockfish may fail by going into them when the problem is not playing the position perfectly but not to go to them.

Stockfish without tablebases may go to the following position with white only to discover too late that it is a draw

[d]k7/2Q5/8/8/8/8/7K/6r1 b - - 5 1

The fact that it can see at depth above 20 that it is a draw is not going to help it if the remaining depth in this position is only 10 plies.

2)Playing faster.
Stockfish can save time by not searching some tablebase positions so the advantage can be simply that it is going one ply deeper in the relevant lines that do not lead to 4 tablebases piece positions.

1) Statistically insignificant : a couple of thousand positions in millions. (worth less than 0.1 Elo point).
2) It's way faster to get an eval in the engine than access a file.

Ingo : how many games ended in a 7 pcs EG ? in a 6 pcs EG ? in a 5 pcs EG ? in a 4 pcs EG ?

syzygy · Post by **syzygy** » Fri Jun 06, 2014 12:46 am

Vinvin wrote:2) It's way faster to get an eval in the engine than access a file.

All 4-man probes are obviously from RAM. It's about 1.25MB of data.

It can very easily be faster to probe a 4-piece position than to search the corresponding 4-piece subtree to some depth. I don't think the depth needs to be large before it pays off.

IWB · Post by **IWB** » Fri Jun 06, 2014 9:10 am

To complete the issue.

THis is S5 without TBS:

Code: Select all

   1 Houdini 4           3111    9    9  3300   75%  2921   31% 
   2 Stockfish 5         3106    9    8  3300   75%  2921   39% 
   3 Komodo 7a           3088    9    9  3300   72%  2922   37% 
   4 Gull 3              3057    8    8  3300   68%  2924   41% 
   5 Critter 1.4a        2980    8    8  3300   57%  2930   46% 
   6 Equinox 2.02        2975    8    8  3300   56%  2930   47% 
   7 Deep Rybka 4.1      2959    8    8  3300   54%  2931   45% 
   8 Deep Fritz 14       2894    8    8  3300   44%  2935   45% 
   9 Chiron 2            2889    8    8  3300   44%  2936   45% 
  10 Protector 1.6.0     2870    8    8  3300   41%  2937   44% 
  11 Hannibal 1.4b       2870    8    8  3300   41%  2937   43% 
  12 Naum 4.2            2838    8    9  3300   36%  2939   41% 
  13 Texel 1.04          2838    8    8  3300   37%  2939   38% 
  14 Senpai 1.0          2838    8    8  3300   36%  2939   41% 
  15 HIARCS 14 WCSC 32b  2812    9    9  3300   33%  2941   37% 
  16 Jonny 6.00          2798    9    9  3300   31%  2942   36%

and this with 4pc SYSYGY:

Code: Select all

   1 Stockfish 5 SYZ4    3112    9    9  3300   76%  2920   39% 
   2 Houdini 4           3111    9    9  3300   75%  2920   31% 
   3 Komodo 7a           3086    9    8  3300   72%  2922   37% 
   4 Gull 3              3057    8    8  3300   68%  2924   41% 
   5 Critter 1.4a        2979    8    8  3300   57%  2929   46% 
   6 Equinox 2.02        2973    8    8  3300   56%  2930   46% 
   7 Deep Rybka 4.1      2958    8    8  3300   54%  2931   45% 
   8 Deep Fritz 14       2895    8    8  3300   44%  2935   45% 
   9 Chiron 2            2887    8    8  3300   43%  2935   45% 
  10 Protector 1.6.0     2870    8    8  3300   41%  2936   44% 
  11 Hannibal 1.4b       2866    8    8  3300   40%  2937   42% 
  12 Texel 1.04          2838    8    9  3300   37%  2939   39% 
  13 Naum 4.2            2838    8    8  3300   36%  2939   41% 
  14 Senpai 1.0          2838    8    8  3300   36%  2939   41% 
  15 HIARCS 14 WCSC 32b  2811    9    8  3300   33%  2940   37% 
  16 Jonny 6.00          2798    9    8  3300   31%  2941   36%

Thats a win of 6 Elo. If this is because of SYZYGY bases or just statistical noise ... I do not know.

Bye
Ingo

PS: Full data later today

Laskos · Post by **Laskos** » Fri Jun 06, 2014 7:08 pm

lkaufman wrote:
Modern Times wrote:
michiguel wrote: I do not see any compelling reason why a given game should weight more than others.

Miguel
A win against a strong opponent is surely worth more than a win against a weaker opponent ?
All rating systems give that outcome.

Why are you saying that? EloSTAT - yes, because it's plain stupid. BayesElo, treating 1 draw = 1 win + 1 loss, does not give always the same outcome for the same number of points in RR. I concocted another file with equal number of white-black games, now Ordo gives the identical ratings even with the -W switch for white advantage:
ordo -p order.pgn -o results.txt -s1000 -W

Code: Select all

   # PLAYER  RATING  ERROR   POINTS  PLAYED    (%)
   1 2    : 2300.0   66.4     15.0      30   50.0%
   2 3    : 2300.0   70.2     15.0      30   50.0%
   3 4    : 2300.0   68.0     15.0      30   50.0%
   4 1    : 2300.0   65.0     15.0      30   50.0%

But BayesElo mm 1 1

Code: Select all

Rank Name   Elo    +    - games score oppo. draws
   1 1        3   85   85    30   50%    -1   13%
   2 2        0   90   90    30   50%     0    0%
   3 4        0   90   90    30   50%     0    0%
   4 3       -3   85   85    30   50%     1   13%

So, with the same number of points in RR, it gives different ratings. And this is due to the different draw model: either P(D) is ~ to P(W)*P(L) or P(D)^2 is ~ P(W)*P(L).

The question is: Should a win against a strong opponent and a loss to a weak opponent be treated differently than a loss to a strong opponent and a win against a weak opponent?

IWB · Post by **IWB** » Fri Jun 06, 2014 11:01 pm

Vinvin wrote: Ingo : how many games ended in a 7 pcs EG ? in a 6 pcs EG ? in a 5 pcs EG ? in a 4 pcs EG ?

Are you serious?
3300 games to check?
This is a hobby to live with. If I find ONE sispicous game I remport it to thhe author, thats it.

Sorry, no no offence ment

MAYBE I will have a look if I am bored ...

Bye
Ingo

lkaufman · Post by **lkaufman** » Sat Jun 07, 2014 12:04 am

Laskos wrote:
lkaufman wrote:
Modern Times wrote:
michiguel wrote: I do not see any compelling reason why a given game should weight more than others.

Miguel
A win against a strong opponent is surely worth more than a win against a weaker opponent ?
All rating systems give that outcome.
Why are you saying that? EloSTAT - yes, because it's plain stupid. BayesElo, treating 1 draw = 1 win + 1 loss, does not give always the same outcome for the same number of points in RR. I concocted another file with equal number of white-black games, now Ordo gives the identical ratings even with the -W switch for white advantage:
ordo -p order.pgn -o results.txt -s1000 -W
Code: Select all
   # PLAYER  RATING  ERROR   POINTS  PLAYED    (%)
   1 2    : 2300.0   66.4     15.0      30   50.0%
   2 3    : 2300.0   70.2     15.0      30   50.0%
   3 4    : 2300.0   68.0     15.0      30   50.0%
   4 1    : 2300.0   65.0     15.0      30   50.0% 
But BayesElo mm 1 1
Code: Select all
Rank Name   Elo    +    - games score oppo. draws
   1 1        3   85   85    30   50%    -1   13%
   2 2        0   90   90    30   50%     0    0%
   3 4        0   90   90    30   50%     0    0%
   4 3       -3   85   85    30   50%     1   13%
So, with the same number of points in RR, it gives different ratings. And this is due to the different draw model: either P(D) is ~ to P(W)*P(L) or P(D)^2 is ~ P(W)*P(L).
The question is: Should a win against a strong opponent and a loss to a weak opponent be treated differently than a loss to a strong opponent and a win against a weak opponent?

You didn't read my statement carefully. I said that all rating systems have this property: "A win against a strong opponent is surely worth more than a win against a weaker opponent." That is clearly true; one win against 2800 will always help more than one win against 2700 on any system. I fully agree with you that using BayesElo a higher score against the same opponents does not guaranty a higher rating. That's one reason I favor Ordo.

Vinvin · Post by **Vinvin** » Sat Jun 07, 2014 12:25 am

IWB wrote:
Vinvin wrote: Ingo : how many games ended in a 7 pcs EG ? in a 6 pcs EG ? in a 5 pcs EG ? in a 4 pcs EG ?
Are you serious?
3300 games to check?
This is a hobby to live with. If I find ONE sispicous game I remport it to thhe author, thats it.

Sorry, no no offence ment

MAYBE I will have a look if I am bored ...

Bye
Ingo

That takes a couple of minutes with SCID : search material , number of pieces < 8, < 7, ...

H4 or S5 !?

Re: H4 or S5 !?

Re: H4 or S5 !?

Re: H4 or S5 !?

Re: H4 or S5 !?

Re: H4 or S5 !?

Re: H4 or S5 !?

Re: H4 or S5 !?

Re: H4 or S5 !?

Re: H4 or S5 !?

Re: H4 or S5 !?