I think an individual comparision is missleading. You compare two sets of games with just 220 games each. The possible error is huge and even an identical run might be that much different.
Overall S5 with 4pc SYZ was better by 0.89% - but even that is within the error bar.
Anyhow, I will play the missing games vs DF14 and will check if I replace SF5 standard with this one ...
BYe
Ingo
Thanks for the test, Ingo, much appreciated.
Also, all Houdini performances thus far throughout the years are within error bars...
One thing you should do explicitly on your home page is write with very big letter SF is the new number one, instead of Houdini is leading by 5 elo, but it is very close.
You did a great job, one thing you need to do now is to have a great heading.
IWB wrote:Overall S5 with 4pc SYZ was better by 0.89% - but even that is within the error bar.
...
BYe
Ingo
Here's the main question : is the 0.89% difference due to the settings or due to statistical error ? For me, it's due 100% to statistical error. The other mean is the 4 pcs EGTB add not even a fraction of rating point to SF ...
Just curious, what do you mean by your last sentence? "The other mean is the ..."
IWB wrote:Overall S5 with 4pc SYZ was better by 0.89% - but even that is within the error bar.
...
BYe
Ingo
Here's the main question : is the 0.89% difference due to the settings or due to statistical error ? For me, it's due 100% to statistical error. The other mean is the 4 pcs EGTB add not even a fraction of rating point to SF ...
Just curious, what do you mean by your last sentence? "The other mean is the ..."
"The other meaning of the last sentence" = If rating difference is explained by the statistical error, the 4 pcs EGTB doesn't gives an improvement in strength.
IWB wrote:Overall S5 with 4pc SYZ was better by 0.89% - but even that is within the error bar.
...
BYe
Ingo
Here's the main question : is the 0.89% difference due to the settings or due to statistical error ? For me, it's due 100% to statistical error. The other mean is the 4 pcs EGTB add not even a fraction of rating point to SF ...
I do not understand how do you know?
People talk only about advantage of playing the endgame perfectly
but it is not the only advantage and there are at least 2 different advantage by tablebases.
1)Knowing positions not to go.
Stockfish's static evaluation is wrong for some tablebases position and stockfish may fail by going into them when the problem is not playing the position perfectly but not to go to them.
Stockfish without tablebases may go to the following position with white only to discover too late that it is a draw
[d]k7/2Q5/8/8/8/8/7K/6r1 b - - 5 1
The fact that it can see at depth above 20 that it is a draw is not going to help it if the remaining depth in this position is only 10 plies.
2)Playing faster.
Stockfish can save time by not searching some tablebase positions so the advantage can be simply that it is going one ply deeper in the relevant lines that do not lead to 4 tablebases piece positions.
1) Statistically insignificant : a couple of thousand positions in millions. (worth less than 0.1 Elo point).
2) It's way faster to get an eval in the engine than access a file.
Ingo : how many games ended in a 7 pcs EG ? in a 6 pcs EG ? in a 5 pcs EG ? in a 4 pcs EG ?
Vinvin wrote:2) It's way faster to get an eval in the engine than access a file.
All 4-man probes are obviously from RAM. It's about 1.25MB of data.
It can very easily be faster to probe a 4-piece position than to search the corresponding 4-piece subtree to some depth. I don't think the depth needs to be large before it pays off.
michiguel wrote: I do not see any compelling reason why a given game should weight more than others.
Miguel
A win against a strong opponent is surely worth more than a win against a weaker opponent ?
All rating systems give that outcome.
Why are you saying that? EloSTAT - yes, because it's plain stupid. BayesElo, treating 1 draw = 1 win + 1 loss, does not give always the same outcome for the same number of points in RR. I concocted another file with equal number of white-black games, now Ordo gives the identical ratings even with the -W switch for white advantage:
ordo -p order.pgn -o results.txt -s1000 -W
So, with the same number of points in RR, it gives different ratings. And this is due to the different draw model: either P(D) is ~ to P(W)*P(L) or P(D)^2 is ~ P(W)*P(L).
The question is: Should a win against a strong opponent and a loss to a weak opponent be treated differently than a loss to a strong opponent and a win against a weak opponent?
michiguel wrote: I do not see any compelling reason why a given game should weight more than others.
Miguel
A win against a strong opponent is surely worth more than a win against a weaker opponent ?
All rating systems give that outcome.
Why are you saying that? EloSTAT - yes, because it's plain stupid. BayesElo, treating 1 draw = 1 win + 1 loss, does not give always the same outcome for the same number of points in RR. I concocted another file with equal number of white-black games, now Ordo gives the identical ratings even with the -W switch for white advantage:
ordo -p order.pgn -o results.txt -s1000 -W
So, with the same number of points in RR, it gives different ratings. And this is due to the different draw model: either P(D) is ~ to P(W)*P(L) or P(D)^2 is ~ P(W)*P(L).
The question is: Should a win against a strong opponent and a loss to a weak opponent be treated differently than a loss to a strong opponent and a win against a weak opponent?
You didn't read my statement carefully. I said that all rating systems have this property: "A win against a strong opponent is surely worth more than a win against a weaker opponent." That is clearly true; one win against 2800 will always help more than one win against 2700 on any system. I fully agree with you that using BayesElo a higher score against the same opponents does not guaranty a higher rating. That's one reason I favor Ordo.