Vinvin wrote:
I think 4pc Syzygy will change nothing in the SF strength. So for me, it's a second test with SF at the exact same level, which it is interesting too
BTW, it should be interesting to find one (yes, only one ! ) game where the 4pc Syzygy change the final outcome.
I am not sure either, but I doubt a significant win with 5pc as well.
Again, I hve no doubt that for a serious analysis of games/engamepostions the use a Tablebases is VERY usefull.
Time will tell, right now, after 50%+ it looks like a small 1 digit elo gain (3-5)
I think 4pc Syzygy will change nothing in the SF strength. So for me, it's a second test with SF at the exact same level, which it is interesting too
BTW, it should be interesting to find one (yes, only one ! ) game where the 4pc Syzygy change the final outcome.
I think that it is hard to know if tablebases changed the result in a specific game because tablebases can help indirectly.
It is possible that tablebases caused stockfish to reach the same depth 1% faster thanks to pruning so it played slightly faster and the fact that stockfish had more time on the clock helped it to get bigger depth and play better move during the game.
You will find it very hard to prove based on analysis that stockfish could get worse result without tablebases
because stockfish is not deterministic and if you repeat the same game twice you may get different results even with exactly the same program.
IWB wrote:
..., as I am curious I will run the S5 match again with 4pc SYSYSY bases....
Test is running. The original setup had this:
...
So 74.58% have to be beaten!
Bye
Ingo
I think 4pc Syzygy will change nothing in the SF strength. So for me, it's a second test with SF at the exact same level, which it is interesting too
BTW, it should be interesting to find one (yes, only one ! ) game where the 4pc Syzygy change the final outcome.
I think that it is hard to know if tablebases changed the result in a specific game because tablebases can help indirectly.
It is possible that tablebases caused stockfish to reach the same depth 1% faster thanks to pruning so it played slightly faster and the fact that stockfish had more time on the clock helped it to get bigger depth and play better move during the game.
You will find it very hard to prove based on analysis that stockfish could get worse result without tablebases
because stockfish is not deterministic and if you repeat the same game twice you may get different results even with exactly the same program.
My point is not about tablebases in general, it's about 4 pcs tablebases !
Which 4 pcs ending is misevaluated by Stockfish 5 ??
michiguel wrote:
There is only one correct answer, and that is SF5 should be #1 (by a very tiny small margin, though). Why? this is a round robin, so everybody played each other in the same conditions etc. etc. so, the programs who score mores points overall should be #1. This is one of the cases in which there is no doubt about the relative order. As a reference, in the output of Ordo you can see the actual points (the others give %). Whatever program you use, the relative order should follow the number of points. Basically, SF won this gigantic RR tournament, and should be #1.
Miguel, I am a bit tired and can't reason clearly. Can you prove that in the case:
Direct matches in RR
A>B
B>C
C>A
and total points in RR are A>B>C, then Elo ratings are always also A>B>C?
It can be demonstrated as long as certain assumptions are respected, like two draws equal one win + loss. But, I am not sure I can do it elegantly or understandably by quickly typing here.
For instance, let's try a reductio ad absurdum. Let's assume that the elo is EloB > EloA > EloC. In that case, A face stronger schedule than B. Both faced C, but the head to head match was tougher for A (because EloB > EloA). Consequently, A face a tougher schedule and got more points, which means it should have a higher elo. But, that contradicts the initial assumption EloB > EloA > EloC, disproving it. If you keep doing this analysis, you will see that the only reasonable scenario is EloA > EloB > EloC.
Miguel
Still seems to assume transitivity in matches (assumed by Elo scale transitivity). Maybe I will try to disprove your proof with 4 entities, when I will get time to build a suitable pgn file for Ordo (I recommend Ordo to every tester).
We see an inversion between 2nd and 3rd places as number of points goes. My guess is it may have to do with the -W switch I used for white advantage.
Right, that is because the number of whites and blacks are disproportionate. 2 played 10 games with white and only 5 with black. The opposite is with 3. If white/blacks are ignored (no switch) or set at value 0 (same thing) with -w0
But if the number of whites and blacks are not the same, the premise that the schedules have been the same is broken, so I expect this type of behavior. You can manually set the value of white and we can see that this order is altered when we go from -w35 to -w36
michiguel wrote:
Right, that is because the number of whites and blacks are disproportionate. 2 played 10 games with white and only 5 with black. The opposite is with 3. If white/blacks are ignored (no switch) or set at value 0 (same thing) with -w0
But if the number of whites and blacks are not the same, the premise that the schedules have been the same is broken, so I expect this type of behavior. You can manually set the value of white and we can see that this order is altered when we go from -w35 to -w36
Miguel
I tried another thing. Let's say the RR consists of very very many engines, and I introduce a new one. So, engines 1, 3, 4 are anchors, and engine 2 is the newcomer. I get again a reversing:
ordo -p order.pgn -o ratings.txt -w0 -s1000 -m anchors.csv
That was just for fun, but it troubles me a bit that with w0, engines with perfectly equal total scoring in RR get exactly the same Elo. Isn't it a bit odd? It doesn't matter where from you get the points?
michiguel wrote:
Right, that is because the number of whites and blacks are disproportionate. 2 played 10 games with white and only 5 with black. The opposite is with 3. If white/blacks are ignored (no switch) or set at value 0 (same thing) with -w0
But if the number of whites and blacks are not the same, the premise that the schedules have been the same is broken, so I expect this type of behavior. You can manually set the value of white and we can see that this order is altered when we go from -w35 to -w36
Miguel
I tried another thing. Let's say the RR consists of very very many engines, and I introduce a new one. So, engines 1, 3, 4 are anchors, and engine 2 is the newcomer. I get again a reversing:
ordo -p order.pgn -o ratings.txt -w0 -s1000 -m anchors.csv
Of course, the schedules are now not the same (EDIT: implicitly, one played infinite number of games to "fix" the rating).
but it troubles me a bit that with w0, engines with perfectly equal total scoring in RR get exactly the same Elo. Isn't it a bit odd? It doesn't matter where from you get the points?
No, it is not odd. It is all a matter of premises. If you have as a premise that every games weights the same, then it is an expected behavior. If you believe that certain games should weight more than others, then you can have different outcomes based on that. Ordo weights all the games equally, so with exactly the same schedule, more points will always be higher rating. I do not see any compelling reason why a given game should weight more than others.
michiguel wrote: I do not see any compelling reason why a given game should weight more than others.
Miguel
A win against a strong opponent is surely worth more than a win against a weaker opponent ?
All rating systems give that outcome. The question is: Should a win against a strong opponent and a loss to a weak opponent be treated differently than a loss to a strong opponent and a win against a weak opponent?
michiguel wrote: I do not see any compelling reason why a given game should weight more than others.
Miguel
A win against a strong opponent is surely worth more than a win against a weaker opponent ?
The flip side of that is a loss against a weaker opponent should count more than a loss against a strong opponent.
Let's say in one RR player A was 9-1against player B and 5-5 against player C, and player B was 3-7 against player C. Now, from a second RR we find that A was 8-2 against B and 6-4 against C, while B was 3-7 against C. Should the Elo estimate for A from the first RR be higher or lower than the estimate from the second RR? IMO, the estimates should be the same.