H4 or S5 !?

IWB · Post by **IWB** » Wed Jun 04, 2014 12:48 pm

Vinvin wrote: I think 4pc Syzygy will change nothing in the SF strength. So for me, it's a second test with SF at the exact same level, which it is interesting too

BTW, it should be interesting to find one (yes, only one ! ) game where the 4pc Syzygy change the final outcome.

I am not sure either, but I doubt a significant win with 5pc as well.

Again, I hve no doubt that for a serious analysis of games/engamepostions the use a Tablebases is VERY usefull.

Time will tell, right now, after 50%+ it looks like a small 1 digit elo gain (3-5)

BYe
Ingo

Uri Blass · Post by **Uri Blass** » Wed Jun 04, 2014 1:48 pm

Vinvin wrote:

IWB wrote:

IWB wrote: ..., as I am curious I will run the S5 match again with 4pc SYSYSY bases....

Test is running. The original setup had this:

Code: Select all

     Stockfish 5              3080.0 (2297.0 : 783.0)
                              220.0 (127.5 :  92.5) Houdini 4           3111
                              220.0 (121.5 :  98.5) Komodo 7a           3088
                              220.0 (134.0 :  86.0) Gull 3              3057
                              220.0 (150.5 :  69.5) Critter 1.4a        2980
                              220.0 (149.0 :  71.0) Equinox 2.02        2975
                              220.0 (159.5 :  60.5) Deep Rybka 4.1      2959
                              220.0 (176.0 :  44.0) Deep Fritz 14       2894
                              220.0 (170.5 :  49.5) Chiron 2            2889
                              220.0 (181.0 :  39.0) Protector 1.6.0     2870
                              220.0 (168.0 :  52.0) Hannibal 1.4b       2870
                              220.0 (183.0 :  37.0) Naum 4.2            2838
                              220.0 (187.0 :  33.0) Texel 1.04          2838
                              220.0 (187.0 :  33.0) Senpai 1.0          2838
                              220.0 (188.0 :  32.0) HIARCS 14 WCSC 32b  2812
                              220.0 (190.5 :  29.5) Jonny 6.00          2798

So 74.58% have to be beaten!

Bye
Ingo

I think 4pc Syzygy will change nothing in the SF strength. So for me, it's a second test with SF at the exact same level, which it is interesting too

BTW, it should be interesting to find one (yes, only one ! ) game where the 4pc Syzygy change the final outcome.

I think that it is hard to know if tablebases changed the result in a specific game because tablebases can help indirectly.

It is possible that tablebases caused stockfish to reach the same depth 1% faster thanks to pruning so it played slightly faster and the fact that stockfish had more time on the clock helped it to get bigger depth and play better move during the game.

You will find it very hard to prove based on analysis that stockfish could get worse result without tablebases
because stockfish is not deterministic and if you repeat the same game twice you may get different results even with exactly the same program.

Vinvin · Post by **Vinvin** » Wed Jun 04, 2014 2:08 pm

Uri Blass wrote:
Vinvin wrote:
IWB wrote:
IWB wrote: ..., as I am curious I will run the S5 match again with 4pc SYSYSY bases....
Test is running. The original setup had this:
...
So 74.58% have to be beaten!

Bye
Ingo
I think 4pc Syzygy will change nothing in the SF strength. So for me, it's a second test with SF at the exact same level, which it is interesting too

BTW, it should be interesting to find one (yes, only one ! ) game where the 4pc Syzygy change the final outcome.
I think that it is hard to know if tablebases changed the result in a specific game because tablebases can help indirectly.

It is possible that tablebases caused stockfish to reach the same depth 1% faster thanks to pruning so it played slightly faster and the fact that stockfish had more time on the clock helped it to get bigger depth and play better move during the game.

You will find it very hard to prove based on analysis that stockfish could get worse result without tablebases
because stockfish is not deterministic and if you repeat the same game twice you may get different results even with exactly the same program.

My point is not about tablebases in general, it's about 4 pcs tablebases !
Which 4 pcs ending is misevaluated by Stockfish 5 ??

michiguel · Post by **michiguel** » Wed Jun 04, 2014 2:57 pm

Laskos wrote:
Laskos wrote:
michiguel wrote:
Laskos wrote:
michiguel wrote:
There is only one correct answer, and that is SF5 should be #1 (by a very tiny small margin, though). Why? this is a round robin, so everybody played each other in the same conditions etc. etc. so, the programs who score mores points overall should be #1. This is one of the cases in which there is no doubt about the relative order. As a reference, in the output of Ordo you can see the actual points (the others give %). Whatever program you use, the relative order should follow the number of points. Basically, SF won this gigantic RR tournament, and should be #1.

1 Stockfish 5 : 3115.1 2473.0 3300 74.9%
2 Houdini 4 : 3111.0 2458.5 3300 74.5%

Miguel
Miguel, I am a bit tired and can't reason clearly. Can you prove that in the case:
Direct matches in RR
A>B
B>C
C>A

and total points in RR are A>B>C, then Elo ratings are always also A>B>C?
It can be demonstrated as long as certain assumptions are respected, like two draws equal one win + loss. But, I am not sure I can do it elegantly or understandably by quickly typing here.

For instance, let's try a reductio ad absurdum. Let's assume that the elo is EloB > EloA > EloC. In that case, A face stronger schedule than B. Both faced C, but the head to head match was tougher for A (because EloB > EloA). Consequently, A face a tougher schedule and got more points, which means it should have a higher elo. But, that contradicts the initial assumption EloB > EloA > EloC, disproving it. If you keep doing this analysis, you will see that the only reasonable scenario is EloA > EloB > EloC.

Miguel
Still seems to assume transitivity in matches (assumed by Elo scale transitivity). Maybe I will try to disprove your proof with 4 entities, when I will get time to build a suitable pgn file for Ordo (I recommend Ordo to every tester).
With a concocted PGN file using Ordo, I got
Code: Select all
   # PLAYER  RATING  ERROR   POINTS  PLAYED    (%)
   1 4    : 2361.7  100.0      8.0      15   53.3%
   2 3    : 2314.6   99.5      7.5      15   50.0%
   3 2    : 2303.0  102.2      8.0      15   53.3%
   4 1    : 2220.7   96.7      6.5      15   43.3%
We see an inversion between 2nd and 3rd places as number of points goes. My guess is it may have to do with the -W switch I used for white advantage.

Right, that is because the number of whites and blacks are disproportionate. 2 played 10 games with white and only 5 with black. The opposite is with 3. If white/blacks are ignored (no switch) or set at value 0 (same thing) with -w0

ordo -p order.pgn -w0

you get

Code: Select all

   # PLAYER    : RATING    POINTS  PLAYED    (%)
   1 2    : 2317.6       8.0      15   53.3%
   2 4    : 2317.6       8.0      15   53.3%
   3 3    : 2300.0       7.5      15   50.0%
   4 1    : 2264.7       6.5      15   43.3%

But if the number of whites and blacks are not the same, the premise that the schedules have been the same is broken, so I expect this type of behavior. You can manually set the value of white and we can see that this order is altered when we go from -w35 to -w36

Miguel

Laskos · Post by **Laskos** » Wed Jun 04, 2014 3:44 pm

michiguel wrote:
Right, that is because the number of whites and blacks are disproportionate. 2 played 10 games with white and only 5 with black. The opposite is with 3. If white/blacks are ignored (no switch) or set at value 0 (same thing) with -w0

ordo -p order.pgn -w0

you get
Code: Select all
   # PLAYER    : RATING    POINTS  PLAYED    (%)
   1 2    : 2317.6       8.0      15   53.3%
   2 4    : 2317.6       8.0      15   53.3%
   3 3    : 2300.0       7.5      15   50.0%
   4 1    : 2264.7       6.5      15   43.3%
But if the number of whites and blacks are not the same, the premise that the schedules have been the same is broken, so I expect this type of behavior. You can manually set the value of white and we can see that this order is altered when we go from -w35 to -w36

Miguel

I tried another thing. Let's say the RR consists of very very many engines, and I introduce a new one. So, engines 1, 3, 4 are anchors, and engine 2 is the newcomer. I get again a reversing:

ordo -p order.pgn -o ratings.txt -w0 -s1000 -m anchors.csv

Code: Select all

   # PLAYER  RATING  ERROR   POINTS  PLAYED    (%)
   1 4    : 2600.0   ----      8.0      15   53.3%
   2 3    : 2500.0   ----      7.5      15   50.0%
   3 2    : 2440.9  174.8      8.0      15   53.3%
   4 1    : 2000.0   62.4      6.5      15   43.3%

That was just for fun, but it troubles me a bit that with w0, engines with perfectly equal total scoring in RR get exactly the same Elo. Isn't it a bit odd? It doesn't matter where from you get the points?

Modern Times · Post by **Modern Times** » Wed Jun 04, 2014 5:28 pm

Laskos wrote: It doesn't matter where from you get the points?

It certainly should matter.

michiguel · Post by **michiguel** » Wed Jun 04, 2014 5:50 pm

Laskos wrote:
michiguel wrote:
Right, that is because the number of whites and blacks are disproportionate. 2 played 10 games with white and only 5 with black. The opposite is with 3. If white/blacks are ignored (no switch) or set at value 0 (same thing) with -w0

ordo -p order.pgn -w0

you get
Code: Select all
   # PLAYER    : RATING    POINTS  PLAYED    (%)
   1 2    : 2317.6       8.0      15   53.3%
   2 4    : 2317.6       8.0      15   53.3%
   3 3    : 2300.0       7.5      15   50.0%
   4 1    : 2264.7       6.5      15   43.3%
But if the number of whites and blacks are not the same, the premise that the schedules have been the same is broken, so I expect this type of behavior. You can manually set the value of white and we can see that this order is altered when we go from -w35 to -w36

Miguel
I tried another thing. Let's say the RR consists of very very many engines, and I introduce a new one. So, engines 1, 3, 4 are anchors, and engine 2 is the newcomer. I get again a reversing:

ordo -p order.pgn -o ratings.txt -w0 -s1000 -m anchors.csv
Code: Select all
   # PLAYER  RATING  ERROR   POINTS  PLAYED    (%)
   1 4    : 2600.0   ----      8.0      15   53.3%
   2 3    : 2500.0   ----      7.5      15   50.0%
   3 2    : 2440.9  174.8      8.0      15   53.3%
   4 1    : 2000.0   62.4      6.5      15   43.3% 
That was just for fun,

Of course, the schedules are now not the same (EDIT: implicitly, one played infinite number of games to "fix" the rating).

but it troubles me a bit that with w0, engines with perfectly equal total scoring in RR get exactly the same Elo. Isn't it a bit odd? It doesn't matter where from you get the points?

No, it is not odd. It is all a matter of premises. If you have as a premise that every games weights the same, then it is an expected behavior. If you believe that certain games should weight more than others, then you can have different outcomes based on that. Ordo weights all the games equally, so with exactly the same schedule, more points will always be higher rating. I do not see any compelling reason why a given game should weight more than others.

Miguel

Modern Times · Post by **Modern Times** » Wed Jun 04, 2014 5:59 pm

michiguel wrote: I do not see any compelling reason why a given game should weight more than others.

Miguel

A win against a strong opponent is surely worth more than a win against a weaker opponent ?

lkaufman · Post by **lkaufman** » Wed Jun 04, 2014 6:10 pm

Modern Times wrote:
michiguel wrote: I do not see any compelling reason why a given game should weight more than others.

Miguel
A win against a strong opponent is surely worth more than a win against a weaker opponent ?

All rating systems give that outcome. The question is: Should a win against a strong opponent and a loss to a weak opponent be treated differently than a loss to a strong opponent and a win against a weak opponent?

Adam Hair · Post by **Adam Hair** » Wed Jun 04, 2014 6:37 pm

Modern Times wrote:
michiguel wrote: I do not see any compelling reason why a given game should weight more than others.

Miguel
A win against a strong opponent is surely worth more than a win against a weaker opponent ?

The flip side of that is a loss against a weaker opponent should count more than a loss against a strong opponent.

Let's say in one RR player A was 9-1against player B and 5-5 against player C, and player B was 3-7 against player C. Now, from a second RR we find that A was 8-2 against B and 6-4 against C, while B was 3-7 against C. Should the Elo estimate for A from the first RR be higher or lower than the estimate from the second RR? IMO, the estimates should be the same.

*I see that Larry made a similar point

H4 or S5 !?

Re: H4 or S5 !?

Re: H4 or S5 !?

Re: H4 or S5 !?

Re: H4 or S5 !?

Re: H4 or S5 !?

Re: H4 or S5 !?

Re: H4 or S5 !?

Re: H4 or S5 !?

Re: H4 or S5 !?

Re: H4 or S5 !?