What is New in Toga II 1.3x4?
Moderators: hgm, Rebel, chrisw
-
- Posts: 18748
- Joined: Thu Mar 09, 2006 6:40 pm
- Location: US of Europe, germany
- Full name: Thorsten Czub
Re: Some more data to the fire :-)
when the blueberry setting stabilized as weaker than the default, i stopped testing.
Re: Some more data to the fire :-)
It's not much stronger Thorsten, it was a mistake to discard it.mclane wrote:when the results showed that the gap between the 2 programs was to big, i throw out the redundant older versions to get more data from the more important ones.
the behaviour in those tournaments is so sure that you can reproduce it at any time.
Re: Some more data to the fire :-)
How many games did you test this setting with?mclane wrote:when the blueberry setting stabilized as weaker than the default, i stopped testing.
-
- Posts: 4565
- Joined: Sun Mar 12, 2006 2:40 am
- Full name:
Re: Some more data to the fire :-)
Okay Thorsten, I take it you refer mainly to tournament c here?mclane wrote:when the results showed that the gap between the 2 programs was to big, i throw out the redundant older versions to get more data from the more important ones.
the behaviour in those tournaments is so sure that you can reproduce it at any time.
But in tournament c does that mean that Toga 1.2.1a did play no more than 20 games, scoring 12½/20 and Toga 1.3x4 scoring 15/20? http://f50.parsimony.net/forum200336/messages/8014.htm
I do think that was a bit early then to conclude that Toga 1.2.1a played a redundant result in that tournament... And if you can really exactly reproduce this any time, then I just don't understand the statistics of Eloratings anymore at all. I'm not saying it's not an accurate result, but there is still supposed to be a lot of statistical noise in that, and the result easily could have been the other way round, even if Toga 1.3x4 is really stronger.
I still don't understand the result of Tournament k3...
Regards, Eelco
Re: Some more data to the fire :-)
Well said, Eelco!Eelco de Groot wrote:Okay Thorsten, I take it you refer mainly to tournament c here?mclane wrote:when the results showed that the gap between the 2 programs was to big, i throw out the redundant older versions to get more data from the more important ones.
the behaviour in those tournaments is so sure that you can reproduce it at any time.
But in tournament c does that mean that Toga 1.2.1a did play no more than 20 games, scoring 12½/20 and Toga 1.3x4 scoring 15/20? http://f50.parsimony.net/forum200336/messages/8014.htm
I do think that was a bit early then to conclude that Toga 1.2.1a played a redundant result in that tournament... And if you can really exactly reproduce this any time, then I just don't understand the statistics of Eloratings anymore at all. I'm not saying it's not an accurate result, but there is still supposed to be a lot of statistical noise in that, and the result easily could have been the other way round, even if Toga 1.3x4 is really stronger.
I still don't understand the result of Tournament k3...
Regards, Eelco
Best,
Terry
-
- Posts: 18748
- Joined: Thu Mar 09, 2006 6:40 pm
- Location: US of Europe, germany
- Full name: Thorsten Czub
Re: Some more data to the fire :-)
no it was and is not early. it can be reproduced at any time.
e.g. here
here
etc.
the faster the machine, or the more computation time, the bigger the gap.
togaII 1.2.1a is overtaken by togaII 1.3x4
e.g. here
Code: Select all
Motor Punkte RyToHiPxFrLoSpPyGlthCoToNoPr S-B
01: Rybka v2.3.lk.x64 8,5/9 · 1 1 1 1 = 1 1 1 1 30,25
02: TogaII1.3x4 [default] 7,5/9 · = = 1 1 1 1 1 = 1 26,00
03: Hiarcs11.1UCI 7,5/9 · = 0 1 1 1 1 1 1 1 19,00
04: program x 6,0/9 0 = = · = = 1 1 1 1 20,00
05: Fruit-061115a 6,0/9 0 = = · 1 = 1 = 1 1 18,75
06: LoopMP 12.32 5,0/9 0 0 = · = = 1 1 = 1 13,50
07: Spike1.2 4,5/8 0 0 1 · = = = 1 1 16,25
08: program y 4,5/9 = 0 0 0 = = · 1 1 1 15,50
09: Glaurung121-EM64T 3,0/8 0 0 0 = = = · = 1 9,25
10: theBaron1.8.1Uci 3,0/9 0 0 0 0 = 0 · 1 = 1 6,00
11: Colossus2006f 2,0/9 0 0 = 0 0 0 0 · = 1 4,75
12: TogaII1.2.1a 1,5/8 0 0 0 = 0 0 = = · 5,00
13: Now0704 1,0/8 0 = 0 0 0 0 = · 0 5,25
14: ProDeo1.2 1,0/9 0 0 0 0 0 0 0 0 1 · 1,00
61 Partien von 182 gespielt
Code: Select all
Motor Punkte HiRyToGlLoChNaToRuNoBaChCoPrZaNo S-B
01: Hiarcs11.1UCI 13,5/15 · = 1 1 1 = 1 = 1 1 1 1 1 1 1 1 92,00
02: Rybka v2.3.lk.w32 11,0/15 = · 0 1 = 0 0 1 1 1 1 1 1 1 1 1 69,00
03: TogaII1.3x4 [default] 10,5/15 0 1 · 1 0 1 1 = 0 1 = 1 = 1 1 1 68,50
04: Glaurung1.2.1-32bit 10,0/15 0 0 0 · = = 1 1 1 1 = 1 = 1 1 1 60,25
05: LoopMP 12.32 9,5/15 0 = 1 = · = = = = = 1 = 1 1 = 1 61,50
06: program x 9,5/15 = 1 0 = = · 0 0 0 1 1 1 1 1 1 1 58,00
07: Naum 2.0 8,5/15 0 1 0 0 = 1 · 1 = = 1 0 1 0 1 1 54,75
08: TogaII 1.2.1a 8,5/15 = 0 = 0 = 1 0 · 1 = = = = 1 1 1 53,75
09: Ruffian2.1.0 8,5/15 0 0 1 0 = 1 = 0 · = = 1 = 1 1 1 51,00
10: Now0704 6,5/15 0 0 0 0 = 0 = = = · 1 1 0 1 1 = 36,75
11: Baron1.8.1.Uci 6,0/15 0 0 = = 0 0 0 = = 0 · 1 1 0 1 1 32,25
12: program y 6,0/15 0 0 0 0 = 0 1 = 0 0 0 · 1 1 1 1 29,50
13: Colossus2006f 4,5/15 0 0 = = 0 0 0 = = 1 0 0 · = 0 1 28,00
14: ProDeo 1.2 4,5/15 0 0 0 0 0 0 1 0 0 0 1 0 = · 1 1 19,75
15: Zarkov 4.5t 2,5/15 0 0 0 0 = 0 0 0 0 0 0 0 1 0 · 1 9,75
16: Now0604 0,5/15 0 0 0 0 0 0 0 0 0 = 0 0 0 0 0 · 3,25
120 Partien von 240 gespielt
the faster the machine, or the more computation time, the bigger the gap.
togaII 1.2.1a is overtaken by togaII 1.3x4
-
- Posts: 18748
- Joined: Thu Mar 09, 2006 6:40 pm
- Location: US of Europe, germany
- Full name: Thorsten Czub
Re: Some more data to the fire :-)
?And if you can really exactly reproduce this any time, then I just don't understand the statistics of Eloratings anymore at all.
what is the problem ?
program A is stronger than B.
you make a tournament. and A gets more points than B.
you play another tournament.
A gets more points than B.
you play another tournament.
A gets more points than B.
what is the thing you don't understand ?
In ALL my tournaments toga1.3 gets more points than 1.2.1a.
NO exception. of course the gap between the results is different. depending on the rounds, the time control, the opponents and the hardware.
but the difference between the programs is big enough that in all tournaments the better program is capable to get a higher result.
thats what my results show.
if i would have had ONE exception, i would not be so sure.
if i would have had 2 exceptions, i would be not sure.
but there was none so far.
since both programs use performance.bin as book, it can also not
be a matter of the book.
please - which thing is the thing you don't understand ?
-
- Posts: 18748
- Joined: Thu Mar 09, 2006 6:40 pm
- Location: US of Europe, germany
- Full name: Thorsten Czub
Re: Some more data to the fire :-)
i have some more games for tournament:
togaII1.3x4 in the leading group of programs with 9 from 11.
togaII1.2.1a 3 from 11.
for me the important question is not why this is a pattern/behaviour i get in all of my tournaments on all machines, for me the major question is why the elo-list guys cannot reproduce this.
Code: Select all
Motor Punkte RyToHiPxFrSpLoPyGlTothCoNoPr S-B
01: Rybka v2.3.lk.x64 10,0/11 · = 1 1 1 1 = 1 1 1 1 1 45,25
02: TogaII1.3x4 [default] 9,0/11 · = = 1 1 1 1 1 1 = = 1 39,00
03: Hiarcs11.1UCI 9,0/11 = · = 1 0 1 1 1 1 1 1 1 36,00
04: Program x 8,0/11 0 = = · = 1 = 1 1 1 1 1 34,00
05: Fruit-061115a 6,5/11 0 = 0 = · = 1 = 1 = 1 1 25,75
06: Spike1.2 6,5/11 0 0 1 0 · = = 1 = 1 1 1 23,00
07: LoopMP 12.32 6,5/11 0 0 = = · = = = 1 1 1 1 21,25
08: Program y 5,5/10 = 0 0 0 = = · 1 1 1 1 24,50
09: Glaurung121-EM64T 4,5/11 0 0 0 = = = 0 · = 1 = 1 15,75
10: TogaII1.2.1a 3,0/11 0 0 0 0 0 = 0 = · = = 1 9,25
11: theBaron1.8.1Uci 3,0/11 0 0 0 0 = 0 0 0 = · 1 1 8,25
12: Colossus2006f 2,5/11 0 = 0 0 = 0 0 0 = 0 · 1 10,25
13: Now0704 1,0/10 0 = 0 0 0 0 0 = 0 · 0 6,75
14: ProDeo1.2 1,0/11 0 0 0 0 0 0 0 0 0 0 1 · 1,00
76 Partien von 182 gespielt
togaII1.2.1a 3 from 11.
for me the important question is not why this is a pattern/behaviour i get in all of my tournaments on all machines, for me the major question is why the elo-list guys cannot reproduce this.
-
- Posts: 41423
- Joined: Sun Feb 26, 2006 10:52 am
- Location: Auckland, NZ
Re: Some more data to the fire :-)
Are you sure that you have Toga II 1.2.1a set up correctly Thorsten?mclane wrote: for me the important question is not why this is a pattern/behaviour i get in all of my tournaments on all machines, for me the major question is why the elo-list guys cannot reproduce this.
Both CCRL and CEGT have played thousands of games with Toga II 1.2.1a at a range of time controls and your results seem to be out of character.
Of course, could just be bad luck.
Regards, Graham.
-
- Posts: 18748
- Joined: Thu Mar 09, 2006 6:40 pm
- Location: US of Europe, germany
- Full name: Thorsten Czub
Re: Some more data to the fire :-)
of course. you forget that this result can be reproduded on several of my machines.
different hardware. different arena versions. different time controls. IMO not the result of toga1.2.1a is out of order, the program plays as it has always played. over the time it was overtaken by other programs. togaII1.3x4 is astonishing since it is fighting almost in the top group. with hiarcs and rybka in the group of 1-3 ranked programs.
i did all these testings 1 month or so ago. later i tested blueberry and found it weaker than default.
i stopped testing 1.2.1a.
it makes IMO not much sense to test when a later version and stronger program can replace it.
only because some people asked i did some more tournaments with both versions participating. with the "usual" result.
IMO the gap between the 2 toga versions is more than 10 or 20 elo.
while toga1.2.1a is more in the region of hiarcs x50 hyp.
toga1.3x4 is more in the region of hiarcs11.1.
this means at least 40 ELO points (that togaII1.3x4 is stronger than toga1.2.1a.)
different hardware. different arena versions. different time controls. IMO not the result of toga1.2.1a is out of order, the program plays as it has always played. over the time it was overtaken by other programs. togaII1.3x4 is astonishing since it is fighting almost in the top group. with hiarcs and rybka in the group of 1-3 ranked programs.
i did all these testings 1 month or so ago. later i tested blueberry and found it weaker than default.
i stopped testing 1.2.1a.
it makes IMO not much sense to test when a later version and stronger program can replace it.
only because some people asked i did some more tournaments with both versions participating. with the "usual" result.
IMO the gap between the 2 toga versions is more than 10 or 20 elo.
while toga1.2.1a is more in the region of hiarcs x50 hyp.
toga1.3x4 is more in the region of hiarcs11.1.
this means at least 40 ELO points (that togaII1.3x4 is stronger than toga1.2.1a.)
Last edited by mclane on Thu Apr 12, 2007 12:01 am, edited 3 times in total.