Page 6 of 9

Re: For CCRL Tester/s- One More.....

Posted: Mon Mar 25, 2013 6:07 am
by geots
Would you happen to know how CEGT would handle the same scenario as above?



And thanks again in advance-

george

Re: Toga II 3.0 released

Posted: Mon Mar 25, 2013 6:24 am
by Eelco de Groot
Eelco de Groot wrote: I see that Teemu Pudas (Vempele) writes in the same thread that he adapted the code so that would imply it is already in one of the CMLX versions. I did not remember that... He probably placed the code better, my version was placed somewhere as one of of the 'win' recognizers/situations early in eval.cpp but I don't know how Teemu did it.
It arrived not exactly later that day but one year later there it was: Toga CMLX 1.4.5e. Teemu's Toga CMLX 1.4.5e version is hopefully still downloadable from this 2009 thread on Rybka forum.

Eelco

Re: Toga II 3.0 released

Posted: Mon Mar 25, 2013 6:51 am
by tjfroh
Toga II 3.0 is going head to head with Stockfish 2.3. on all my dual processor machines. Each have two wins and one draw with a total of five games. Very small sample size but if 1,000,000 players get the same results then that is statistically significant.

I am impressed.


TJF

PS I haven't installed Houdini 3 yet.. I am waiting for my eight core machine with 32 GB of RAM to arrive.

Re: Toga II 3.0 released

Posted: Mon Mar 25, 2013 7:23 am
by IWB
geots wrote:Ok- my mistake. I guess Ingo is running Z Mexico 64bit.
Yes I do run the 64bit version. If someone would bother to read my conditions it becomes very clear. Unfortunately you are not the only one who doesn't care and just look to the rank, rating and if in good mood to the time control. That other conditions are at least as important seems to be ignored by many ... at least according to the emails I get from my web site ... :-(
geots wrote: ... And don't listen to anyone who says 200 games mean nothing and you need over 1000. Stats are what they are, but if under 1000 meant nothing, you could throw out about 50% or more of CCRLs ratings.
Interesting thought :-)! But not what I meant. I meant 150(or 200) games against ONE opponent. That IS irrelevant!
This is an exceprt from the latest Toga run:

Code: Select all

Toga II 3.0 32b - Zappa Mexico II (2703)		63.0	-	87.0		42.00%		Perf=2647
Toga II 3.0 32b - Chiron 1.5 (2845)		57.5	-	92.5		38.33%		Perf=2763
That is a difference of 116 Elo* - which cant be explained by statistics! This is simply the difference that occurs because of the playing style suits better or not. So:
1. You can't draw conclusions out of 150 games against ONE opponent
2. You only get a valid average result if you run against a higher number of opponents. (which makes lists with many times the same opponent and just a few others doubtfull)

Of course, if you are just interested in how does the engine perform against one particular engine you might consider the result interesting ...

Bye
Ingo

*some engines even have a bigger performance gap than 116 Elo between opponents

Re: Toga II 3.0 released

Posted: Mon Mar 25, 2013 7:40 am
by jd1
Eelco de Groot wrote:
Eelco de Groot wrote: I see that Teemu Pudas (Vempele) writes in the same thread that he adapted the code so that would imply it is already in one of the CMLX versions. I did not remember that... He probably placed the code better, my version was placed somewhere as one of of the 'win' recognizers/situations early in eval.cpp but I don't know how Teemu did it.
It arrived not exactly later that day but one year later there it was: Toga CMLX 1.4.5e. Teemu's Toga CMLX 1.4.5e version is hopefully still downloadable from this 2009 thread on Rybka forum.

Eelco
Ok thanks Eelco!

I'll have a look when I start to work on Toga again.

Jerry

Re: Toga II 3.0 released

Posted: Mon Mar 25, 2013 7:47 am
by geots
IWB wrote:
geots wrote:Ok- my mistake. I guess Ingo is running Z Mexico 64bit.
Yes I do run the 64bit version. If someone would bother to read my conditions it becomes very clear. Unfortunately you are not the only one who doesn't care and just look to the rank, rating and if in good mood to the time control. That other conditions are at least as important seems to be ignored by many ... at least according to the emails I get from my web site ... :-(
geots wrote: ... And don't listen to anyone who says 200 games mean nothing and you need over 1000. Stats are what they are, but if under 1000 meant nothing, you could throw out about 50% or more of CCRLs ratings.
Interesting thought :-)! But not what I meant. I meant 150(or 200) games against ONE opponent. That IS irrelevant!
This is an exceprt from the latest Toga run:

Code: Select all

Toga II 3.0 32b - Zappa Mexico II (2703)		63.0	-	87.0		42.00%		Perf=2647
Toga II 3.0 32b - Chiron 1.5 (2845)		57.5	-	92.5		38.33%		Perf=2763
That is a difference of 116 Elo* - which cant be explained by statistics! This is simply the difference that occurs because of the playing style suits better or not. So:
1. You can't draw conclusions out of 150 games against ONE opponent
2. You only get a valid average result if you run against a higher number of opponents. (which makes lists with many times the same opponent and just a few others doubtfull)

Of course, if you are just interested in how does the engine perform against one particular engine you might consider the result interesting ...

Bye
Ingo

*some engines even have a bigger performance gap than 116 Elo between opponents




You have no clue what I looked at and care about or don't care about. FYI I saw 32b, but I mistakenly was putting it with the engine on the right. Have you ever considered that in your info you have "32b" highlighted in red. A lot of people might very well think all they have to see is the red and when they don't might assume it is 64. I would highlight it in both places- or in neither- if it were me. But it isn't me, so whatever.


So if I can't draw conclusions from 150 games ag. ONE opponent, then I don't suppose I can draw conclusions from 50 games ag. ONE opponent. Bullshit. I have a 50 game match where a lower tier Ivanhoe version beat Fritz 13 (wins and losses only) 29-2. I have another (2) 50 game matches where Strelka 5.6 beat the strongest Ivanhoe in existence at the moment, 32-4 and 31-6. If I were you, I would add a caveat to your statement. The results are true- if I sounded a bit sarcastic- that is probably true as well, as I'm not at the moment in the mood for testing lessons. But no harm- no foul.


Best,

Please IGNORE Both "Help" Threads- Sorry!

Posted: Mon Mar 25, 2013 8:43 am
by geots
I needed to go ahead and run the matches- I will live with 40/3.


Best,

Re: Toga II 3.0 released

Posted: Mon Mar 25, 2013 12:28 pm
by Uri Blass
IWB wrote:
geots wrote:Ok- my mistake. I guess Ingo is running Z Mexico 64bit.
Yes I do run the 64bit version. If someone would bother to read my conditions it becomes very clear. Unfortunately you are not the only one who doesn't care and just look to the rank, rating and if in good mood to the time control. That other conditions are at least as important seems to be ignored by many ... at least according to the emails I get from my web site ... :-(
geots wrote: ... And don't listen to anyone who says 200 games mean nothing and you need over 1000. Stats are what they are, but if under 1000 meant nothing, you could throw out about 50% or more of CCRLs ratings.
Interesting thought :-)! But not what I meant. I meant 150(or 200) games against ONE opponent. That IS irrelevant!
This is an exceprt from the latest Toga run:

Code: Select all

Toga II 3.0 32b - Zappa Mexico II (2703)		63.0	-	87.0		42.00%		Perf=2647
Toga II 3.0 32b - Chiron 1.5 (2845)		57.5	-	92.5		38.33%		Perf=2763
That is a difference of 116 Elo* - which cant be explained by statistics! This is simply the difference that occurs because of the playing style suits better or not. So:
1. You can't draw conclusions out of 150 games against ONE opponent
2. You only get a valid average result if you run against a higher number of opponents. (which makes lists with many times the same opponent and just a few others doubtfull)

Of course, if you are just interested in how does the engine perform against one particular engine you might consider the result interesting ...

Bye
Ingo

*some engines even have a bigger performance gap than 116 Elo between opponents
I think that it is possible to get conclusions based on 150 games against a single opponent and the question is what are the conclusions.

A program may perform better against one opponent because of playing style but there is a limit for it.

If I see for example result like 120-30 I can be practically sure that the winner is better.
If I see for example result like 90-60 I cannot be practically sure that the winner is better but I can be practically sure that the winner is not more than 100 elo worse than the loser.

These conclusions are based on common sense and previous experience and I was relatively careful here.

I believe that there is no single practical case when A score even 70% against B in 150 games when A score worse than B against C.

Note that C should have a similiar rating to the average of A and B(not more than 100 elo difference).

If you think that I am wrong then I would like to see a single case including the names of the programs A,B,C so everybody can reproduce the results(with possible small differences).

Note that I do not claim that it is impossible to build programs A,B,C when it happens but only that it practically does not happen.

Re: Toga II 3.0 released

Posted: Mon Mar 25, 2013 12:50 pm
by geots
Uri Blass wrote:
IWB wrote:
geots wrote:Ok- my mistake. I guess Ingo is running Z Mexico 64bit.
Yes I do run the 64bit version. If someone would bother to read my conditions it becomes very clear. Unfortunately you are not the only one who doesn't care and just look to the rank, rating and if in good mood to the time control. That other conditions are at least as important seems to be ignored by many ... at least according to the emails I get from my web site ... :-(
geots wrote: ... And don't listen to anyone who says 200 games mean nothing and you need over 1000. Stats are what they are, but if under 1000 meant nothing, you could throw out about 50% or more of CCRLs ratings.
Interesting thought :-)! But not what I meant. I meant 150(or 200) games against ONE opponent. That IS irrelevant!
This is an exceprt from the latest Toga run:

Code: Select all

Toga II 3.0 32b - Zappa Mexico II (2703)		63.0	-	87.0		42.00%		Perf=2647
Toga II 3.0 32b - Chiron 1.5 (2845)		57.5	-	92.5		38.33%		Perf=2763
That is a difference of 116 Elo* - which cant be explained by statistics! This is simply the difference that occurs because of the playing style suits better or not. So:
1. You can't draw conclusions out of 150 games against ONE opponent
2. You only get a valid average result if you run against a higher number of opponents. (which makes lists with many times the same opponent and just a few others doubtfull)

Of course, if you are just interested in how does the engine perform against one particular engine you might consider the result interesting ...

Bye
Ingo

*some engines even have a bigger performance gap than 116 Elo between opponents
I think that it is possible to get conclusions based on 150 games against a single opponent and the question is what are the conclusions.

A program may perform better against one opponent because of playing style but there is a limit for it.

If I see for example result like 120-30 I can be practically sure that the winner is better.
If I see for example result like 90-60 I cannot be practically sure that the winner is better but I can be practically sure that the winner is not more than 100 elo worse than the loser.

These conclusions are based on common sense and previous experience and I was relatively careful here.

I believe that there is no single practical case when A score even 70% against B in 150 games when A score worse than B against C.

Note that C should have a similiar rating to the average of A and B(not more than 100 elo difference).

If you think that I am wrong then I would like to see a single case including the names of the programs A,B,C so everybody can reproduce the results(with possible small differences).

Note that I do not claim that it is impossible to build programs A,B,C when it happens but only that it practically does not happen.



You are NOT wrong, my friend. I could not have said it any better!



Best,

george

Re: Toga II 3.0 released

Posted: Mon Mar 25, 2013 4:28 pm
by IWB
Uri Blass wrote:
I think that it is possible to get conclusions based on 150 games against a single opponent and the question is what are the conclusions.

A program may perform better against one opponent because of playing style but there is a limit for it.

If I see for example result like 120-30 I can be practically sure that the winner is better.
If I see for example result like 90-60 I cannot be practically sure that the winner is better but I can be practically sure that the winner is not more than 100 elo worse than the loser.

These conclusions are based on common sense and previous experience and I was relatively careful here.

I believe that there is no single practical case when A score even 70% against B in 150 games when A score worse than B against C.

Note that C should have a similiar rating to the average of A and B(not more than 100 elo difference).

If you think that I am wrong then I would like to see a single case including the names of the programs A,B,C so everybody can reproduce the results(with possible small differences).

Note that I do not claim that it is impossible to build programs A,B,C when it happens but only that it practically does not happen.
I have no idea why people are so nitpicking. Your example of 70% is about 150 Elo difference (or 105 to 45 in a 150 game match) while in the given case we have a 50 Elo difference.
So yes, there are case where you can draw conclusions,
1. Where you know the result in advance and
2. Where you have differences which are huge.

The Toga example shows quite nice a shown 50 Elo difference is in reality 0 and that goes in both directions. Where Toga underperformed roughly 70 Elo against Zappa it overperformed ~40 Elo vs Chiron. If you guys are happy with a 150 Elo (70%) accuracy, than fine you can draw conclusions, for myself this is not good enough.

Best
Ingo

EDIT: All the data of the IPON is online, you can search for any best/worst perfomance there, and I am sure there are examples with more than 116 Elo ... (Toga will be updated in a few hours)