CEGT - rating lists March 11th 2012

Discussion of computer chess matches and engine tournaments.

Moderators: hgm, Rebel, chrisw

ThatsIt
Posts: 991
Joined: Thu Mar 09, 2006 2:11 pm

Re: CEGT - rating lists March 11th 2012

Post by ThatsIt »

IWB wrote:
[...snip...]
ThatsIt wrote:
IWB wrote: Are you using books or starting positions for that?
Both.
THAT is the biggest disadvantage as a test is not repeatable then! Engine A vs B is different than Engine A vs C! If the RATING is different (especially with enough game) is a complete different question.
Hi Ingo !

Even if you always use one and the same testset you
will never get the same result(s).
Do the following:
play a match Engine A vs Engine B with your testset.
After that reboot the machine and do exactly the same again.
I predict:
a.) you will not get the same result (+- 5%)
b.) you will not get the same games (probably more than 10%-15% different games ?)

Best wishes,
G.S.
ThatsIt
Posts: 991
Joined: Thu Mar 09, 2006 2:11 pm

Re: CEGT - rating lists March 11th 2012

Post by ThatsIt »

geots wrote: I have been patiently waiting for your updated blitz list. Thanks, Gerhard.
george
You are welcome, George.
btw.:
the blitz-list is made by Wolfgang, Werner and me.

Best wishes,
G.S.
ernest
Posts: 2041
Joined: Wed Mar 08, 2006 8:30 pm

Re: CEGT - rating lists March 11th 2012

Post by ernest »

ThatsIt wrote: you will not get the same games (probably more than 10%-15% different games ?)
Hi Gerhard,

What do you mean by same games?
Obviously at some point (15th move?, 20th move?...) the move will be different, even with 1 thread, mainly because the TC cannot work twice exactly the same (small timing differences)...
ThatsIt
Posts: 991
Joined: Thu Mar 09, 2006 2:11 pm

Re: CEGT - rating lists March 11th 2012

Post by ThatsIt »

ernest wrote:
ThatsIt wrote: you will not get the same games (probably more than 10%-15% different games ?)
What do you mean by same games?
Obviously at some point (15th move?, 20th move?...) the move will be different, even with 1 thread, mainly because the TC cannot work twice exactly the same (small timing differences)...
Thats the point, Ernest.
And sometimes (4-5% maybe ?) the result will be different.

Keep in mind, Ingo wrote:
"THAT is the biggest disadvantage as a test is not repeatable then!
Engine A vs B is different than Engine A vs C! If the RATING is
different (especially with enough game) is a complete different question.

Best wishes,
G.S.
IWB
Posts: 1539
Joined: Thu Mar 09, 2006 2:02 pm

Re: CEGT - rating lists March 11th 2012

Post by IWB »

ThatsIt wrote:
ernest wrote:
ThatsIt wrote: you will not get the same games (probably more than 10%-15% different games ?)
What do you mean by same games?
Obviously at some point (15th move?, 20th move?...) the move will be different, even with 1 thread, mainly because the TC cannot work twice exactly the same (small timing differences)...
Thats the point, Ernest.
And sometimes (4-5% maybe ?) the result will be different.

Keep in mind, Ingo wrote:
"THAT is the biggest disadvantage as a test is not repeatable then!
Engine A vs B is different than Engine A vs C! If the RATING is
different (especially with enough game) is a complete different question.
The main difference is that in one case the engines "decide" to play something different, in the other case YOU do that. Even if the result might be similar it is a conceptional mistake as YOU should interfere with the result as little as possible!

Bye
Ingo
IWB
Posts: 1539
Joined: Thu Mar 09, 2006 2:02 pm

Re: CEGT - rating lists March 11th 2012

Post by IWB »

ThatsIt wrote:
Even if you always use one and the same testset you
will never get the same result(s).
Not the same, but playing many games nearly the same (with my 2000+ games for sure much less than 5%).
ThatsIt wrote: Do the following:
play a match Engine A vs Engine B with your testset.
After that reboot the machine and do exactly the same again.
I predict:
a.) you will not get the same result (+- 5%)
Hmm, it was around 5% with 100 games, my guess (not checked) is that I am below 5% with 150 now.
ThatsIt wrote: b.) you will not get the same games (probably more than 10%-15% different games ?)
Actually I guess that I will get MUCH higher rates of different games!

But that is not the point. If you want a structured testing you have to have conditions which are repeatable (as good as possible, we all have to compromise sometimes but openings are not nessesary and do not belong to that compromise). If the engines decide to do something different, fine. If you "make" it different, that is a wrong concept of testing!

Bye
Ingo
Norm Pollock
Posts: 1056
Joined: Thu Mar 09, 2006 4:15 pm
Location: Long Island, NY, USA

Re: CEGT - rating lists March 11th 2012

Post by Norm Pollock »

Hi Werner,

I have trouble uncompressing cegttotal.zip for 40/20 for both 3/11 and 3/18 downloads.

7-zip file manager says "unsupported compression method for cegttotal.pgn".

-Norm
ThatsIt
Posts: 991
Joined: Thu Mar 09, 2006 2:11 pm

Re: CEGT - rating lists March 11th 2012

Post by ThatsIt »

IWB wrote: [...snip...]
But that is not the point. If you want a structured testing you have to have conditions which are repeatable (as good as possible, we all have to compromise sometimes but openings are not nessesary and do not belong to that compromise). If the engines decide to do something different, fine. If you "make" it different, that is a wrong concept of testing!
I do not agree, but thats unimportant. We're talking about +- 5 points!

Best wishes,
G.S.
IWB
Posts: 1539
Joined: Thu Mar 09, 2006 2:02 pm

Re: CEGT - rating lists March 11th 2012

Post by IWB »

ThatsIt wrote:
IWB wrote: [...snip...]
But that is not the point. If you want a structured testing you have to have conditions which are repeatable (as good as possible, we all have to compromise sometimes but openings are not nessesary and do not belong to that compromise). If the engines decide to do something different, fine. If you "make" it different, that is a wrong concept of testing!
I do not agree, but thats unimportant. We're talking about +- 5 points!

Best wishes,
G.S.
There is nothing to "agree or not" except you deny the basic principles of sientific work! :-)

However, you are right with the result of +/- 5 Elo.

Bye
INgo