CCRL update (11th July 2008)

Discussion of computer chess matches and engine tournaments.

Moderator: Ras

User avatar
Graham Banks
Posts: 45320
Joined: Sun Feb 26, 2006 10:52 am
Location: Auckland, NZ

Re: CCRL update (11th July 2008)

Post by Graham Banks »

Norm Pollock wrote:Hi Graham,

I noticed that there are 6 "Beta" versions of Toga in the 40/40 lists. I did not see any "Beta" versions for other engines. Why only Toga?

And why is the "Beta" testing included in the engine ratings? Why doesn't CCRL wait until the final released version of each Toga version is out before doing an engine rating test?

-Norm
When we first started, we had a policy to exclude the testing of betas and private engines.
However, we relaxed that rule once we got our lists well established.
Shaun has always been a beta tester for Toga and that largely explains the proliferation of Toga testing in particular.

Regards, Graham.
gbanksnz at gmail.com
Norm Pollock
Posts: 1079
Joined: Thu Mar 09, 2006 4:15 pm
Location: Long Island, NY, USA

Re: CCRL update (11th July 2008)

Post by Norm Pollock »

Graham Banks wrote:
Norm Pollock wrote:Hi Graham,

I noticed that there are 6 "Beta" versions of Toga in the 40/40 lists. I did not see any "Beta" versions for other engines. Why only Toga?

And why is the "Beta" testing included in the engine ratings? Why doesn't CCRL wait until the final released version of each Toga version is out before doing an engine rating test?

-Norm
When we first started, we had a policy to exclude the testing of betas and private engines.
However, we relaxed that rule once we got our lists well established.
Shaun has always been a beta tester for Toga and that largely explains the proliferation of Toga testing in particular.

Regards, Graham.
Hi Graham,

From my POV, you were right when you started, especially with regards to Beta testing. I think that Beta testing mixed with engine rating testing makes a strange broth.

-Norm
User avatar
Graham Banks
Posts: 45320
Joined: Sun Feb 26, 2006 10:52 am
Location: Auckland, NZ

Re: CCRL update (11th July 2008)

Post by Graham Banks »

Norm Pollock wrote:
Graham Banks wrote:
Norm Pollock wrote:Hi Graham,

I noticed that there are 6 "Beta" versions of Toga in the 40/40 lists. I did not see any "Beta" versions for other engines. Why only Toga?

And why is the "Beta" testing included in the engine ratings? Why doesn't CCRL wait until the final released version of each Toga version is out before doing an engine rating test?

-Norm
When we first started, we had a policy to exclude the testing of betas and private engines.
However, we relaxed that rule once we got our lists well established.
Shaun has always been a beta tester for Toga and that largely explains the proliferation of Toga testing in particular.

Regards, Graham.
Hi Graham,

From my POV, you were right when you started, especially with regards to Beta testing. I think that Beta testing mixed with engine rating testing makes a strange broth.

-Norm
Hi Norm,

this has been discussed before and there are differing points of view, as there are with regards to private engines also.

The other thing we discussed when we started was whether or not to adopt a scientific approach to our testing, that is each engine playing a set number of games against all other engines in the list.
However, it was decided that freedom of testing was a big part of the fun aspect of testing - being able to run tournaments, etc.

The whole concept of CCRL was to gather together a group of engine testers to pool their results, thereby establishing a more meaningful rating list than could be produced as individuals.

Cheers, Graham.
gbanksnz at gmail.com
Shaun
Posts: 323
Joined: Wed Mar 08, 2006 9:55 pm
Location: Brighton - UK

Re: CCRL update (11th July 2008)

Post by Shaun »

Graham Banks wrote:
IWB wrote:
IWB wrote:Hello Graham.

as you know I am interested in Shredder, because of that I had a closer look into your statistics
http://www.computerchess.org.uk/ccrl/40 ... 4-bit_1CPU

Do I interprete that list correct? You include a couple of opponents where Shredder played just ONE single game? If so I think that is, regardless if a win or a loss, at least "strange".

On the other hand the opponent with the most games is Naum (3.1 and 2.2 = 112 games out of 368, roughly 1/3 of all games). If you now see that Shredder underperformes vs this major part of its opponents the rating is correct, but too low compared to ALL opponents. (I am not blaming Naum for beeing better in a direct comparision than Shredder!)

This criticism is not because it is Shredder, I just checked there. It is most likely valid for some other engines to the positiv or to the negative direction. I strongly think that for a usefull rating list the number of opponents should be as high as possible AND the number of games vs these opponents should be equal.

Nevertheless I have all respect for the huge work you as a team are doing! Thanks for that.

Bye
Ingo
... and of course thanks for playing more games with Shredder! I see that you as a team did the effort in the last week and I appreciate that!

One more question, as I see that the rating of Shredder dropped during the latest testing, is there a possibility to see vs what opponent these recent games where played?

Thx again,
Ingo
Hi Ingo,

I'll get Shaun to answer your questions as he is the one carrying out further Shredder 11 64-bit 1CPU testing.

Regards, Graham.
Hi Ingo,

I think it is fair to say that the single CPU list needs some work...

The has been a lot of focus on the quad testing and if you look at the engines there, things are much better.

I will be running less quad games over the coming weeks with a focus on the single CPU list to help sort this.

However as always it is difficult to juggle priorities as there are so many engines that deserve testing.

Shaun

P.S. Thanks for bringing this to our attention as the 1CPU rating list is critical too.
Shaun
Posts: 323
Joined: Wed Mar 08, 2006 9:55 pm
Location: Brighton - UK

Re: CCRL update (11th July 2008)

Post by Shaun »

Graham Banks wrote:
ernest wrote:Graham, concerning Naum, what is Naum 3.1 64-bit PTnormal? :o
Thanks
Hi Ernest,

I'll get Shaun to explain. :P

Regards, Graham.
Hi Ernest,

this is one of the setting in Naum 3.1 changed from default.

Pruning Type (The default is 'smart').

Shaun
Shaun
Posts: 323
Joined: Wed Mar 08, 2006 9:55 pm
Location: Brighton - UK

Re: CCRL update (11th July 2008)

Post by Shaun »

Norm Pollock wrote:
Graham Banks wrote:
Norm Pollock wrote:Hi Graham,

I noticed that there are 6 "Beta" versions of Toga in the 40/40 lists. I did not see any "Beta" versions for other engines. Why only Toga?

And why is the "Beta" testing included in the engine ratings? Why doesn't CCRL wait until the final released version of each Toga version is out before doing an engine rating test?

-Norm
When we first started, we had a policy to exclude the testing of betas and private engines.
However, we relaxed that rule once we got our lists well established.
Shaun has always been a beta tester for Toga and that largely explains the proliferation of Toga testing in particular.

Regards, Graham.
Hi Graham,

From my POV, you were right when you started, especially with regards to Beta testing. I think that Beta testing mixed with engine rating testing makes a strange broth.

-Norm
Hi Norm,

a couple of quick points - you can exclude betas and private versions by looking at the pure rating lists.

To test/include beta's or not is a difficult subject:

some authors release many versions and none of them are declared betas other authors seem reluctant to remove the beta status.

With the 40/40 list we tend to not jump on the latest beta and wait for the blitz results first hence you will find many more version and settings tested in the blitz list.

However again the pure list removes any possible distortion caused by too many of the same engine family being present.

And although the most obvious as beta is in the name, Toga is not the only beta tested in the 40/40 list for instance Glaurung 2 epsilon/5 32-bit was a beta as is Pro Deo 1.6b and there are bound to be more.

all the best

Shaun
IWB
Posts: 1539
Joined: Thu Mar 09, 2006 2:02 pm

Re: CCRL update (11th July 2008)

Post by IWB »

[/quote]

Hi Ingo,

I think it is fair to say that the single CPU list needs some work...

The has been a lot of focus on the quad testing and if you look at the engines there, things are much better.

I will be running less quad games over the coming weeks with a focus on the single CPU list to help sort this.

However as always it is difficult to juggle priorities as there are so many engines that deserve testing.

Shaun

P.S. Thanks for bringing this to our attention as the 1CPU rating list is critical too.[/quote]

Thank you Shaun,

I like a single list as it shows best the improvememnts in an engine in search and eval - that's why my attension is there.

Yes, it may need some work, but at least if you fix balance one engine it automaticly balances another one as well - and it is faster than the Quad list :-)

Thanks for all your work again
Ingo