FRC - Hiarcs 11.2 completed (24 ELO weaker than 11.1)

Discussion of computer chess matches and engine tournaments.

Moderator: Ras

Spock

FRC - Hiarcs 11.2 completed (24 ELO weaker than 11.1)

Post by Spock »

Yes, unfortunately in common with many other ratings list, including our own 40/4 standard chess list, Hiarcs 11.2 has turned in a performance here worse than 11.1 and also worse than 11.

So Hiarcs 11.1 retains it's spot at the top of the pure list and best versions list.

A definite pattern can be seen with 11.2 - it underperformed against the stronger engines whilst over-performing against weaker opponents.

Most unfortunate, and the first time in the history of this list that a new engine version has performed worse than it's predecessor. I do not however expect a repeat of this from Hiarcs, and I'm sure that the Hiarcs team will be back fighting with Hiarcs 12 :)

The main list is here
http://www.computerchess.org.uk/ccrl/404FRC/index.html

But you'll need to look at the "Complete List" to find Hiarcs 11.2

Scores against common opponents can be seen here

.
pichy
Posts: 2564
Joined: Thu Mar 09, 2006 3:04 am

Re: FRC - Hiarcs 11.2 completed (24 ELO weaker than 11.1)

Post by pichy »

Spock wrote:Yes, unfortunately in common with many other ratings list, including our own 40/4 standard chess list, Hiarcs 11.2 has turned in a performance here worse than 11.1 and also worse than 11.

So Hiarcs 11.1 retains it's spot at the top of the pure list and best versions list.

A definite pattern can be seen with 11.2 - it underperformed against the stronger engines whilst over-performing against weaker opponents.

Most unfortunate, and the first time in the history of this list that a new engine version has performed worse than it's predecessor. I do not however expect a repeat of this from Hiarcs, and I'm sure that the Hiarcs team will be back fighting with Hiarcs 12 :)

The main list is here
http://www.computerchess.org.uk/ccrl/404FRC/index.html

But you'll need to look at the "Complete List" to find Hiarcs 11.2

Scores against common opponents can be seen here

.

What would you estimate the Newer Rybka rating will be in FRC, close to 2985 :?:
Spock

Re: FRC - Hiarcs 11.2 completed (24 ELO weaker than 11.1)

Post by Spock »

pichy wrote:
What would you estimate the Newer Rybka rating will be in FRC, close to 2985 :?:
Not sure - the 32-bit version may struggle to get to that level, but the 64-bit version certainly at least that
User avatar
Eelco de Groot
Posts: 4669
Joined: Sun Mar 12, 2006 2:40 am
Full name:   Eelco de Groot

Re: FRC - Hiarcs 11.2 completed (24 ELO weaker than 11.1)

Post by Eelco de Groot »

Thanks for testing Hiarcs 11.2 Ray! Even if the result is a bit unexpected it is important to know how strong the new version is.

I think that Mark should go back trying to improve Hiarcs in Normal Playing Style, so that we would just have to switch on Hypermodern to get some extra elopoints. That seemed to work better :)

By the way Ray, I think it may also be a good thing that in your FRC testings all engines play each other even if the Elo rating shows a big gap. If the Elo rating system works as it should, also for computer chess, it should of course not matter at all how large the difference between opponents is. And if it does make a difference for the final results it is possible that playing all against all, or something approaching this, is still the best way, although the picture comparing versions that are closer in the list may suffer a little the integrity of the whole list is probably better, especially with the big amount of games that you are playing to reduce randomness.

I think it is possible Harm Geert Muller might say something similar but at the moment I believe he is more busy with discussing multithreading issues with Robert Hyatt :)

A propos, maybe you or Jorge knows a bit more this, I could not find anything about whether the Chess960 version of Rybka will be separately available or maybe just an option for Rybka 3.0? Maybe I missed it if Vasik Rajlich already said something about this? I have not yet bought Rybka 2.3 but the 3.0 is still such a long time away to wait for :?

Thanks for doing your tests Ray!

Fischer Random Regards,
Eelco
Spock

Re: FRC - Hiarcs 11.2 completed (24 ELO weaker than 11.1)

Post by Spock »

Thanks for your interest in the FRC list :)

I currently play pairs where the ELO gap is <= 300 ELO. It is probably something I can't win on - if I play pairs with a bigger ELO gap, then some people will criticise that, in fact have criticised CCRL in the past for it. However you would like to see them played... Is it really meaningful for example for Hiarcs to play Ayito, a 500+ difference ? I'm certainly up for the discussion and playing the games if on balance that is what the audience for this list would prefer. The FRC list is a bit unique and doesn't have to necessarily follow the same rules as our standard chess list.

The next development on the list is to include 64-bit engines. Currently it is 32-bit only, only because that is the only machine I had spare when I started this list. Now all my machines are 64-bit, so I want to "upgrade" it. So I'll soon be playing Naum x64 and Glaurung x64

Rybka 2.3.2 960 is a private engine. Rybka 3.0 will be the first public version to support FRC.

I have Rybka 2.3.2 960, and it is testing now. The 32-bit results will be on the list within the next 24 hrs or so hopefully. Then the 64-bit version together with 64-bit Naum and Glaurung as above.
Uri Blass
Posts: 10890
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: FRC - Hiarcs 11.2 completed (24 ELO weaker than 11.1)

Post by Uri Blass »

I think that no result is meaningless.
You are free to play matches when the difference is high(it may be interesting what is the minimal difference in rating when we are going to see 100-0 result) and you are also free to play matches at longer time control so we can see if we get different ranking at different time control.

It may be possible that deep sjeng is better than glaurung1.2.1 at 40/40

7 Glaurung 1.2.1 2767 +14 −14 48.0% +9.9 21.6% 2000
98.6%
8 Deep Sjeng 2.5 1CPU 2745 +14 −14 47.0% +15.6 22.0% 1800


Note that in normal 40/40 it seems to be the case that Deep Sjeng earned 70 elo from transition from 40/4 to 40/40 when glaurung only earned 5 elo.

It also may be possible that naum has a better place at 40/40 because this engine seems to perform better at longer time control based on CEGT
and CCRL.

Uri
Andrew
Posts: 231
Joined: Thu Mar 09, 2006 12:51 am
Location: Australia

Re: FRC - Hiarcs 11.2 completed (24 ELO weaker than 11.1)

Post by Andrew »

But isn't it true that 11.2 was just meant to fix up a few problems, and
wasn't meant to be an improvement on 11.1 ??

Also the errors on your list for both versions are +- 16. and +-20 If this is 1 standard deviation, then the observed difference has no statistical significance. "Most unfortunate" isn't a fair appraisal.

Andrew
Spock wrote:Yes, unfortunately in common with many other ratings list, including our own 40/4 standard chess list, Hiarcs 11.2 has turned in a performance here worse than 11.1 and also worse than 11.

So Hiarcs 11.1 retains it's spot at the top of the pure list and best versions list.

A definite pattern can be seen with 11.2 - it underperformed against the stronger engines whilst over-performing against weaker opponents.

Most unfortunate, and the first time in the history of this list that a new engine version has performed worse than it's predecessor. I do not however expect a repeat of this from Hiarcs, and I'm sure that the Hiarcs team will be back fighting with Hiarcs 12 :)

The main list is here
http://www.computerchess.org.uk/ccrl/404FRC/index.html

But you'll need to look at the "Complete List" to find Hiarcs 11.2

Scores against common opponents can be seen here

.
Dirt
Posts: 2851
Joined: Wed Mar 08, 2006 10:01 pm
Location: Irvine, CA, USA

Re: FRC - Hiarcs 11.2 completed (24 ELO weaker than 11.1)

Post by Dirt »

Andrew wrote:But isn't it true that 11.2 was just meant to fix up a few problems, and
wasn't meant to be an improvement on 11.1 ??

Also the errors on your list for both versions are +- 16. and +-20 If this is 1 standard deviation, then the observed difference has no statistical significance. "Most unfortunate" isn't a fair appraisal.

Andrew
The standard in chess ratings seems to be two standard deviations (or 95% confidence). There would still a fair chance of the difference being statistical error, but 11.2 is rated lower in some standard chess rating lists. A lower rating in FRC too isn't surprising.
Uri Blass
Posts: 10890
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: FRC - Hiarcs 11.2 completed (24 ELO weaker than 11.1)

Post by Uri Blass »

Dirt wrote:
Andrew wrote:But isn't it true that 11.2 was just meant to fix up a few problems, and
wasn't meant to be an improvement on 11.1 ??

Also the errors on your list for both versions are +- 16. and +-20 If this is 1 standard deviation, then the observed difference has no statistical significance. "Most unfortunate" isn't a fair appraisal.

Andrew
The standard in chess ratings seems to be two standard deviations (or 95% confidence). There would still a fair chance of the difference being statistical error, but 11.2 is rated lower in some standard chess rating lists. A lower rating in FRC too isn't surprising.
A lower rating in FRC is certainly surprising because 11.2 was clearly supposed to be an improvement:

http://64.68.157.89/forum/viewtopic.php ... 24&t=15384

Harvey Williamson


No it is not the version that played in last weekends event.

It is an improved version of 11.1 with a few enhancements.
Dirt
Posts: 2851
Joined: Wed Mar 08, 2006 10:01 pm
Location: Irvine, CA, USA

Re: FRC - Hiarcs 11.2 completed (24 ELO weaker than 11.1)

Post by Dirt »

Uri Blass wrote:
Dirt wrote:
Andrew wrote:But isn't it true that 11.2 was just meant to fix up a few problems, and
wasn't meant to be an improvement on 11.1 ??

Also the errors on your list for both versions are +- 16. and +-20 If this is 1 standard deviation, then the observed difference has no statistical significance. "Most unfortunate" isn't a fair appraisal.

Andrew
The standard in chess ratings seems to be two standard deviations (or 95% confidence). There would still a fair chance of the difference being statistical error, but 11.2 is rated lower in some standard chess rating lists. A lower rating in FRC too isn't surprising.
A lower rating in FRC is certainly surprising because 11.2 was clearly supposed to be an improvement:

http://64.68.157.89/forum/viewtopic.php ... 24&t=15384

Harvey Williamson


No it is not the version that played in last weekends event.

It is an improved version of 11.1 with a few enhancements.
I think that was before the testing by the major testing groups. Since it is now seen to be performing worse in standard chess, why do you think it is surprising it is also performing worse in FRC?