Komodo 5 release now available!

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

User avatar
geots
Posts: 4790
Joined: Sat Mar 11, 2006 12:42 am

Re: Komodo 5 release now available!

Post by geots »

lkaufman wrote:Some comments:
1. Komodo should outsearch Houdini but not Stockfish because Komodo is in between them (assuming Houdini is same as Ivanhoe in this matter) in how much we reduce in general.
2. Use of a score of one pawn as a threshold seems way too high. In some programs such a score means over 90% chance to win.
3. Use of any score to cancel games is a potential bias, because it depends on which program reports the score. A given score has different meanings in different programs. Scores run nearly double in Stockfish compared to Houdini. As in most things, Komodo is in the middle.
4. It is far better to test with "testsuites" or opening books designed for such tests, rather than generic ones. Then you needn't worry about the score out of book, this has already been done for you.


Yes, I am sure you are correct about the test suites- but life is too short to spend much of it doing something you dislike. And I dislike the test suites. I am afraid the engines will have to suffer thru a generic book with me.

I will go along with a lot- but to worry about +1.00 meaning 2 different things to 2 different engines is one I am not going to get all tangled up in and let it affect the way I test. To me that is overkill. Actually in a 12 move limit you could give an engine a win every time he comes out of the opening showing +1.00 and I doubt in the end it would change the elo diff. a full point.

But there is another issue that is aggravating me more than anything else right now. My machine is fast enough it benchmarks 40/40 to run at 40/21. So it is not my system that is the problem. I am fast getting to the point where I don't know how much longer I can post results here. I click on "submit" and have to sit here watching the little circle in top left turn counter clockwise for about 2 minutes sometimes before it decides to change course and head the other way. This forum needs work. But that is in the other section posting results where you mostly suffer with that. And it started not long ago and keeps getting worse. Maybe someone is late paying the bill.


Best,

george
Uri Blass
Posts: 10409
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: Komodo 5 release now available!

Post by Uri Blass »

I think that it is better simply not to allow start positions when the evaluation by some accepted strong program is more than +1 in the first place

Note that what surprise me is that we have almost no games between 1 cpu and 4 cpu

It gives me the conclusion that
If I decide to be a tester I am going to test only 1 cpu against 4 cpu because I think that if we have 2 teams of computers with almost no games between them then it may means distortion of the rating list when the difference between 4 cpu and 1 cpu in the rating list may be wrong.

Note that there are games of komodo4 against 4 cpu but I do not see games of Houdini1.5a 64 bits against 4 cpu and for some reason people test houdini32 bit only against 32 bits program that may distort the rating of houdini32 bit because it does not get stronger opponents like stockfish2.2.2 64 bits 4 cpu.
User avatar
Don
Posts: 5106
Joined: Tue Apr 29, 2008 4:27 pm

Re: Komodo 5 release now available!

Post by Don »

Uri Blass wrote:I think that it is better simply not to allow start positions when the evaluation by some accepted strong program is more than +1 in the first place
The problem with that is bias - you should not select openings based on the opinion of one program.

However it might make sense to vote among a number of programs (that are derived from each other) for such a purpose.

Note that what surprise me is that we have almost no games between 1 cpu and 4 cpu

It gives me the conclusion that
If I decide to be a tester I am going to test only 1 cpu against 4 cpu because I think that if we have 2 teams of computers with almost no games between them then it may means distortion of the rating list when the difference between 4 cpu and 1 cpu in the rating list may be wrong.

Note that there are games of komodo4 against 4 cpu but I do not see games of Houdini1.5a 64 bits against 4 cpu and for some reason people test houdini32 bit only against 32 bits program that may distort the rating of houdini32 bit because it does not get stronger opponents like stockfish2.2.2 64 bits 4 cpu.
I dislike testing between opponent of widely disparate strengths. It's a big waste of time testing against a program that is 500 ELO weaker for example - time would be better spend playing more games with opponents that are closer together.
Capital punishment would be more effective as a preventive measure if it were administered prior to the crime.
Uri Blass
Posts: 10409
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: Komodo 5 release now available!

Post by Uri Blass »

Don wrote:
Uri Blass wrote:I think that it is better simply not to allow start positions when the evaluation by some accepted strong program is more than +1 in the first place
The problem with that is bias - you should not select openings based on the opinion of one program.

However it might make sense to vote among a number of programs (that are derived from each other) for such a purpose.

Note that what surprise me is that we have almost no games between 1 cpu and 4 cpu

It gives me the conclusion that
If I decide to be a tester I am going to test only 1 cpu against 4 cpu because I think that if we have 2 teams of computers with almost no games between them then it may means distortion of the rating list when the difference between 4 cpu and 1 cpu in the rating list may be wrong.

Note that there are games of komodo4 against 4 cpu but I do not see games of Houdini1.5a 64 bits against 4 cpu and for some reason people test houdini32 bit only against 32 bits program that may distort the rating of houdini32 bit because it does not get stronger opponents like stockfish2.2.2 64 bits 4 cpu.
I dislike testing between opponent of widely disparate strengths. It's a big waste of time testing against a program that is 500 ELO weaker for example - time would be better spend playing more games with opponents that are closer together.
I understand but the difference bwteen houdini1.5 32 bit and stockfish2.2.2 64 bits 4 cpu is clearly less than 500 elo and even less than 100 elo.
User avatar
Don
Posts: 5106
Joined: Tue Apr 29, 2008 4:27 pm

Re: Komodo 5 release now available!

Post by Don »

Uri Blass wrote:
Don wrote:
Uri Blass wrote:I think that it is better simply not to allow start positions when the evaluation by some accepted strong program is more than +1 in the first place
The problem with that is bias - you should not select openings based on the opinion of one program.

However it might make sense to vote among a number of programs (that are derived from each other) for such a purpose.

Note that what surprise me is that we have almost no games between 1 cpu and 4 cpu

It gives me the conclusion that
If I decide to be a tester I am going to test only 1 cpu against 4 cpu because I think that if we have 2 teams of computers with almost no games between them then it may means distortion of the rating list when the difference between 4 cpu and 1 cpu in the rating list may be wrong.

Note that there are games of komodo4 against 4 cpu but I do not see games of Houdini1.5a 64 bits against 4 cpu and for some reason people test houdini32 bit only against 32 bits program that may distort the rating of houdini32 bit because it does not get stronger opponents like stockfish2.2.2 64 bits 4 cpu.
I dislike testing between opponent of widely disparate strengths. It's a big waste of time testing against a program that is 500 ELO weaker for example - time would be better spend playing more games with opponents that are closer together.
I understand but the difference bwteen houdini1.5 32 bit and stockfish2.2.2 64 bits 4 cpu is clearly less than 500 elo and even less than 100 elo.
You are citing an example of where it is not so I don't understand your point. Of course there are examples of where it's not.

I'm talking about rating lists in general, playing every program against every program. If we are trying to rate Carlsen, should he have to play the same number of games against you and I as he would the top players?

Don
Capital punishment would be more effective as a preventive measure if it were administered prior to the crime.
User avatar
Dan Honeycutt
Posts: 5258
Joined: Mon Feb 27, 2006 4:31 pm
Location: Atlanta, Georgia

Re: Komodo 5 release now available!

Post by Dan Honeycutt »

geots wrote:But there is another issue that is aggravating me more than anything else right now. My machine is fast enough it benchmarks 40/40 to run at 40/21. So it is not my system that is the problem. I am fast getting to the point where I don't know how much longer I can post results here. I click on "submit" and have to sit here watching the little circle in top left turn counter clockwise for about 2 minutes sometimes before it decides to change course and head the other way. This forum needs work. But that is in the other section posting results where you mostly suffer with that. And it started not long ago and keeps getting worse. Maybe someone is late paying the bill.


Best,

george
Hi George,

I doubt the problem is your machine, more likely it is your internet connection or service provider. You may want to post something in Help and Suggestions where Sam is likely to see it. Maybe he can offer some suggestions.

Best
Dan H.
Uri Blass
Posts: 10409
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: Komodo 5 release now available!

Post by Uri Blass »

Don wrote:
Uri Blass wrote:
Don wrote:
Uri Blass wrote:I think that it is better simply not to allow start positions when the evaluation by some accepted strong program is more than +1 in the first place
The problem with that is bias - you should not select openings based on the opinion of one program.

However it might make sense to vote among a number of programs (that are derived from each other) for such a purpose.

Note that what surprise me is that we have almost no games between 1 cpu and 4 cpu

It gives me the conclusion that
If I decide to be a tester I am going to test only 1 cpu against 4 cpu because I think that if we have 2 teams of computers with almost no games between them then it may means distortion of the rating list when the difference between 4 cpu and 1 cpu in the rating list may be wrong.

Note that there are games of komodo4 against 4 cpu but I do not see games of Houdini1.5a 64 bits against 4 cpu and for some reason people test houdini32 bit only against 32 bits program that may distort the rating of houdini32 bit because it does not get stronger opponents like stockfish2.2.2 64 bits 4 cpu.
I dislike testing between opponent of widely disparate strengths. It's a big waste of time testing against a program that is 500 ELO weaker for example - time would be better spend playing more games with opponents that are closer together.
I understand but the difference bwteen houdini1.5 32 bit and stockfish2.2.2 64 bits 4 cpu is clearly less than 500 elo and even less than 100 elo.
You are citing an example of where it is not so I don't understand your point. Of course there are examples of where it's not.

I'm talking about rating lists in general, playing every program against every program. If we are trying to rate Carlsen, should he have to play the same number of games against you and I as he would the top players?

Don
I did not suggest to play games between players with 500 elo difference so I do not understand your point and what is the reason that you mentioned that you dislike testing between opponent of widely disparate strengths.
User avatar
Don
Posts: 5106
Joined: Tue Apr 29, 2008 4:27 pm

Re: Komodo 5 release now available!

Post by Don »

Uri Blass wrote: I did not suggest to play games between players with 500 elo difference so I do not understand your point and what is the reason that you mentioned that you dislike testing between opponent of widely disparate strengths.
You did suggest that - not the 500 ELO value but the concept that you should play way up or down to get accurate ratings. You said that if you were a tester you would play 1 core programs against 4 core programs to get ratings. It would be silly to do that to get variety, criiter 1 core, critter 2 core, critter 3 core, critter 4 core - that's not variety. So you must believe that it's good to have programs playing much weaker or stronger programs. If you didn't mean that what did you mean?
Capital punishment would be more effective as a preventive measure if it were administered prior to the crime.
lkaufman
Posts: 5966
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA

Re: Komodo 5 release now available!

Post by lkaufman »

You and Don are talking about different things. You favor more pairings between dissimilar engines (32 bit vs 64 bit, 1 core vs 4 core), Don favors close matches. You can do both by pairing strong 32 bit or 1 core engines vs weaker 64 bit or 4 core engines. Then you will both be happy!
User avatar
Houdini
Posts: 1471
Joined: Tue Mar 16, 2010 12:00 am

Re: Komodo 5 release now available!

Post by Houdini »

rvida wrote:Why not implement ches960 support then? It would surely help to prove or disprove your hypothesis. Btw. looking at CCRL 40/4 FRC list, I might start spreading a hypothesis too :)... Also note the 100 elo gap between #2 and #3 (and between #4 - #5). It would be nice if more strong engines supported FRC.

Code: Select all

CCRL 404FRC Rating List - All engines, best versions only

Rank           Engine         ELO    +    -   Score  AvOp  Games
 1 Critter 1.6 64-bit         3289  +22  -22  76.7% -212.8   900
 2 Houdini 2.0 64-bit         3280  +18  -18  69.4% -156.8  1200
 3 Stockfish 2.2.2 64-bit     3182  +17  -17  60.2%  -81.7  1300
 4 Rybka 4 64-bit             3170  +14  -14  61.4%  -87.0  1800
 5 Naum 4.2 64-bit            3029  +12  -11  48.8%   +6.8  3100
 6 Shredder 12                3020  +12  -12  45.3%  +32.3  2900
A late reaction, but I've just run a test match showing that Critter 1.6a indeed appears to be slightly stronger than Houdini 2.0 in FRC - playing without opening book from the initial 960 positions with reversed colors.
After the 1920 games at 2'+2", single thread, the match result was 1010-910 for Critter (41% draws), scoring 52.6% or a performance of +18 Elo +/- 9 Elo. Congrats, Richard!

I'm now running a similar match against a pre-beta Houdini 3 DEV, results are quite different ;).

Robert