Stockfish 4 running for the IPON

Discussion of computer chess matches and engine tournaments.

Moderator: Ras

IWB
Posts: 1539
Joined: Thu Mar 09, 2006 2:02 pm

Re: Stockfish 4 running for the IPON

Post by IWB »

lkaufman wrote:... I think the explanation is "contempt"...
I fully agree here. Stockfish is that good that it would score better against a larger field of opponents with a contempt ...
On the other hand that is something which fires back as soon as real contenders are on the board. Rybka is an example on that. Since some engines are there which are on one level its rating declines much more than it would be nessesary and we have these "The emporer is naked" effect :-)

Anyhow, I test default setting as 99% of all users will run the engine tha way.

Bye
Ingo

PS: Maybe K-CCT might be better against Houdini without a contempt ;-)
Uri Blass
Posts: 11150
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: Stockfish 4 running for the IPON

Post by Uri Blass »

Note that contempt is only one explanation for doing better against weak opponents.

I did not watch the games so I do not know but I can think about an alternative explanation and maybe stockfish is relatively worse in converting advantages to win regardless of contempt and I saw cases when stockfish evaluated some drawn endgame as winning.

for example see the following position and give stockfish to search

Stockfish is happy to get it against weak opponents instead of winning against them.

[d]7k/8/8/8/8/7P/6K1/7B w - - 0 1
IWB
Posts: 1539
Joined: Thu Mar 09, 2006 2:02 pm

Re: Stockfish 4 running for the IPON

Post by IWB »

Uri Blass wrote:Note that contempt is only one explanation for doing better against weak opponents.

I did not watch the games so I do not know but I can think about an alternative explanation and maybe stockfish is relatively worse in converting advantages to win regardless of contempt and I saw cases when stockfish evaluated some drawn endgame as winning.

for example see the following position and give stockfish to search

Stockfish is happy to get it against weak opponents instead of winning against them.

[d]7k/8/8/8/8/7P/6K1/7B w - - 0 1
Good example! I saw quite of few of exactly this endgame!

Bye
Ingo
IWB
Posts: 1539
Joined: Thu Mar 09, 2006 2:02 pm

Re: Stockfish 4 running for the IPON

Post by IWB »

Stockfish 4 run is finished and online: http://www.inwoba.de

Very remarkable!

Bye
Ingo
mcostalba
Posts: 2684
Joined: Sat Jun 14, 2008 9:17 pm

Re: Stockfish 4 running for the IPON

Post by mcostalba »

lkaufman wrote:It seems at first that we have a bit of a mystery, in that Stockfish 4 beats Houdini 3 in your test but does worse against almost every other opponent.
Please let me help you to re-read the data. This is the table of the score improvements from SF 3 to SF 4 as downloaded from Ingo site.

Code: Select all

4 Stockfish 4               : 3016  3000 (+1660,=1113,-227), 73.9 %
8 Stockfish 3               : 2977  3450 (+1568,=1490,-392), 67.0 %

Houdini 3 STD                 : 37.3 -> 52.0
Komodo CCT                    : 41.3 -> 48.3 
Critter 1.4a                  : 51.7 -> 54.3
Deep Rybka 4.1                : 55.0 -> 59.3
Gull 2.1                      : 59.0 -> 61.0
Chiron 1.5                    : 68.0 -> 74.0
Protector 1.5.0               : 71.7 -> 77.7 
Naum 4.2                      : 71.3 -> 76.7
Hannibal 1.3                  : 72.7 -> 74.3 
Deep Fritz 13 32b             : 74.7 -> 77.3
HIARCS 14 WCSC 32b            : 71.3 -> 75.7
Deep Shredder 12              : 70.0 -> 77.3
Deep Sjeng c't 2010 32b       : 75.7 -> 81.7
Spike 1.4 32b                 : 82.0 -> 83.3
spark-1.0                     : 81.7 -> 80.3  *
Deep Junior 13.3              : 80.7 -> 82.0
Booot 5.2.0                   : 78.7 -> 80.3
Quazar 0.4                    : 84.7 -> 88.7
Toga II 3.0 32b               : 82.7 -> 87.7
Zappa Mexico II               : 81.0 -> 85.7
 
So SF has done terribly better against H3 and very well against Komodo, but has substantially improved against all the opponents (with the exception of spark). It is also interesting to note that the improvement is more or less equally distributed across all the range, no matter if strong or weak. For instance we have improved a lot also against Toga, Deep Sjeng and Shredder.

Ingo, thanks a lot for running your tournament in a timely and efficient way as usual, and sorry if the binary issue has caused you some trouble. I will try to manage it differently in the future.
IWB
Posts: 1539
Joined: Thu Mar 09, 2006 2:02 pm

Re: Stockfish 4 running for the IPON

Post by IWB »

Hello Marco,

I would not emphasize on the individual results. At the end that are just 150 games ...

Thanks for considering my proposal regarding the compiles.

Bye
Ingo
Vinvin
Posts: 5312
Joined: Thu Mar 09, 2006 9:40 am
Full name: Vincent Lejeune

Re: Stockfish 4 running for the IPON

Post by Vinvin »

mcostalba wrote:
lkaufman wrote:It seems at first that we have a bit of a mystery, in that Stockfish 4 beats Houdini 3 in your test but does worse against almost every other opponent.
Please let me help you to re-read the data. This is the table of the score improvements from SF 3 to SF 4 as downloaded from Ingo site.
...
Marco, I think you missed the point, I think Larry pointed what I saw too (but 1 word missing in the sentence to be clear) :
"It seems at first that we have a bit of a mystery, in that Stockfish 4 beats Houdini 3 in your test but does worse than Houdini 3 against almost every other opponent."

In other words : Stockfish 4 beat _ALL_ the opponents (including Houdini 3) but it's not on the top of the rank list.

Congratulation for this exceptional release !!!
Vincent
lkaufman
Posts: 6284
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA
Full name: Larry Kaufman

Re: Stockfish 4 running for the IPON

Post by lkaufman »

mcostalba wrote:
lkaufman wrote:It seems at first that we have a bit of a mystery, in that Stockfish 4 beats Houdini 3 in your test but does worse against almost every other opponent.
Please let me help you to re-read the data. This is the table of the score improvements from SF 3 to SF 4 as downloaded from Ingo site.

Code: Select all

4 Stockfish 4               : 3016  3000 (+1660,=1113,-227), 73.9 %
8 Stockfish 3               : 2977  3450 (+1568,=1490,-392), 67.0 %

Houdini 3 STD                 : 37.3 -> 52.0
Komodo CCT                    : 41.3 -> 48.3 
Critter 1.4a                  : 51.7 -> 54.3
Deep Rybka 4.1                : 55.0 -> 59.3
Gull 2.1                      : 59.0 -> 61.0
Chiron 1.5                    : 68.0 -> 74.0
Protector 1.5.0               : 71.7 -> 77.7 
Naum 4.2                      : 71.3 -> 76.7
Hannibal 1.3                  : 72.7 -> 74.3 
Deep Fritz 13 32b             : 74.7 -> 77.3
HIARCS 14 WCSC 32b            : 71.3 -> 75.7
Deep Shredder 12              : 70.0 -> 77.3
Deep Sjeng c't 2010 32b       : 75.7 -> 81.7
Spike 1.4 32b                 : 82.0 -> 83.3
spark-1.0                     : 81.7 -> 80.3  *
Deep Junior 13.3              : 80.7 -> 82.0
Booot 5.2.0                   : 78.7 -> 80.3
Quazar 0.4                    : 84.7 -> 88.7
Toga II 3.0 32b               : 82.7 -> 87.7
Zappa Mexico II               : 81.0 -> 85.7
 
So SF has done terribly better against H3 and very well against Komodo, but has substantially improved against all the opponents (with the exception of spark). It is also interesting to note that the improvement is more or less equally distributed across all the range, no matter if strong or weak. For instance we have improved a lot also against Toga, Deep Sjeng and Shredder.

Ingo, thanks a lot for running your tournament in a timely and efficient way as usual, and sorry if the binary issue has caused you some trouble. I will try to manage it differently in the future.
I was comparinig SF4 with Houdini 3, not with SF3. Sorry if I didn't make that clear. No one is disputing that SF4 is significantly stronger than SF3 regardless of opponent or time control. We now have a three way race instead of a two way one. Congratulations!
My own testing agrees extremely well with Ingo's, in that I show that although Komodo 5.1 still has a lead over SF4 in direct play, it has shrunk from a large lead to a quite small one. Obviously we need to improve Komodo or SF will soon pass us.
One cosmetic issue: why don't you use a larger divisor when converting your internal scores to centipawns for the user? In general, you score a pawn up position as way more than a pawn plus, closer to two pawns. Multiplying your current outputs by about 0.6 or so is necessary to get the standard meaning of a plus 100 score.
beram
Posts: 1187
Joined: Wed Jan 06, 2010 3:11 pm

Re: Stockfish 4 running for the IPON

Post by beram »

lkaufman wrote:
mcostalba wrote:
lkaufman wrote:.

... My own testing agrees extremely well with Ingo's, in that I show that although Komodo 5.1 still has a lead over SF4 in direct play, it has shrunk from a large lead to a quite small one.... Obviously we need to improve Komodo or SF will soon pass us.
Well Larry I very much doubt that
or perhaps you test Komodo 51 with contempt = 0
or the SSE42
or something else you surely will come up with :roll:
lkaufman
Posts: 6284
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA
Full name: Larry Kaufman

Re: Stockfish 4 running for the IPON

Post by lkaufman »

beram wrote:
lkaufman wrote:
mcostalba wrote:
lkaufman wrote:.

... My own testing agrees extremely well with Ingo's, in that I show that although Komodo 5.1 still has a lead over SF4 in direct play, it has shrunk from a large lead to a quite small one.... Obviously we need to improve Komodo or SF will soon pass us.
Well Larry I very much doubt that
or perhaps you test Komodo 51 with contempt = 0
or the SSE42
or something else you surely will come up with :roll:
Sorry, I can't understand your point. You find it strange that my testing agrees with Ingo's?? Are you saying that his result was somehow wrong? Too high for SF or too low? Why?