Stockfish 4 running for the IPON

IWB · Post by **IWB** » Fri Aug 23, 2013 7:58 am

lkaufman wrote:... I think the explanation is "contempt"...

I fully agree here. Stockfish is that good that it would score better against a larger field of opponents with a contempt ...
On the other hand that is something which fires back as soon as real contenders are on the board. Rybka is an example on that. Since some engines are there which are on one level its rating declines much more than it would be nessesary and we have these "The emporer is naked" effect

Anyhow, I test default setting as 99% of all users will run the engine tha way.

Bye
Ingo

PS: Maybe K-CCT might be better against Houdini without a contempt

Uri Blass · Post by **Uri Blass** » Fri Aug 23, 2013 8:04 am

Note that contempt is only one explanation for doing better against weak opponents.

I did not watch the games so I do not know but I can think about an alternative explanation and maybe stockfish is relatively worse in converting advantages to win regardless of contempt and I saw cases when stockfish evaluated some drawn endgame as winning.

for example see the following position and give stockfish to search

Stockfish is happy to get it against weak opponents instead of winning against them.

[d]7k/8/8/8/8/7P/6K1/7B w - - 0 1

IWB · Post by **IWB** » Fri Aug 23, 2013 8:34 am

Uri Blass wrote:Note that contempt is only one explanation for doing better against weak opponents.

I did not watch the games so I do not know but I can think about an alternative explanation and maybe stockfish is relatively worse in converting advantages to win regardless of contempt and I saw cases when stockfish evaluated some drawn endgame as winning.

for example see the following position and give stockfish to search

Stockfish is happy to get it against weak opponents instead of winning against them.

[d]7k/8/8/8/8/7P/6K1/7B w - - 0 1

Good example! I saw quite of few of exactly this endgame!

Bye
Ingo

IWB · Post by **IWB** » Fri Aug 23, 2013 9:07 am

Stockfish 4 run is finished and online: http://www.inwoba.de

Very remarkable!

Bye
Ingo

mcostalba · Post by **mcostalba** » Fri Aug 23, 2013 9:37 am

lkaufman wrote:It seems at first that we have a bit of a mystery, in that Stockfish 4 beats Houdini 3 in your test but does worse against almost every other opponent.

Please let me help you to re-read the data. This is the table of the score improvements from SF 3 to SF 4 as downloaded from Ingo site.

Code: Select all

4 Stockfish 4               : 3016  3000 (+1660,=1113,-227), 73.9 %
8 Stockfish 3               : 2977  3450 (+1568,=1490,-392), 67.0 %

Houdini 3 STD                 : 37.3 -> 52.0
Komodo CCT                    : 41.3 -> 48.3 
Critter 1.4a                  : 51.7 -> 54.3
Deep Rybka 4.1                : 55.0 -> 59.3
Gull 2.1                      : 59.0 -> 61.0
Chiron 1.5                    : 68.0 -> 74.0
Protector 1.5.0               : 71.7 -> 77.7 
Naum 4.2                      : 71.3 -> 76.7
Hannibal 1.3                  : 72.7 -> 74.3 
Deep Fritz 13 32b             : 74.7 -> 77.3
HIARCS 14 WCSC 32b            : 71.3 -> 75.7
Deep Shredder 12              : 70.0 -> 77.3
Deep Sjeng c't 2010 32b       : 75.7 -> 81.7
Spike 1.4 32b                 : 82.0 -> 83.3
spark-1.0                     : 81.7 -> 80.3  *
Deep Junior 13.3              : 80.7 -> 82.0
Booot 5.2.0                   : 78.7 -> 80.3
Quazar 0.4                    : 84.7 -> 88.7
Toga II 3.0 32b               : 82.7 -> 87.7
Zappa Mexico II               : 81.0 -> 85.7

So SF has done terribly better against H3 and very well against Komodo, but has substantially improved against all the opponents (with the exception of spark). It is also interesting to note that the improvement is more or less equally distributed across all the range, no matter if strong or weak. For instance we have improved a lot also against Toga, Deep Sjeng and Shredder.

Ingo, thanks a lot for running your tournament in a timely and efficient way as usual, and sorry if the binary issue has caused you some trouble. I will try to manage it differently in the future.

IWB · Post by **IWB** » Fri Aug 23, 2013 9:49 am

Hello Marco,

I would not emphasize on the individual results. At the end that are just 150 games ...

Thanks for considering my proposal regarding the compiles.

Bye
Ingo

Vinvin · Post by **Vinvin** » Fri Aug 23, 2013 9:53 am

mcostalba wrote:
lkaufman wrote:It seems at first that we have a bit of a mystery, in that Stockfish 4 beats Houdini 3 in your test but does worse against almost every other opponent.
Please let me help you to re-read the data. This is the table of the score improvements from SF 3 to SF 4 as downloaded from Ingo site.
...

Marco, I think you missed the point, I think Larry pointed what I saw too (but 1 word missing in the sentence to be clear) :
"It seems at first that we have a bit of a mystery, in that Stockfish 4 beats Houdini 3 in your test but does worse than Houdini 3 against almost every other opponent."

In other words : Stockfish 4 beat _ALL_ the opponents (including Houdini 3) but it's not on the top of the rank list.

Congratulation for this exceptional release !!!
Vincent

lkaufman · Post by **lkaufman** » Fri Aug 23, 2013 4:03 pm

mcostalba wrote:
lkaufman wrote:It seems at first that we have a bit of a mystery, in that Stockfish 4 beats Houdini 3 in your test but does worse against almost every other opponent.
Please let me help you to re-read the data. This is the table of the score improvements from SF 3 to SF 4 as downloaded from Ingo site.
Code: Select all
4 Stockfish 4               : 3016  3000 (+1660,=1113,-227), 73.9 %
8 Stockfish 3               : 2977  3450 (+1568,=1490,-392), 67.0 %

Houdini 3 STD                 : 37.3 -> 52.0
Komodo CCT                    : 41.3 -> 48.3 
Critter 1.4a                  : 51.7 -> 54.3
Deep Rybka 4.1                : 55.0 -> 59.3
Gull 2.1                      : 59.0 -> 61.0
Chiron 1.5                    : 68.0 -> 74.0
Protector 1.5.0               : 71.7 -> 77.7 
Naum 4.2                      : 71.3 -> 76.7
Hannibal 1.3                  : 72.7 -> 74.3 
Deep Fritz 13 32b             : 74.7 -> 77.3
HIARCS 14 WCSC 32b            : 71.3 -> 75.7
Deep Shredder 12              : 70.0 -> 77.3
Deep Sjeng c't 2010 32b       : 75.7 -> 81.7
Spike 1.4 32b                 : 82.0 -> 83.3
spark-1.0                     : 81.7 -> 80.3  *
Deep Junior 13.3              : 80.7 -> 82.0
Booot 5.2.0                   : 78.7 -> 80.3
Quazar 0.4                    : 84.7 -> 88.7
Toga II 3.0 32b               : 82.7 -> 87.7
Zappa Mexico II               : 81.0 -> 85.7
 
So SF has done terribly better against H3 and very well against Komodo, but has substantially improved against all the opponents (with the exception of spark). It is also interesting to note that the improvement is more or less equally distributed across all the range, no matter if strong or weak. For instance we have improved a lot also against Toga, Deep Sjeng and Shredder.

Ingo, thanks a lot for running your tournament in a timely and efficient way as usual, and sorry if the binary issue has caused you some trouble. I will try to manage it differently in the future.

I was comparinig SF4 with Houdini 3, not with SF3. Sorry if I didn't make that clear. No one is disputing that SF4 is significantly stronger than SF3 regardless of opponent or time control. We now have a three way race instead of a two way one. Congratulations!
My own testing agrees extremely well with Ingo's, in that I show that although Komodo 5.1 still has a lead over SF4 in direct play, it has shrunk from a large lead to a quite small one. Obviously we need to improve Komodo or SF will soon pass us.
One cosmetic issue: why don't you use a larger divisor when converting your internal scores to centipawns for the user? In general, you score a pawn up position as way more than a pawn plus, closer to two pawns. Multiplying your current outputs by about 0.6 or so is necessary to get the standard meaning of a plus 100 score.

beram · Post by **beram** » Fri Aug 23, 2013 8:21 pm

lkaufman wrote:
mcostalba wrote:
lkaufman wrote:.

... My own testing agrees extremely well with Ingo's, in that I show that although Komodo 5.1 still has a lead over SF4 in direct play, it has shrunk from a large lead to a quite small one.... Obviously we need to improve Komodo or SF will soon pass us.
Well Larry I very much doubt that
or perhaps you test Komodo 51 with contempt = 0
or the SSE42
or something else you surely will come up with

lkaufman · Post by **lkaufman** » Fri Aug 23, 2013 8:55 pm

beram wrote:
lkaufman wrote:
mcostalba wrote:
lkaufman wrote:.

... My own testing agrees extremely well with Ingo's, in that I show that although Komodo 5.1 still has a lead over SF4 in direct play, it has shrunk from a large lead to a quite small one.... Obviously we need to improve Komodo or SF will soon pass us.
Well Larry I very much doubt that
or perhaps you test Komodo 51 with contempt = 0
or the SSE42
or something else you surely will come up with
Sorry, I can't understand your point. You find it strange that my testing agrees with Ingo's?? Are you saying that his result was somehow wrong? Too high for SF or too low? Why?

Stockfish 4 running for the IPON

Re: Stockfish 4 running for the IPON

Re: Stockfish 4 running for the IPON

Re: Stockfish 4 running for the IPON

Re: Stockfish 4 running for the IPON

Re: Stockfish 4 running for the IPON

Re: Stockfish 4 running for the IPON

Re: Stockfish 4 running for the IPON

Re: Stockfish 4 running for the IPON

Re: Stockfish 4 running for the IPON

Re: Stockfish 4 running for the IPON