Why can't my stockfish beat Komodo 8.0.....like ever?

TShackel · Post by **TShackel** » Fri Apr 24, 2015 9:45 pm

Hi all,

I've done many long time control games at 90' and 30" increment. My processor is an i7 haswell quad core 2.5 ghz, hashis only 1 GB each, maybe I should increase that as my system has 16 gb of ram, but I didn't want to bog things down so I kept the hash low.

I have set up several tournements with many participants including many top engines. And stockfish 6.0 has won or shared the lead in every tournament I have played it in. However, Stockfish 6.0 has never been able to beat Komodo 8.0 on my system and Komodo 8.0 has never beat Stockfish on my system either. So it's always boring draws between them. Then what I did was started some round robins with only the top three (Houdini, Komodo, and latest stocfish dev version instead of stockfish 6.0). This is my first time using the dev version instead of stockfish 6.0. And the first tournament houdini beat both komodo and stockfish, winning the tournament. But in the second tournament, Stockfish beat Houdini in two games, and everything else was draws so Stockfish development version won.

So my question is, why can't stockfish 6.0, or even the latest development version beat Komodo 8.0 in any games, when Houdini has done it sevveral times, Gull 3.0 has beaten it several times, and even texel has beaten komodo once. I wonder if it's just because both engines are so positional and safe that no tactics materialize. Perhaps houdini is a bit more aggressive and manages to find ways to beat komodo.

Any ideas?

Thanks,

Tim.

Frank Quisinsky · Post by **Frank Quisinsky** » Fri Apr 24, 2015 9:56 pm

???

Stockfish 6 BMI2 x64 - Komodo 8 x64=1===1=======1=1=1=1===1=1101011001=11=============

15x won after 50 games
4x lost after 50 games

No chance for Komodo 8.
We can speak from a league different if we compare it directly. But if we compare with the results vs. others ... Stockfish lost more and more power with all the fast draw games.

Code: Select all

1 Stockfish 6 BMI2 x64      : 3104  1650 (+1136,=484,- 30), 83.5 %

Zappa Mexico II x64           :  50 (+ 40,= 10,-  0), 90.0 %
Chiron 2.0 x64                :  50 (+ 31,= 17,-  2), 79.0 %
Senpai 1.0 SSE42 x64          :  50 (+ 39,= 10,-  1), 88.0 %
Hiarcs 14 WCSC w32            :  50 (+ 32,= 17,-  1), 81.0 %
Quazar 0.4 x64                :  50 (+ 46,=  4,-  0), 96.0 %
Spark 1.0 x64                 :  50 (+ 36,= 14,-  0), 86.0 %
Shredder 12 x64               :  50 (+ 35,= 15,-  0), 85.0 %
Gaviota 1.0 AVX x64           :  50 (+ 38,= 12,-  0), 88.0 %
Spike 1.4 Leiden w32          :  50 (+ 33,= 17,-  0), 83.0 %
Critter 0.90 SSE4 x64         :  50 (+ 29,= 20,-  1), 78.0 %
iCE 2.0 v2240 POP x64         :  50 (+ 43,=  7,-  0), 93.0 %
Vajolet2 1.45 POP x64         :  50 (+ 39,= 10,-  1), 88.0 %
SmarThink 1.70 SSE3 x64       :  50 (+ 34,= 16,-  0), 84.0 %
Protector 1.7.0 x64           :  50 (+ 33,= 17,-  0), 83.0 %
Komodo 8 x64                  :  50 (+ 15,= 31,-  4), 61.0 %
GullChess 3.0 BMI2 x64        :  50 (+ 15,= 31,-  4), 61.0 %
Junior 13.3.00 x64            :  50 (+ 37,= 12,-  1), 86.0 %
Fire 4 x64                    :  50 (+ 20,= 25,-  5), 65.0 %
Equinox 3.30 x64              :  50 (+ 23,= 25,-  2), 71.0 %
DiscoCheck 5.2.1 x64          :  50 (+ 40,= 10,-  0), 90.0 %
Deuterium 14.3.34.130 POP x64 :  50 (+ 39,= 11,-  0), 89.0 %
Fizbo 1.3.1 x64               :  50 (+ 41,=  9,-  0), 91.0 %
EXchess 7.51b x64             :  50 (+ 41,=  8,-  1), 90.0 %
Nirvanachess 2.0a x64         :  50 (+ 31,= 19,-  0), 81.0 %
Naum 4.6 x64                  :  50 (+ 37,= 12,-  1), 86.0 %
Andscacs 0.72 POP x64         :  50 (+ 40,=  9,-  1), 89.0 %
Cheng4 0.38 x64               :  50 (+ 38,= 12,-  0), 88.0 %
Texel 1.05 x64                :  50 (+ 30,= 18,-  2), 78.0 %
Atlas 3.80 x64                :  50 (+ 43,=  7,-  0), 93.0 %
Sting SF 4.8.4 x64            :  50 (+ 29,= 19,-  2), 77.0 %
Arasan 17.5 POP x64           :  50 (+ 40,=  9,-  1), 89.0 %
Hannibal 1.5 x64              :  50 (+ 30,= 20,-  0), 80.0 %
Sjeng c't 2010 w32            :  50 (+ 39,= 11,-  0), 89.0 %

With a handfull games nothing to see ..
Also allways the same.

Komodo 8 and Stockfish 6 are quit different and a lot of 1-0 and 0-1 games produced. Very easy Stockfish 6 is much stronger with lesser pieces on board as Komodo 8. Could be interesting to see where Komodo 9 get + 40 Elo.

Have a look here ...
http://www.amateurschach.de/main/060-29 ... me_s6c.htm

2x more fast draw games for Stockfish after opening book moves. Stockfish lost the power and in a Rating list an Advantage for Komodo.

Komodo is extremly good optimated in the beginning of games. You can search games Komodo will give draw ... the produced fast draws from Komodo are absolutely OK. Can be see if you replay the games.

Very tricky ... same do Houdini!
With such a trick also Houdini is around 15-20 Elo stronger.

Best
Frank

TShackel · Post by **TShackel** » Fri Apr 24, 2015 11:45 pm

Hi,

Well, at least you produced several wins from stockfish. However, I've produced none. Keep in mind though that 19 total wins out of 50 games means 31 games are drawn which is quite a few. That's probably why i haven't seen any wins yet. Unlike you, I did not put them in a 50 game match. What I do is a round robin of three engines, houdini, komodo, and stockfish, instead of direct matches between stockfish and komodo. It gets a little boring to have long matches with the same engines for me. For rating lists, chessbase tells me it's better to have tournaments with many engines rather than matches of only two engines. But, most rating lists I've seen do matches between engines and somehow figure out the ratings based upon that. So which way is better? Could someone tell me how to figure out rating lists with matches alone, and how to do that in Fritz 12 Chessbase interface? That would help if you could explain why most rating lists do matches instead of tournaments, and how to formulate rating lists in chessbase based upon that.

Thanks.

Sincerely,

Tim.

TShackel · Post by **TShackel** » Fri Apr 24, 2015 11:58 pm

Frank,

By the way, what was the time control for your 50 game matches?

Tim.

Frank Quisinsky · Post by **Frank Quisinsky** » Sat Apr 25, 2015 12:14 am

Indeed, time control can be a reason!!

On the other hand, both engines are to different in style. Often I am thinking ... if I analyze for me interesting positions ... that Komodo and Stockfish produced the same results. Fact is that most of positions, for me are interesting, for chess programs are to boring and the results are clear and the same.

Not easy to find out more with the strength we have.
With stats it's easy to see the different styles, not with analysing positions.

Time control is 40 / 10 and again 40 / 10 ... around 45-minutes per game on 4.3Ghz. Have a look in my conditions. More isn't possible if I will test many engines with two systems only. I think two systems is enough for producing good ratings. And the time control I am using is for a Rating list very high.

Best
Frank

TShackel · Post by **TShackel** » Sat Apr 25, 2015 12:26 am

Frank Quisinsky wrote:Indeed, time control can be a reason!!

On the other hand, both engines are to different in style. Often I am thinking ... if I analyze for me interesting positions ... that Komodo and Stockfish produced the same results. Fact is that most of positions, for me are interesting, for chess programs are to boring and the results are clear and the same.

Not easy to find out more with the strength we have.
With stats it's easy to see the different styles, not with analysing positions.

Time control is 40 / 10 and again 40 / 10 ... around 45-minutes per game on 4.3Ghz. Have a look in my conditions. More isn't possible if I will test many engines with two systems only. I think two systems is enough for producing good ratings. And the time control I am using is for a Rating list very high.

Best
Frank

40 moves in 10 minutes, or 45 minutes or so per game is probably why theres more decisive results. I do longer time control, which is more like 3 hours per game, sometimes shorter if there's a shot draw, and sometimes longer. My longer time control might be why stockfish is having trouble beating komodo. However, I use the same time control with gull and houdini and they beat komodo severall times.

Can somebody tell me why most rating lists use matches with different opponents rather than tournaments with many engines and developing the rating list based upon that. And secondly, how do I figure out ratings in chessbase gui if i do only matches between two engines.

Thanks for your help.

Sincerely,

Tim.

Frank Quisinsky · Post by **Frank Quisinsky** » Sat Apr 25, 2015 12:39 am

Hi Tim,

for myself to ChessBase GUI.
Is for me to buggy without good support.
I gave up with ChessBase for a longer time.
Can used my time better with stronger software.

And to your other question:
From my Point of view I am playing tournaments. Each test I do goes in a big 10.500 games tournament table (each one vs. each one). Maybe I don't understand your question?

And to the time control:
With longer time controls the draw Quote is higher. That isn't a secret. But the draw Quote goes very slow higher. Means if I would double the time control perhaps 2 games lesser with 1:0 or 0:1 will be the average in a 50 games matches.

Can be easy simulate with all the results we produced all the years. Only if the style of engines is quit different the draw quote will not go higher with more time. If the style is equal the draw quote will be higher with more time. That is allways the same.

Not generally ... have a look what DiscoCheck do. DiscoCheck avoids draws, or SmarThink too in late middlegames. Vajolet like it to give very fast a game draws ... Protector is a draw searcher in Transposition into endgame.

With all the different styles it's not easy to try out a rule. Such a rule isn't possible if you compare only three engines with Houdini, Stockfish and Komodo. Only three styles ... will give you nothing as ... the Ratio among themselves.

And to Houdini:
Houdini is nothing ... Version 1 was a 1:1 copy from Robolite and in this case not very interesting for myself. I have other interests as to look or to power such things. I don't missed the engine in my Rating list, each 1.500 Elo engine is more interesting for me.

Best
Frank

mjlef · Post by **mjlef** » Sun Apr 26, 2015 5:11 am

Tim,

Longer time controls lead to more draws. They also lead to much better play. One example it the last TCEC Superfinal available here:

http://tcec.chessdom.com/archive.php

Out of 64 games in the Superfinal between Komodo and Stockfish, there were only 11 wins out of 64 games. And this was with experts attempting to select opening the two programs disagreed about, in hopes of having less draws. Of course he same thing happens with human chess play near the top with long time controls.

It can be frustrating. When writing a strong engine, it is very hard to get a lot of long games in. And you need a lot of games to tell which engine is best. And the stronger the engines the more chance of a draw so it means you need even more games. But every game you play and let people know about, the closer to the truth we can get. I really appreciate it when people run lots f games, especially long ones.

We typically play a lot of games at pretty fast time controls, then less at a longer time controls and even less at even longer time controls. The fast game lets us establish a pretty accurate elo at tha one time control, and we use the other games to get some idea of the scaling. With more search time and depth, how is the elo difference changing? But fewer games introduce more uncertainty In general, we think Komodo scales better (gets stronger when searching more time/depth) than other programs, although I understand the Stockfish team has made some recent changes that could mean they scale better not than in the past. The next TCEC will be very interesting.

shrapnel · Post by **shrapnel** » Sun Apr 26, 2015 5:26 am

mjlef wrote: I understand the Stockfish team has made some recent changes that could mean they scale better not than in the past. The next TCEC will be very interesting.

Yes, I have confirmed from my online games that latest Stockfish Dev. Versions are DEFINITELY scaling better with Time AND with increasing strength of Hardware , unlike earlier Stockfish versions.
Komodo 9 definitely has its work cut out if it wants to overtake Stockfish.

TShackel · Post by **TShackel** » Mon Apr 27, 2015 3:51 am

mjlef wrote:Tim,

Longer time controls lead to more draws. They also lead to much better play. One example it the last TCEC Superfinal available here:

http://tcec.chessdom.com/archive.php

Out of 64 games in the Superfinal between Komodo and Stockfish, there were only 11 wins out of 64 games. And this was with experts attempting to select opening the two programs disagreed about, in hopes of having less draws. Of course he same thing happens with human chess play near the top with long time controls.

It can be frustrating. When writing a strong engine, it is very hard to get a lot of long games in. And you need a lot of games to tell which engine is best. And the stronger the engines the more chance of a draw so it means you need even more games. But every game you play and let people know about, the closer to the truth we can get. I really appreciate it when people run lots f games, especially long ones.

We typically play a lot of games at pretty fast time controls, then less at a longer time controls and even less at even longer time controls. The fast game lets us establish a pretty accurate elo at tha one time control, and we use the other games to get some idea of the scaling. With more search time and depth, how is the elo difference changing? But fewer games introduce more uncertainty In general, we think Komodo scales better (gets stronger when searching more time/depth) than other programs, although I understand the Stockfish team has made some recent changes that could mean they scale better not than in the past. The next TCEC will be very interesting.

Thanks for your reply Mark. I agree long time control unfortunately means more draws. To get the maximum number of games for a test between these two engines, I wonder if it's better to do a direct match between the two engines. Because, sometimes I get bored if two engines play againt each other for too long, and so I usually like to include other engines in round robin tournaments etc. But maybe I need patience in a match format to really determine which engine is better.

Sincerely,

Tim.

P.S. - Thanks for the subscription service for Komodo 9.0 and beyond.

Why can't my stockfish beat Komodo 8.0.....like ever?

Why can't my stockfish beat Komodo 8.0.....like ever?

Re: Why can't my stockfish beat Komodo 8.0.....like ever?

Re: Why can't my stockfish beat Komodo 8.0.....like ever?

Re: Why can't my stockfish beat Komodo 8.0.....like ever?

Re: Why can't my stockfish beat Komodo 8.0.....like ever?

Re: Why can't my stockfish beat Komodo 8.0.....like ever?

Re: Why can't my stockfish beat Komodo 8.0.....like ever?

Re: Why can't my stockfish beat Komodo 8.0.....like ever?

Re: Why can't my stockfish beat Komodo 8.0.....like ever?

Re: Why can't my stockfish beat Komodo 8.0.....like ever?