mjlef wrote:beram wrote:lkaufman wrote:S.Taylor wrote:lkaufman wrote:We have released komodo 10.2 at komodochess.com. We estimate based on our testing that it is about 22 elo stronger than komodo 10.1 and about 30 elo stronger than komodo 10 at three minutes plus one second increment, on one or many threads. This is the largest elo gain we have had between versions since going to the subscription model. There were so many changes that it is very difficult to say which ones should get most of the credit. Evaluation, search, and time management were all changed significantly. Features (UCI options) are the same as in komodo 10.1. As usual, it is free to subscribers, and available at a 20% discount to those who bought komodo 9 or a later version.
What _I_ want to see is how this new komodo 10.2 does vs the SF that played in stage 3 of TCEC, OR, the version of SF that played in Graham Banks matches between sf and komodo.
It is very frustrating to have two moving targets at the same time.
e.g. let's say komodo beats ASM, how will we know if the older SF version, which Graham used, wouldn't have crushed komodo 10.2 like it did to komodo 10.1?
We should first test a moving target with a standing target before we go on to the second moving target.
I think we'll do ok relative to stockfish in tests like Frank's where each opponent plays against many other opponents, but we will not look so good in direct matches with Stockfish. The reasons for this are not all clear, although "Contempt" is a factor. It's up to each person to decide whether a direct match or a rr tournament is a better way to decide which of two engines is stronger. There are arguments both ways.
Well perhaps you did relatively OK in the past but nowadays... this was latest result by Frank. A clear win for SF 180916 against all, Komodo 10 and 10.1 included
(and besides Komodo 10.1 was performing 11 ELO points below K10 in his Test)
http://www.amateurschach.de/main/cross-tab/v438.htm
Code: Select all
1 SF 18Sep2016 BMI2 x64 C10 xxxxx 33.0 39.0 38.5 36.0 40.0 37.0 36.0 39.5 42.0 39.0 42.5 41.5 42.5 42.5
2 Komodo 10 x64 17.0 xxxxx 32.0 37.0 40.0 36.5 37.0 37.0 40.5 41.0 40.5 36.5 42.0 39.5 39.0
3 Houdini 4 STD B x64 11.0 18.0 xxxxx 26.5 26.0 27.5 32.5 29.5 30.5 34.0 32.0 33.0 28.5 32.0 38.0
4 Fire 4 x64 11.5 13.0 23.5 xxxxx 23.5 25.0 25.0 26.5 32.0 28.5 31.0 29.0 32.5 31.5 32.0
5 GullChess 3.0 BMI2 x64 14.0 10.0 24.0 26.5 xxxxx 31.0 23.5 24.5 28.0 30.5 30.0 32.0 26.5 31.5 32.5
6 Andscacs 0.872 BMI2 x64 10.0 13.5 22.5 25.0 19.0 xxxxx 26.0 25.0 26.0 23.5 26.5 31.5 25.0 28.5 31.0
7 Equinox 3.30 x64 13.0 13.0 17.5 25.0 26.5 24.0 xxxxx 24.0 23.5 26.0 29.0 27.5 28.5 32.5 30.5
8 Fizbo 1.8 BMI2 x64 14.0 13.0 20.5 23.5 25.5 25.0 26.0 xxxxx 29.5 23.5 25.5 27.0 28.0 30.5 29.0
9 Critter 1.6a x64 10.5 9.5 19.5 18.0 22.0 24.0 26.5 20.5 xxxxx 24.0 27.5 26.5 28.0 29.5 28.5
10 Fritz 15 x64 8.0 9.0 16.0 21.5 19.5 26.5 24.0 26.5 26.0 xxxxx 26.5 22.5 25.0 26.5 24.0
11 Nirvanachess 2.3 POP x64 11.0 9.5 18.0 19.0 20.0 23.5 21.0 24.5 22.5 23.5 xxxxx 26.5 27.0 28.5 22.5
12 Hannibal 1.7 x64 7.5 13.5 17.0 21.0 18.0 18.5 22.5 23.0 23.5 27.5 23.5 xxxxx 24.0 24.0 29.5
13 Chiron 3 x64 8.5 8.0 21.5 17.5 23.5 25.0 21.5 22.0 22.0 25.0 23.0 26.0 xxxxx 21.5 22.0
14 Protector 1.9.0 x64 7.5 10.5 18.0 18.5 18.5 21.5 17.5 19.5 20.5 23.5 21.5 26.0 28.5 xxxxx 24.0
15 Texel 1.06 x64 7.5 11.0 12.0 18.0 17.5 19.0 19.5 21.0 21.5 26.0 27.5 20.5 28.0 26.0 xxxxx
You are using as evident a run with a May 2016 version of Komodo (Komodo 10) versus a September 2016 version of Stockfish?
Also, as I understand, in the recent run with K 10.2, Contempt was improperly set to 15. 15 is a reasonable value for running against a larger number of mostly weaker opponents. A match against Stockfish or other Komodo versions should use a Contempt of 0.
The issue Mark is that every single tester keeps coming out with results that clearly show that Stockfish is stronger than Komodo. I personally know of a private match between the latest asmFish and Komodo 10.1 - and after about 700 games asmFish was up +90 Elo. I can confirm based on what the tester told me that it was a single thread, Nunn-match tournament at four minute\two second increment. Then you have reputable testers such as Ipman and SPCC clearly showing that Stockfish is stronger. Error margins and the amount of games played may be important; however, there are also things like common sense. The PATTERN of Stockfish's dominance has not changed for months. And that's exactly why Stockfish is now in the superfinal, and it's still undefeated in the TCEC tournament. Stockfish has gained massive strength this year. You should not just shrug off peoples' findings just because they may not be what you want to hear. The patterns speak for themselves.
Look at some of what Ipman wrote concerning his Komodo development versions tested:
28-10-2016
"Getting here Komodo 1730.00 and after first 100games he has it difficult..need for sure a good result in next 100games..
Stockfish 221016 has in mean time 300games and still clear first engine!"
23-10-2016
"Komodo 1714.17 did it not so well and came at same level as K1702 (is removed) and is equal with K10.1
Komodo 1687 is still best version but has played with Dynamism=117
It's amazing but Komodo is already 50Elo behind best Stockfish! a lot improvement is needed!"
Then most recently: 30-10-2016
"Testing Komodo 1730.00 is stopped because not so good results and Komodo 10.2 is released!"
I love Komodo man. In effect, I hope that both you and Larry keep improving it. I love its positional style - and its excellent endgame play. That said, I think even Larry himself sees that Stockfish is on top now. But really, who cares? Just keep adding Elo. That is all that should matter. Stockfish has slowed down over the last few days; it is not impossible to catch Stockfish development over time.