LS-ratinglist: Hannibal 1.3 and Protector 1.5

Discussion of computer chess matches and engine tournaments.

Moderator: Ras

User avatar
pohl4711
Posts: 2889
Joined: Sat Sep 03, 2011 7:25 am
Location: Berlin, Germany
Full name: Stefan Pohl

LS-ratinglist: Hannibal 1.3 and Protector 1.5

Post by pohl4711 »

Hi,

the results of Hannibal 1.3 and Prospector 1.5 are now online in a new section called "prospective freeware engines". These engines are too weak for the LS-ratinglist (score below 30%). Because of that, their games are not included in the LS-gamebase and the Elo-results of the opponent engines (in order to avoid Elo-distortions). You find the result below the main LS-ratinglist on the Home-screen.

http://ls-ratinglist.beepworld.de/

(Perhaps you have to clear your browsercache or reload the website)

Stefan
lucasart
Posts: 3243
Joined: Mon May 31, 2010 1:29 pm
Full name: lucasart

Re: LS-ratinglist: Hannibal 1.3 and Protector 1.5

Post by lucasart »

pohl4711 wrote:Hi,

the results of Hannibal 1.3 and Prospector 1.5 are now online in a new section called "prospective freeware engines". These engines are too weak for the LS-ratinglist (score below 30%). Because of that, their games are not included in the LS-gamebase and the Elo-results of the opponent engines (in order to avoid Elo-distortions). You find the result below the main LS-ratinglist on the Home-screen.

http://ls-ratinglist.beepworld.de/

(Perhaps you have to clear your browsercache or reload the website)

Stefan
Thanks for this list, it's interesting. And I like the testing conditions, especially the time control.

But the main problem of your list is that it is plagued with incestuous testing, which distorts the rating. Almost all of the engines you have tested are more or less based on Robbolito.

=> different versions of the same Robbobased-engine
1 Houdini 3 x64 3153 3 4 29000 69% 3013 40% (€)
2 Robodini 1.1 x64 3145 5 5 11000 64% 3041 43% (sc)(sp)
3 Houdini 3 tactical 3118 5 5 10000 63% 3025 44% (€)(set)
4 Houdini 2.0c x64 3102 5 5 11000 62% 3012 42% (€)
5 Houdini 1.5a x64 3082 5 5 10000 59% 3014 44% (best mp-freeware)

=> Strelka 5.0 is a reverse engineered Rybka 3 (from the horse's mouth Yuri Osipov). Don't know about 5.5, but I'm guessing it's not very far from Rybka 4.1
6 Strelka 5.5 x64 3068 3 3 33000 56% 3023 53% (sc)
18 Rybka 4.1 x64s 3009 4 4 33000 48% 3025 47% (€)

=> Critter: finally a nice and original engine that is not based on Robbolito, but again two versions of the same engine distort the rating list
7 Critter 1.6a x64 3067 3 3 32000 57% 3022 54%
8 Critter 1.4a x64s 3060 5 5 10000 53% 3038 53%

=> Original engine appearing only once (finally!)
9 Komodo 5 x64 3058 4 4 33000 55% 3023 44% (€)(sc)

=> IvanHoe and all its clones. It's interesting to note that they are *all* inferior to the original, which is no surprise to me...
10 Ivanhoe 50kQ x64s 3038 4 4 19000 52% 3021 57% (best open source)(cp)
11 Ivanhoe 46eQi x64s 3034 5 5 14000 49% 3039 58%
12 Robbolito 0.21Q x64s 3030 3 3 27000 51% 3026 59%
13 Fire trap120925 x64s 3029 5 5 11000 48% 3041 59%
14 Bouquet 1.6 x64s 3022 4 4 23000 50% 3024 58%
15 Ivanhoe 46h x64 3018 4 4 18000 48% 3034 54%
16 Bouquet 1.5 x64s 3017 4 4 15000 47% 3036 57%
17 Robbolito 0.10 x64s 3016 4 4 15000 48% 3030 55%
19 Vitruvius 1.11C x64s 3008 4 4 16000 46% 3038 56% (€)
20 PanChess 00.250 x64 3005 4 4 18000 46% 3031 54%
21 Akkad 0.52b x64s 3003 4 4 19000 47% 3025 55%
22 Robbolito 0.085g3 x64 3000 3 3 33000 46% 3025 56% (sc)(Ippolit 2009)
23 PanChess 00.240 x64 2999 5 5 11000 44% 3040 52%
25 Saros 3.3b x64s 2996 5 5 11000 44% 3040 54%
28 Saros 3.0 x64 2984 4 4 18000 43% 3035 48%
31 Saros 3.5a Distro x64s 2971 5 5 11000 40% 3041 50%
33 Vitruvius 1.11H x64s 2952 5 5 10000 37% 3045 45% (€)
34 Bouquet 1.4 x64s 2926 5 5 13000 34% 3042 42%

=> 2 versions of the same engine. Besides, Gull II, even though it's an original engine, replicates the eval function of IvanHoe (from the horse's mouth, again). So in terms of game play, it's going to be very similar to IvanHoe, even though the author deserves much more recognition than the cloners from the above list (at least he wrote an engine from scratch with an SMP implementation, which is no trivial task)
24 Gull 2 x64 2997 5 5 11000 44% 3040 54%
30 Gull 2 beta2 x64 2983 5 5 11000 42% 3035 53%

=> Original engine? I don't know, and the source code is closed. At least the similarity dendogram seem to show that it's very close to the Robbo family. At best it's like Gull (original engine from scratch, but eval copied from IvanHoe), and at worst it's a butchered IvanHoe with bugs added (see list of clones above all weaker than original)
32 Black Mamba 1.2b x64 2961 5 5 11000 39% 3040 49% (sc)(sp)

=> Stockfish: probably the only honest engine out there. But again, two versions distort the rating list.
26 Stockfish 2.3.1 x64s 2996 3 4 25000 46% 3026 46%
27 Stockfish 2.2.2 x64s 2992 5 5 11000 45% 3030 44%

29 Strong Chess 1v10 x64 2983 5 5 11000 42% 3041 56%

The only way to remove the version biais problem is to remove all games by previous versions when a new version is better. That means that the above list would be shrunk to about 5 engines only...

It says a lot about the status of computer chess today, doesn't it :lol:
Theory and practice sometimes clash. And when that happens, theory loses. Every single time.
User avatar
pohl4711
Posts: 2889
Joined: Sat Sep 03, 2011 7:25 am
Location: Berlin, Germany
Full name: Stefan Pohl

Re: LS-ratinglist: Hannibal 1.3 and Protector 1.5

Post by pohl4711 »

Each engine plays only against the 11 engines of the top10-tournament (without its predecessor, if it is in the top10tournament), so - for example - Stockfish plays never against Stockfish. For a distortion-free ratinglist just take a look on the LS-top10-tournament (all games of previous engine-versions are deleted in there)...

The problem, that many top-engines are based on Ippolit, is smaller than most people believe. The Ippolit-source is from 2009 and after more than 3 years, the engine-development fom that basement is very different. Thats what I call evolutionary engine-development. And thats the situation in computerchess. I dont want to ignore/boycott this situation in my testwork.

Stefan
lucasart
Posts: 3243
Joined: Mon May 31, 2010 1:29 pm
Full name: lucasart

Re: LS-ratinglist: Hannibal 1.3 and Protector 1.5

Post by lucasart »

pohl4711 wrote:Each engine plays only against the 11 engines of the top10-tournament (without its predecessor, if it is in the top10tournament), so - for example - Stockfish plays never against Stockfish. For a distortion-free ratinglist just take a look on the LS-top10-tournament (all games of previous engine-versions are deleted in there)...
OK, so that partially solves the problem of rating distortion. Although it is still there because of different versions of Robbolito that call themselves different engines...
pohl4711 wrote: The problem, that many top-engines are based on Ippolit, is smaller than most people believe. The Ippolit-source is from 2009 and after more than 3 years, the engine-development fom that basement is very different. Thats what I call evolutionary engine-development. And thats the situation in computerchess. I dont want to ignore/boycott this situation in my testwork.
Oh, no. I certainly don't advocate censorship. I was one of the most vocal against this censorship in the Rybka/Ippolit days. But only the best IvanHoe should remain, otherwise the whole list is significantly distorted.
Theory and practice sometimes clash. And when that happens, theory loses. Every single time.