Testing Ivanhoe B46fb at 1 hour/game.

Discussion of computer chess matches and engine tournaments.

Moderators: hgm, Rebel, chrisw

Tomcass
Posts: 786
Joined: Sun Apr 16, 2006 9:09 pm

Testing Ivanhoe B46fb at 1 hour/game.

Post by Tomcass »

TEST 1: 60 GAMES
GUI Fritz 12
Book Masterbook 3.1
60 minutes/game
Intel i7 9750 3.33 Mhz. 12 MEM RAM
4 real cores. No HT.
Ponder: Off.

Ivanhoe B46fb x64(x4)LP - Critter 1.2 64-bit SSE4(x4) 9.5 - 10.5 +2/=15/-3 47.50%
Ivanhoe B46fb x64(x4)LP - Deep Rybka 4.1 SSE42 x64(x4) 12.5 - 7.5 +5/=15/-0 62.50%
Ivanhoe B46fb x64(x4)LP - Houdini 2.0c Pro x64(x4) 10.0 - 10.0 +3/=14/-3 50.00%

http://www.megaupload.com/?d=L4H8CALL

TEST 2: 60 GAMES
GUI Fritz 12
Book Perfect 2012
60 minutes/game
Intel X980 3.33 Mhz. 24 MEM RAM
6 real cores. No HT.
Ponder: Off

Ivanhoe B46fb x64(x6)PPLP - Critter 1.2 64-bit SSE4x6bo 9.5 - 10.5 +2/=15/-3 47.50%
Ivanhoe B46fb x64(x6)PPLP - Deep Rybka 4.1 SSE42 x64 (x6) 11.0 - 9.0 +5/=12/-3 55.00%
Ivanhoe B46fb x64(x6)PPLP - Houdini 2.0c Pro x64(x6) 7.5 - 12.5 +2/=11/-7 37.50%

http://www.megaupload.com/?d=9HUI7ONO

JOINT RESULTS 120 games:

Ivanhoe B46fb x64(x4)LP - Critter 1.2 64-bit SSE4(x4) 19.0 -21.0 +4/=30/-6 47.50%
Ivanhoe B46fb x64(x4)LP - Deep Rybka 4.1 SSE42 x64(x4) 23.5 - 16.5 +10/=27/-3 57.75%
Ivanhoe B46fb x64(x4)LP - Houdini 2.0c Pro x64(x4) 17.5 – 22.5 +5/=25/-10 43.75%

Regards from Barcelona.

Tom.
lkaufman
Posts: 5960
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA

Re: Testing Ivanhoe B46fb at 1 hour/game.

Post by lkaufman »

I'm always glad to see engines tested at longer time controls, but I would just like to make one point about it: when you test closely related engines like the four you are now testing, there is no reason to expect a significant difference in "scaling"; how much each benefits from additional time. It would be much more interesting to see the results of UNRELATED engines at long time controls; in this instance it would mean testing Ivanhoe against Stockfish and Komodo rather than against Ippo-inspired programs or Rybka. Then at least there is a likelihood that the results may differ sharply from blitz results.
User avatar
geots
Posts: 4790
Joined: Sat Mar 11, 2006 12:42 am

Re: Testing Ivanhoe B46fb at 1 hour/game.

Post by geots »

lkaufman wrote:I'm always glad to see engines tested at longer time controls, but I would just like to make one point about it: when you test closely related engines like the four you are now testing, there is no reason to expect a significant difference in "scaling"; how much each benefits from additional time. It would be much more interesting to see the results of UNRELATED engines at long time controls; in this instance it would mean testing Ivanhoe against Stockfish and Komodo rather than against Ippo-inspired programs or Rybka. Then at least there is a likelihood that the results may differ sharply from blitz results.


I fully agree with you Larry, about testing engines in a faily ag. each other. Don't do it myself, and have never encouraged anyone to do it. However, I take issue with you that either the "IvanHoes" are in the Rybka family or "Rybka" is in the IvanHoe family. There may be a lot of similarities, but I am not writing this to be responded to with a page full of "similar" or "identical" evidence. To go any further in this issue about not testing ag. Rybka is fine if you believe that. But all that does is restart the Rybka issue all over again, and the claims that "IvanHoe" is too similar to Rybka to be clean. As I said, pages upon pages of similar code between them I don't have time for and just am not interested in. I respect your feelings, as I keep testing each "clean" IvanHoe ag. Rybka 4.1 as the baseline. And I do not for one minute think Rybka can be included in a "family of engines that are too similar to test". I have no problem whatsoever with and respect also- your feelings.


Best,

george


PS: Of course I will not be testing them only ag. Rybka. Rybka remains only one of a number that will get their chance.
lkaufman
Posts: 5960
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA

Re: Testing Ivanhoe B46fb at 1 hour/game.

Post by lkaufman »

geots wrote:
lkaufman wrote:I'm always glad to see engines tested at longer time controls, but I would just like to make one point about it: when you test closely related engines like the four you are now testing, there is no reason to expect a significant difference in "scaling"; how much each benefits from additional time. It would be much more interesting to see the results of UNRELATED engines at long time controls; in this instance it would mean testing Ivanhoe against Stockfish and Komodo rather than against Ippo-inspired programs or Rybka. Then at least there is a likelihood that the results may differ sharply from blitz results.


I fully agree with you Larry, about testing engines in a faily ag. each other. Don't do it myself, and have never encouraged anyone to do it. However, I take issue with you that either the "IvanHoes" are in the Rybka family or "Rybka" is in the IvanHoe family. There may be a lot of similarities, but I am not writing this to be responded to with a page full of "similar" or "identical" evidence. To go any further in this issue about not testing ag. Rybka is fine if you believe that. But all that does is restart the Rybka issue all over again, and the claims that "IvanHoe" is too similar to Rybka to be clean. As I said, pages upon pages of similar code between them I don't have time for and just am not interested in. I respect your feelings, as I keep testing each "clean" IvanHoe ag. Rybka 4.1 as the baseline. And I do not for one minute think Rybka can be included in a "family of engines that are too similar to test". I have no problem whatsoever with and respect also- your feelings.


Best,

george


PS: Of course I will not be testing them only ag. Rybka. Rybka remains only one of a number that will get their chance.

Certainly the similarity of Ivanhoe, Houdini, and Critter is much greater than the similarity of any of them to Rybka. So I have no particular objection to someone testing Ivanhoe against Stockfish, Komodo, and Rybka. But it is also beyond doubt that the similarity of the Ippos to Rybka is much greater than the similarity of either to Stockfish or Komodo, so I would say that someone wanting to evaluate Ivanhoe at long time controls should give priority to testing against Stockfish and Komodo, and then add Rybka next, followed by Critter, and only then Houdini. In other words, start with the most dissimilar engine and work your way towards those with progressively greater similarity to the engine to be tested. Of course this requires knowing enough about the engines to know how similar they are to each other.
Similarly someone wanting to test a new Komodo at long time controls should start with the most dissimilar engines, which would be all the ippo-related ones, then Rybka, and then Stockfish, since Komodo has more in common with Stockfish than with the others.
User avatar
geots
Posts: 4790
Joined: Sat Mar 11, 2006 12:42 am

Re: Testing Ivanhoe B46fb at 1 hour/game.

Post by geots »

lkaufman wrote:
geots wrote:
lkaufman wrote:I'm always glad to see engines tested at longer time controls, but I would just like to make one point about it: when you test closely related engines like the four you are now testing, there is no reason to expect a significant difference in "scaling"; how much each benefits from additional time. It would be much more interesting to see the results of UNRELATED engines at long time controls; in this instance it would mean testing Ivanhoe against Stockfish and Komodo rather than against Ippo-inspired programs or Rybka. Then at least there is a likelihood that the results may differ sharply from blitz results.


I fully agree with you Larry, about testing engines in a faily ag. each other. Don't do it myself, and have never encouraged anyone to do it. However, I take issue with you that either the "IvanHoes" are in the Rybka family or "Rybka" is in the IvanHoe family. There may be a lot of similarities, but I am not writing this to be responded to with a page full of "similar" or "identical" evidence. To go any further in this issue about not testing ag. Rybka is fine if you believe that. But all that does is restart the Rybka issue all over again, and the claims that "IvanHoe" is too similar to Rybka to be clean. As I said, pages upon pages of similar code between them I don't have time for and just am not interested in. I respect your feelings, as I keep testing each "clean" IvanHoe ag. Rybka 4.1 as the baseline. And I do not for one minute think Rybka can be included in a "family of engines that are too similar to test". I have no problem whatsoever with and respect also- your feelings.


Best,

george


PS: Of course I will not be testing them only ag. Rybka. Rybka remains only one of a number that will get their chance.

Certainly the similarity of Ivanhoe, Houdini, and Critter is much greater than the similarity of any of them to Rybka. So I have no particular objection to someone testing Ivanhoe against Stockfish, Komodo, and Rybka. But it is also beyond doubt that the similarity of the Ippos to Rybka is much greater than the similarity of either to Stockfish or Komodo, so I would say that someone wanting to evaluate Ivanhoe at long time controls should give priority to testing against Stockfish and Komodo, and then add Rybka next, followed by Critter, and only then Houdini. In other words, start with the most dissimilar engine and work your way towards those with progressively greater similarity to the engine to be tested. Of course this requires knowing enough about the engines to know how similar they are to each other.
Similarly someone wanting to test a new Komodo at long time controls should start with the most dissimilar engines, which would be all the ippo-related ones, then Rybka, and then Stockfish, since Komodo has more in common with Stockfish than with the others.


Larry, when you put it this way, or better, use this explanation- I cannot find one word to disagree with. And if I was going to only test ag. Rybka and none others- I agree that would not be the best of choices. But with the variables of 1. time limits 2. number of games 3. common opponents chosen- not any of my outcomes are written in stone anyway. (Unless you wanted to back up to where IvanHoe T52E that I will be posting on shortly beat Fritz 13- ignoring draws- 40 - 18 in a 100 game match. That outcome just aint gonna change much by shuffling variables.) I am just trying to give people who might be interested in engines they haven't seen much of a bit of an idea of their strength- while enjoying the testing at the same time. It wasn't so much the one engine IvanHoe, but rather the difference in some of the compiles when you have so many to choose from. But when the dust settles, yes- I have to agree with everything you said.


Best,

george
MM
Posts: 766
Joined: Sun Oct 16, 2011 11:25 am

Re: Testing Ivanhoe B46fb at 1 hour/game.

Post by MM »

lkaufman wrote:
geots wrote:
lkaufman wrote:I'm always glad to see engines tested at longer time controls, but I would just like to make one point about it: when you test closely related engines like the four you are now testing, there is no reason to expect a significant difference in "scaling"; how much each benefits from additional time. It would be much more interesting to see the results of UNRELATED engines at long time controls; in this instance it would mean testing Ivanhoe against Stockfish and Komodo rather than against Ippo-inspired programs or Rybka. Then at least there is a likelihood that the results may differ sharply from blitz results.


I fully agree with you Larry, about testing engines in a faily ag. each other. Don't do it myself, and have never encouraged anyone to do it. However, I take issue with you that either the "IvanHoes" are in the Rybka family or "Rybka" is in the IvanHoe family. There may be a lot of similarities, but I am not writing this to be responded to with a page full of "similar" or "identical" evidence. To go any further in this issue about not testing ag. Rybka is fine if you believe that. But all that does is restart the Rybka issue all over again, and the claims that "IvanHoe" is too similar to Rybka to be clean. As I said, pages upon pages of similar code between them I don't have time for and just am not interested in. I respect your feelings, as I keep testing each "clean" IvanHoe ag. Rybka 4.1 as the baseline. And I do not for one minute think Rybka can be included in a "family of engines that are too similar to test". I have no problem whatsoever with and respect also- your feelings.


Best,

george


PS: Of course I will not be testing them only ag. Rybka. Rybka remains only one of a number that will get their chance.

Certainly the similarity of Ivanhoe, Houdini, and Critter is much greater than the similarity of any of them to Rybka. So I have no particular objection to someone testing Ivanhoe against Stockfish, Komodo, and Rybka. But it is also beyond doubt that the similarity of the Ippos to Rybka is much greater than the similarity of either to Stockfish or Komodo, so I would say that someone wanting to evaluate Ivanhoe at long time controls should give priority to testing against Stockfish and Komodo, and then add Rybka next, followed by Critter, and only then Houdini. In other words, start with the most dissimilar engine and work your way towards those with progressively greater similarity to the engine to be tested. Of course this requires knowing enough about the engines to know how similar they are to each other.
Similarly someone wanting to test a new Komodo at long time controls should start with the most dissimilar engines, which would be all the ippo-related ones, then Rybka, and then Stockfish, since Komodo has more in common with Stockfish than with the others.
Hello,
if you could insert into Komodo your logic and your expertness to explain it, it would be certainly the n.1 engine. Thanks.

Regards
MM