Houdini 2.0c Pro x64 Vs Critter 1.4 64bit SSE4 300 games.

mwyoung · Post by **mwyoung** » Sat Dec 31, 2011 8:10 am

My first results for Critter 1.4.

i7 @ 2.1 Ghz 4 core.
TC 4m+1s
GTB = 5 man installed to solid state drive.
Games played = 300
Same generic book to 12 moves. Each side played same
opening as white and black.
Ht=2048

1 Houdini 2.0c Pro x64 (4c) +29 +83/=159/-58 54.17% 162.5/300
2 Critter 1.4 64-bit SSE4 (4c) -29 +58/=159/-83 45.83% 137.5/300

Hugo · Post by **Hugo** » Sat Dec 31, 2011 11:41 am

Hello Mark

nice comparison.
verry close to my own results. Of course only 60 games but long TC and ponder ON.
1 Houdini 2.0c x64 T4 (+23) +14/-10/=36 53.33% 32.0/60
2 Critter 1.4 64-bit SSE4 T4(-22) +10/-14/=36 46.67% 28.0/60

regards, Clemens

mwyoung · Post by **mwyoung** » Sat Dec 31, 2011 12:24 pm

Hugo wrote:Hello Mark

nice comparison.
verry close to my own results. Of course only 60 games but long TC and ponder ON.
1 Houdini 2.0c x64 T4 (+23) +14/-10/=36 53.33% 32.0/60
2 Critter 1.4 64-bit SSE4 T4(-22) +10/-14/=36 46.67% 28.0/60

regards, Clemens

To early to tell for sure on the rating of Critter 1.4, but my guess is Critter 1.4 will be almost equal to Houdini 1.5a....

This will be my next test Critter 1.4 Vs Houdni 1.5a.

MM · Post by MM » Sat Dec 31, 2011 12:53 pm

mwyoung wrote:
Hugo wrote:Hello Mark

nice comparison.
verry close to my own results. Of course only 60 games but long TC and ponder ON.
1 Houdini 2.0c x64 T4 (+23) +14/-10/=36 53.33% 32.0/60
2 Critter 1.4 64-bit SSE4 T4(-22) +10/-14/=36 46.67% 28.0/60

regards, Clemens
To early to tell for sure on the rating of Critter 1.4,

I agree, 60 games mean almost nothing. With 60 games you have only a general idea of the strenght of an engine.
To have a clear evaluation of the elo of the engine you need around 2000 games in my opinion.

Regards

mwyoung · Post by **mwyoung** » Sat Dec 31, 2011 5:37 pm

MM wrote:
mwyoung wrote:
Hugo wrote:Hello Mark

nice comparison.
verry close to my own results. Of course only 60 games but long TC and ponder ON.
1 Houdini 2.0c x64 T4 (+23) +14/-10/=36 53.33% 32.0/60
2 Critter 1.4 64-bit SSE4 T4(-22) +10/-14/=36 46.67% 28.0/60

regards, Clemens
To early to tell for sure on the rating of Critter 1.4,
I agree, 60 games mean almost nothing. With 60 games you have only a general idea of the strenght of an engine.
To have a clear evaluation of the elo of the engine you need around 2000 games in my opinion.

Regards

I have to smile here, but that is ridiculous, 2000 games. I have been testing since the 80's. And we did a pretty good job finding the correct ratings and rankings back then. And we did that all by playing 100% of the games by hand.

You only need that many games to find very small elo's between programs. This is done for fine tuning engine eval terms, not rating chess computers.

MM · Post by MM » Sat Dec 31, 2011 6:26 pm

mwyoung wrote:
MM wrote:
mwyoung wrote:
Hugo wrote:Hello Mark

nice comparison.
verry close to my own results. Of course only 60 games but long TC and ponder ON.
1 Houdini 2.0c x64 T4 (+23) +14/-10/=36 53.33% 32.0/60
2 Critter 1.4 64-bit SSE4 T4(-22) +10/-14/=36 46.67% 28.0/60

regards, Clemens
To early to tell for sure on the rating of Critter 1.4,
I agree, 60 games mean almost nothing. With 60 games you have only a general idea of the strenght of an engine.
To have a clear evaluation of the elo of the engine you need around 2000 games in my opinion.

Regards
I have to smile here, but that is ridiculous, 2000 games. I have been testing since the 80's. And we did a pretty good job finding the correct ratings and rankings back then. And we did that all by playing 100% of the games by hand.

You only need that many games to find very small elo's between programs. This is done for fine tuning engine eval terms, not rating chess computers.

So, how many games, would you need, in your opinion, to have
a precise rating in elo? If 2000 makes you smile, just say a number that you consider enough please.

Regards

mwyoung · Post by **mwyoung** » Sat Dec 31, 2011 6:56 pm

MM wrote:
mwyoung wrote:
MM wrote:
mwyoung wrote:
Hugo wrote:Hello Mark

nice comparison.
verry close to my own results. Of course only 60 games but long TC and ponder ON.
1 Houdini 2.0c x64 T4 (+23) +14/-10/=36 53.33% 32.0/60
2 Critter 1.4 64-bit SSE4 T4(-22) +10/-14/=36 46.67% 28.0/60

regards, Clemens
To early to tell for sure on the rating of Critter 1.4,
I agree, 60 games mean almost nothing. With 60 games you have only a general idea of the strenght of an engine.
To have a clear evaluation of the elo of the engine you need around 2000 games in my opinion.

Regards
I have to smile here, but that is ridiculous, 2000 games. I have been testing since the 80's. And we did a pretty good job finding the correct ratings and rankings back then. And we did that all by playing 100% of the games by hand.

You only need that many games to find very small elo's between programs. This is done for fine tuning engine eval terms, not rating chess computers.
So, how many games, would you need, in your opinion, to have
a precise rating in elo? If 2000 makes you smile, just say a number that you consider enough please.

Regards

You never have enough data....

And it depends on what you mean by precise. 10,000 games will not give you an exact rating or even a million. You have to have a grasp of stats, and what level of certainty you want.

It depends on what you are trying to do.

For a engine programmer, they may play 5000 games at ultra fast time controls to find a 5 elo or less improvement in their eval terms.

That is one extreme...

On the other side of this, if you are asking the question what program is better. And we don't care by how much, just what program is stronger.

You can find this out in as little a 7 games at 95% confidence.

If what you think is true, I can claim I am stronger then GM Kasparov since I have not played 2000 games, and you don't have a grasp yet of my rating strength.

But this is ridiculous. If you understand the ratings system and statistics. Even though I have never played GM Kasparov or played 2000 games. you can show with a very high degree of certainty that I am not stronger then GM Kasparov.

You can start to get a bit of a grasp of a rating at around 20 games. After that you are shrinking the error bars.

Have you ever looked at one of the oldest testing groups rating list? SSDF.
No one disregarded their data, and they do not test to 2000 or more games for obvious reasons.

Note the games played and error bars.....

http://ssdf.bosjo.net/list.htm

MM · Post by MM » Sat Dec 31, 2011 7:35 pm

mwyoung wrote:
MM wrote:
mwyoung wrote:
MM wrote:
mwyoung wrote:
Hugo wrote:Hello Mark

nice comparison.
verry close to my own results. Of course only 60 games but long TC and ponder ON.
1 Houdini 2.0c x64 T4 (+23) +14/-10/=36 53.33% 32.0/60
2 Critter 1.4 64-bit SSE4 T4(-22) +10/-14/=36 46.67% 28.0/60

regards, Clemens
To early to tell for sure on the rating of Critter 1.4,
I agree, 60 games mean almost nothing. With 60 games you have only a general idea of the strenght of an engine.
To have a clear evaluation of the elo of the engine you need around 2000 games in my opinion.

Regards
I have to smile here, but that is ridiculous, 2000 games. I have been testing since the 80's. And we did a pretty good job finding the correct ratings and rankings back then. And we did that all by playing 100% of the games by hand.

You only need that many games to find very small elo's between programs. This is done for fine tuning engine eval terms, not rating chess computers.
So, how many games, would you need, in your opinion, to have
a precise rating in elo? If 2000 makes you smile, just say a number that you consider enough please.

Regards
You never have enough data....

And it depends on what you mean by precise. 10,000 games will not give you an exact rating or even a million. You have to have a grasp of stats, and what level of certainty you want.

It depends on what you are trying to do.

For a engine programmer, they may play 5000 games at ultra fast time controls to find a 5 elo or less improvement in their eval terms.

That is one extreme...

On the other side of this, if you are asking the question what program is better. And we don't care by how much, just what program is stronger.

You can find this out in as little a 7 games at 95% confidence.

If what you think is true, I can claim I am stronger then GM Kasparov since I have not played 2000 games, and you don't have a grasp yet of my rating strength.

But this is ridiculous. If you understand the ratings system and statistics. Even though I have never played GM Kasparov or played 2000 games. you can show with a very high degree of certainty that I am not stronger then GM Kasparov.

You can start to get a bit of a grasp of a rating at around 20 games. After that you are shrinking the error bars.

Have you ever looked at one of the oldest testing groups rating list? SSDF.
No one disregarded their data, and they do not test to 2000 or more games for obvious reasons.

Note the games played and error bars.....

http://ssdf.bosjo.net/list.htm

I talked about a ''clear evaluation of the elo'' of the engine. I believe that 2000 games are enough. This doesn't mean that the elo will be precise at 100% of course. But the approximation is totally accettable.

Then i agree that to have a really precise evaluation in elo, probably 10000 games wouldn't be enough.

As regards the match between an engine and another and the games needed to be able to identify which one is better, i think 500 games are good, if 2 engines are very close. If they are biased, you could need only 100 games or even less.

Fot the rest i didn't understand anything (what Kasparov and zero games against him has to do with this discussion?).

Best Regards

mwyoung · Post by **mwyoung** » Sat Dec 31, 2011 7:45 pm

MM wrote:
mwyoung wrote:
MM wrote:
mwyoung wrote:
MM wrote:
mwyoung wrote:
Hugo wrote:Hello Mark

nice comparison.
verry close to my own results. Of course only 60 games but long TC and ponder ON.
1 Houdini 2.0c x64 T4 (+23) +14/-10/=36 53.33% 32.0/60
2 Critter 1.4 64-bit SSE4 T4(-22) +10/-14/=36 46.67% 28.0/60

regards, Clemens
To early to tell for sure on the rating of Critter 1.4,
I agree, 60 games mean almost nothing. With 60 games you have only a general idea of the strenght of an engine.
To have a clear evaluation of the elo of the engine you need around 2000 games in my opinion.

Regards
I have to smile here, but that is ridiculous, 2000 games. I have been testing since the 80's. And we did a pretty good job finding the correct ratings and rankings back then. And we did that all by playing 100% of the games by hand.

You only need that many games to find very small elo's between programs. This is done for fine tuning engine eval terms, not rating chess computers.
So, how many games, would you need, in your opinion, to have
a precise rating in elo? If 2000 makes you smile, just say a number that you consider enough please.

Regards
You never have enough data....

And it depends on what you mean by precise. 10,000 games will not give you an exact rating or even a million. You have to have a grasp of stats, and what level of certainty you want.

It depends on what you are trying to do.

For a engine programmer, they may play 5000 games at ultra fast time controls to find a 5 elo or less improvement in their eval terms.

That is one extreme...

On the other side of this, if you are asking the question what program is better. And we don't care by how much, just what program is stronger.

You can find this out in as little a 7 games at 95% confidence.

If what you think is true, I can claim I am stronger then GM Kasparov since I have not played 2000 games, and you don't have a grasp yet of my rating strength.

But this is ridiculous. If you understand the ratings system and statistics. Even though I have never played GM Kasparov or played 2000 games. you can show with a very high degree of certainty that I am not stronger then GM Kasparov.

You can start to get a bit of a grasp of a rating at around 20 games. After that you are shrinking the error bars.

Have you ever looked at one of the oldest testing groups rating list? SSDF.
No one disregarded their data, and they do not test to 2000 or more games for obvious reasons.

Note the games played and error bars.....

http://ssdf.bosjo.net/list.htm

I talked about a ''clear evaluation of the elo'' of the engine. I believe that 2000 games are enough. This doesn't mean that the elo will be precise at 100% of course. But the approximation is totally accettable.

Then i agree that to have a really precise evaluation in elo, probably 10000 games wouldn't be enough.

As regards the match between an engine and another and the games needed to be able to identify which one is better, i think 500 games are good, if 2 engines are very close. If they are biased, you could need only 100 games or even less.

Fot the rest i didn't understand anything (what Kasparov and zero games against him has to do with this discussion?).

Best Regards

I think the post is clear. If you don't understand something it is becasue you are unable to understand or unwilling.

Laskos · Post by **Laskos** » Sat Dec 31, 2011 7:56 pm

mwyoung wrote:My first results for Critter 1.4.

i7 @ 2.1 Ghz 4 core.
TC 4m+1s
GTB = 5 man installed to solid state drive.
Games played = 300
Same generic book to 12 moves. Each side played same
opening as white and black.
Ht=2048

1 Houdini 2.0c Pro x64 (4c) +29 +83/=159/-58 54.17% 162.5/300
2 Critter 1.4 64-bit SSE4 (4c) -29 +58/=159/-83 45.83% 137.5/300

My result at ultra-short controls, which somehow confirms this, albeit with Houdini 1.5a. I got this several days ago, when Critter 1.4 appeared, forgot to post:

Code: Select all

Games Completed = 7215 of 10000 (Avg game length = 13.856 sec)
Settings = RR/16MB/5000ms+30ms/M 700000cp for 1000 moves, D 3000 moves/PGN:C:\Users\Ani\Downloads\LittleBlitzer\swcr.pgn(5120)
Time = 114221 sec elapsed, 44089 sec remaining
 1.  Houdini 1.5a             	3906.0/7215	2599-2002-2614  	(L: m=2000 t=2 i=0 a=0)	(D: r=1242 i=672 f=682 s=18 a=0)	(tpm=84.5 d=10.5 nps=1026066)
 2.  Critter 1.4              	3309.0/7215	2002-2599-2614  	(L: m=2598 t=1 i=0 a=0)	(D: r=1242 i=672 f=682 s=18 a=0)	(tpm=88.2 d=11.7 nps=996130)

Code: Select all

    Program                            Score       %       Elo    +   -    Draws

  1 Houdini 1.5a                   : 3906.0/7215  54.1     3214    6   6   36.2 %
  2 Critter 1.4                    : 3309.0/7215  45.9     3186    6   6   36.2 %

28 +/- 6 Elo points between them (95% confidence). The result is pretty reliably extrapolated to longer time controls, because Houdini 1.5 and Critter 1.4 scale similarly, in fact they are similar generally

Kai

Houdini 2.0c Pro x64 Vs Critter 1.4 64bit SSE4 300 games.

Houdini 2.0c Pro x64 Vs Critter 1.4 64bit SSE4 300 games.

Re: Houdini 2.0c Pro x64 Vs Critter 1.4 64bit SSE4 300 gam

Re: Houdini 2.0c Pro x64 Vs Critter 1.4 64bit SSE4 300 gam

Re: Houdini 2.0c Pro x64 Vs Critter 1.4 64bit SSE4 300 gam

Re: Houdini 2.0c Pro x64 Vs Critter 1.4 64bit SSE4 300 gam

Re: Houdini 2.0c Pro x64 Vs Critter 1.4 64bit SSE4 300 gam

Re: Houdini 2.0c Pro x64 Vs Critter 1.4 64bit SSE4 300 gam

Re: Houdini 2.0c Pro x64 Vs Critter 1.4 64bit SSE4 300 gam

Re: Houdini 2.0c Pro x64 Vs Critter 1.4 64bit SSE4 300 gam

Re: Houdini 2.0c Pro x64 Vs Critter 1.4 64bit SSE4 300 gam