Houdini 1.03a The New Nr 1 !!!

Discussion of anything and everything relating to chess playing software and machines.

Moderator: Ras

mwyoung
Posts: 2727
Joined: Wed May 12, 2010 10:00 pm

Re: Houdini 1.03a The New Nr 1 !!!

Post by mwyoung »

I could not disagree more......
Dann Corbit
Posts: 12804
Joined: Wed Mar 08, 2006 8:57 pm
Location: Redmond, WA USA

Re: Houdini 1.03a The New Nr 1 !!!

Post by Dann Corbit »

IGarcia wrote:STaylor is right in several ways.

With low amount of games is not possible to exactly determine the
rating, elo or any other measure. But if a some games are played in
equal conditions, like Martin is doing, we can allow for a moment that the
result be taken as a representative measure (with a big +/- margin) of a
more precise value.
If you flip a coin fairly, can it come up 2,3,4,5,6 heads in a row? Each outcome is a bit more unlikely but all of them happen enough that you will see runs of 2,3,4,5,6 of a kind if you do an experiment to check. Whether the engines are strong or weak is irrelevant.
Results like 16-16 18-14 or even 20-12 speaks about engines with
similar strength.
They might. Or they might speak about engines where one is much stronger than the other. The more games you play, the less likely that such is the case, if the percentage of points won is about equal.
And the engine loosing can be a great rival, with
excellent moves, even better than the winner, but loosing endgames
because no endgame bases, or the difference can be a opening book.
No experiment that I have seen (and I have examined many) have shown a significant increase in strength from EGTB. All they do is make the engines play pretty chess. Perhaps if more games ended up in the endgame or if we have huge flash drives, then it could make a difference. But most games end in the mid-game and most drives are slow enough that the perfect information adds little to the win/loss total
So, for the pure rating go play 10.000 games, but the a real good engine
can not be the one wining those thousands blitz
 
Why would you imagine that strong or weak bends the probability. If the trials are independent (e.g. learning is turned off) then all objects suffer from the same outrageous turns of probability that the others do.
S.Taylor
Posts: 8514
Joined: Thu Mar 09, 2006 3:25 am
Location: Jerusalem Israel

Re: Houdini 1.03a The New Nr 1 !!!

Post by S.Taylor »

mwyoung wrote:
S.Taylor wrote:
Martin Thoresen wrote:
Dann Corbit wrote: Now, running 30 games is a great idea. But then don't imagine that you have decided who is strongest yet.
Dann, this is presicely what I am trying to explain to mr. Taylor.

My matches at tournament time control are not in any way intended to give a final answer to whether engine A is stronger than engine B.

To reach such conclusions at the very slow time control I use, a large number of computers would have to be used in order to get the game numbers high enough within a humane time-frame.

Even with 1500 games the error margins are quite high.

My matches and upcoming tournaments are intended to provide the viewers with some quality chess entertainment, nothing else.


Best Regards,
Martin
I don't even agree that all games are absolutely useless and say absolutely nothing about how an engine or a human plays, before a few thousand games have been played.
If this is the case, there can hardly be much quality chess entertainment.
When a machine moves, you are also interested in how much you think it ought to know why it is moving there, especially if you don't understand it.
It's hardly entertaining when you keep being sure the machine has a mistake, and you break your head trying to see what was behind it, only to realize that OF COURSE it was a stupid move.
You can't have much entertainment when it is nothing but disappointment over and over again. One wants to see things being proven over the board. It's not enough fun just to imagine and imagine no end, and mental masturbation etc. Then in the end you don't any conclusions about the positions and why this or that happened, as it is all just like random nonsense.
You want to see and apreaciate quality, and to know if and when there was or was not quality.
And you don't only want to know who won and how many, but you want to know what it looked like.
When you have all that, you only need to watch a few games to get a better idea how strong the programs are, than simply hearing the results of a few hundred games.
If you have no other interest or feeling or understanding, then of course, just play 10,000 games. But 20 games with quality (and long TC) show more, in almost every way.
Didn't Kasparov say that if a human beats a computer even one game, it shows that the human is superior in understanding.
It's not too much different if you test a small number, if it's done well, and you can see how the games went.
At no point did houdini or Stocfish get a plus score in (i don't think), it was only Rybka inching up and up, steadily.
There is so much to argue about this.
The only thing the statistics do is to get an exact rating. But the likelihood of this being being wildly different to after 20 quality games, is very low.
OK?

With this logic you can make almost any program or anyone better then anyone else. This is subjective fallacy


"Didn't Kasparov say that if a human beats a computer even one game, it shows that the human is superior in understanding."

Since Kasparov logic can not be wrong. I will assume Chess Genius from 1994 was "Superior in chess Understanding" to GM Kasparov. They played 2 games and Chess Genius won 1 1/2 to 1/2.

Because: If (A) beats (B) in even one game. It shows (A) is superior in understanding to (B).

[Event "Intel Chess Grand Prix (active)"]
[Site "London (England)"]
[Date "1994.??.??"]
[EventDate "?"]
[Round "1"]
[Result "0-1"]
[White "Garry Kasparov"]
[Black "Genius (Computer)"]
[ECO "D11"]
[WhiteElo "?"]
[BlackElo "?"]
[PlyCount "120"]

1.c4 c6 2.d4 d5 3.Nf3 Nf6 4.Qc2 dxc4 5.Qxc4 Bf5 6.Nc3 Nbd7
7.g3 e6 8.Bg2 Be7 9.O-O O-O 10.e3 Ne4 11.Qe2 Qb6 12.Rd1 Rad8
13.Ne1 Ndf6 14.Nxe4 Nxe4 15.f3 Nd6 16.a4 Qb3 17.e4 Bg6 18.Rd3
Qb4 19.b3 Nc8 20.Nc2 Qb6 21.Bf4 c5 22.Be3 cxd4 23.Nxd4 Bc5
24.Rad1 e5 25.Nc2 Rxd3 26.Qxd3 Ne7 27.b4 Bxe3+ 28.Qxe3 Rd8
29.Rxd8+ Qxd8 30.Bf1 b6 31.Qc3 f6 32.Bc4+ Bf7 33.Ne3 Qd4
34.Bxf7+ Kxf7 35.Qb3+ Kf8 36.Kg2 Qd2+ 37.Kh3 Qe2 38.Ng2 h5
39.Qe3 Qc4 40.Qd2 Qe6+ 41.g4 hxg4 42.fxg4 Qc4 43.Qe1 Qb3+
44.Ne3 Qd3 45.Kg3 Qxe4 46.Qd2 Qf4+ 47.Kg2 Qd4 48.Qxd4 exd4
49.Nc4 Nc6 50.b5 Ne5 51.Nd6 d3 52.Kf2 Nxg4+ 53.Ke1 Nxh2 54.Kd2
Nf3+ 55.Kxd3 Ke7 56.Nf5+ Kf7 57.Ke4 Nd2+ 58.Kd5 g5 59.Nd6+ Kg6
60.Kd4 Nb3+ 0-1
Mark, I hope you read all i wrote and understand what i mean.
Obviously not every thing that beats a human one game shows any understanding. But beating a strong machine is obviously alot different.
Kasparov could possibly miss something (absent-mindedly) and allow a beginner to deliver him mate in one, however unlikely. A machine can not possibly allow any short-term combination, visible to any regular strong player.

I could also say, that if there is a match in which machine beats machine b 20-0, it looks like some people would say it means absolutely nothing about the great chess perfection of the one that lost, since it might win the next 980 games.

I would say differently, that those 20 wins would be more evenly spread out, and not only that, but that they would be 40 draws among the 1000 games.

Atleast, if, in the above case of 1000 games in which there is result of 20/980, if it begins with the loser who gets the 20 of the 980, getting 2 actual wins from the first 10 games, i would say with all certainty that the winner of that match will NEVER end up with 980 of the 1000.

Does someone who goes to a dentist tell himself, that the dentist is very likely to break my most delicate teeth due to human failing, because it happens 2% of times with the best of dentists (when they treat delcate teeth which are nonetheless savable, with enough care), statistically? So therefore I am "prepared that those 2 teeth out of hundred teeth that the dentist treats, will be mine, and that dentist will still be my idol, as the most dependable dentist"?
De Vos W
Posts: 431
Joined: Tue Dec 01, 2009 11:59 am

Re: Houdini 1.03a The New Nr 1 !!!

Post by De Vos W »

The Dr. Wael Deeb settings.

Pawn Hash: 64
Pawn Value: 120
Knight Value: 325
Bishop Value: 325
Rook Value: 600
Queen Value: 1170
Bishop Pair Value: 65
Verification Reduction: 8

With these settings FireBird 1.1 w32 ( the strongest Fire engine) is in good condition.
S.Taylor
Posts: 8514
Joined: Thu Mar 09, 2006 3:25 am
Location: Jerusalem Israel

Re: Houdini 1.03a The New Nr 1 !!!

Post by S.Taylor »

Or does one think, no, they will not be MY teeth?
Anyway, please forget about his comparrison for now, as it needs to be worked out a bit better, to illustrate the point, or maybe not.
jpqy
Posts: 556
Joined: Thu Apr 24, 2008 9:31 am
Location: Belgium

Re: Houdini 1.03a The New Nr 1 !!!

Post by jpqy »

This is not a reply..

A question..maybe 10 years ago i run always normal standard games 40 moves in 2 hours..and so on. They called this quality games..because playing faster was not good enough to know how strong a engine is!
A few years ago with my AMD 3400+ i run 30min. games..i get deeper depth then my 2hour games..so quality from these games where better,yes?

I had after my AMD a E6600 then a Q6600 playing 15 & 10min. games and again i go deeper in depth,even i use less time..so again better quality games,right?

And now i have a core i7 running @3.80Ghz playing Blitz games 5min. and again i go deeper in depth!
So,i'am comming when a game start after the opening with depth 6 to now depth 20 ..how you call these games now then if they are so many years ago they where quality games..and these days our system are so fast that we go much deeper even with only 5min games?

JP.
Uri Blass
Posts: 11100
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: Houdini 1.03a The New Nr 1 !!!

Post by Uri Blass »

mwyoung wrote:
S.Taylor wrote:
Martin Thoresen wrote:
Dann Corbit wrote: Now, running 30 games is a great idea. But then don't imagine that you have decided who is strongest yet.
Dann, this is presicely what I am trying to explain to mr. Taylor.

My matches at tournament time control are not in any way intended to give a final answer to whether engine A is stronger than engine B.

To reach such conclusions at the very slow time control I use, a large number of computers would have to be used in order to get the game numbers high enough within a humane time-frame.

Even with 1500 games the error margins are quite high.

My matches and upcoming tournaments are intended to provide the viewers with some quality chess entertainment, nothing else.


Best Regards,
Martin
I don't even agree that all games are absolutely useless and say absolutely nothing about how an engine or a human plays, before a few thousand games have been played.
If this is the case, there can hardly be much quality chess entertainment.
When a machine moves, you are also interested in how much you think it ought to know why it is moving there, especially if you don't understand it.
It's hardly entertaining when you keep being sure the machine has a mistake, and you break your head trying to see what was behind it, only to realize that OF COURSE it was a stupid move.
You can't have much entertainment when it is nothing but disappointment over and over again. One wants to see things being proven over the board. It's not enough fun just to imagine and imagine no end, and mental masturbation etc. Then in the end you don't any conclusions about the positions and why this or that happened, as it is all just like random nonsense.
You want to see and apreaciate quality, and to know if and when there was or was not quality.
And you don't only want to know who won and how many, but you want to know what it looked like.
When you have all that, you only need to watch a few games to get a better idea how strong the programs are, than simply hearing the results of a few hundred games.
If you have no other interest or feeling or understanding, then of course, just play 10,000 games. But 20 games with quality (and long TC) show more, in almost every way.
Didn't Kasparov say that if a human beats a computer even one game, it shows that the human is superior in understanding.
It's not too much different if you test a small number, if it's done well, and you can see how the games went.
At no point did houdini or Stocfish get a plus score in (i don't think), it was only Rybka inching up and up, steadily.
There is so much to argue about this.
The only thing the statistics do is to get an exact rating. But the likelihood of this being being wildly different to after 20 quality games, is very low.
OK?

With this logic you can make almost any program or anyone better then anyone else. This is subjective fallacy


"Didn't Kasparov say that if a human beats a computer even one game, it shows that the human is superior in understanding."

Since Kasparov logic can not be wrong. I will assume Chess Genius from 1994 was "Superior in chess Understanding" to GM Kasparov. They played 2 games and Chess Genius won 1 1/2 to 1/2.

Because: If (A) beats (B) in even one game. It shows (A) is superior in understanding to (B).

[Event "Intel Chess Grand Prix (active)"]
[Site "London (England)"]
[Date "1994.??.??"]
[EventDate "?"]
[Round "1"]
[Result "0-1"]
[White "Garry Kasparov"]
[Black "Genius (Computer)"]
[ECO "D11"]
[WhiteElo "?"]
[BlackElo "?"]
[PlyCount "120"]

1.c4 c6 2.d4 d5 3.Nf3 Nf6 4.Qc2 dxc4 5.Qxc4 Bf5 6.Nc3 Nbd7
7.g3 e6 8.Bg2 Be7 9.O-O O-O 10.e3 Ne4 11.Qe2 Qb6 12.Rd1 Rad8
13.Ne1 Ndf6 14.Nxe4 Nxe4 15.f3 Nd6 16.a4 Qb3 17.e4 Bg6 18.Rd3
Qb4 19.b3 Nc8 20.Nc2 Qb6 21.Bf4 c5 22.Be3 cxd4 23.Nxd4 Bc5
24.Rad1 e5 25.Nc2 Rxd3 26.Qxd3 Ne7 27.b4 Bxe3+ 28.Qxe3 Rd8
29.Rxd8+ Qxd8 30.Bf1 b6 31.Qc3 f6 32.Bc4+ Bf7 33.Ne3 Qd4
34.Bxf7+ Kxf7 35.Qb3+ Kf8 36.Kg2 Qd2+ 37.Kh3 Qe2 38.Ng2 h5
39.Qe3 Qc4 40.Qd2 Qe6+ 41.g4 hxg4 42.fxg4 Qc4 43.Qe1 Qb3+
44.Ne3 Qd3 45.Kg3 Qxe4 46.Qd2 Qf4+ 47.Kg2 Qd4 48.Qxd4 exd4
49.Nc4 Nc6 50.b5 Ne5 51.Nd6 d3 52.Kf2 Nxg4+ 53.Ke1 Nxh2 54.Kd2
Nf3+ 55.Kxd3 Ke7 56.Nf5+ Kf7 57.Ke4 Nd2+ 58.Kd5 g5 59.Nd6+ Kg6
60.Kd4 Nb3+ 0-1
I disagree with kasparov but kasparov did not say that
If (A) beats (B) in even one game. It shows (A) is superior in understanding to (B).

Based on kasparov's logic it is a correct conclusion only if A is a human and B is a computer.

Uri
mwyoung
Posts: 2727
Joined: Wed May 12, 2010 10:00 pm

Re: Houdini 1.03a The New Nr 1 !!!

Post by mwyoung »

I Know.....:) I was being absurd.

But the logic is there is no logic. You make up rules and twisted logic so the data will fit your veiws. I love it.
IGarcia
Posts: 543
Joined: Mon Jul 05, 2010 10:27 pm

Re: Houdini 1.03a The New Nr 1 !!!

Post by IGarcia »

If I give you a complete new engine: Unknown version 0.01

and you start testing, 10 games each:

against SOS: Unknown wins 10-0
against Spike 1.2: Unknown wins 10-0
against Naum: Unknown wins 8-2
against Houdini 1.02: Unknown wins 6-4
against Stockfish 1.8: Unknown wins 6-4
against Houdini 1.03a: Unknown loose 4.5-5.5
against Rybka4: Unknown loose 4-6

You do not need 10.000 games to figure that the "Unknown v0.01" is close to Stokfish 1.8, Houdini 1.03a and Rybka4.

You cant know exactly the ELO but that is not flipping a coin.
As told, if you run 32 games and result is 18-14 the engines are about simmilar strenght

And this is not fliping a coin.

To say short: If you test a unknown engine against rybka and in the first game rybka loses the first thing one can think is:
a) its one in 1000 event
b) we have a strong engine with posibilities to fight with strong engines.


To hard to write in english for me. Much harder to give explanations. No need to discredit me with your flip coin arguments.
De Vos W
Posts: 431
Joined: Tue Dec 01, 2009 11:59 am

Re: Houdini 1.03a The New Nr 1 !!!

Post by De Vos W »

Mark Mason wrote:Hi,

How do those results square with the one's posted here which suggest Deep Rybka 4 tops Houdini ?

http://www.talkchess.com/forum/viewtopi ... 7&start=20

The Dr. Wael Deeb settings for Fire 1.1 32 bit:

Pawn Hash: 64
Pawn Value: 120
Knight Value: 325
Bishop Value: 325
Rook Value: 600
Queen Value: 1170
Bishop Pair Value: 65
Verification Reduction: 8

With these settings FireBird 1.1 w32 ( the strongest Fire engine) is in good condition.
In my personal tests FireBird is approximately 20 elo stronger than Deep Rybka 4 32bit !

I feel a lot of elo is saved for the rental version of Rybka, so Vasik Rajlich did not care much
about this Rybka 4, what a shame.