Houdini 3

Discussion of anything and everything relating to chess playing software and machines.

Moderator: Ras

User avatar
mclane
Posts: 18911
Joined: Thu Mar 09, 2006 6:40 pm
Location: US of Europe, germany
Full name: Thorsten Czub

Re: Houdini 3

Post by mclane »

lkaufman wrote: Is it simply that they have cruder eval that is faster but inferior, or is it that some aspect of the search makes them tactically stronger but positionally weaker? Only someone who has studied the Ippo code can answer this of course, especially if it compared to SF code, as SF has the same general behavior as Komodo in this respect.

they do not evaluate this extensive.
What seems like a fairy tale today may be reality tomorrow.
Here we have a fairy tale of the day after tomorrow....
User avatar
Eelco de Groot
Posts: 4669
Joined: Sun Mar 12, 2006 2:40 am
Full name:   Eelco de Groot

Re: Houdini 3

Post by Eelco de Groot »

mclane wrote:
lkaufman wrote:Basically I would like to know if Komodo (and Stockfish) are weak at bullet chess because we are doing something wrong, or whether we are stronger at slower chess because Ippo/Ivanhoe is doing something wrong (or both). Comments anyone?
komodo and stockfish are intelligent evaluation programs.
ippo/ivanhoe or rybka etc. are stupid search programs.

the intelligent programs are weak at bullet, and better at slow chess.
Maybe so, but I would not count Rybka 4.1 or even Rybka 3 as just more Fritz 5.32 clones on anabolic bitboard steroids. Larry put a lot of knowledge into Rybka 3. More than they actually needed which Vas and Larry both admitted, in the hope it would pay off at very long range. The reasoning is simple, the longer the time control the less you can afford positional mistakes. Tactical mistakes are still important as well but search is a bit better at correcting those.

It may be just a question at what timecontrol/hardware which set of evaluation features you need. Clearly older HIARCS had too much evaluation features for todays processors, and had to scale back. Fruit was very good at all ranges but still had some limitations in that the eval did cover only the most important things. All the improvements in Toga for Fruit however were in the search, I don't remember anything really good as an improvement in the evaluation. So the evaluation was Fruit's greatest strength and that was what scaled best. All that also ended up in Rybka but some of Larry's stuff was thrown out the Ippolit clones. It made for very good blitzers and I presume some of that we still see in Houdini today. Richard maybe disagrees?

But Tord Romstad I think was much more to the point here than my own explanation:
> If you are correct then Hiarcs at infinite time ought to beat Rybka at infinite time - and quoting you - "Rybka... would have no chance"
>
> I disagree with your assessment but I will try to test it: By matching Hiarcs 13.1 v Rybka 4 at ultra long time controls.


You got that completely backwards: At faster hardware or longer time controls, Rybka's advantage over Hiarcs will grow exponentially, all the way until the programs search deep enough to get close to solving the game, and all ends end in a draw. When the hardware gets sufficiently fast, the scalability of the search trumps everything else. If program X is 100 Elo points stronger than program Y when they both search only 50k nodes or so per move, but program Y gains 10 Elo points more with every time the number of nodes doubles, program Y will massacre program X at sufficiently fast hardware. That's the situation we are talking about.

Therefore, on modern PC hardware, you can easily write a super strong program simply by focusing on almost nothing but search scalability. It doesn't matter if the opponent programs are much more intelligent -- as long as you out-search them sufficiently, you'll crush them. Thanks to the exponential nature of the advantage provided by good scalability, nothing else really matters much at fast hardware.

Writing a scalable search isn't very hard. The state of the art techniques are well understood, and fairly easy to implement. Programs like Rybka or Stockfish running on fast hardware are terrifyingly strong, but from a software engineering point of view, they are quite unremarkable. Playing strongly on devices like the first-generation iPhone, with a slow CPU and limited memory, is a much harder task. That's why I called Hiarcs the best chess program. "Best" is of course a subjective assessment, what I meant is that Hiarcs is the most impressive programming achievement. What it does is a lot more remarkable than what Stockfish does.
Eelco
Debugging is twice as hard as writing the code in the first
place. Therefore, if you write the code as cleverly as possible, you
are, by definition, not smart enough to debug it.
-- Brian W. Kernighan
Werewolf
Posts: 2027
Joined: Thu Sep 18, 2008 10:24 pm

Re: Houdini 3

Post by Werewolf »

Eelco de Groot wrote:
mclane wrote:
lkaufman wrote:Basically I would like to know if Komodo (and Stockfish) are weak at bullet chess because we are doing something wrong, or whether we are stronger at slower chess because Ippo/Ivanhoe is doing something wrong (or both). Comments anyone?
komodo and stockfish are intelligent evaluation programs.
ippo/ivanhoe or rybka etc. are stupid search programs.

the intelligent programs are weak at bullet, and better at slow chess.
Maybe so, but I would not count Rybka 4.1 or even Rybka 3 as just more Fritz 5.32 clones on anabolic bitboard steroids. Larry put a lot of knowledge into Rybka 3. More than they actually needed which Vas and Larry both admitted, in the hope it would pay off at very long range. The reasoning is simple, the longer the time control the less you can afford positional mistakes. Tactical mistakes are still important as well but search is a bit better at correcting those.

It may be just a question at what timecontrol/hardware which set of evaluation features you need. Clearly older HIARCS had too much evaluation features for todays processors, and had to scale back. Fruit was very good at all ranges but still had some limitations in that the eval did cover only the most important things. All the improvements in Toga for Fruit however were in the search, I don't remember anything really good as an improvement in the evaluation. So the evaluation was Fruit's greatest strength and that was what scaled best. All that also ended up in Rybka but some of Larry's stuff was thrown out the Ippolit clones. It made for very good blitzers and I presume some of that we still see in Houdini today. Richard maybe disagrees?

But Tord Romstad I think was much more to the point here than my own explanation:
> If you are correct then Hiarcs at infinite time ought to beat Rybka at infinite time - and quoting you - "Rybka... would have no chance"
>
> I disagree with your assessment but I will try to test it: By matching Hiarcs 13.1 v Rybka 4 at ultra long time controls.


You got that completely backwards: At faster hardware or longer time controls, Rybka's advantage over Hiarcs will grow exponentially, all the way until the programs search deep enough to get close to solving the game, and all ends end in a draw. When the hardware gets sufficiently fast, the scalability of the search trumps everything else. If program X is 100 Elo points stronger than program Y when they both search only 50k nodes or so per move, but program Y gains 10 Elo points more with every time the number of nodes doubles, program Y will massacre program X at sufficiently fast hardware. That's the situation we are talking about.

Therefore, on modern PC hardware, you can easily write a super strong program simply by focusing on almost nothing but search scalability. It doesn't matter if the opponent programs are much more intelligent -- as long as you out-search them sufficiently, you'll crush them. Thanks to the exponential nature of the advantage provided by good scalability, nothing else really matters much at fast hardware.

Writing a scalable search isn't very hard. The state of the art techniques are well understood, and fairly easy to implement. Programs like Rybka or Stockfish running on fast hardware are terrifyingly strong, but from a software engineering point of view, they are quite unremarkable. Playing strongly on devices like the first-generation iPhone, with a slow CPU and limited memory, is a much harder task. That's why I called Hiarcs the best chess program. "Best" is of course a subjective assessment, what I meant is that Hiarcs is the most impressive programming achievement. What it does is a lot more remarkable than what Stockfish does.
Eelco
He made that comment replying to one of my posts. I'd like to hear a clear explanation of "scaleability" though.

What EXACTLY are we talking about? A low branching factor..?
bhlangonijr
Posts: 482
Joined: Thu Oct 16, 2008 4:23 am
Location: Milky Way

Re: Houdini 3

Post by bhlangonijr »

michiguel wrote:
:shock: We need more GMs in this forum, not less.

Miguel
+1

People are so sensitive here....
User avatar
mclane
Posts: 18911
Joined: Thu Mar 09, 2006 6:40 pm
Location: US of Europe, germany
Full name: Thorsten Czub

Re: Houdini 3

Post by mclane »

Eelco de Groot wrote:
mclane wrote:
lkaufman wrote:Basically I would like to know if Komodo (and Stockfish) are weak at bullet chess because we are doing something wrong, or whether we are stronger at slower chess because Ippo/Ivanhoe is doing something wrong (or both). Comments anyone?
komodo and stockfish are intelligent evaluation programs.
ippo/ivanhoe or rybka etc. are stupid search programs.

the intelligent programs are weak at bullet, and better at slow chess.
Maybe so, but I would not count Rybka 4.1 or even Rybka 3 as just more Fritz 5.32 clones on anabolic bitboard steroids. Larry put a lot of knowledge into Rybka 3. More than they actually needed which Vas and Larry both admitted, in the hope it would pay off at very long range. The reasoning is simple, the longer the time control the less you can afford positional mistakes. Tactical mistakes are still important as well but search is a bit better at correcting those.

It may be just a question at what timecontrol/hardware which set of evaluation features you need. Clearly older HIARCS had too much evaluation features for todays processors, and had to scale back. Fruit was very good at all ranges but still had some limitations in that the eval did cover only the most important things. All the improvements in Toga for Fruit however were in the search, I don't remember anything really good as an improvement in the evaluation. So the evaluation was Fruit's greatest strength and that was what scaled best. All that also ended up in Rybka but some of Larry's stuff was thrown out the Ippolit clones. It made for very good blitzers and I presume some of that we still see in Houdini today. Richard maybe disagrees?

But Tord Romstad I think was much more to the point here than my own explanation:
> If you are correct then Hiarcs at infinite time ought to beat Rybka at infinite time - and quoting you - "Rybka... would have no chance"
>
> I disagree with your assessment but I will try to test it: By matching Hiarcs 13.1 v Rybka 4 at ultra long time controls.


You got that completely backwards: At faster hardware or longer time controls, Rybka's advantage over Hiarcs will grow exponentially, all the way until the programs search deep enough to get close to solving the game, and all ends end in a draw. When the hardware gets sufficiently fast, the scalability of the search trumps everything else. If program X is 100 Elo points stronger than program Y when they both search only 50k nodes or so per move, but program Y gains 10 Elo points more with every time the number of nodes doubles, program Y will massacre program X at sufficiently fast hardware. That's the situation we are talking about.

Therefore, on modern PC hardware, you can easily write a super strong program simply by focusing on almost nothing but search scalability. It doesn't matter if the opponent programs are much more intelligent -- as long as you out-search them sufficiently, you'll crush them. Thanks to the exponential nature of the advantage provided by good scalability, nothing else really matters much at fast hardware.

Writing a scalable search isn't very hard. The state of the art techniques are well understood, and fairly easy to implement. Programs like Rybka or Stockfish running on fast hardware are terrifyingly strong, but from a software engineering point of view, they are quite unremarkable. Playing strongly on devices like the first-generation iPhone, with a slow CPU and limited memory, is a much harder task. That's why I called Hiarcs the best chess program. "Best" is of course a subjective assessment, what I meant is that Hiarcs is the most impressive programming achievement. What it does is a lot more remarkable than what Stockfish does.
Eelco
the problem is, there are situations/positions where you cannot SOLVE the position with SEARCH. you can scale as much as you wish, in those situations KNOWLEDGE is the key.

if you don't have this knowledge, all your hardware or your good scaling will not work. of course these situations are not seen by humans who let the machines play 30.000 bullet games. because no human replays those data.

but when watching tournament games (40/120) or even email-chess/correspondance chess games, you see that stockfish or komodo
CAN find out in such a situation while rybka and all clones cannot.

of course you cannot find these observations in ratings in elo lists.
What seems like a fairy tale today may be reality tomorrow.
Here we have a fairy tale of the day after tomorrow....
Uri Blass
Posts: 10889
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: Houdini 3

Post by Uri Blass »

lkaufman wrote:
rvida wrote:
lkaufman wrote:
Houdini wrote:
melajara wrote:Hi Robert,

Don't forget to dedicate some resources for testing at long time controls too as some people here are implying Houdini 1.5 is stronger than version 2 at the regular OTB time control (40 moves in 120 minutes).

Best regards
Mel, some of the most vocal of the "some people here" have never actually tested Houdini 2 :roll:. If you have any doubts about Houdini 2 at long TC, check the neighboring thread, or the very interesting matches of various engines against Deep Junior 13. Houdini 2 scores about 10% better than its closest competitors.

As for Houdini 3, it will do quite well at long TC - some search algorithm improvements will have an increasing effect with higher search depths.

Robert
The first test you cite does not have either Komodo or Houdini 1.5, so it doesn't tell us anything about whether Houdini 2 scales as well as either of those two engines.
Of course, Komodo is the best scaling program out there. It is absolutely unthinkable that it scales as well as anything else, just has some problems with fast TC.
lkaufman wrote: Comparing to Critter and Ivanhoe just confirms my contention that neither of these scale any better than Houdini as they all have basically the Ippo search, with various tweaks.
Komodo has basically Ippo search too. (alpha-beta + null-move + IID + aggressive lmr)
Edit: ...with various tweaks of course :twisted:
Leaving Critter aside (so you won't take this as directed at you), it is clear that Komodo is weaker than Ivanhoe (or any acknowledged Ippo derivative) at bullet chess or faster, but stronger at blitz and much stronger at slow chess. If we compare Komodo to the very different (from Ivanhoe) program Stockfish, we don't see this pattern, Komodo is just slightly stronger at all levels. I still don't understand the reason for this scaling behavior vs. the Ippo programs, but I would very much like to get at the reason for it. Since Komodo is not open-source, I don't expect anyone to be able to solve this puzzle, but since Stockfish is open-source as is Ivanhoe perhaps someone can comment on why SF is weaker than Ivanhoe at bullet chess but stronger at normal time controls? I think it is something to do with the search, though it could also be caused by eval. Basically I would like to know if Komodo (and Stockfish) are weak at bullet chess because we are doing something wrong, or whether we are stronger at slower chess because Ippo/Ivanhoe is doing something wrong (or both). Comments anyone?
I do not know but I am not sure if Stockfish is really stronger at longer time control relative to Critter and Critter seem to be stronger than stockfish at all levels on single cpu and equal to stockfish at all level with 4 or 6 cpu's(when I compare between 40/4 and 40/40).

Stockfish may be weak at bullet chess but it does not mean that it scale better relative to Critter when we go from blitz to slower time control and do not include bullet.

CCRL 40/4 has the following


Critter 1.4 64-bit 3231 +17 −17
Stockfish 2.2.2 64-bit 3210 +17 −17

CCRL 40/40 has the following

Critter 1.4 64-bit 3214 +26 −26 59.0% −59.2 52.3%
Stockfish 2.2 64-bit 3189 +38 −38 57.9% −49.2 53.5%

Critter 1.4 32-bit 3181 +20 −20 69.3% −130.9 39.6%
Stockfish 2.2.2 32-bit 3163 +20 −20 67.5% −114.8 44.8%

You can see 20 elo advantage for Critter at all levels(I included also 32 bits for 40/40 because we have not enough games for 64 bits at this time control).

If I look at many CPU's I see the following

40/40

3‑4 Critter 1.4 64-bit 4CPU 3253 +25 −25 59.3% −56.5
3‑4 Stockfish 2.2.2 64-bit 4CPU 3253 +27 −27 59.8% −58.7

They do not test them at 6 cpu in 40/40

40/4

2‑3 Critter 1.4 64-bit 6CPU 3348 +16 −16 65.2% −112.8
2‑3 Stockfish 2.2.2 64-bit 6CPU 3348 +21 −21 64.5% −110.2

unfortunately no data for 4 cpu at 40/4
User avatar
Eelco de Groot
Posts: 4669
Joined: Sun Mar 12, 2006 2:40 am
Full name:   Eelco de Groot

Re: Houdini 3

Post by Eelco de Groot »

Werewolf wrote: He made that comment replying to one of my posts. I'd like to hear a clear explanation of "scaleability" though.

What EXACTLY are we talking about? A low branching factor..?
Hello Carl,

Well if I had to choose off the top of my head I would rather take the elonumber per doubling of time as a measurement. It is of course related to this branching factor, but not the same thing. The effective depth is still the overriding property that determines elo, in other words if you search deeper you will beat the more complete evaluation. Provided that there are no errors in your own evaluation, at least not many, if it allows you to search more nodes because it is faster than your opponent's, you will likely win (on the average). It will not even show in crude tactics, because the outsearching you do will make your opponent look dumb and your own play brilliant. Chris Whittington used to call this the "search gap" © Chris Whittington. Don't go searching for his post in the archives though because we had to delete all copies of it for breach of his copyright. High scaleability then would mean a consistently high elo increase per doubling of search time, or doubling of searched nodes.

Eelco
Debugging is twice as hard as writing the code in the first
place. Therefore, if you write the code as cleverly as possible, you
are, by definition, not smart enough to debug it.
-- Brian W. Kernighan
Uri Blass
Posts: 10889
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: Houdini 3

Post by Uri Blass »

I can see that stockfish 1 cpu has better rating at 120/40 on CEGT but we clearly have not enough games there to prove something.

http://www.husvankempen.de/nunn//40120n ... liste.html

At 40/20 with more games I can see only 5 elo advantage for Critter relative to stockfish.

I do not know if Stockfish scales better than Critter(when we start from blitz and go to long time control) but even in case that it scales better it seems that we have something like an estimate of 10 elo for being 10 times faster that is not much and CEGT 120/40 is basically a different time control because it is not 120/40 repeating so I am not sure if it is good to use CEGT 120/40 to compare with different time controls because factors that are not earning from slower time control may influence the result.
MM
Posts: 766
Joined: Sun Oct 16, 2011 11:25 am

Re: Houdini 3

Post by MM »

mclane wrote:

the problem is, there are situations/positions where you cannot SOLVE the position with SEARCH. you can scale as much as you wish, in those situations KNOWLEDGE is the key.

if you don't have this knowledge, all your hardware or your good scaling will not work. of course these situations are not seen by humans who let the machines play 30.000 bullet games. because no human replays those data.
+1
MM
Uri Blass
Posts: 10889
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: Houdini 3

Post by Uri Blass »

Eelco de Groot wrote:
mclane wrote:
lkaufman wrote:Basically I would like to know if Komodo (and Stockfish) are weak at bullet chess because we are doing something wrong, or whether we are stronger at slower chess because Ippo/Ivanhoe is doing something wrong (or both). Comments anyone?
komodo and stockfish are intelligent evaluation programs.
ippo/ivanhoe or rybka etc. are stupid search programs.

the intelligent programs are weak at bullet, and better at slow chess.
Maybe so, but I would not count Rybka 4.1 or even Rybka 3 as just more Fritz 5.32 clones on anabolic bitboard steroids. Larry put a lot of knowledge into Rybka 3. More than they actually needed which Vas and Larry both admitted, in the hope it would pay off at very long range. The reasoning is simple, the longer the time control the less you can afford positional mistakes. Tactical mistakes are still important as well but search is a bit better at correcting those.

It may be just a question at what timecontrol/hardware which set of evaluation features you need. Clearly older HIARCS had too much evaluation features for todays processors, and had to scale back. Fruit was very good at all ranges but still had some limitations in that the eval did cover only the most important things. All the improvements in Toga for Fruit however were in the search, I don't remember anything really good as an improvement in the evaluation. So the evaluation was Fruit's greatest strength and that was what scaled best. All that also ended up in Rybka but some of Larry's stuff was thrown out the Ippolit clones. It made for very good blitzers and I presume some of that we still see in Houdini today. Richard maybe disagrees?

But Tord Romstad I think was much more to the point here than my own explanation:
> If you are correct then Hiarcs at infinite time ought to beat Rybka at infinite time - and quoting you - "Rybka... would have no chance"
>
> I disagree with your assessment but I will try to test it: By matching Hiarcs 13.1 v Rybka 4 at ultra long time controls.


You got that completely backwards: At faster hardware or longer time controls, Rybka's advantage over Hiarcs will grow exponentially, all the way until the programs search deep enough to get close to solving the game, and all ends end in a draw. When the hardware gets sufficiently fast, the scalability of the search trumps everything else. If program X is 100 Elo points stronger than program Y when they both search only 50k nodes or so per move, but program Y gains 10 Elo points more with every time the number of nodes doubles, program Y will massacre program X at sufficiently fast hardware. That's the situation we are talking about.

Therefore, on modern PC hardware, you can easily write a super strong program simply by focusing on almost nothing but search scalability. It doesn't matter if the opponent programs are much more intelligent -- as long as you out-search them sufficiently, you'll crush them. Thanks to the exponential nature of the advantage provided by good scalability, nothing else really matters much at fast hardware.

Writing a scalable search isn't very hard. The state of the art techniques are well understood, and fairly easy to implement. Programs like Rybka or Stockfish running on fast hardware are terrifyingly strong, but from a software engineering point of view, they are quite unremarkable. Playing strongly on devices like the first-generation iPhone, with a slow CPU and limited memory, is a much harder task. That's why I called Hiarcs the best chess program. "Best" is of course a subjective assessment, what I meant is that Hiarcs is the most impressive programming achievement. What it does is a lot more remarkable than what Stockfish does.
Eelco
I do not think that the better evaluation win at slower time control and I do not think that the better evaluation win at blitz.

There are evaluation components that can help at blitz and no effect at slow time control and there are evaluation components that can help at slower time control but no effect in blitz.

I will give an example for evaluation component that help only at blitz.

Suppose to make things simple that we have no tablebases and
suppose that we have 2 programs.

one does not know to win KR vs K without a deep search because it evaluates all KR vs K position with the same evaluation and one knows to win KR vs K even at small depth thanks to evaluation.

If we play blitz the program that does not know to win may draw games with KR vs K and fail to win and lose rating points but if we play long time control it is going always to win KR vs K thanks to search so the fact that it has no knowledge in the evaluation is not a problem for it.


Of course practically KR vs K is not a problem today but there may be other type of endgames that you need deep search to win but as soon as you search deep enough it is no problem to win them so having knowledge in the evaluation about some specific endgames if it is not about who wins the game but about how to improve the position may be productive only at blitz.