beram wrote:At CCRL at LTC 40/40, Houdini 1.5a and Houdini 2.0c scored 49% and 50 % Against Komodo5.
So Houdini3beta at LTC 62%(!) against Komodo 5 is almost incredible.
But we have the games for proof.
I wouldn't be surprised if the improvement is less than 60 ELO, but my strong guess is that it must be 40 ELO at least and "even" that would be a very great achievement.
The 3 long TC matches have now finished.
Houdini 3 - Komodo 5 match: +43 -19 =58
72-48 (+68 Elo ± 42 Elo).
Houdini 3 - Stockfish 2.3.1 match: +44 -16 =60
74-46 (+82 Elo ± 42 Elo).
Houdini 3 - Houdini 2.0c match: +48 -15 =57
76.5-43.5 (+94 Elo ± 42 Elo).
Download All the Games: http://www.cruxis.com/download/Houdini3_LongTC_Matches.zip
Over-all result: Houdini 3 scored 61.8% (+81 Elo ± 24 Elo) against the average of Houdini 2.0c, Komodo 5 and Stockfish 2.3.1.
Well, having the games is no proof of anything, this could just be the best of three results for example. I'm not accusing anyone here(.....)just pointing out the need for independent tests. More likely Houdini 3 was just luckier(.....) than Houdini 2/1.5 against Komodo; what is the improvement if you do the same comparison for the Stockfish match? It looks like you picked out one of the two (non-self play) results to make a point.
Well that would imply a somewhat smaller gain than the Komodo data, but still an excellent one. Since elo gains are almost always larger at faster time limits, these results would predict an IPON gain of maybe 75 elo or more. If the actual IPON rating does show such a gain I will be quite impressed and will offer my congrats to Robert, but until then I just don't believe it. If Houdini 2 could only show around a ten elo gain for a year, it would be pretty remarkable if the next year showed a gain like 75.
beram wrote:At CCRL at LTC 40/40, Houdini 1.5a and Houdini 2.0c scored 49% and 50 % Against Komodo5.
So Houdini3beta at LTC 62%(!) against Komodo 5 is almost incredible.
But we have the games for proof.
I wouldn't be surprised if the improvement is less than 60 ELO, but my strong guess is that it must be 40 ELO at least and "even" that would be a very great achievement.
The 3 long TC matches have now finished.
Houdini 3 - Komodo 5 match: +43 -19 =58
72-48 (+68 Elo ± 42 Elo).
Houdini 3 - Stockfish 2.3.1 match: +44 -16 =60
74-46 (+82 Elo ± 42 Elo).
Houdini 3 - Houdini 2.0c match: +48 -15 =57
76.5-43.5 (+94 Elo ± 42 Elo).
Download All the Games: http://www.cruxis.com/download/Houdini3_LongTC_Matches.zip
Over-all result: Houdini 3 scored 61.8% (+81 Elo ± 24 Elo) against the average of Houdini 2.0c, Komodo 5 and Stockfish 2.3.1.
Well, having the games is no proof of anything, this could just be the best of three results for example. I'm not accusing anyone here(.....)just pointing out the need for independent tests. More likely Houdini 3 was just luckier(.....) than Houdini 2/1.5 against Komodo; what is the improvement if you do the same comparison for the Stockfish match? It looks like you picked out one of the two (non-self play) results to make a point.
Since elo gains are almost always larger at faster time limits, these results would predict an IPON gain of maybe 75 elo or more.
I do not think that the "almost always" is going to be correct for houdini and I think that it dependent on the type of the improvements that you do.
For example better order of moves may make the program 10% faster at blitz but more than 10% faster at longer time control.
My guess is that you are going to find out that houdini3 scales better than Current Komodo(not in the meaning of elo difference but in the meaning of time advantage that komodo needs for result of 50%).
Jouni wrote:According to that test Stockfish 2.3.1 is stronger than Houdini 2.0c !!
That just demonstrates once again that self-play (i.e. playing related engines) overstates rating differences.
The new Houdini is plus 40 Elo at best when in falls in the arms of the testers who are living in the real world........
Note that I wrote at best....
Dr.D
My opinion is different
My guess is that
houdini3 is going to be at least 41 elo better than houdini2 in the ipon rating list.
I am going to wait to the ipon rating list before deciding if to buy houdini3 and we are going to see who is right.
Fair enough Uri
_No one can hit as hard as life.But it ain’t about how hard you can hit.It’s about how hard you can get hit and keep moving forward.How much you can take and keep moving forward….