VERY INTERESTING Stockfish vs. Houdini Match Has Begun!

Discussion of computer chess matches and engine tournaments.

Moderator: Ras

mwyoung
Posts: 2727
Joined: Wed May 12, 2010 10:00 pm

Re: And the Very Latest Houdini 3 v Stockfish 290613 UPDATE!

Post by mwyoung »

geots wrote:The last update was after 35 games, leaving 15 to be played- and Houdini finding himself needing to make up 5 games.

Each has his own take, but to me a rating list is like an FIDE Elo list for humans. First, no engine relinquishes his standing or his being the World champion because of blitz. To me, a championship match on my end must be at no faster control than 40/40. Preferably either 4 or 6 Cores for each. If Stockfish were to win this match- this version is the No.1 engine in the world in my eyes until Houdini or another engine can unseat him. Minimum 50 games and no speeds faster than 40/40. If I go with 40 moves in the 1st 2.5 hours, etc...... , then I would look at the best of 24, or something of that order.

Again, would a victory here mean this version of Stockfish is actually stronger than Houdini 3? That is moot. He would have- at the time it counted, played the better chess. And that is all that matters.

So here is an all-important update:



Alienware AURORA_R4
Intel i7 w/6 True Cores
Fritz 11 gui
6 Cores/64bit
256MB hash
Bases=NONE
Ponder_Learning=OFF
Perfect 2012b.ctg w/12-move limit
40/19 Repeating- benchmarked to adapt to 40/40
Match=50 games



Game 36- Stockfish plays the white pieces- game is drawn!

Game 37- Houdini plays the white pieces- game is drawn!

Game 38- Stockfish plays the white pieces- again, game is drawn!




Now it is really crunch time. Houdini is running out of time- he must now make up a 5 game deficit in 12 games.


Later-
Thanks George for running this test to let us all know how good Stockfish is at long time controls.

My only comment is,

I Am So Damn Tired Of Being Right!

George Speight Quotes in June 2013 regarding my Stockfish Test results:


"Try this asinine remark. You say it is beating all versions of Houdini @ 40/40. To say that with only the few games you have played tells me either you don't know jack about testing- or you are just delusional."


Let me be very clear. If you want to really see the difference between Houdini 3 and a Stockfish development version- run them against each other in a head-up match of 100 games or more. As I have done with the Komodo MP beta that will be released tomorrow- ag. the 2 strongest Stockfish Dev. versions, 010613 and 090613. Both of which Komodo MP beat. Don't confuse the issue with a gauntlet if "Stockfish - Houdini" is going to be your headline. Strange things happen to an engine when all it gets is a steady diet of Houdini game after game after game after game...... never getting a breather- just more Houdini on top of Houdini.

"Let me explain something to you. Anytime someone says such and such is better or worse at this control or that control- that is another way of saying the engine in question needs a lot of work. Because the "studs" don't give a shit. They will play you at midnight in a cornfield with extension cords. At any control. To the best- all that is irrelevant bullshit."

"Adam, it cannot beat Houdini 3. But IF it could, it still would not be Number 1 now- because I will bet any amount of money I can get my hands on there is not a development version right now that can beat Komodo MP. I have tested it too much. That is a promise from me to you.

As for 2 weeks from now, or next month- the way it is spitting out Stockfish versions- who knows. And Marco, besides being a friend, is no dummy. All I can tell you is what I see now. No Ouija board at my house.


"They are better, and they are strong- but they are no match for Houdini 3. I am sorry- but such is life. In fact- I will bet the farm that Komodo MP will easily come in at Number 2 in the world. I have run too many 100s of games with it to not know. You can put a stamp on that and mail it."
"The worst thing that can happen to a forum is a running wild attacking moderator(HGM) who is not corrected by the community." - Ed Schröder
But my words like silent raindrops fell. And echoed in the wells of silence.
User avatar
geots
Posts: 4790
Joined: Sat Mar 11, 2006 12:42 am

Re: And the Very Latest Houdini 3 v Stockfish 290613 UPDATE!

Post by geots »

I really don't see anything much out of line with what I said. I conceded the Stockfish versions were coming out fast- and who knew in the next couple weeks. I still say you ran a couple games, got excited and flopped around like a goose in a shitstorm. Your main contention seems to be Komodo. I beta test for Don- and I still stand by what I said concerning Komodo.

Look, it is real simple. I don't like people who try to grandstand. I don't like your mouth, and I don't like you. I would appreciate it if you would refrain from ever having any further contact with me on this forum.


gs
User avatar
geots
Posts: 4790
Joined: Sat Mar 11, 2006 12:42 am

Stockfish, Houdini And a Monster UPDATE w/2 Monsters!

Post by geots »

We are getting down to the nitty gritty now. These 2 stalwarts are putting on quite a show. Houdini gets a win, Stockfish follows with a win of his own and Houdini comes back with another win. Problem is, as I have stated more than once- it is almost impossible to win a match like this when you are never able to put together back-to-back wins. And Stockfish has just not allowed Houdini to do it even once. Whereas Stockfish has managed to do it at least twice and once even put together "back-to-back-to-back."

As I type, Stockfish has an even position in the late middle game with the white pieces. If he at least can hold this game to a draw, that means Houdini will be playing from 4 games behind with 6 games remaining. If he wiggles out of that one, old Harry will be cheering from his grave.



Alienware AURORA_R4
Intel i7 w/6 True Cores
Fritz 11 gui
6 Cores/64bit
256MB hash
Bases=NONE
Ponder_Learning=OFF
Perfect 2012b.ctg w/12-move limit
40/19 Repeating- benchmarked to adapt to 40/40
Match=50 games


Code: Select all

Stockfish 290613 64 SSE4.2    +11/-7/=25
Houdini 3 x64                 +7/-11/=25

I want to be extremely clear about this. The good testers do not have "favorites". And they do not let their emotions get involved. In matches I run, the only important issues are between the engines playing and my credibility as a tester. If the results, in some way, prove me right- that is fine. If they prove me wrong- that is fine as well. This match is not about me. It is about 2 of the best giving it all they have.



Now for a bit of rest-
mwyoung
Posts: 2727
Joined: Wed May 12, 2010 10:00 pm

Re: And the Very Latest Houdini 3 v Stockfish 290613 UPDATE!

Post by mwyoung »

geots wrote:I really don't see anything much out of line with what I said. I conceded the Stockfish versions were coming out fast- and who knew in the next couple weeks. I still say you ran a couple games, got excited and flopped around like a goose in a shitstorm. Your main contention seems to be Komodo. I beta test for Don- and I still stand by what I said concerning Komodo.

Look, it is real simple. I don't like people who try to grandstand. I don't like your mouth, and I don't like you. I would appreciate it if you would refrain from ever having any further contact with me on this forum.


gs
I guess not after you attack me. Were is your buddy Miguel protecting the stupid from this speculation. I love your guys double standards.

Called it then and pointing it all out for us to see nothing more. Man I am tired of always being right !
"The worst thing that can happen to a forum is a running wild attacking moderator(HGM) who is not corrected by the community." - Ed Schröder
But my words like silent raindrops fell. And echoed in the wells of silence.
User avatar
geots
Posts: 4790
Joined: Sat Mar 11, 2006 12:42 am

Stockfish v Houdini- An UPDATE At the 45 Game Mark!

Post by geots »

There are 5 games remaining, as we have taken a break for a couple of hours. Things are now very problematic for Houdini. To salvage the match, Houdini must run the table by winning all 5 games and retain his title via a drawn match. And the chances of that are.................................

(I can't make up my mind if game 45 was a blunderfest or some really great chess.) At any rate................



Alienware AURORA_R4
Intel i7 w/6 True Cores

Fritz 11 gui
6 Cores/64bit
256MB hash
Bases=NONE
Ponder_Learning=OFF
Perfect 2012b.ctg w/12-move limit
40/19 Repeating- benchmarked to adapt to 40/40
Match=50 games



Code: Select all

Stockfish 290613 64 SSE4.2    +12/-7/=26
Houdini 3 x64                 +7/-12/=26


The match will resume with the start of game 46 shortly-
User avatar
geots
Posts: 4790
Joined: Sat Mar 11, 2006 12:42 am

Re: And A Sidenote That Is Definitely Worth Mentioning!

Post by geots »

Stronger- weaker? I would have no clue. My first reaction is it certainly would not be weaker, but what do I know. It looks as if this Stockfish version I am running, 290613- is gone forever. If you don't already have it, the chances may be slim and none. Because when you download it from the Stockfish development site, you get 13062911. Which for me installs as 290613. But if I download the same executable now- 13062911- it now installs as 030713. When I asked Marco why this happens, he told me he wasn't sure, but it probably had to mean the executable had gone thru a recompile. Which doesn't necessarily make it better or worse- just an interesting tidbit.

And the Houdini-Stockfish match is now down to 3 games remaining.


And..........
FriedmannC
Posts: 273
Joined: Fri Feb 10, 2012 7:58 pm
Location: SUCEAVA, ROMANIA

Re: And A Sidenote That Is Definitely Worth Mentioning!

Post by FriedmannC »

And where can I download this Stockfish version? :)
User avatar
Don
Posts: 5106
Joined: Tue Apr 29, 2008 4:27 pm

Re: And the Very Latest Houdini 3 v Stockfish 290613 UPDATE!

Post by Don »

mwyoung wrote:
geots wrote:The last update was after 35 games, leaving 15 to be played- and Houdini finding himself needing to make up 5 games.

Each has his own take, but to me a rating list is like an FIDE Elo list for humans. First, no engine relinquishes his standing or his being the World champion because of blitz. To me, a championship match on my end must be at no faster control than 40/40. Preferably either 4 or 6 Cores for each. If Stockfish were to win this match- this version is the No.1 engine in the world in my eyes until Houdini or another engine can unseat him. Minimum 50 games and no speeds faster than 40/40. If I go with 40 moves in the 1st 2.5 hours, etc...... , then I would look at the best of 24, or something of that order.

Again, would a victory here mean this version of Stockfish is actually stronger than Houdini 3? That is moot. He would have- at the time it counted, played the better chess. And that is all that matters.

So here is an all-important update:



Alienware AURORA_R4
Intel i7 w/6 True Cores
Fritz 11 gui
6 Cores/64bit
256MB hash
Bases=NONE
Ponder_Learning=OFF
Perfect 2012b.ctg w/12-move limit
40/19 Repeating- benchmarked to adapt to 40/40
Match=50 games



Game 36- Stockfish plays the white pieces- game is drawn!

Game 37- Houdini plays the white pieces- game is drawn!

Game 38- Stockfish plays the white pieces- again, game is drawn!




Now it is really crunch time. Houdini is running out of time- he must now make up a 5 game deficit in 12 games.


Later-
Thanks George for running this test to let us all know how good Stockfish is at long time controls.

My only comment is,

I Am So Damn Tired Of Being Right!

George Speight Quotes in June 2013 regarding my Stockfish Test results:


"Try this asinine remark. You say it is beating all versions of Houdini @ 40/40. To say that with only the few games you have played tells me either you don't know jack about testing- or you are just delusional."


Let me be very clear. If you want to really see the difference between Houdini 3 and a Stockfish development version- run them against each other in a head-up match of 100 games or more.
Try 1000 games or more. Houdini is the strongest program still - I would like to believe that the ranking is Komodo, followed by Stockfish followed by Houdini but I have seen nothing to indicate that Houdini 3 is not still on top.

The truth of the matter is that both Komodo and Stockfish are strong enough to win short matches against each other or Houdini 3. I cannot prove this but I estimate that any short match win against Houdini is far more likely to get reported than a lost match and a 100 game match carries something on the order of 60 ELO of error margin. Beating Houdini 3 in a 100 game match is VERY possible for Stockfish if it has improved by 35 ELO. On the CCRL 40/40 list Stockfish 3 is 72 ELO behind Houdini 3 on 4 cores. If it has improved 35 ELO it is still about 37 ELO behind - plenty strong enough to be able to win 100 games matches with non-trivial probability. Even a 1000 games match has something like a 15 point error margin.

As I have done with the Komodo MP beta that will be released tomorrow- ag. the 2 strongest Stockfish Dev. versions, 010613 and 090613. Both of which Komodo MP beat. Don't confuse the issue with a gauntlet if "Stockfish - Houdini" is going to be your headline. Strange things happen to an engine when all it gets is a steady diet of Houdini game after game after game after game...... never getting a breather- just more Houdini on top of Houdini.

"Let me explain something to you. Anytime someone says such and such is better or worse at this control or that control- that is another way of saying the engine in question needs a lot of work. Because the "studs" don't give a shit. They will play you at midnight in a cornfield with extension cords. At any control. To the best- all that is irrelevant bullshit."

"Adam, it cannot beat Houdini 3. But IF it could, it still would not be Number 1 now- because I will bet any amount of money I can get my hands on there is not a development version right now that can beat Komodo MP. I have tested it too much. That is a promise from me to you.

As for 2 weeks from now, or next month- the way it is spitting out Stockfish versions- who knows. And Marco, besides being a friend, is no dummy. All I can tell you is what I see now. No Ouija board at my house.


"They are better, and they are strong- but they are no match for Houdini 3. I am sorry- but such is life. In fact- I will bet the farm that Komodo MP will easily come in at Number 2 in the world. I have run too many 100s of games with it to not know. You can put a stamp on that and mail it."
Capital punishment would be more effective as a preventive measure if it were administered prior to the crime.
User avatar
Ajedrecista
Posts: 2177
Joined: Wed Jul 13, 2011 9:04 pm
Location: Madrid, Spain.

Re: And a sidenote that is definitely worth mentioning!

Post by Ajedrecista »

Hello Teodoriu:
FriedmannC wrote:And where can I download this Stockfish version? :)
There are tons of development versions here. They are autocompiles (someone wrote a Python script to do it automatically).

Other links you may enjoy regarding SF development:

Stockfish Testing Framework.

Stockfish foreign engines regression test.

SF is improving very fast. :) Have a nice day!

Regards from Spain.

Ajedrecista.
User avatar
Ajedrecista
Posts: 2177
Joined: Wed Jul 13, 2011 9:04 pm
Location: Madrid, Spain.

And the very latest Houdini 3 vs. Stockfish 290613 UPDATE!

Post by Ajedrecista »

Hello Don:
Don wrote:Try 1000 games or more. Houdini is the strongest program still - I would like to believe that the ranking is Komodo, followed by Stockfish followed by Houdini but I have seen nothing to indicate that Houdini 3 is not still on top.

The truth of the matter is that both Komodo and Stockfish are strong enough to win short matches against each other or Houdini 3. I cannot prove this but I estimate that any short match win against Houdini is far more likely to get reported than a lost match and a 100 game match carries something on the order of 60 ELO of error margin. Beating Houdini 3 in a 100 game match is VERY possible for Stockfish if it has improved by 35 ELO. On the CCRL 40/40 list Stockfish 3 is 72 ELO behind Houdini 3 on 4 cores. If it has improved 35 ELO it is still about 37 ELO behind - plenty strong enough to be able to win 100 games matches with non-trivial probability. Even a 1000 games match has something like a 15 point error margin.
I did the calculation of probabilities using a trinomial distribution with my own Fortran tool. I calculated two cases: 100 games and 1000 games. I used your reported advantage of 37 Elo of Houdini 3 over current SF development (in fact, I think that +35 Elo gain between SF versions is inflated talking about 40/40 TC (the improvement is not inflated in bullet TC)... anyway, I blindly trusted you ;)). I randomly choosed a probability of draw of 50% for a single game:

Code: Select all

Probabilities_in_a_trinomial_distribution, ® 2013.

--------------------------------------------------------------------
Probabilities of all possible scores in a match between two engines.
--------------------------------------------------------------------

Write down the number of games of the match (from 2 up to 50000):

100

Write down the engines rating difference (between -800 Elo and 800 Elo).
Elo(first player) - Elo(second player):

37

Write down the probability of a draw (%) between 0.0001 % and 89.3905 %

50

Write down the clock rate of the CPU (in GHz), only for timing the elapsed time of the calculations:

3

End of the calculations.
Approximated time spent in calculations:      3 ms.

The results will be saved into two different Notepads, at the same path of this
programme.

The results have been successfully saved into two files:

     Probabilities.txt
     Summary_of_probabilities.txt

Approximated total elapsed time:    971 ms.

Thanks for using Probabilities_in_a_trinomial_distribution. Press Enter to exit.

Code: Select all

Probabilities for a match of   100 games (rounded up to 0.0001 %):
 
Rating difference (rounded up to 0.01 Elo):   37.00 Elo.
 
Probability of a win  = W ~ 30.3047 %
Probability of a draw = D ~ 50.0000 %
Probability of a lose = L ~ 19.6953 %

[...]

                           SUMMARY:

 Probability that the first player wins the match ~  92.5639 %
                      Probability of a tied match ~   1.8035 %
Probability that the second player wins the match ~   5.6326 %
 
--------------------------------------------------------------
 
 Prob.(first player wins) + 0.5*Prob.(tied match) ~  93.4656 %
Prob.(second player wins) + 0.5*Prob.(tied match) ~   6.5344 %
I input the data from Houdini POV, so if Elo difference and the probability of draw of a single game are accurate enough, SF should have arond 5% or 6% of probabilities of win a 100-game match. Going to a 1000-game match:

Code: Select all

Probabilities_in_a_trinomial_distribution, ® 2013.

--------------------------------------------------------------------
Probabilities of all possible scores in a match between two engines.
--------------------------------------------------------------------

Write down the number of games of the match (from 2 up to 50000):

1000

Write down the engines rating difference (between -800 Elo and 800 Elo).
Elo(first player) - Elo(second player):

37

Write down the probability of a draw (%) between 0.0001 % and 89.3905 %

50

Write down the clock rate of the CPU (in GHz), only for timing the elapsed time of the calculations:

3

End of the calculations.
Approximated time spent in calculations:    172 ms.

The results will be saved into a Notepad, at the same path of this programme.

The results have been successfully saved into this file:

     Summary_of_probabilities.txt

Approximated total elapsed time:    419 ms.

Thanks for using Probabilities_in_a_trinomial_distribution. Press Enter to exit.

Code: Select all

Probabilities for a match of  1000 games (rounded up to 0.0001 %):
 
Rating difference (rounded up to 0.01 Elo):   37.00 Elo.
 
Probability of a win  = W ~ 30.3047 %
Probability of a draw = D ~ 50.0000 %
Probability of a lose = L ~ 19.6953 %

[...]

                           SUMMARY:
 
 Probability that the first player wins the match ~  99.9999 %
                      Probability of a tied match ~   0.0000 %
Probability that the second player wins the match ~   0.0001 %
 
--------------------------------------------------------------
 
 Prob.(first player wins) + 0.5*Prob.(tied match) ~  99.9999 %
Prob.(second player wins) + 0.5*Prob.(tied match) ~   0.0001 %
The ~ 0.0001% probabilities for SF mostly (but not exclusively) come from 499 - 501 and 499.5 - 500.5 scores. These scores are in the limit of the granularity of my output.

Of course, more games help the stronger side (smaller error bars). I hope that this info with concrete numbers is useful.

Regards from Spain.

Ajedrecista.