New Houdini

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

tpoppins
Posts: 919
Joined: Tue Nov 24, 2015 9:11 pm
Location: upstate

Re: re: win adjudication/Re: New Houdini

Post by tpoppins »

Lyudmil Tsvetkov wrote:Poor Drawfish just in 3rd, way behind...

One of the reasons I would sometimes hate using SF as an analysis engine is that it sees too much draws, even in completely unbalanced positions.
I can confirm this after months of seeing hundreds of random (as opposed to those taken from the same game) positions analyzed on Let's Check. Since it lists three lines for every position you often get to see a second opinion (or two) by other engines, and Stockfish's 0.00 sometimes sticks out like a sore thumb.

It's like they hired Anish Giri to hand-tune SF's eval. ;)
Nay Lin Tun
Posts: 708
Joined: Mon Jan 16, 2012 6:34 am

Re: re: win adjudication/Re: New Houdini

Post by Nay Lin Tun »

This Houdini scored 90 percent, 48/53 so far. So,Houdini is nearly 400 elo above the average elo.Impressive...
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: re: win adjudication/Re: New Houdini

Post by Laskos »

After 54-55 rounds out of 62 in the TCEC Rapidi:

Code: Select all

   # PLAYER                  : RATING  ERROR    POINTS  PLAYED     (%)   CFS(next)
   1 Houdini 200716          : 3341.7  121.5      50.0      55    90.9      88    
   2 Stockfish 030916        : 3247.0  104.6      47.0      55    85.5      53    
   3 Komodo 1692.19          : 3241.6  106.0      47.5      55    86.4      94    
   4 Fire 5                  : 3127.7   95.0      42.0      54    77.8      71    
   5 Jonny 8                 : 3094.0   89.8      40.0      54    74.1      52    
   6 Ginkgo 1.9h             : 3091.3   85.5      41.5      55    75.5      76    
   7 Gull 3                  : 3048.6   87.1      38.0      54    70.4      68    
   8 Rybka 4.1               : 3021.0   80.1      37.5      55    68.2      53    
   9 Andscacs 0.872b         : 3016.4   84.3      36.5      54    67.6      54    
  10 Nirvana 010916          : 3010.2   85.3      37.0      55    67.3      55    
  11 Chiron 030916           : 3002.3   82.4      35.5      54    65.7      59    
  12 Protector 1.9           : 2989.6   82.1      35.5      55    64.5      71    
  13 Texel 1.07a6            : 2958.1   83.2      34.0      54    63.0      70    
  14 Naum 4.6                : 2928.5   79.4      31.0      55    56.4      53    
  15 Hannibal 1.7            : 2925.0   78.7      32.0      55    58.2      52    
  16 Fizbo 1.8               : 2922.8   80.5      32.5      55    59.1      53    
  17 Critter 1.6a            : 2918.5   81.8      31.5      54    58.3      90    
  18 Bobcat 070916           : 2845.8   81.0      26.0      55    47.3      80    
  19 Raptor 2.3              : 2797.4   82.2      26.5      55    48.2      79    
  20 Vajolet2 2.2.15         : 2751.6   83.4      21.0      55    38.2      75    
  21 Fruit 070916            : 2710.5   87.2      20.5      55    37.3      56    
  22 Laser 280816            : 2700.8   87.7      20.5      55    37.3      66    
  23 Arasan 19.1             : 2675.1   91.6      17.5      55    31.8      59    
  24 Gaviota 1.01            : 2659.8   87.9      18.5      55    33.6      60    
  25 The Baron 3.40b         : 2643.7   91.4      18.5      54    34.3      64    
  26 DisasterArea 1.63       : 2619.7   97.4      16.0      55    29.1      65    
  27 Hakkapeliitta 210416    : 2592.0   96.1      16.5      55    30.0     100    
  28 Jellyfish 1.1           : 2354.1  131.5       9.5      55    17.3      91    
  29 Myrddin 0.87            : 2226.1  158.3       6.0      55    10.9      85    
  30 Delphil 3.3b2           : 2106.3  193.7       4.0      55     7.3      55    
  31 Firefly 2.7.0           : 2091.2  194.0       4.0      55     7.3      87    
  32 Fridolin 2              : 1941.5  238.0       2.0      55     3.6     ---    

White advantage = 32.74 +/- 11.28
Draw rate (equal opponents) = 54.94 % +/- 3.17
Houdini performs about 100 ELO points above SF and Komodo, SF a bit above K, although it has less points. Houdini is 88% likely to be stronger than SF in the Rapdis. Not that high confidence, but I predict that Superfinal won't be a walkover of SF over H.
shrapnel
Posts: 1339
Joined: Fri Nov 02, 2012 9:43 am
Location: New Delhi, India

Re: re: win adjudication/Re: New Houdini

Post by shrapnel »

Yup ! Houdini will win. Stockfish has no chance against the Magician !
GO HOUDINI !!!
i7 5960X @ 4.1 Ghz, 64 GB G.Skill RipJaws RAM, Twin Asus ROG Strix OC 11 GB Geforce 2080 Tis
APassionForCriminalJustic
Posts: 417
Joined: Sat May 24, 2014 9:16 am

Re: re: win adjudication/Re: New Houdini

Post by APassionForCriminalJustic »

Laskos wrote:After 54-55 rounds out of 62 in the TCEC Rapidi:

Code: Select all

   # PLAYER                  : RATING  ERROR    POINTS  PLAYED     (%)   CFS(next)
   1 Houdini 200716          : 3341.7  121.5      50.0      55    90.9      88    
   2 Stockfish 030916        : 3247.0  104.6      47.0      55    85.5      53    
   3 Komodo 1692.19          : 3241.6  106.0      47.5      55    86.4      94    
   4 Fire 5                  : 3127.7   95.0      42.0      54    77.8      71    
   5 Jonny 8                 : 3094.0   89.8      40.0      54    74.1      52    
   6 Ginkgo 1.9h             : 3091.3   85.5      41.5      55    75.5      76    
   7 Gull 3                  : 3048.6   87.1      38.0      54    70.4      68    
   8 Rybka 4.1               : 3021.0   80.1      37.5      55    68.2      53    
   9 Andscacs 0.872b         : 3016.4   84.3      36.5      54    67.6      54    
  10 Nirvana 010916          : 3010.2   85.3      37.0      55    67.3      55    
  11 Chiron 030916           : 3002.3   82.4      35.5      54    65.7      59    
  12 Protector 1.9           : 2989.6   82.1      35.5      55    64.5      71    
  13 Texel 1.07a6            : 2958.1   83.2      34.0      54    63.0      70    
  14 Naum 4.6                : 2928.5   79.4      31.0      55    56.4      53    
  15 Hannibal 1.7            : 2925.0   78.7      32.0      55    58.2      52    
  16 Fizbo 1.8               : 2922.8   80.5      32.5      55    59.1      53    
  17 Critter 1.6a            : 2918.5   81.8      31.5      54    58.3      90    
  18 Bobcat 070916           : 2845.8   81.0      26.0      55    47.3      80    
  19 Raptor 2.3              : 2797.4   82.2      26.5      55    48.2      79    
  20 Vajolet2 2.2.15         : 2751.6   83.4      21.0      55    38.2      75    
  21 Fruit 070916            : 2710.5   87.2      20.5      55    37.3      56    
  22 Laser 280816            : 2700.8   87.7      20.5      55    37.3      66    
  23 Arasan 19.1             : 2675.1   91.6      17.5      55    31.8      59    
  24 Gaviota 1.01            : 2659.8   87.9      18.5      55    33.6      60    
  25 The Baron 3.40b         : 2643.7   91.4      18.5      54    34.3      64    
  26 DisasterArea 1.63       : 2619.7   97.4      16.0      55    29.1      65    
  27 Hakkapeliitta 210416    : 2592.0   96.1      16.5      55    30.0     100    
  28 Jellyfish 1.1           : 2354.1  131.5       9.5      55    17.3      91    
  29 Myrddin 0.87            : 2226.1  158.3       6.0      55    10.9      85    
  30 Delphil 3.3b2           : 2106.3  193.7       4.0      55     7.3      55    
  31 Firefly 2.7.0           : 2091.2  194.0       4.0      55     7.3      87    
  32 Fridolin 2              : 1941.5  238.0       2.0      55     3.6     ---    

White advantage = 32.74 +/- 11.28
Draw rate (equal opponents) = 54.94 % +/- 3.17
Houdini performs about 100 ELO points above SF and Komodo, SF a bit above K, although it has less points. Houdini is 88% likely to be stronger than SF in the Rapdis. Not that high confidence, but I predict that Superfinal won't be a walkover of SF over H.
I'm not sure what you are trying to prove here. Rapids means absolutely nothing. The top engines have barely played each other. Hence what exactly can you possibly extract from Rapids? Did you watch Stage Three where Stockfish dominated? Once again Stockfish has always had a higher draw rate - and Houdini has always done well versus more inferior opposition. Rapids is nothing more than "who can pop the most amount bubbles in the air". The shortest match in the Rapids thus far actually comes from Stockfish. Does that account for anything? I guess not.

So your prediction is that it will not be a "walk over"? Give a result out of 100 games then. Mine is what I have stated all along; the superfinal will be a comfortable win for Stockfish at, let us say, a score of 20-5-75. Have you already forgotten about the tournament where Stockfish effectively gutted Komodo 10.1? Here is the link: http://talkchess.com/forum/viewtopic.php?t=61484

This means that EVEN if Houdini development somehow reached Komodo 10.1s level (unlikely but who really knows) that would not be even close to enough to make the superfinal realistically competitive.
APassionForCriminalJustic
Posts: 417
Joined: Sat May 24, 2014 9:16 am

Re: re: win adjudication/Re: New Houdini

Post by APassionForCriminalJustic »

shrapnel wrote:Yup ! Houdini will win. Stockfish has no chance against the Magician !
GO HOUDINI !!!
Anil, you are going to be one very depressed individual 50 games into the final. Stockfish is a much stronger engine. There is absolutely no question about that. Guess who did not lose a single match in Stage Three? And that DOES mean something. Contrary to some public opinion(s), Stockfish gains significant strength at VLTC.
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: re: win adjudication/Re: New Houdini

Post by Laskos »

APassionForCriminalJustic wrote:
Laskos wrote:After 54-55 rounds out of 62 in the TCEC Rapidi:

Code: Select all

   # PLAYER                  : RATING  ERROR    POINTS  PLAYED     (%)   CFS(next)
   1 Houdini 200716          : 3341.7  121.5      50.0      55    90.9      88    
   2 Stockfish 030916        : 3247.0  104.6      47.0      55    85.5      53    
   3 Komodo 1692.19          : 3241.6  106.0      47.5      55    86.4      94    
   4 Fire 5                  : 3127.7   95.0      42.0      54    77.8      71    
   5 Jonny 8                 : 3094.0   89.8      40.0      54    74.1      52    
   6 Ginkgo 1.9h             : 3091.3   85.5      41.5      55    75.5      76    
   7 Gull 3                  : 3048.6   87.1      38.0      54    70.4      68    
   8 Rybka 4.1               : 3021.0   80.1      37.5      55    68.2      53    
   9 Andscacs 0.872b         : 3016.4   84.3      36.5      54    67.6      54    
  10 Nirvana 010916          : 3010.2   85.3      37.0      55    67.3      55    
  11 Chiron 030916           : 3002.3   82.4      35.5      54    65.7      59    
  12 Protector 1.9           : 2989.6   82.1      35.5      55    64.5      71    
  13 Texel 1.07a6            : 2958.1   83.2      34.0      54    63.0      70    
  14 Naum 4.6                : 2928.5   79.4      31.0      55    56.4      53    
  15 Hannibal 1.7            : 2925.0   78.7      32.0      55    58.2      52    
  16 Fizbo 1.8               : 2922.8   80.5      32.5      55    59.1      53    
  17 Critter 1.6a            : 2918.5   81.8      31.5      54    58.3      90    
  18 Bobcat 070916           : 2845.8   81.0      26.0      55    47.3      80    
  19 Raptor 2.3              : 2797.4   82.2      26.5      55    48.2      79    
  20 Vajolet2 2.2.15         : 2751.6   83.4      21.0      55    38.2      75    
  21 Fruit 070916            : 2710.5   87.2      20.5      55    37.3      56    
  22 Laser 280816            : 2700.8   87.7      20.5      55    37.3      66    
  23 Arasan 19.1             : 2675.1   91.6      17.5      55    31.8      59    
  24 Gaviota 1.01            : 2659.8   87.9      18.5      55    33.6      60    
  25 The Baron 3.40b         : 2643.7   91.4      18.5      54    34.3      64    
  26 DisasterArea 1.63       : 2619.7   97.4      16.0      55    29.1      65    
  27 Hakkapeliitta 210416    : 2592.0   96.1      16.5      55    30.0     100    
  28 Jellyfish 1.1           : 2354.1  131.5       9.5      55    17.3      91    
  29 Myrddin 0.87            : 2226.1  158.3       6.0      55    10.9      85    
  30 Delphil 3.3b2           : 2106.3  193.7       4.0      55     7.3      55    
  31 Firefly 2.7.0           : 2091.2  194.0       4.0      55     7.3      87    
  32 Fridolin 2              : 1941.5  238.0       2.0      55     3.6     ---    

White advantage = 32.74 +/- 11.28
Draw rate (equal opponents) = 54.94 % +/- 3.17
Houdini performs about 100 ELO points above SF and Komodo, SF a bit above K, although it has less points. Houdini is 88% likely to be stronger than SF in the Rapdis. Not that high confidence, but I predict that Superfinal won't be a walkover of SF over H.
I'm not sure what you are trying to prove here. Rapids means absolutely nothing. The top engines have barely played each other. Hence what exactly can you possibly extract from Rapids? Did you watch Stage Three where Stockfish dominated? Once again Stockfish has always had a higher draw rate - and Houdini has always done well versus more inferior opposition. Rapids is nothing more than "who can pop the most amount bubbles in the air". The shortest match in the Rapids thus far actually comes from Stockfish. Does that account for anything? I guess not.

So your prediction is that it will not be a "walk over"? Give a result out of 100 games then. Mine is what I have stated all along; the superfinal will be a comfortable win for Stockfish at, let us say, a score of 20-5-75. Have you already forgotten about the tournament where Stockfish effectively gutted Komodo 10.1? Here is the link: http://talkchess.com/forum/viewtopic.php?t=61484

This means that EVEN if Houdini development somehow reached Komodo 10.1s level (unlikely but who really knows) that would not be even close to enough to make the superfinal realistically competitive.
I already posted earlier in this thread for you the Ordo ratings in Stage 3:

Code: Select all

   # PLAYER              : RATING  ERROR    POINTS  PLAYED     (%)   CFS(next) 
   1 Stockfish 160716    : 3332.9   47.8      39.0      56    69.6      89    
   2 Houdini 200716      : 3291.0   43.1      36.5      57    64.0      75    
   3 Komodo 10.1         : 3269.2   41.5      34.5      57    60.5      99    
   4 Fire 5              : 3199.3   37.6      28.5      56    50.9      98    
   5 Andscacs 0.872b     : 3137.9   40.0      23.5      56    42.0      72    
   6 Jonny 8             : 3119.1   41.4      22.0      56    39.3      73    
   7 Gull 3              : 3100.0   42.4      20.5      56    36.6      50    
   8 Rybka 4.1           : 3100.0   45.6      20.5      56    36.6     ---    

White advantage = 85.92 +/- 13.10 
Draw rate (equal opponents) = 92.63 % +/- 4.81 
Stockfish and Houdini are basically withing 2SD error margins here. Corroborated with Rapid, one can speculate that 50 games are not enough to separate Stockfish and Houdini. That's why I suspect the Superfinal won't be too skewed, although in 100 games it could happen that one scores significantly higher than the other.
Milos
Posts: 4190
Joined: Wed Nov 25, 2009 1:47 am

Re: re: win adjudication/Re: New Houdini

Post by Milos »

Laskos wrote:I already posted earlier in this thread for you the Ordo ratings in Stage 3:

Code: Select all

   # PLAYER              : RATING  ERROR    POINTS  PLAYED     (%)   CFS(next) 
   1 Stockfish 160716    : 3332.9   47.8      39.0      56    69.6      89    
   2 Houdini 200716      : 3291.0   43.1      36.5      57    64.0      75    
   3 Komodo 10.1         : 3269.2   41.5      34.5      57    60.5      99    
   4 Fire 5              : 3199.3   37.6      28.5      56    50.9      98    
   5 Andscacs 0.872b     : 3137.9   40.0      23.5      56    42.0      72    
   6 Jonny 8             : 3119.1   41.4      22.0      56    39.3      73    
   7 Gull 3              : 3100.0   42.4      20.5      56    36.6      50    
   8 Rybka 4.1           : 3100.0   45.6      20.5      56    36.6     ---    

White advantage = 85.92 +/- 13.10 
Draw rate (equal opponents) = 92.63 % +/- 4.81 
Stockfish and Houdini are basically withing 2SD error margins here. Corroborated with Rapid, one can speculate that 50 games are not enough to separate Stockfish and Houdini. That's why I suspect the Superfinal won't be too skewed, although in 100 games it could happen that one scores significantly higher than the other.
Trying to reason with extreme kind of fanboy is nothing but a total waste of time. No amount of statistics or scientific data would change their mind.
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: re: win adjudication/Re: New Houdini

Post by Laskos »

Still 2 games to play in round 58/62 TCEC Rapid, the Ordo logistic standings:

Code: Select all

   # PLAYER                  : RATING  ERROR    POINTS  PLAYED     (%)   CFS(next)

   1 Houdini 200716          : 3341.8  117.4      52.5      58    90.5      90    
   2 Komodo 1692.19          : 3244.4  102.1      49.5      58    85.3      55    
   3 Stockfish 030916        : 3234.8  100.9      49.0      58    84.5      93    
   4 Fire 5                  : 3135.6   92.6      46.0      58    79.3      76    
   5 Jonny 8                 : 3091.7   85.2      43.0      58    74.1      64    
   6 Ginkgo 1.9h             : 3069.7   84.2      42.0      58    72.4      75    
   7 Gull 3                  : 3030.3   82.8      40.0      58    69.0      60    
   8 Andscacs 0.872b         : 3015.0   82.8      39.0      58    67.2      54    
   9 Rybka 4.1               : 3009.7   76.0      39.0      58    67.2      60    
  10 Nirvana 010916          : 2996.3   79.1      38.0      58    65.5      63    
  11 Protector 1.9           : 2977.6   78.7      37.5      58    64.7      56    
  12 Chiron 030916           : 2969.9   78.3      36.5      58    62.9      64    
  13 Texel 1.07a6            : 2950.7   76.9      36.0      58    62.1      62    
  14 Naum 4.6                : 2934.3   76.3      33.5      58    57.8      53    
  15 Critter 1.6a            : 2930.2   77.0      34.5      58    59.5      57    
  16 Hannibal 1.7            : 2920.1   79.1      34.5      58    59.5      65    
  17 Fizbo 1.8               : 2900.0   76.0      33.5      58    57.8      89    
  18 Bobcat 070916           : 2832.4   78.4      28.0      58    48.3      79    
  19 Raptor 2.3              : 2788.3   77.9      27.5      58    47.4      80    
  20 Vajolet2 2.2.15         : 2739.8   83.2      22.5      57    39.5      76    
  21 Fruit 070916            : 2698.5   83.6      22.0      57    38.6      53    
  22 Laser 280816            : 2693.9   86.3      22.0      58    37.9      57    
  23 Arasan 19.1             : 2683.2   86.9      20.0      58    34.5      73    
  24 Gaviota 1.01            : 2647.7   81.2      19.0      58    32.8      64    
  25 The Baron 3.40b         : 2626.0   87.1      19.5      58    33.6      51    
  26 DisasterArea 1.63       : 2625.0   91.1      17.5      57    30.7      74    
  27 Hakkapeliitta 210416    : 2585.0   90.3      17.5      58    30.2     100    
  28 Jellyfish 1.1           : 2340.7  123.1       9.5      58    16.4      95    
  29 Myrddin 0.87            : 2180.9  156.4       6.0      58    10.3      69    
  30 Delphil 3.3b2           : 2130.2  170.0       5.0      58     8.6      69    
  31 Firefly 2.7.0           : 2075.6  182.8       4.0      57     7.0      88    
  32 Fridolin 2              : 1928.6  232.0       2.0      58     3.4     ---    

White advantage = 35.28 +/- 11.48
Draw rate (equal opponents) = 55.17 % +/- 2.99
Houdini is 100 ELO points above Komodo and Stockfish. Also, confidence of superiority of 90% of Houdini over Komodo (and higher over Stockfish) in this Rapid.

The most striking peculiarity is that Houdini keeps more Queens on the board till the Win adjudication than Stockfish and Komodo. Average number of Queens at the Win adjudication:

Houdini: 1.06
Stockfish: 0.70
Komodo: 0.68

97.5% or 2 standard deviations that it's not a statistical fluke.
clumma
Posts: 186
Joined: Fri Oct 10, 2014 10:05 pm
Location: Berkeley, CA

Re: re: win adjudication/Re: New Houdini

Post by clumma »

Amazing. How can someone leave the field for years and come back with this kind of performance straight away?