3Champs reloaded

Discussion of computer chess matches and engine tournaments.

Moderator: Ras

User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: 3Champs reloaded

Post by Laskos »

Milos wrote:
Martin Thoresen wrote:You fail to see my point: these two factors were the reason I believed contempt was disabled. Has nothing to do with contempt still being part of the Houdini code or not.
You based your belief on
1) irrelevant fact
2) thrust in word of someone whom I personally take for a liar
I understand what you believe, I just don't share your believes and have arguments why.
But that is not the problem. The problem is that you present your believes as some kind of universal truth. It seams to me that you think because you are some kind of VIP thanks to running TCEC tournament your arguments hold better. I don't subscribe to this point of view.
It seems RH presented a crippled version to TCEC. Then he is such a liar that he won't divulge this his secret. He intentionally lost whereas he could comfortably win. Apollo landing was a fake.
Martin Thoresen
Posts: 1833
Joined: Thu Jun 22, 2006 12:07 am

Re: 3Champs reloaded

Post by Martin Thoresen »

Milos wrote: I understand what you believe, I just don't share your believes and have arguments why.
Fair enough.
Milos wrote: But that is not the problem. The problem is that you present your believes as some kind of universal truth. It seams to me that you think because you are some kind of VIP thanks to running TCEC tournament your arguments hold better. I don't subscribe to this point of view.
Interesting, because for me there was never a problem anywhere in this discussion. Apparently though, you created a problem for yourself by thinking that I think I am "some kind of VIP".
That you don't believe in that point of view is totally irrelevant to me since that thought had yet to enter my mind before you brought it up just now.

And oh, a small notification: your thought is erroneous, I do not think I am "some kind of VIP". You do not need to thank me for solving your problem - let's just say you owe me one.
Milos
Posts: 4190
Joined: Wed Nov 25, 2009 1:47 am

Re: 3Champs reloaded

Post by Milos »

Laskos wrote:It seems RH presented a crippled version to TCEC. Then he is such a liar that he won't divulge this his secret. He intentionally lost whereas he could comfortably win. Apollo landing was a fake.
Nice try, I mean those "smart" hyperbolas of apples and oranges, funny you forgot to mention JFK murder, Vatican and Mason domination and NSA world surveillance myth. :lol: :lol: :lol:
This doesn't move you an inch closer to the most probable explanation of the topic discussed, but still makes discussion interesting ;).

Most probable explanation is that RH presented a version with medium contempt (his regular contempt is 20cp that works best with rating lists) thinking he could pass Stage 4 the best with it since opponents were mixed or he was too confident (high personal contempt :)) that H4 is still enough above.
Well in the end it was a screw up but mainly because you can't impact coin toss probability by blowing air during tossing ;), i.e. the other (SF and K) were too close.
Back to the topic, thanks to contempt you can gain, but also lose 20Elo, but in the format such as TCEC can hardly have much impact.
However, with higher number of games (or bunch of tests that are presented in talkchess) you start noticing that difference.
ouachita
Posts: 454
Joined: Tue Jan 15, 2013 4:33 pm
Location: Ritz-Carlton, NYC
Full name: Bobby Johnson

Re: 3Champs reloaded

Post by ouachita »

Milos wrote:You know that correlation is a measurable quantity?
Please show me here how you would demonstrate or where anyone has demonstrated any scientific statistical relationship between these two sets of random results/data: 1+1, 90+2, or any data even remotely comparable. You are free to use any statistical or mathematical expression you chose.

And I invite RH, LK and MC to do the same.
SIM, PhD, MBA, PE
Vinvin
Posts: 5296
Joined: Thu Mar 09, 2006 9:40 am
Full name: Vincent Lejeune

Re: 3Champs reloaded

Post by Vinvin »

Automatic report with SCID

Code: Select all

...
3. Result Trends

3.1 Result lengths and frequencies

                 Score      Game length             Frequency       
                          1-0    =-=    0-1    1-0     =-=     0-1  
 Report games    57.0%     72     76     78   24.6%   64.6%   10.6% 
 All games       57.0%     73     77     78   24.6%   64.6%   10.6% 

3.2 Shortest wins (White)

  1:  1-0(33) Stockfish 241113 64 SSE4.2 - Houdini 4 Pro x64 x12, ? 2013 [46]
  2:  1-0(34) Komodo 1142.00 64-bit x12-2 - Houdini 4 Pro x64 x12, ? 2013 [20]
  3:  1-0(38) Stockfish 241113 64 SSE4.2 - Houdini 4 Pro x64 x12, ? 2013 [43]
  4:  1-0(43) Komodo 1142.00 64-bit x12-2 - Stockfish 241113 64 SSE4.2, ? 2013 [3]
  5:  1-0(49) Komodo 1142.00 64-bit x12-2 - Houdini 4 Pro x64 x12, ? 2013 [13]

3.3 Shortest wins (Black)

  1:  0-1(47) Houdini 4 Pro x64 x12 - Komodo 1142.00 64-bit x12-2, ? 2013 [50]
  2:  0-1(48) Stockfish 241113 64 SSE4.2 - Houdini 4 Pro x64 x12, ? 2013 [59]
  3:  0-1(50) Stockfish 241113 64 SSE4.2 - Komodo 1142.00 64-bit x12-2, ? 2013 [54]
  4:  0-1(50) Komodo 1142.00 64-bit x12-2 - Stockfish 241113 64 SSE4.2, ? 2013 [37]
  5:  0-1(50) Houdini 4 Pro x64 x12 - Stockfish 241113 64 SSE4.2, ? 2013 [34]

4. Moves and Themes

4.1 Move orders reaching the report position

There was only one move order reaching this position:
  1:   (150)

4.2 Moves from the report position

    Move      Frequency    Score  AvElo Perf AvYear Draw ECO
 1: e4         64: 42.6%   53.1%              2013  63% B00a 
 2: d4         40: 26.6%   53.7%              2013  68% A40a 
 3: Nf3        24: 16.0%   66.6%              2013  67% A04  
 4: c4         22: 14.6%   63.6%              2013  64% A10  
__________________________________________________________________
TOTAL:        150:100.0%   57.0%              2013  65%

4.3 Positional Themes

Frequency of themes in the first 20 moves of each game:
   Same-side castling:             77%    White Isolated Queen Pawn:       7%
   Opposite castling:               7%    Black Isolated Queen Pawn:       9%
   Kingside pawn storm:            11%    White Pawn on 5/6/7th rank:     49%
   Queens exchanged:               40%    Black Pawn on 2/3/4th rank:     29%
   Only one side has Bishop pair:   4%    Open c/d/e file:                46%

4.4 Endgames

Material at the end of each game:
                   P     BN      R   R,BN      Q   Q,BN    Q,R Q,R,BN
 Report games     3%    26%    18%    27%     3%     5%     3%    15%
 All games        3%    26%    18%    27%     3%     5%     3%    15%

5. Theory Table

-------------------------------------------------------------------------------
  +37 =97 -16 (85.5/150: 57%)
-------------------------------------------------------------------------------
     1        2        3        4        5        6        7        8     
-------------------------------------------------------------------------------
 1  c4       Nf3[2]   g3       Nc3      d4       Bg2      dc5      Qb3      17:
    Nf6[1]   g6[3]    Bg7      OO       d6       c5       dc5      Nc6[4]   65%

 2  ...      g3[5]    Nc3[7]   Bg2      Nf3      d4       OO       d5        5:
    e5       d6[6]    Nc6      Nf6      Be7      OO       h6       Nb8[8]   60%

 3  Nf3      g3[10]   c4[11]   Bg2      OO       Qb3      h3       d4       10:
    d5[9]    c6       Bg4      e6       Nf6      Qb6      Bh5      Be7[12]  65%

 4  ...      d4[13]   e3[15]   c3       Bd3      OO       Ne5      de5      14:
    Nf6      d5[14]   c5       e6       Nc6      b6       Ne5      Nd7[16]  68%

 5  d4       c4[18]   f3[20]   cd5      e4       Nc3      Be3      Qd2       8:
    g6[17]   Nf6[19]  d5       Nd5      Nb6      Bg7      OO       Nc6[21]  56%

 6  ...      c4[22]   Nf3[24]  Nc3[25]  e3       Qc2      b3       Bd3      14:
    d5       c6[23]   Nf6      e6[26]   Nbd7     Be7      OO       b6[27]   50%

 7  ...      c4[28]   Nc3[30]  Bg5[32]  Bf4      bc3      e3       Nf3      18:
    Nf6      g6[29]   d5[31]   Ne4      Nc3      Bg7      OO       c6[33]   56%

 8  e4       d4       Nc3[35]  Ne4      Ng3      Nf3      h4       Ne5      13:
    c6[34]   d5       de4      Bf5      Bg6      e6[36]   h6       Bh7[37]  46%

 9  ...      ...      Nd2      Ne4      Nf6      c3       Nf3      Be3       7:
    ...      ...      de4      Nf6[38]  gf6[39]  e5[40]   Qe7      Nd7[41]  64%

10  ...      Nf3      Bb5[43]  Ba4      OO       d4[46]   Bb3      de5      18:
    e5       Nc6[42]  a6[44]   Nf6[45]  Ne4      b5       d5       Be6[47]  67%

11  ...      Nf3[48]  d4[49]   Nd4      c4[51]   Nc3      Be2      Qd4      10:
    c5       Nc6      cd4      g6[50]   Nf6      d6       Nd4      Bg7[52]  45%

12  ...      ...      d4       Nd4      Nc3      Be2[54]  Nb3      f4        8:
    ...      d6       cd4      Nf6      a6[53]   e5[55]   Be6[56]  Be7[57]  38%

13  ...      ...      d4       Nd4      Nc3      a3[60]   Be2      OO        8:
    ...      e6       cd4      Nf6[58]  Nc6[59]  Be7      OO       d5[61]   50%

-------------------------------------------------------------------------------
PaulieD
Posts: 239
Joined: Tue Jun 25, 2013 8:19 pm

Re: 3Champs reloaded

Post by PaulieD »

No matter how long the time control, 100 games is insignificant statistically.

It's the same as 100 1 minute lightning matches statistically, it is just longer.

It is part of all the data that gets gathered to ultimately determine who is the best. This will take some time.

To gather enough LTC matches for significance takes a very, very long time.
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: 3Champs reloaded

Post by Laskos »

PaulieD wrote:No matter how long the time control, 100 games is insignificant statistically.

It's the same as 100 1 minute lightning matches statistically, it is just longer.

It is part of all the data that gets gathered to ultimately determine who is the best. This will take some time.

To gather enough LTC matches for significance takes a very, very long time.
SF has a LOS of 92% against H4 in this 150 games tourney.
carldaman
Posts: 2287
Joined: Sat Jun 02, 2012 2:13 am

Re: 3Champs reloaded

Post by carldaman »

Laskos wrote:
PaulieD wrote:No matter how long the time control, 100 games is insignificant statistically.

It's the same as 100 1 minute lightning matches statistically, it is just longer.

It is part of all the data that gets gathered to ultimately determine who is the best. This will take some time.

To gather enough LTC matches for significance takes a very, very long time.
SF has a LOS of 92% against H4 in this 150 games tourney.
That's right -- even a small sample can be fairly significant if the measured rating difference is large enough.
lkaufman
Posts: 6256
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA
Full name: Larry Kaufman

Re: 3Champs reloaded

Post by lkaufman »

ouachita wrote:
Milos wrote:You know that correlation is a measurable quantity?
Please show me here how you would demonstrate or where anyone has demonstrated any scientific statistical relationship between these two sets of random results/data: 1+1, 90+2, or any data even remotely comparable. You are free to use any statistical or mathematical expression you chose.

And I invite RH, LK and MC to do the same.
There is obviously some correlation between blitz and slow ratings; even a casual glance at the rating lists shows this. If the only information you have is that engine A is stronger than engine B at blitz, then you should bet on engine A at slow chess.
However, results at the actual time limit of interest should be the ones that count provided that the difference between the slow and fast tests is statistically meaningful. So for example, if one looks at the relative ratings of Komodo TCEC and Houdini 4 on Acer's 90' + 30" rating list (600 game minimum) and the same difference on any of the blitz lists, the difference in their relative ratings is huge and surely significant beyond 99% (I leave it to the mathematicians to confirm this). So in this case the blitz ratings are almost totally irrelevant to 90 minute + increment strength.
Note that the above is totally independent from the contempt argument. If contempt were set to zero for both Komodo and Houdini Aser's tests, I would guess it would raise Houdini by ten points or so and Komodo by five points. Houdini 4 would still be well back in third place, and Komodo would be very close to Stockfish DD.
ouachita
Posts: 454
Joined: Tue Jan 15, 2013 4:33 pm
Location: Ritz-Carlton, NYC
Full name: Bobby Johnson

Re: 3Champs reloaded

Post by ouachita »

lkaufman wrote:in this case the blitz ratings are almost totally irrelevant to 90 minute + increment strength.
I believe that you are most likely correct, and I would hypothesize that you are most likely correct. Truth is, we do not and may never know with any measure of certainty the extent to which blitz (STC) test results are relevant to 90+ or 40/120+ or 40/240 or . . . 40/4000, etc. results. Perhaps guessing will have to suffice.
SIM, PhD, MBA, PE