It seems RH presented a crippled version to TCEC. Then he is such a liar that he won't divulge this his secret. He intentionally lost whereas he could comfortably win. Apollo landing was a fake.Milos wrote:You based your belief onMartin Thoresen wrote:You fail to see my point: these two factors were the reason I believed contempt was disabled. Has nothing to do with contempt still being part of the Houdini code or not.
1) irrelevant fact
2) thrust in word of someone whom I personally take for a liar
I understand what you believe, I just don't share your believes and have arguments why.
But that is not the problem. The problem is that you present your believes as some kind of universal truth. It seams to me that you think because you are some kind of VIP thanks to running TCEC tournament your arguments hold better. I don't subscribe to this point of view.
3Champs reloaded
Moderator: Ras
-
- Posts: 10948
- Joined: Wed Jul 26, 2006 10:21 pm
- Full name: Kai Laskos
Re: 3Champs reloaded
-
- Posts: 1833
- Joined: Thu Jun 22, 2006 12:07 am
Re: 3Champs reloaded
Fair enough.Milos wrote: I understand what you believe, I just don't share your believes and have arguments why.
Interesting, because for me there was never a problem anywhere in this discussion. Apparently though, you created a problem for yourself by thinking that I think I am "some kind of VIP".Milos wrote: But that is not the problem. The problem is that you present your believes as some kind of universal truth. It seams to me that you think because you are some kind of VIP thanks to running TCEC tournament your arguments hold better. I don't subscribe to this point of view.
That you don't believe in that point of view is totally irrelevant to me since that thought had yet to enter my mind before you brought it up just now.
And oh, a small notification: your thought is erroneous, I do not think I am "some kind of VIP". You do not need to thank me for solving your problem - let's just say you owe me one.
-
- Posts: 4190
- Joined: Wed Nov 25, 2009 1:47 am
Re: 3Champs reloaded
Nice try, I mean those "smart" hyperbolas of apples and oranges, funny you forgot to mention JFK murder, Vatican and Mason domination and NSA world surveillance myth.Laskos wrote:It seems RH presented a crippled version to TCEC. Then he is such a liar that he won't divulge this his secret. He intentionally lost whereas he could comfortably win. Apollo landing was a fake.



This doesn't move you an inch closer to the most probable explanation of the topic discussed, but still makes discussion interesting

Most probable explanation is that RH presented a version with medium contempt (his regular contempt is 20cp that works best with rating lists) thinking he could pass Stage 4 the best with it since opponents were mixed or he was too confident (high personal contempt

Well in the end it was a screw up but mainly because you can't impact coin toss probability by blowing air during tossing

Back to the topic, thanks to contempt you can gain, but also lose 20Elo, but in the format such as TCEC can hardly have much impact.
However, with higher number of games (or bunch of tests that are presented in talkchess) you start noticing that difference.
-
- Posts: 454
- Joined: Tue Jan 15, 2013 4:33 pm
- Location: Ritz-Carlton, NYC
- Full name: Bobby Johnson
Re: 3Champs reloaded
Please show me here how you would demonstrate or where anyone has demonstrated any scientific statistical relationship between these two sets of random results/data: 1+1, 90+2, or any data even remotely comparable. You are free to use any statistical or mathematical expression you chose.Milos wrote:You know that correlation is a measurable quantity?
And I invite RH, LK and MC to do the same.
SIM, PhD, MBA, PE
-
- Posts: 5296
- Joined: Thu Mar 09, 2006 9:40 am
- Full name: Vincent Lejeune
Re: 3Champs reloaded
Automatic report with SCID
Code: Select all
...
3. Result Trends
3.1 Result lengths and frequencies
Score Game length Frequency
1-0 =-= 0-1 1-0 =-= 0-1
Report games 57.0% 72 76 78 24.6% 64.6% 10.6%
All games 57.0% 73 77 78 24.6% 64.6% 10.6%
3.2 Shortest wins (White)
1: 1-0(33) Stockfish 241113 64 SSE4.2 - Houdini 4 Pro x64 x12, ? 2013 [46]
2: 1-0(34) Komodo 1142.00 64-bit x12-2 - Houdini 4 Pro x64 x12, ? 2013 [20]
3: 1-0(38) Stockfish 241113 64 SSE4.2 - Houdini 4 Pro x64 x12, ? 2013 [43]
4: 1-0(43) Komodo 1142.00 64-bit x12-2 - Stockfish 241113 64 SSE4.2, ? 2013 [3]
5: 1-0(49) Komodo 1142.00 64-bit x12-2 - Houdini 4 Pro x64 x12, ? 2013 [13]
3.3 Shortest wins (Black)
1: 0-1(47) Houdini 4 Pro x64 x12 - Komodo 1142.00 64-bit x12-2, ? 2013 [50]
2: 0-1(48) Stockfish 241113 64 SSE4.2 - Houdini 4 Pro x64 x12, ? 2013 [59]
3: 0-1(50) Stockfish 241113 64 SSE4.2 - Komodo 1142.00 64-bit x12-2, ? 2013 [54]
4: 0-1(50) Komodo 1142.00 64-bit x12-2 - Stockfish 241113 64 SSE4.2, ? 2013 [37]
5: 0-1(50) Houdini 4 Pro x64 x12 - Stockfish 241113 64 SSE4.2, ? 2013 [34]
4. Moves and Themes
4.1 Move orders reaching the report position
There was only one move order reaching this position:
1: (150)
4.2 Moves from the report position
Move Frequency Score AvElo Perf AvYear Draw ECO
1: e4 64: 42.6% 53.1% 2013 63% B00a
2: d4 40: 26.6% 53.7% 2013 68% A40a
3: Nf3 24: 16.0% 66.6% 2013 67% A04
4: c4 22: 14.6% 63.6% 2013 64% A10
__________________________________________________________________
TOTAL: 150:100.0% 57.0% 2013 65%
4.3 Positional Themes
Frequency of themes in the first 20 moves of each game:
Same-side castling: 77% White Isolated Queen Pawn: 7%
Opposite castling: 7% Black Isolated Queen Pawn: 9%
Kingside pawn storm: 11% White Pawn on 5/6/7th rank: 49%
Queens exchanged: 40% Black Pawn on 2/3/4th rank: 29%
Only one side has Bishop pair: 4% Open c/d/e file: 46%
4.4 Endgames
Material at the end of each game:
P BN R R,BN Q Q,BN Q,R Q,R,BN
Report games 3% 26% 18% 27% 3% 5% 3% 15%
All games 3% 26% 18% 27% 3% 5% 3% 15%
5. Theory Table
-------------------------------------------------------------------------------
+37 =97 -16 (85.5/150: 57%)
-------------------------------------------------------------------------------
1 2 3 4 5 6 7 8
-------------------------------------------------------------------------------
1 c4 Nf3[2] g3 Nc3 d4 Bg2 dc5 Qb3 17:
Nf6[1] g6[3] Bg7 OO d6 c5 dc5 Nc6[4] 65%
2 ... g3[5] Nc3[7] Bg2 Nf3 d4 OO d5 5:
e5 d6[6] Nc6 Nf6 Be7 OO h6 Nb8[8] 60%
3 Nf3 g3[10] c4[11] Bg2 OO Qb3 h3 d4 10:
d5[9] c6 Bg4 e6 Nf6 Qb6 Bh5 Be7[12] 65%
4 ... d4[13] e3[15] c3 Bd3 OO Ne5 de5 14:
Nf6 d5[14] c5 e6 Nc6 b6 Ne5 Nd7[16] 68%
5 d4 c4[18] f3[20] cd5 e4 Nc3 Be3 Qd2 8:
g6[17] Nf6[19] d5 Nd5 Nb6 Bg7 OO Nc6[21] 56%
6 ... c4[22] Nf3[24] Nc3[25] e3 Qc2 b3 Bd3 14:
d5 c6[23] Nf6 e6[26] Nbd7 Be7 OO b6[27] 50%
7 ... c4[28] Nc3[30] Bg5[32] Bf4 bc3 e3 Nf3 18:
Nf6 g6[29] d5[31] Ne4 Nc3 Bg7 OO c6[33] 56%
8 e4 d4 Nc3[35] Ne4 Ng3 Nf3 h4 Ne5 13:
c6[34] d5 de4 Bf5 Bg6 e6[36] h6 Bh7[37] 46%
9 ... ... Nd2 Ne4 Nf6 c3 Nf3 Be3 7:
... ... de4 Nf6[38] gf6[39] e5[40] Qe7 Nd7[41] 64%
10 ... Nf3 Bb5[43] Ba4 OO d4[46] Bb3 de5 18:
e5 Nc6[42] a6[44] Nf6[45] Ne4 b5 d5 Be6[47] 67%
11 ... Nf3[48] d4[49] Nd4 c4[51] Nc3 Be2 Qd4 10:
c5 Nc6 cd4 g6[50] Nf6 d6 Nd4 Bg7[52] 45%
12 ... ... d4 Nd4 Nc3 Be2[54] Nb3 f4 8:
... d6 cd4 Nf6 a6[53] e5[55] Be6[56] Be7[57] 38%
13 ... ... d4 Nd4 Nc3 a3[60] Be2 OO 8:
... e6 cd4 Nf6[58] Nc6[59] Be7 OO d5[61] 50%
-------------------------------------------------------------------------------
-
- Posts: 239
- Joined: Tue Jun 25, 2013 8:19 pm
Re: 3Champs reloaded
No matter how long the time control, 100 games is insignificant statistically.
It's the same as 100 1 minute lightning matches statistically, it is just longer.
It is part of all the data that gets gathered to ultimately determine who is the best. This will take some time.
To gather enough LTC matches for significance takes a very, very long time.
It's the same as 100 1 minute lightning matches statistically, it is just longer.
It is part of all the data that gets gathered to ultimately determine who is the best. This will take some time.
To gather enough LTC matches for significance takes a very, very long time.
-
- Posts: 10948
- Joined: Wed Jul 26, 2006 10:21 pm
- Full name: Kai Laskos
Re: 3Champs reloaded
SF has a LOS of 92% against H4 in this 150 games tourney.PaulieD wrote:No matter how long the time control, 100 games is insignificant statistically.
It's the same as 100 1 minute lightning matches statistically, it is just longer.
It is part of all the data that gets gathered to ultimately determine who is the best. This will take some time.
To gather enough LTC matches for significance takes a very, very long time.
-
- Posts: 2287
- Joined: Sat Jun 02, 2012 2:13 am
Re: 3Champs reloaded
That's right -- even a small sample can be fairly significant if the measured rating difference is large enough.Laskos wrote:SF has a LOS of 92% against H4 in this 150 games tourney.PaulieD wrote:No matter how long the time control, 100 games is insignificant statistically.
It's the same as 100 1 minute lightning matches statistically, it is just longer.
It is part of all the data that gets gathered to ultimately determine who is the best. This will take some time.
To gather enough LTC matches for significance takes a very, very long time.
-
- Posts: 6257
- Joined: Sun Jan 10, 2010 6:15 am
- Location: Maryland USA
- Full name: Larry Kaufman
Re: 3Champs reloaded
There is obviously some correlation between blitz and slow ratings; even a casual glance at the rating lists shows this. If the only information you have is that engine A is stronger than engine B at blitz, then you should bet on engine A at slow chess.ouachita wrote:Please show me here how you would demonstrate or where anyone has demonstrated any scientific statistical relationship between these two sets of random results/data: 1+1, 90+2, or any data even remotely comparable. You are free to use any statistical or mathematical expression you chose.Milos wrote:You know that correlation is a measurable quantity?
And I invite RH, LK and MC to do the same.
However, results at the actual time limit of interest should be the ones that count provided that the difference between the slow and fast tests is statistically meaningful. So for example, if one looks at the relative ratings of Komodo TCEC and Houdini 4 on Acer's 90' + 30" rating list (600 game minimum) and the same difference on any of the blitz lists, the difference in their relative ratings is huge and surely significant beyond 99% (I leave it to the mathematicians to confirm this). So in this case the blitz ratings are almost totally irrelevant to 90 minute + increment strength.
Note that the above is totally independent from the contempt argument. If contempt were set to zero for both Komodo and Houdini Aser's tests, I would guess it would raise Houdini by ten points or so and Komodo by five points. Houdini 4 would still be well back in third place, and Komodo would be very close to Stockfish DD.
-
- Posts: 454
- Joined: Tue Jan 15, 2013 4:33 pm
- Location: Ritz-Carlton, NYC
- Full name: Bobby Johnson
Re: 3Champs reloaded
I believe that you are most likely correct, and I would hypothesize that you are most likely correct. Truth is, we do not and may never know with any measure of certainty the extent to which blitz (STC) test results are relevant to 90+ or 40/120+ or 40/240 or . . . 40/4000, etc. results. Perhaps guessing will have to suffice.lkaufman wrote:in this case the blitz ratings are almost totally irrelevant to 90 minute + increment strength.
SIM, PhD, MBA, PE