Martin on the SF loss on time

Norm Pollock · Post by **Norm Pollock** » Wed Oct 21, 2015 10:16 pm


Welcome to the WBEC Ridderkerk homepages.

The WBEC logo is created by Wilhelm Hudetz!


For the Winboard Engine Info page use the "Engine-Info" link above!

The rules section belowe is 

The 2nd edition of WBEC Ridderkerk:
The 3rd and 4th Division are finished.
The 2nd Division was started at 6 April 2002.
The 1st Division will start as soon the 2nd is finished.

The 3rd edition of WBEC Ridderkerk:
The 5th Division is finished.
The 4th Division was started at 18 May 2002.
The 3rd Division will start as soon the 4th is finished.

All crosstables and games (download) of the first edition can be found at the “History” page.

Tournament rules:
=================
The AmCom, 1st and 2nd Division will be played at my Dual P3-1266 computer, 
max memory usage for each engine: 128 MB (hashtables) + 8 MB egtb cache.
---
The 3rd, 4th and 5th Division will be played at my Dual P3-933 computer, 
max memory usage for each engine: 90 MB (hashtables) + 8 MB egtb cache.
---
All games will be played at 40 moves/40 minutes, Ponder=ON, all 3,4 and 5 man egtbs available.
Authors are free to send me their favorite settings as long they do not overstep the above rules of max memory usage!
P.S. I use Windows2000 and Winboard 4.2.6, at least until there is a newer version.
Tournament manager: WBTM by Jori Ostrovskij.


Division rules:
===============
FreeCo Division:Will be a 4 or 6 RR tournament.
Participants: All commercial WB engines I own, all “private” WB engines I get from 
their authors and the best 3 of the 1st Division (can be more to make the number of participants at least 12.


1st Division: Will be a 4 RR tournament.
16 engines + the best 4 of Division 2. (total: 20 participants)
Engines finished as 1st, 2nd and 3rd (maby more) will play in the next FreeCo Division!
Rank 17, 18, 19 and 20 will go down to the 2nd Division.


2nd Division: Will be a 4 RR tournament.
16 engines + the best 4 of Division 3. (total: 20 participants)
Engines finishing in the top 4 play in the next 1st Division tournament.(which starts
immediately  after the 2nd Division is finished)
Rank 16,17,18,19 and 20 will go down to the 3rd Division.


3rd Division: Will be a 4 RR tournament.
Participants: 14 engines + the best 6 of the 4th Division. (total 20 engines)
Engines finishing in the top 5 play in the next 2nd Division tournament. (which
starts immediately after the FreeCo Division is finished)
Rank 15,16,17,18,19 and 20 will go down to the 4th Division.


4th Division: Will be a 2 RR tournament.
Participants: 18 engines + the best 8 of the 5th Division.
Engines finishing in the top 6 will play in the 3rd Division 
tournament!(which starts immediately after the 4th Division is finished)
Rank 17, 18, 19, 20, 21, 22, 23 and 24 will go down to the 4th Division.


5th Division: Will be a 2 RR tournament.
Participants: 12 engines + the best 12 of the 6th Division.
Engines finishing in the top 8 will play in the 4th Division 
tournament!(which starts immediately after the 5th Division is finished)
The engines who finish at rank 13 and lower will go back to the 6th Division.


6th Division: Will be a Swiss tournament.
If less as 40 participants it will be an 1RR, if 40+ it will be an 20 rounds Swiss.
Participants: all free engines that comply to the tournament rules and are not in an higher Division!
Engines finishing in the top 12 will play in the 5th Division tournament!(which starts immediately after the

6th Division is finished)


Rules for engines to participate:
=================================
1. Compatibility:
The engine must play under winboard without problems!

2. Stability:
A] The engine should play a (test) series of 36 games with no more than one hangup...(on my computer :-) )



When an engine in the "Test" tournament:
A] 'hangs' in the background 2 or more times after an game is finished.
B] or crash (or play ilegal moves) in more as 2 games
 will not start in the 'official' tournament!!(or there must be an bugfix ofcourse)
C] If an engine lose several games on time in the test tournament (40/5) I will test it first at 40/40 before I
will make an decision!


When the 'official' Division tournament is started:
Each engine who 'hangs' 2x during the first half of the tournament will be taken out immediate, if it happen
3x or more in the whole tournament it will be taken out after the Division is finished!

Each engine who lose 4 games in the first half of an Division tournament because of 'lost on time, illegal moves
or crashes will be taken out immediate!
Each engine who lose 6 games in the whole Division tournament because of 'lost on time, illegal moves
or crashes' will be taken out after the Division is finished!

If an Division is started eventual 'bugfixes' are allowed until the engine have start his 4th round games (4RR)
or have start his 5th round games in an 2RR or Swiss (Qualify).

3. Time control: 
The engine must be capable of the usual time controls, like 40 in 40' - losing more then 9 % on time would cause 
me to take it out of the WBEC Ridderkerk... (as soon a fixed version is released it will return  into the lowest Division).
 
4. drawclaim: (force draw result to Winboard)
The engines should claim draw due to 50-move rule or 3-fold-repetition... 
insufficent material is not a must - anyway I would like to see it... :)

5. It must be freely available, at least in the future. (Exception: AmCom Division)

6. Only 1 engine by an author allowed.

7. In the AmCom part will play every "not free" available WB engine i have (IF they support rules 1-4 above!), 
if they are at (lets say) 1st Division strength.
All authors of a "not free" available WB engine who would like to see his engine playing in the AmCom division, 
feel free to send me a message or a (beta) version of the engine to test.
(it will NOT be sent to others by me!)

8. All new free available engines that comply with the tournament rules will start in the lowest Division.
(the first start after I get the engine)


Game rules:
===========
1. Lost on time - game is lost - exception: Opponent has  insufficient material to mate - then draw.

2. wrong mate claiming - game is lost - same exception as 1.

3. wrong draw claiming - game is lost - same exception as 1.

4. illegal move - game is lost - same exception as 1.

5. crash - game is lost - same exception as 1.

Norm Pollock · Post by **Norm Pollock** » Wed Oct 21, 2015 10:19 pm

Jesse Gersenson wrote:
Norm Pollock wrote:Makes me appreciate how great were Leo's WBEC tournaments. Unfortunately the details of his rules are no longer available online. But based on my memory:

He had a mini pre-tournament to weed out engines that didn't work properly.

He allowed replacement engines early on only. The replacement was usually a prior version that he already had.

He used standard equipment that developers would have, not some super 20 core unique machine that at best only professional developers would have.

He also allowed using "permanent brain" aka "ponder". Controversial but one good thing is that it helps engines in time trouble.

And of course his tournament involved a lot more than 22 engines. Over 100 for sure.
These rules Norm?
https://web.archive.org/web/20020613073 ... /INFO.html

Great find, Jesse! Thanx!

bob · Post by **bob** » Thu Oct 22, 2015 1:02 am

Michel wrote:Currently there are good tests (lazy_smp) going on to check how the normalized elo/sigma scales with TC

http://tests.stockfishchess.org/tests/v ... 5a47758469

ELO: 25.49 +-8.8 (95%) LOS: 100.0%
Total: 1311 W: 194 L: 98 D: 1019

TC=120+0.1

http://tests.stockfishchess.org/tests/v ... 0be39e1b39

ELO: 44.75 +-7.6 (95%) LOS: 100.0%
Total: 2069 W: 407 L: 142 D: 1520

TC=60+0.1

The first test isn't finished and the second was stopped. But let's ignore this.

Since fishtest uses 95% twosided confidence intervals we find for the normalized elo/sigma

Long TC: 25.49/(8.8/1.96)/(1311**.5)=0.157
Short TC: 44.75/(7.6/1.96)/(2069**.5)=0.254

(the exact computation using the trinomial distribution is very close).

One checks that the difference is very significant. So for this particular patch (assuming that the numbers don't change too drastically) testing resolution does indeed go down with longer TC.
Laskos wrote:I tried to investigate a bit more and to incorporate a draw model (assuming logistic ELO), but it seems the TC scaling has little to do with the usual Bayeselo and Davidson draw models. Then I took the "fastgm" website results at 10 minutes per game and 1 minute per game, for example Stockfish 6 performance looks like that:
Code: Select all
60s + 0.6s
    Stockfish 6 64     3369      :   2250 (+1195, =859,  -196),  72.2 %

    vs.                          :  games (     +,    =,    -),   (%) :   Diff
    Komodo 9 64-bit              :    250 (    79,  118,   53),  55.2 :    +16
    Houdini 4 x64                :    250 (    98,  116,   36),  62.4 :    +66
    Gull 3 x64                   :    250 (   117,  102,   31),  67.2 :   +150
    Fire 4 x64                   :    250 (   128,  106,   16),  72.4 :   +163
    Equinox 3.30 x64mp           :    250 (   138,   93,   19),  73.8 :   +192
    Critter 1.6a 64-bit          :    250 (   127,  104,   19),  71.6 :   +192
    Bouquet 1.8 x64              :    250 (   155,   88,    7),  79.6 :   +218
    Deep Rybka 4.1 x64           :    250 (   171,   70,    9),  82.4 :   +263
    Hannibal 1.5 x64             :    250 (   182,   62,    6),  85.2 :   +369


600s + 6s
    Stockfish 6   3130 :   2700 (+1163,=1390,-147),  68.8 %

    vs.                :  games (    +,    =,   -),   (%) :   Diff
    Komodo 9           :    300 (   43,  210,  47),  49.3 :     -3
    Houdini 4          :    300 (  111,  154,  35),  62.7 :    +73
    Gull 3             :    300 (  109,  176,  15),  65.7 :   +115
    Fire 4             :    300 (  108,  179,  13),  65.8 :   +122
    Equinox 3.30       :    300 (  134,  160,   6),  71.3 :   +163
    Critter 1.6a       :    300 (  145,  147,   8),  72.8 :   +174
    Bouquet 1.8        :    300 (  167,  128,   5),  77.0 :   +201
    Rybka 4.1          :    300 (  174,  111,  15),  76.5 :   +208
    Hannibal 1.5       :    300 (  172,  125,   3),  78.2 :   +249
A global result for this data is:

60s + 0.6s
Stockfish 6 : 2250 (+1195, =859,-196), 72.2 %

d=0.382
w/l=6.10

w-l=0.444
sigma=0.6489
(w-l)/sigma = 0.684

600s + 6s
Stockfish 6 : 2700 (+1163,=1390,-147), 68.8 %

d=0.515
w/l=7.91

w-l=0.376
sigma=0.5863
(w-l)/sigma = 0.641

Resolution decreases with TC. Win/loss increases indeed with TC, but not enough to offset the diminishing (win-loss), which decreases more than sigma.

This data I would call "large strength difference" decrease in resolution with TC, and it is very well fitted by the model win/loss ~ C/(1-draw) for the value of C between 3 and 4. The plot of resolution versus draw rate looks like that:

If one picks only close in strength results from database, the resolution increases with TC, and now we have "small strength difference" increase in resolution with TC. It is described by the same win/loss ~ C/(1-draw), but now C is around 1:

I think that Miguel's wilos are better suited to describe these results than our usual elos.

For the record, those time controls are WAY out of whack. Larry (Kaufman) discussed this issue a while back. Typical times for human games look something like 5m+3s or 60m + 1m, or something in that range where the base time is about 60x the increment. 120 + 0.1 is out of whack, being the base time is 1200x the increment. That can lead to potential clock mismanagement. If the increment is too large, the base time becomes unimportant, if the increment is too small, the increment becomes unimportant.

thekingman · Post by **thekingman** » Mon Oct 26, 2015 8:13 pm

Stockfish just lost on time again. And this is old Stockfish, confirming what Team SF had previously said: the bug has been in the code for a very long time, well before lazy-SMP, and for whatever reason, only seems to come out on the TCEC hardware. It seems even with the old Stockfish, something about this setup causes it to occasionally come out.

The good news is that a patch was made to fix this weeks ago, and Stockfish is still almost guaranteed to make the superfinal, where it will finally be allowed to implement the fix.

bnemias · Post by **bnemias** » Mon Oct 26, 2015 8:27 pm

thekingman wrote:...confirming what Team SF had previously said: the bug has been in the code for a very long time

Until it is understood what is going on, it's difficult to classify as a "bug" or anything else. It could well be related to the TCEC hardware for all anyone knows. Might want to revisit any other time losses by other engines this season if any. It'd be nice to know what happened and why before progressing, but logistics probably make that impossible.

In any event, I'm glad I don't have to decide what to do.

thekingman · Post by **thekingman** » Mon Oct 26, 2015 8:38 pm

bnemias wrote:
thekingman wrote:...confirming what Team SF had previously said: the bug has been in the code for a very long time
Until it is understood what is going on, it's difficult to classify as a "bug" or anything else. It could well be related to the TCEC hardware for all anyone knows. Might want to revisit any other time losses by other engines this season if any. It'd be nice to know what happened and why before progressing, but logistics probably make that impossible.

In any event, I'm glad I don't have to decide what to do.

That is very true. It never loses on time on other machines, so it is clearly some kind of interaction between the hardware and software. Difficult to know exactly what to call that.

I just checked all possible tiebreaks. Even if Gull wins its remaining games and Stockfish loses its remaining games, there is no combination of other results in which Stockfish doesn't win the tiebreak. However much drama this adds, the final result is not going to change. Team SF came up with a solution weeks ago, so after it is finally allowed to be implemented for the superfinal, it is not going to be a problem anymore.

zullil · Post by **zullil** » Mon Oct 26, 2015 9:30 pm

thekingman wrote:
bnemias wrote:
thekingman wrote:...confirming what Team SF had previously said: the bug has been in the code for a very long time
Until it is understood what is going on, it's difficult to classify as a "bug" or anything else. It could well be related to the TCEC hardware for all anyone knows. Might want to revisit any other time losses by other engines this season if any. It'd be nice to know what happened and why before progressing, but logistics probably make that impossible.

In any event, I'm glad I don't have to decide what to do.
That is very true. It never loses on time on other machines, so it is clearly some kind of interaction between the hardware and software. Difficult to know exactly what to call that.

Likely OS-related. Tried and failed to reproduce on similar hardware with Linux. Since I can't seem to link to the Fishcooking forum, I'll quote myself:

I was asked by someone here to attempt to reproduce a time-limit overstep on a 20-core machine. Note: I use Linux, not Windows.

I downloaded the source and used gcc-5 and "make build ARCH=x86-64-bmi2" to compile Stockfish-ee208868ee9764e2d43ca2edc92e666ce5eb330f.

Rather than playing a lot of games, I decided to reproduce the setting in which the second time loss occurred. I used 20 threads, a 32 GB hash and 5-man tables. I did ten searches, restarting the engine after each. Each time I sent two commands to the engine, one after the other:

position startpos moves c2c4 g8f6 g1f3 g7g6 g2g3 f8g7 f1g2 e8g8 e1g1 d7d6 b1c3 b8c6 d2d4 e7e5 d4e5 d6e5 c1g5 c8e6 d1c1 d8c8 f1d1 f8d8 f3d2 h7h6 g5f6 g7f6 d2e4 d8d1 c1d1 f6e7 c3d5 e7f8 d1d2 f7f5 e4f6 g8h8 e2e4 c8d8 a1e1 e6f7 e4f5 g6f5 g3g4 f7d5 f6d5 f5g4 d2e2 d8d7 d5f6 d7g7 f6h5 c6d4 e2e4 g7g5 e4b7 a8e8 h5g3 g5d2 e1f1 d2a5 b7e4 a5a6 e4g4 a6c4 g4h5 e8e7 b2b3 c4f7 h5d1 f7f4 d1d3 e7e8 g1h1 e8e6 d3c4 e6e7 c4a4 c7c5 a4a6 f4f7 f1e1 e7e6 a6c4 e6f6 g2d5 f7e7 e1g1 f6g6 d5e4 g6g7 c4d5 e7e8 e4g2 g7f7 g1f1 d4b5 h1g1 b5c3 d5d2 c3b5 d2e3 b5d4 f2f4 f8g7 f4f5 e8e7 g1h1 f7f8 e3c1 h8h7 g2d5 h7h8 f1f2 e7d6 d5e4 f8g8 e4g2 g7f8 c1c4 d6f6 g3e4 f6g7 f5f6 g7g4 h2h3 g4d1 h1h2 d1e1 c4d5 d4e2 f2f1 g8g2 h2g2 e2f4 f1f4 e5f4 d5d3 e1a5 a2a4 a5c7 d3d5 c7c8 d5f7 c8d8 a4a5 f4f3 g2f3 d8d3 f3f4 f8d6 e4d6 d3d6 f4f5 d6d3

go wtime 32607 btime 443426 winc 30000 binc 30000

Not once did the engine come close to over-stepping the limit. In fact, given the small variance in the data, I can't see how this binary on my hardware would ever overstep. What happened with the Windows binary on the TCEC server I cannot know. Here's the data. I can't see how further testing by me could help---but please let me know.

info nodes 2252743071 time 32531
info nodes 2170858978 time 32529
info nodes 2193849215 time 32531
info nodes 2424568304 time 32529
info nodes 2497563424 time 32531
info nodes 2118002176 time 32531
info nodes 2164228243 time 32528
info nodes 2276720043 time 32530
info nodes 2315695483 time 32528
info nodes 2312409490 time 32532

Norm Pollock · Post by **Norm Pollock** » Tue Oct 27, 2015 12:08 am

Nobody can reproduce the time bug on other machines? I don't get that.

Can't they just reduce the safety/overhead value within SF that forces a move with time running out? Reduce it just to the point where a time forfeit occurs on their 20CPU testing machine. Then they can compare the overhead differences between the two types of SMP.

Statistically lazy SMP had 2 time forfeits in 5 matches, regular SMP 1 time forfeit in about 56 matches.

Dann Corbit · Post by **Dann Corbit** » Tue Oct 27, 2015 12:51 am

Norm Pollock wrote:Nobody can reproduce the time bug on other machines? I don't get that.

Can't they just reduce the safety/overhead value within SF that forces a move with time running out? Reduce it just to the point where a time forfeit occurs on their 20CPU testing machine. Then they can compare the overhead differences between the two types of SMP.

Statistically lazy SMP had 2 time forfeits in 5 matches, regular SMP 1 time forfeit in about 56 matches.

It just shows that the initial analysis by the stockfish team was correct. They claimed that the timeout defect also was present in the YBW treading code.

It seems that the threading overhead (starting and/or ending the search) for LazySMP is higher than for YBW.

At any rate, so far as I can see, there is no correction in the code for the timeout problem. There is only code for a new threading model.

ernest · Post by **ernest** » Tue Oct 27, 2015 1:34 am

thekingman wrote:The good news is that a patch was made to fix this weeks ago, and Stockfish is still almost guaranteed to make the superfinal, where it will finally be allowed to implement the fix.

Where is that fix ?
In http://abrok.eu/stockfish/ I cannot find it !...

And apparently TCEC ruled that Stockfish could only present a version found in http://abrok.eu/stockfish/

Martin on the SF loss on time

Re: Martin on the SF loss on time

Re: Martin on the SF loss on time

Re: Martin on the SF loss on time

Re: Martin on the SF loss on time

Re: Martin on the SF loss on time

Re: Martin on the SF loss on time

Re: Martin on the SF loss on time

Re: Martin on the SF loss on time

Re: Martin on the SF loss on time

Re: Martin on the SF loss on time