Martin on the SF loss on time

Uri Blass · Post by **Uri Blass** » Thu Oct 15, 2015 5:46 am

My opinion is that it is better to decide later if to use lazy smp.

I suggest to use lazy smp in the final only in case that there is some evidence that support the conjecture that lazy smp is better at long time control.

In other words only if you see more than 50% for lazy smp against master at long time control use lazy smp and otherwise do not use lazy smp.

So far I did not see more than 50% and the only match that I know at long time control is the following match when lazy smp got one loss and many draws(unfortunately the starting positions of this match were not good and were too drawish and IMO it is better to use opening like 1.e4 h5 when it is clear that white has an advantage but not clear if white wins so hopefully we will get 1.5:0.5 in part of the matches.

http://talkchess.com/forum/viewtopic.ph ... 1&start=10

Michel · Post by **Michel** » Thu Oct 15, 2015 7:22 am

elo differences generally shrink massively between 60" games and TCEC,

Do you mean elo difference, as measured by score? That is an incorrect measure. You have to compare apples with apples.

The correct measure is "resolution". I.e. elo difference (as measured by score) divided by error bars (mainly controlled by the draw ratio).

I am quite interested to know if resolution indeed goes down with TC.

Laskos · Post by **Laskos** » Thu Oct 15, 2015 8:57 am

Michel wrote:

I am quite interested to know if resolution indeed goes down with TC.

It's a good question, the answer seems again empirical. I did a couple of years ago a test with Houdini, and the fraction wins/losses seemed either pretty constant with TC (at longer TC) or mildly increasing. For Komodo-SF matches at TCEC, it was an apparent increase in wins/losses compared to shorter TC. I picked quickly two parameters which seem reasonable to me:

draw_ratio -- monotonous with TC
win/loss

Then:

sigma = sqrt(1 - draw_ratio) -- for small difference between win and loss;
win - loss = (win/loss - 1 + draw_ratio - draw_ratio*win/loss)/(1 + win/loss)

1/ Constant win/loss (some empirical data suggest it)

(win-loss)/sigma:

2/ win/loss increases as 1/sqrt(1-draw_ratio) (some other empirical data suggest it, especially Komodo-SF matches)

(win-loss)/sigma:

To establish a more general rule, one has to take some database with engines at different TC, but I bet it will vary on engine and TC chosen.

syzygy · Post by **syzygy** » Thu Oct 15, 2015 10:09 pm

lucasart wrote:I think SF should go with the master branch, for the final. Lazy SMP is stronger than master, but in 3h games the difference should be small:
* We do not have statistically reliable data to know the elo gain at 3h games, and we never will (because statistically reliable means tens of thousand of games, which we cannot do at this tc).

If the Lazy SMP version scales well, the Elo difference on many cores should be large enough that a reasonable number of games would show it.

Laskos · Post by **Laskos** » Fri Oct 16, 2015 7:40 am

I tried to investigate a bit more and to incorporate a draw model (assuming logistic ELO), but it seems the TC scaling has little to do with the usual Bayeselo and Davidson draw models. Then I took the "fastgm" website results at 10 minutes per game and 1 minute per game, for example Stockfish 6 performance looks like that:

Code: Select all

60s + 0.6s
    Stockfish 6 64     3369      &#58;   2250 (+1195, =859,  -196&#41;,  72.2 %

    vs.                          &#58;  games (     +,    =,    -),   (%) &#58;   Diff
    Komodo 9 64-bit              &#58;    250 (    79,  118,   53&#41;,  55.2 &#58;    +16
    Houdini 4 x64                &#58;    250 (    98,  116,   36&#41;,  62.4 &#58;    +66
    Gull 3 x64                   &#58;    250 (   117,  102,   31&#41;,  67.2 &#58;   +150
    Fire 4 x64                   &#58;    250 (   128,  106,   16&#41;,  72.4 &#58;   +163
    Equinox 3.30 x64mp           &#58;    250 (   138,   93,   19&#41;,  73.8 &#58;   +192
    Critter 1.6a 64-bit          &#58;    250 (   127,  104,   19&#41;,  71.6 &#58;   +192
    Bouquet 1.8 x64              &#58;    250 (   155,   88,    7&#41;,  79.6 &#58;   +218
    Deep Rybka 4.1 x64           &#58;    250 (   171,   70,    9&#41;,  82.4 &#58;   +263
    Hannibal 1.5 x64             &#58;    250 (   182,   62,    6&#41;,  85.2 &#58;   +369


600s + 6s
    Stockfish 6   3130 &#58;   2700 (+1163,=1390,-147&#41;,  68.8 %

    vs.                &#58;  games (    +,    =,   -),   (%) &#58;   Diff
    Komodo 9           &#58;    300 (   43,  210,  47&#41;,  49.3 &#58;     -3
    Houdini 4          &#58;    300 (  111,  154,  35&#41;,  62.7 &#58;    +73
    Gull 3             &#58;    300 (  109,  176,  15&#41;,  65.7 &#58;   +115
    Fire 4             &#58;    300 (  108,  179,  13&#41;,  65.8 &#58;   +122
    Equinox 3.30       &#58;    300 (  134,  160,   6&#41;,  71.3 &#58;   +163
    Critter 1.6a       &#58;    300 (  145,  147,   8&#41;,  72.8 &#58;   +174
    Bouquet 1.8        &#58;    300 (  167,  128,   5&#41;,  77.0 &#58;   +201
    Rybka 4.1          &#58;    300 (  174,  111,  15&#41;,  76.5 &#58;   +208
    Hannibal 1.5       &#58;    300 (  172,  125,   3&#41;,  78.2 &#58;   +249

A global result for this data is:

60s + 0.6s
Stockfish 6 : 2250 (+1195, =859,-196), 72.2 %

d=0.382
w/l=6.10

w-l=0.444
sigma=0.6489
(w-l)/sigma = 0.684

600s + 6s
Stockfish 6 : 2700 (+1163,=1390,-147), 68.8 %

d=0.515
w/l=7.91

w-l=0.376
sigma=0.5863
(w-l)/sigma = 0.641

Resolution decreases with TC. Win/loss increases indeed with TC, but not enough to offset the diminishing (win-loss), which decreases more than sigma.

This data I would call "large strength difference" decrease in resolution with TC, and it is very well fitted by the model win/loss ~ C/(1-draw) for the value of C between 3 and 4. The plot of resolution versus draw rate looks like that:

If one picks only close in strength results from database, the resolution increases with TC, and now we have "small strength difference" increase in resolution with TC. It is described by the same win/loss ~ C/(1-draw), but now C is around 1:

I think that Miguel's wilos are better suited to describe these results than our usual elos.

bob · Post by **bob** » Fri Oct 16, 2015 8:28 pm

syzygy wrote:
lucasart wrote:I think SF should go with the master branch, for the final. Lazy SMP is stronger than master, but in 3h games the difference should be small:
* We do not have statistically reliable data to know the elo gain at 3h games, and we never will (because statistically reliable means tens of thousand of games, which we cannot do at this tc).
If the Lazy SMP version scales well, the Elo difference on many cores should be large enough that a reasonable number of games would show it.

It is not clear what "scales" means in this context. I have seen references to BOTH higher NPS, AND longer time-to-depth. If I had time, I'd run a few tests on it to see what it does on my 20 or 24 core boxes... But that's time away from working on my code, which is not exactly a good use of time.

Uri Blass · Post by **Uri Blass** » Sat Oct 17, 2015 3:43 am

bob wrote:
syzygy wrote:
lucasart wrote:I think SF should go with the master branch, for the final. Lazy SMP is stronger than master, but in 3h games the difference should be small:
* We do not have statistically reliable data to know the elo gain at 3h games, and we never will (because statistically reliable means tens of thousand of games, which we cannot do at this tc).
If the Lazy SMP version scales well, the Elo difference on many cores should be large enough that a reasonable number of games would show it.
It is not clear what "scales" means in this context. I have seen references to BOTH higher NPS, AND longer time-to-depth. If I had time, I'd run a few tests on it to see what it does on my 20 or 24 core boxes... But that's time away from working on my code, which is not exactly a good use of time.

scales well mean performing significantly better at longer time control.

The claim is that lazy SMP with many cores performs better than the previous algorithm at long time control.

If there is an advantage of at least 30 elo not at blitz then it is possible to show elo advantage by a test of some hundrends of games and you do not need thousands of games.

mcostalba · Post by **mcostalba** » Sat Oct 17, 2015 12:57 pm

syzygy wrote:
bob wrote:You did understand that they changed versions prior to this stage? Switching to the new lazy-amp version which apparently exhibits this bug while old versions with normal YBW did not.
Yes I know, but as far as I understand the lag parameter has always been set to 10ms and time management has not really been changed.

Apparently SF loses a bit of time upon finishing a search when it waits for all threads to stop. I would think the YBWC version also stops threads before sending the best move (or it would be cheating) and loses some time on that.

Of course it might simply take the lazy-smp version a bit longer to stop the threads than the previous version, and that difference might be just enough to lose on time under the wrong circumstances...

Indeed we of course tested lazy_smp in our framework for many tens of thousand of games before to submit to Martin, and we didn't experience any time loss. People stating that lazy smp was untested, simply ignore how SF development works: every patch that goes in is very deeply and strictly tested, much more than in any other engine development that I am aware of.

The only time related difference between old version and lazy one is the way engine stops and waits for the slaves threads to terminate the search before to return the best move.

So considering the above, my take is that stopping the threads in lazy smp requires more time on the particular TCEC hardware.

We didn't had the chance to test on that hardware, nor on similar hardware, given that very few people have access to such powerful machine.

mcostalba · Post by **mcostalba** » Sat Oct 17, 2015 1:06 pm

mcostalba wrote: We didn't had the chance to test on that hardware, nor on similar hardware, given that very few people have access to such powerful machine.

BTW even as of today we were not able to reproduce the time loss on any of our machines, even the most powerful ones.

Louis tried hard to reproduce the time loss on his big hardware machine, but he failed even under the extreme conditions he threw to SF.

As of today, I am not aware of anybody had some time losses with lazy_smp, the only one seems to be TCEC machine...if this is not unfortunate, well, I don't know what unlucky means

BubbaTough · Post by **BubbaTough** » Sat Oct 17, 2015 1:34 pm

bob wrote: It is not clear what "scales" means in this context. I have seen references to BOTH higher NPS, AND longer time-to-depth. If I had time, I'd run a few tests on it to see what it does on my 20 or 24 core boxes... But that's time away from working on my code, which is not exactly a good use of time.

I think scale does not (or should not) refer to either NPS or time-to-depth. I think it refers to program strength measured in elo. Which is unfortunate, since the two items you reference are much less time consuming to measure.

-Sam

Martin on the SF loss on time

Re: Martin on the SF loss on time

Re: Martin on the SF loss on time

Re: Martin on the SF loss on time

Re: Martin on the SF loss on time

Re: Martin on the SF loss on time

Re: Martin on the SF loss on time

Re: Martin on the SF loss on time

Re: Martin on the SF loss on time

Re: Martin on the SF loss on time

Re: Martin on the SF loss on time