Stockfish 10 was released 29.11.2018

Discussion of anything and everything relating to chess playing software and machines.

Moderators: bob, hgm, Harvey Williamson

Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
Zenmastur
Posts: 830
Joined: Sat May 31, 2014 6:28 am

Re: Stockfish 10 was released 29.11.2018

Post by Zenmastur » Thu Dec 12, 2019 6:30 pm

Uri Blass wrote:
Thu Dec 12, 2019 6:09 pm
Zenmastur wrote:
Thu Dec 12, 2019 12:42 pm
Uri Blass wrote:
Thu Dec 12, 2019 2:56 am

There are not enough games to know if a simplification is a regression or an improvement but you can get an unbiased estimate for the average value of simplifications from stockfish10.

These are the first numbers and you need to get more numbers from the link and calculate average for that purpose.
At least when I look at the first numbers it seems to me that the average is positive.

209.73->207.78(-1.95 elo) 1.12.2018 simplification
208.88->206.03(-2.85 elo) 6.12.2018 simplification
208.75->211.58(2.83 elo) 16.12.2018 simplification
214.03->216.00(1.97 elo) 16.12.2018 simplification
214.25->213.08(-1.17 elo) 24.12.2018 simplification
212.66->213.98(1.32 elo) 27.12.2018 simplification
209.88->210.54(0.66 elo) 4.1.2019 simplification
211.45->215.12(3.67 elo) 10.1.2019 simplification
215.12->212.84(-2.28 elo) 14.1.2019 simplification
212.84->212.17(-0.67 elo) 14.1.2019 simplification
212.17->216.75(4.58 elo) 17.1.2019 simplification
215.25->217.07(1.82 elo) 22.1.2019 simplification
216.10->215.10(-1 elo) 29.1.2019 simplification
215.10->221.39(6.29 elo) 31.1.2019 simplification
217.75->219.64(1.89 elo) 8.2.2019 simplification
219.64->220.48(0.84 elo) 21.2.2019 simplification
220.48->220.45(-0.03 elo) 21.2.2019 simplification
218.64->219.64(1 elo) 27.2.2019 simplification
220.93->218.38->220.45(-0.48 elo) 5.3 simplifications
219.49->220.93(+1.44 elo) 10.3 simplification
219.87->218.20(-1.67 elo) 20.3 simplification
221.09->218.53(-2.56 elo) 24.3 simplification
217.85->218.81(0.96 elo) 4.4 simplification
223.36->220.86->221.82(-1.54 elo) 13.4 simplifications
219.64->219.14->218.61->219.15(-0.49 elo) 16.4 smplifications
219.15->220.30(1.15 elo) 17.4 simplification
218.51->218.61(0.1 elo) 19.4 simplification
221.37->220.81->225.70(4.33 elo) 9.5 simplifications
I have no clue what all this is supposed to mean.
This is about the link

I will explain one line and you can understand the other lines based on the same logic
209.73->207.78(-1.95 elo) 1.12.2018 simplification

The following lines are from the link
https://nextchessmove.com/dev-builds
20181201-0929 20000 11146 433 8421 +207.78 +/- 3.64 Simplification
20181129-1517 20000 11271 478 8251 +209.73 +/- 3.69 Non Functional

209.73 is elo difference from stockfish7 before the simplification.
207.78 is elo difference from stockfish7 after the simplification.

-1.95 is the estimate for elo improvement from the simplification(note that the statistical mistake is above 3.6 elo).
1.12.2018 is the date of the simplification.


The idea is that you can get unbiased estimate for the elo that stockfish get from simplifications by the sum of all these numbers.
I did not calculate the sum of all these numbers but at least the sum of the numbers that I wrote that is only about part of the simplications is above 0.

Maybe somebody can continue to calculate the sum of all the numbers(there are many simplifications after the simplification of 9.5.2019 when I did not write the numbers but you can get it from the link).
I did a spreadsheet for all simplifications on that site. I then set June 1st ELO to zero. I then summed the ELO gain ( or loss) for all simplification from June First to December First. The net ELO loss was 24.54 ELO over a six month period. IMHO this is too much ELO loss. Six months is too much time to wait for a statistical correction. The easiest way to fix this is to slightly alter the simplification bounds on the the tests. E.g. changing the bounds from [-3.00, 1.00] to something like [-2.50, 1.50] would favor less ELO loss (or a shorter time for a correction to occur). It might make simplification slightly harder to pass but it reduces the chances of having long runs of simplifications that have a net ELO loss.

Regards,

Zenmastur
Only 2 defining forces have ever offered to die for you.....Jesus Christ and the American Soldier. One died for your soul, the other for your freedom.

Gabor Szots
Posts: 470
Joined: Sat Jul 21, 2018 5:43 am
Location: Szentendre, Hungary
Full name: Gabor Szots

Re: Stockfish 10 was released 29.11.2018

Post by Gabor Szots » Thu Dec 12, 2019 7:48 pm

Zenmastur wrote:
Thu Dec 12, 2019 6:30 pm
a net ELO loss.
We lost Elo a very long time ago now...
Gabor Szots
CCRL testing group

Michel
Posts: 2087
Joined: Sun Sep 28, 2008 11:50 pm

Re: Stockfish 10 was released 29.11.2018

Post by Michel » Thu Dec 12, 2019 7:49 pm

Zenmatsur wrote:I did a spreadsheet for all simplifications on that site. I then set June 1st ELO to zero. I then summed the ELO gain ( or loss) for all simplification from June First to December First. The net ELO loss was 24.54 ELO over a six month period. IMHO this is too much ELO loss. Six months is too much time to wait for a statistical correction. The easiest way to fix this is to slightly alter the simplification bounds on the the tests. E.g. changing the bounds from [-3.00, 1.00] to something like [-2.50, 1.50] would favor less ELO loss (or a shorter time for a correction to occur). It might make simplification slightly harder to pass but it reduces the chances of having long runs of simplifications that have a net ELO loss.
NCM has gigantic error bars (3.80). So you have to be careful interpreting your results.

I did the analysis for the results from July 1 on and got -15.66. There were 19 blocks of consecutive simplifications. So the error bars are sqrt(2*19)*3.80=23.42. So -15.66 is easily within error bars. Of course you can always pick and choose your period to get a worse result (somebody managed to "prove" using this method that global temperatures are actually declining).

If you really want to prove that simplifications lose (substantial) elo then you should do a more serious analysis.
Ideas=science. Simplification=engineering.
Without ideas there is nothing to simplify.

User avatar
Ajedrecista
Posts: 1423
Joined: Wed Jul 13, 2011 7:04 pm
Location: Madrid, Spain.
Contact:

Re: Stockfish 10 was released 29.11.2018.

Post by Ajedrecista » Thu Dec 12, 2019 8:31 pm

Hello:
Zenmastur wrote:
Thu Dec 12, 2019 6:30 pm
I did a spreadsheet for all simplifications on that site. I then set June 1st ELO to zero. I then summed the ELO gain ( or loss) for all simplification from June First to December First. The net ELO loss was 24.54 ELO over a six month period. IMHO this is too much ELO loss. Six months is too much time to wait for a statistical correction. The easiest way to fix this is to slightly alter the simplification bounds on the the tests. E.g. changing the bounds from [-3.00, 1.00] to something like [-2.50, 1.50] would favor less ELO loss (or a shorter time for a correction to occur). It might make simplification slightly harder to pass but it reduces the chances of having long runs of simplifications that have a net ELO loss.

Regards,

Zenmastur
There are error bars (the numbers with ± before them). IIRC, a sum of N normal distributions, each with mean µ_i and standard variation = sigma_i gives another normal distribution with mean = sum(µ_i) from i = 1, ..., N and a standard variation = sqrt[sum(sigma_i * sigma_i)] from i = 1, ..., N. In this case, the standard deviations we see are not sigma_i (confidence ~ 68.27%) but 95% confidence (circa 1.96*sigma_i).

I also did a spreadsheet with only simplifications and tests before those simplifications from 2019/06/01 to 2019/12/01. I got -24.59 Elo instead of -24.54 Elo... in any case, it is sum(µ_i) from N = 36 simplifications. I also get sqrt[sum(1.96*sigma_i * 1.96*sigma_i)] ~ 23.07 Elo (95% confidence), which translates into a standard deviation of 23.07 / 1.96 ~ 11.77 Elo.

Doing a Z-test: Z = -24.59 / 11.77 ~ -2.09, which is slightly outside of 95% confidence error bars, that is, |Z| > 1.96. I do not think this result is heavily significant but I ask Michel's help or anyone's help who is kind in Statistics to say if my math is correct. I see in the meanwhile that Michel was faster than me! I think the correct way to do is extend the period since the start of SPRT[-3, 1] for simplifications, which should be before 2019/06/01 although I really do not know it.

Regards from Spain.

Ajedrecista.

Michel
Posts: 2087
Joined: Sun Sep 28, 2008 11:50 pm

Re: Stockfish 10 was released 29.11.2018.

Post by Michel » Thu Dec 12, 2019 8:44 pm

Ajedrecista wrote:
Thu Dec 12, 2019 8:31 pm
Hello:
Zenmastur wrote:
Thu Dec 12, 2019 6:30 pm
I did a spreadsheet for all simplifications on that site. I then set June 1st ELO to zero. I then summed the ELO gain ( or loss) for all simplification from June First to December First. The net ELO loss was 24.54 ELO over a six month period. IMHO this is too much ELO loss. Six months is too much time to wait for a statistical correction. The easiest way to fix this is to slightly alter the simplification bounds on the the tests. E.g. changing the bounds from [-3.00, 1.00] to something like [-2.50, 1.50] would favor less ELO loss (or a shorter time for a correction to occur). It might make simplification slightly harder to pass but it reduces the chances of having long runs of simplifications that have a net ELO loss.

Regards,

Zenmastur
There are error bars (the numbers with ± before them). IIRC, a sum of N normal distributions, each with mean µ_i and standard variation = sigma_i gives another normal distribution with mean = sum(µ_i) from i = 1, ..., N and a standard variation = sqrt[sum(sigma_i * sigma_i)] from i = 1, ..., N. In this case, the standard deviations we see are not sigma_i (confidence ~ 68.27%) but 95% confidence (circa 1.96*sigma_i).

I also did a spreadsheet with only simplifications and tests before those simplifications from 2019/06/01 to 2019/12/01. I got -24.59 Elo instead of -24.54 Elo... in any case, it is sum(µ_i) from N = 36 simplifications. I also get sqrt[sum(1.96*sigma_i * 1.96*sigma_i)] ~ 23.07 Elo (95% confidence), which translates into a standard deviation of 23.07 / 1.96 ~ 11.77 Elo.

Doing a Z-test: Z = -24.59 / 11.77 ~ -2.09, which is slightly outside of 95% confidence error bars, that is, |Z| > 1.96. I do not think this result is heavily significant but I ask Michel's help or anyone's help who is kind in Statistics to say if my math is correct. I see in the meanwhile that Michel was faster than me!

Regards from Spain.

Ajedrecista.
Hi Ajedrecista,

What I did differently - I think - is to take into account that you are measuring elo differences so the error bars are multiplied by sqrt(2).

On the other hand, for the error calculation I counted blocks of consecutive simplifications as one. Indeed one has (X1-X2)+(X2-X3)+.. (X_{k-1}-Xk)=X1-Xk. So the error of the difference is only determined by the errors of X1 and Xk.

Unfortunately I accidentally started my analysis from July 1 (and ended Dec 8). So my analysis is not directly compatible with yours.
Ideas=science. Simplification=engineering.
Without ideas there is nothing to simplify.

Alayan
Posts: 255
Joined: Tue Nov 19, 2019 7:48 pm
Full name: Alayan Feh

Re: Stockfish 10 was released 29.11.2018

Post by Alayan » Thu Dec 12, 2019 9:17 pm

NCM testing results vs SF7 are really volatile, I would avoid reading too much into them.

Also, in several occasions bundles of simplifications reverts were tested together and ended about neutral. So, claiming SF lost dozens of elo this year because of simplifications is misinformed. The truth is that on average, elo gaining patches are not giving much, maybe 0.5 elo or so on the 8 moves book, so even with a lot of them it takes time to have a clear progress.

However, this doesn't mean that current simplification bounds are fine.

The 95% lowerbound for simplifications is slightly below -0.5. Of course, the test needs to pass both STC and lTC. Nonetheless, it means that there is a significant likelihood of [-3, 1] SPRT passing for code parts that are increasing strength by only about 0.5 elo (which is many things in SF).

This is especially problematic when rather that removing a big chunk of code (leaving open room for something better instead), the simplification is really trivial. For example, removing a division, or converting a ternary (if stuff) ? true_stuff : false_stuff into average_stuff. These do not help code clarity, do not help future development, are a waste of fishtest resources and damage SF's strength.

The 0 elo average for simplifications when testing bundles of them reverted most likely means that some together actually gained a few elos, while the others lost about the same. Now if you pick your time period appropriately, you can most likely find one with some significant net elo loss from simplifications, but that's not a correct way to run the experiment.

More aggressive bounds for minor simplifications, to greatly increase confidence elo is not thrown away (at the cost of making a bunch of neutral minor simplification fail from bad luck), would make sense considering how minor gains are hard to come by.

Michel
Posts: 2087
Joined: Sun Sep 28, 2008 11:50 pm

Re: Stockfish 10 was released 29.11.2018

Post by Michel » Fri Dec 13, 2019 4:50 am

I redid the analysis for June 1 to Dec 1. Now the sum is -24.59. There were 36 simplifications in 23 blocks. Assuming 3.80 as error bars (the actual error bars vary a bit from test to test) we find total error bars
of sqrt(2*23)*3.80=25.77. So 24.59 is still not significant, despite the fact that June contained some especially large outliers (perhaps this was the reason for the interval to be cherry picked in this way?).

Note this analysis is only about measurement errors. It does not even take into account another source of statistical noise caused by the fact that simplifications are not expected to be exactly zero elo. However it seems to be not worth the effort to go through a more complete analysis.
Ideas=science. Simplification=engineering.
Without ideas there is nothing to simplify.

Post Reply