Stockfish patches

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

Paloma
Posts: 1167
Joined: Thu Dec 25, 2008 9:07 pm
Full name: Herbert L

Stockfish patches

Post by Paloma »

SF dev.

Why are some (up to 6) patches released in one day (often less than 1 hour),
and then again 10 or more days without a single patch?

Why so many within 1 or 2 hour?
User avatar
Marek Soszynski
Posts: 581
Joined: Wed May 10, 2006 7:28 pm
Location: Birmingham, England

Re: Stockfish patches

Post by Marek Soszynski »

I fear that Axtens's "Enable popcount and prefetch for ppc-64" patch could slow down old x64 PCs by 8–10%.
Marek Soszynski
User avatar
Eelco de Groot
Posts: 4561
Joined: Sun Mar 12, 2006 2:40 am
Full name:   

Re: Stockfish patches

Post by Eelco de Groot »

Paloma wrote: Thu Jul 11, 2019 3:24 pm SF dev.

Why are some (up to 6) patches released in one day (often less than 1 hour),
and then again 10 or more days without a single patch?

Why so many within 1 or 2 hour?
It just depends on when the maintainer, which is Marco, has the time to make the decision whether a patch passes or not and update the master. Before that time, developers, testers, can make comments and sometimes there is discussion so that a patch is seldom applied to the master the same day it is published as a pull request. (You can view the pull requests pending under their own tab). Sometimes quickly if there is not discussion possible and Marco can apply it.
Debugging is twice as hard as writing the code in the first
place. Therefore, if you write the code as cleverly as possible, you
are, by definition, not smart enough to debug it.
-- Brian W. Kernighan
Uri Blass
Posts: 10269
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: Stockfish patches

Post by Uri Blass »

Eelco de Groot wrote: Thu Jul 11, 2019 7:57 pm
Paloma wrote: Thu Jul 11, 2019 3:24 pm SF dev.

Why are some (up to 6) patches released in one day (often less than 1 hour),
and then again 10 or more days without a single patch?

Why so many within 1 or 2 hour?
It just depends on when the maintainer, which is Marco, has the time to make the decision whether a patch passes or not and update the master. Before that time, developers, testers, can make comments and sometimes there is discussion so that a patch is seldom applied to the master the same day it is published as a pull request. (You can view the pull requests pending under their own tab). Sometimes quickly if there is not discussion possible and Marco can apply it.
I think that it is bad for stockfish because people do not test a patch against the latest version and maybe testing against the latest version could lead to different result.

For example 2 patches passed sprt[0,4]

1)Combo of statscore divisor and pawn psqt changes
2)Tweak capture scoring formula

If I understand correctly it was practically A+patch 1 against A and A+patch 2 against A
Now maybe A+patch1+patch2 does not work against A.

I often see no improvement in the regression tests and I suspect that it may be the reason

We have in the last regression tests

Regression/progression test against SF10 after "More bonus for free passed pawn" of June 20th.

ELO: 24.06 +-1.8 (95%) LOS: 100.0%
Total: 40000 W: 7313 L: 4547 D: 28140

Later

Regression/progression test against SF10 after "Bonus for double attacks on unsupported pawns" of June 27th.

ELO: 22.75 +-1.8 (95%) LOS: 100.0%
Total: 40000 W: 7260 L: 4644 D: 28096

and so far
Regression/progression test against SF10 after "Assorted trivial cleanups June 2019" of July 11th.
ELO: 23.69 +-2.4 (95%) LOS: 100.0%
Total: 24209 W: 4468 L: 2820 D: 16921

I wonder what is the reason for almost no improvement and I think maybe the reason is that stockfish allow more than one change at the same time even if it is not based on testing against the latest version that they accepted.
Dann Corbit
Posts: 12538
Joined: Wed Mar 08, 2006 8:57 pm
Location: Redmond, WA USA

Re: Stockfish patches

Post by Dann Corbit »

A lot of software development is like that.
I might do a:
pacman -Syu
command and get nothing for a week.
And today,
pacman -Syu
gave me 56 project updates
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
daniel71
Posts: 146
Joined: Wed Aug 27, 2008 3:48 am

Re: Stockfish patches

Post by daniel71 »

I was wondering the same thing about the Stockfish development versions, nothing for 5 days then a group of patches all at once.
Another thing I noticed on Fishtest that they have many patches get labeled as improvements when in fact they score more loses than wins, is somebody trying to ruin the improvements gained? I know they wrote some patches may get passed if it has a speedup and has a negative score. Looks like they wouldn't remove code that is a net gain of ELO and gives the program more knowledge.
Uri Blass
Posts: 10269
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: Stockfish patches

Post by Uri Blass »

Simplifications can pass with slightly negative score but the cases when it happen are rare.

The main problem that I see is that the stockfish team give wrong information about elo.

They give elo estimate for every change and if you add all the estimates that they publish the elo improvement shouls be clearly higher.

Here is the elo estimate for all the improvements if you look only at the stockfish framework results from stockfish 10 at month december and january(I give only the estimate for the improvement at long time control)

It seems that the stockfish designers claim to have more than 20 elo improvment per month if you ignore their regression tests.
I did not calculate the elo advantage except december 2018 and january 2019 but I guess the picture for other months may be similiar.


1)Remove Overload bonus 1.12.2018 +0.06 elo
2)Penalize refuted killers in continuation history 1.12.2018 +3.06 elo(total improvement +3.12 elo)
3)Introduce concept of double pawn protection. 2.12.2018 +3.95 elo(total improvement +7.07 elo)
4)pseudo_legal() and MOVE_NONE 6.12.2018 Reverted in 7 so I assume 0 elo
5)Simplify time manager in search() 6.12.2018 +0.88 elo at long time control(total improvement +7.95 elo)
6)Simplify Killer Move Penalty 6.12.2018 +0.81 elo at long time control(total improvement 8.76 elo)
7)Revert "pseudo_legal() and MOVE_NONE"
8)simplify opposite_colors 9.12.2018
9)add paren. 9.12.2018
10)remove parenthesis. 9.12.2018
11)remove extra line 9.12.2018.
No results for 8-11 and I assume no functional change and no elo change for them
12)Tweak CMH pruning 9.12.2018 +2.22 elo at long time control(total improvment 10.98 elo)
13)Changes identified in RENAME/REFORMATTING thread (#1861) 11.12.2018 no functional change 0 elo
14)Asymmetrical 8x8 Pawn PSQT 13.12.2018 passed long time control but no estimate so I will be large and estimate 0 elo
15)A combo of parameter tweaks 13.12.2018 +2.31 elo(total improvement 13.29 elo)
16)Remove Null Move Pruning material threshold 16.12.2018 +1.03 elo(total improvement 14.32 elo)
17)Start a TT resize only after search finished 16.12.2018 no functional change 0 elo
18)Fix a segfault. 16.12.2018
19)Refactor king ring calculation no testing at long time control so I assume 0 elo
20)Use stronglyProtected 16.12.2018 +0.89 elo (total improvement 15.21 elo)
21)New voting formula for threads 18.12.2018 no functional change in simple thread mode so I assume 0 elo
22)Tweak main killer penalty 18.12.2018 +2.59 elo(total improvement 17.8 elo)
23)Simplify KBNK endgame implementation 20.12.2018
24)Simplify generate_castling (#1885) 23.12.2018
25)Turn on random access for Syzygy files in Windows (#1840) 23.12.2018
26)Use a bit less code to calculate hashfull() (#1830) 23.12.2018
27)Update our continuous integration machinery (#1889) 23.12.2018
28)Improve endgame KBN vs K (#1877) 24.12.2018 claiming no functional change but it is wrong. assume 0 elo
29)Extend stack to ss-5, and remove conditions 24.12.2018 -0.17 elo total improvement 17.63 elo
30)Fix crash in best_group() (#1891) 24.12.2018
31)Simplify SYSTEM_LOGICAL_PROCESSOR_INFORMATION_EX loop (#1892) 24.12.2018
32)Always initialize and evaluate king safety 27.12.2018 +3.59 elo(total improvement 21.22 elo)
33)Improve the Readme 29.12.2018
total improvement in december 21.22 elo



34)Remove as useless micro-optimization in pawns generation (#1915) 1.1.2019(not tested at LTC assume 0 elo)
35)Remove "Any" predicate filter (#1914) 1.1.2019
36)Remove openFiles in pawns. (#1917) 1.1.2019(no testing at LTC so assume 0 elo)
37)Assorted trivial cleanups (#1894) 1.1.2019
38)Delay castling legality check 4.1.2019 +5.03 elo (5.03 total improvement in january)
39)Check tablebase files 4.1.2019
40)Introduce Multi-Cut 6.1.2019 +3.20 elo(8.23 elo total)
41)Flag critical search tree in hash table 9.1.2019 3.21 elo(11.44 total)
42)Small improvements to the CI infrastructure 9.1.2019
43)Minor cleanup to recent 'Flag critical search tree in hash table' patch 10.1.2019
44)Remove pvExact 10.1.2019 +1.72 elo(13.16 total)
45)Simplify time management a bit 14.1.2019 +2.51 elo(15.67 elo total)
46)Simplify pawn moves (#1900) 14.1.2019 no testing at long time control
47)Remove AdjacentFiles 17.1.2019 no testing at long time control
48)Tweak initiative and Pawn PSQT (#1957) 20.1.2019 +1.74 elo(17.41 elo total)
49)Clean-up some shifting in space calculation (#1955) 20.1.2019
50)Simplify pvHit (#1953) 20.1.2019
51)Simplify pondering time management (#1899) no testing with ponder so not relevant
52)Simplify TrappedRook 22.1.2019 +0.44 elo(17.85 total)
53)Use int8_t instead of int for SquareDistance[] 29.1.2019 pure speed up so no testing at LTC and I assume 0 elo
54)Change pinning logic in Static Exchange Evaluation (SEE) 29.1.2019 +1.46 elo(19.31 total)
55)Don't update pvHit after IID 29.1.2019 +0.38 elo(19.69 total)
56)Simplify Stat Score bonus 31.1.2019 +1.34 elo(21.03 total)
Michel
Posts: 2272
Joined: Mon Sep 29, 2008 1:50 am

Re: Stockfish patches

Post by Michel »

Uri wrote:They give elo estimate for every change and if you add all the estimates that they publish the elo improvement shouls be clearly higher.
You have a basic misunderstanding of statistics. The elo estimate calculated by fishtest is (median) unbiased over all patches (both failing and passing). In other words it corrects the inherent SPRT bias.

However if you select only the passed test then you have a different kind of bias which is called "selection bias". The elo estimate does not correct for that and it is not at all obvious how to do so. It would be possible if an elo prior were available.

Note: some time ago an elo prior for fishtest was determined as a normal distribution with mu=-1.013 and sigma=1.101 (logistic elo units). This prior was obtained by minimizing the difference between the empirical distribution of the elo estimates (about 6000 patches) and the theoretical one calculated from the prior. See respectively the histogram and the continuous line in the following graph.

Image

As you see the match is visually rather good. No effort has been made however to see how this elo prior evolves over time.
Ideas=science. Simplification=engineering.
Without ideas there is nothing to simplify.
Uri Blass
Posts: 10269
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: Stockfish patches

Post by Uri Blass »

Michel wrote: Sat Jul 13, 2019 11:12 am
Uri wrote:They give elo estimate for every change and if you add all the estimates that they publish the elo improvement shouls be clearly higher.
You have a basic misunderstanding of statistics. The elo estimate calculated by fishtest is (median) unbiased over all patches (both failing and passing). In other words it corrects the inherent SPRT bias.

However if you select only the passed test then you have a different kind of bias which is called "selection bias". The elo estimate does not correct for that and it is not at all obvious how to do so. It would be possible if an elo prior were available.

Note: some time ago an elo prior for fishtest was determined as a normal distribution with mu=-1.013 and sigma=1.101 (logistic elo units). This prior was obtained by minimizing the difference between the empirical distribution of the elo estimates (about 6000 patches) and the theoretical one calculated from the prior. See respectively the histogram and the continuous line in the following graph.

Image

As you see the match is visually rather good. No effort has been made however to see how this elo prior evolves over time.
I understand that the estimate for the passed patches is biased and wrong and it is exactly the problem and they should not write a biased estimate.
If they want to give a good estimate then the only good way is simply to play a fixed number of games after they decided to accept the patch.

using elo prior distribution for the patches means some assumptions that we do not know if they are correct.
playing 40000 games for every 2 consecutive versions even now will give an unbiased estimate with a possible mistake of 2 elo.

People may complain about using hardware time but I think that it is better for stockfish to use hardware to really have an unbiased estimate for the value of patches because the knowledge may help later to understand what are the good patches that helped stockfish to become better.

I
Michel
Posts: 2272
Joined: Mon Sep 29, 2008 1:50 am

Re: Stockfish patches

Post by Michel »

playing 40000 games for every 2 consecutive versions even now will give an unbiased estimate with a possible mistake of 2 elo.
Getting an accurate elo assessment of patches (or to do any kind or research at all) was never a goal of fishtest. The idea is that the considerable ressources needed for such an assessment should instead be used to test more patches, or to do SPRT tests with narrower bounds, which allows smaller elo patches to succeed more easily. This is simply a choice. One cannot argue with it.
Ideas=science. Simplification=engineering.
Without ideas there is nothing to simplify.