Quick comparison of improvement rates
Elo diff 4-5, 5-6
ipon 101, 39
ccrl40/40 64, 32
cegt40/20 94, 70
Release dates
https://chessprogramming.wikispaces.com ... se%20Dates
4-5 284 days
5-6 241 days
So I was wrong that the improvement rate since SF5 has been higher than usual
Elo/day
ipon 0.36 0.16
ccrl 0.23 0.13
cegt 0.33 0.29
-Carl
Why is SF6 so much stronger?
Moderators: hgm, Rebel, chrisw
-
- Posts: 186
- Joined: Fri Oct 10, 2014 10:05 pm
- Location: Berkeley, CA
-
- Posts: 6052
- Joined: Tue Jun 12, 2012 12:41 pm
Re: Why is SF6 so much stronger?
I don't think SF is strong.clumma wrote:How did they make such a leap in one version? The code is available. Is the cause not understood? I looked here and on other forums, and didn't see discussion on this question.
-Carl
It barely makes legal moves.
-
- Posts: 186
- Joined: Fri Oct 10, 2014 10:05 pm
- Location: Berkeley, CA
Re: Why is SF6 so much stronger?
Here are number of commits per pull request I identified earlier:
2 Add bonuses for minors attacking enemy pieces
? Tune trapped rook penalty
? Double mg bonus and half eg bonus
1 King-pawn threat bonus for endgames
2 Evaluate king safety when no queen is present
1 Change history reduction in LMR to be a full ply
3 Remove use of half-ply reductions
3 Add bonuses for each threat instead of max threat value
1 Be more optimistic in aspiration window
5 Halve StormDanger bonus for blocked pawn on A/H file
6 Avoid searching TT twice for the same key/position...
5 Big King Safety tuning
So something like 31/236 commits could be responsible for 50% of the improvement between versions 5 and 6.
I'll try to put these 12 pull requests into four categories: search, SMP, new heuristic, tuning:
Add bonuses for minors attacking enemy pieces *new heuristic*
Tune trapped rook penalty *tuning*
Double mg bonus and half eg bonus *tuning*
King-pawn threat bonus for endgames *new heuristic*
Evaluate king safety when no queen is present *tuning*
Change history reduction in LMR to be a full ply *search*
Remove use of half-ply reductions *search*
Add bonuses for each threat instead of max threat value *tuning*
Be more optimistic in aspiration window *search*
Halve StormDanger bonus for blocked pawn on A/H file *new heuristic*
Avoid searching TT twice for the same key/position... *search*
Big King Safety tuning *tuning*
The tally is:
* search 4
* SMP 0
* new heuristic 3
* tuning 5
Only in the tuning bucket is the 'reason' for improvements usually unknown.
-Carl
2 Add bonuses for minors attacking enemy pieces
? Tune trapped rook penalty
? Double mg bonus and half eg bonus
1 King-pawn threat bonus for endgames
2 Evaluate king safety when no queen is present
1 Change history reduction in LMR to be a full ply
3 Remove use of half-ply reductions
3 Add bonuses for each threat instead of max threat value
1 Be more optimistic in aspiration window
5 Halve StormDanger bonus for blocked pawn on A/H file
6 Avoid searching TT twice for the same key/position...
5 Big King Safety tuning
So something like 31/236 commits could be responsible for 50% of the improvement between versions 5 and 6.
I'll try to put these 12 pull requests into four categories: search, SMP, new heuristic, tuning:
Add bonuses for minors attacking enemy pieces *new heuristic*
Tune trapped rook penalty *tuning*
Double mg bonus and half eg bonus *tuning*
King-pawn threat bonus for endgames *new heuristic*
Evaluate king safety when no queen is present *tuning*
Change history reduction in LMR to be a full ply *search*
Remove use of half-ply reductions *search*
Add bonuses for each threat instead of max threat value *tuning*
Be more optimistic in aspiration window *search*
Halve StormDanger bonus for blocked pawn on A/H file *new heuristic*
Avoid searching TT twice for the same key/position... *search*
Big King Safety tuning *tuning*
The tally is:
* search 4
* SMP 0
* new heuristic 3
* tuning 5
Only in the tuning bucket is the 'reason' for improvements usually unknown.
-Carl
-
- Posts: 2684
- Joined: Sat Jun 14, 2008 9:17 pm
Re: Why is SF6 so much stronger?
To reliable verify that a patch is stronger than default requries a lot of resources, reliable measuring of how much it is stronger requires even more resources.clumma wrote: The tally is:
* search 4
* SMP 0
* new heuristic 3
* tuning 5
We, in SF, consciously gave up to know the second answer and just rely on knowing the first answer for development (this was done to optimize the use of resources/time and reduce the queue time for submitted tests, this is important to keep the "momentum" going on).
That's the only fact, all other argumentation could be interesting just for discussing.
-
- Posts: 186
- Joined: Fri Oct 10, 2014 10:05 pm
- Location: Berkeley, CA
Re: Why is SF6 so much stronger?
Of course. And the verification dynamically stops when a certain LLR is reached, correct? And the sooner this happens, the bigger the Elo improvement is likely to be?mcostalba wrote:To reliable verify that a patch is stronger than default requries a lot of resources, reliable measuring of how much it is stronger requires even more resources.
I saw results like these in the commit logs
LLR: 2.96 (-2.94,2.94) [-3.00,1.00]
Total: 7647 W: 1356 L: 1214 D: 5077
and basically just took the number of wins, assuming that the smaller the number, the bigger the Elo difference of the change. I sometimes also estimated the win/loss ratio by eye. Am I making wrong assumptions here?
Second question: Is the cause of the Elo regression with version cd065dd known? (top graph at http://tests.stockfishchess.org/regression ) Or is it just measurement error? Changes in that version look innocuous
https://github.com/zamar/Stockfish/comp ... ...cd065dd
-Carl
-
- Posts: 2684
- Joined: Sat Jun 14, 2008 9:17 pm
Re: Why is SF6 so much stronger?
Yes, your assumption can be greatly misleading. Although it is true that if a patch has a big ELO advantage the test (on average) stops earlier than if a patch has a small advantage, the SPRT statistic is indeed complex and given 2 tests, one that stops earlier than the other, very little can be said on the absolute ELO value of the 2 patches. So better don't make assumptions.clumma wrote: Am I making wrong assumptions here?
Yes, it is just measurement error, there is nothing in the patch that could regress. Testing patches is _not_ easy nor trivial. It took SF a long process to reach a reliable testing methodology and anyhow still today, seldom we have some false positives, some bad patch that looks good (rarely) or some good patches that look bad (more often, and is a conscious trade-off also this one).clumma wrote:
Or is it just measurement error?
-
- Posts: 186
- Joined: Fri Oct 10, 2014 10:05 pm
- Location: Berkeley, CA
Re: Why is SF6 so much stronger?
Thanks. Back to the drawing board I guess.