Vinvin wrote: ↑Sat Feb 19, 2022 12:35 pm
I hope dev test at 10s+0.1s will stop. Too short TC can drive SF in a wrong direction as this test shows.
May be minimum time should be 30s+0.3s.
I downloaded it and have to say I am intrigued.
It found and stayed with certain moves a lot faster than the version from a few days ago. Again, I just pull up a special file in Chessbase, open both....and go from position to position observing over say 10 sec to several minutes. 1 or 2 pv usually. If/when they move on from this, I think I may keep it close to me because, it clearly 'works' better on some positions.
The 10.2.2022 SF dev was able to solve some difficult positions at lower depths & time. Now with the 17.2.2022 it is unable to do so.
I guess the changes post 10th Feb, affected the analytical capability of SF.
1. 2b1rk2/5p2/p1P5/2p2P2/2p5/7B/P7/2KR4 w - - 0 1
2. 6br/1KNp1n1r/2p2p2/P1ppRP2/1kP3pP/3PBB2/PN1P4/8 w - - 0 1
3. r5k1/p1pb2bp/3p3r/P1pPp1p1/2B1Pq2/1R2QPp1/1P4PP/5RBK b - - 0 1
4. 4q1kr/p6p/1prQPppB/4n3/4P3/2P5/PP2B2P/R5K1 w - - 0 1
5. 8/8/4kpp1/3p1b2/p6P/2B5/6P1/6K1 b - - 0 47 (Shirov's brilliant Bh3 move position)
Results in testsuites are not a very good measurement for improved results in games, and vice versa. But I guess it makes some sense: (Lots) more extensions, double extensions, I think can cause a move to get 'frozen in' as it were, if with more depth the score rises, or simply because the PV move always is tried first, or simply with time issues (in games), transposition table size maybe. Double extensions everywhere could counter that a bit maybe, but risky if you spend it in nonsense portion of the searchtree, harder to control search explosions etc.
Debugging is twice as hard as writing the code in the first
place. Therefore, if you write the code as cleverly as possible, you
are, by definition, not smart enough to debug it.
-- Brian W. Kernighan
Eelco de Groot wrote: ↑Sun Feb 20, 2022 9:54 am
Results in testsuites are not a very good measurement for improved results in games, and vice versa. But I guess it makes some sense: (Lots) more extensions, double extensions, I think can cause a move to get 'frozen in' as it were, if with more depth the score rises, or simply because the PV move always is tried first, or simply with time issues (in games), transposition table size maybe. Double extensions everywhere could counter that a bit maybe, but risky if you spend it in nonsense portion of the searchtree, harder to control search explosions etc.
There should be some test positions(maybe not from the test suite) when the 1702 stockfish performs better than 1002 stockfish otherwise there is no way it can get more elo(unless the elo improvement is because of better time management or because of some learning to change parameters during the game and that is not the case).
bmp1974 wrote: ↑Sun Feb 20, 2022 8:05 am
The 10.2.2022 SF dev was able to solve some difficult positions at lower depths & time. Now with the 17.2.2022 it is unable to do so.
I guess the changes post 10th Feb, affected the analytical capability of SF.
1. 2b1rk2/5p2/p1P5/2p2P2/2p5/7B/P7/2KR4 w - - 0 1
2. 6br/1KNp1n1r/2p2p2/P1ppRP2/1kP3pP/3PBB2/PN1P4/8 w - - 0 1
3. r5k1/p1pb2bp/3p3r/P1pPp1p1/2B1Pq2/1R2QPp1/1P4PP/5RBK b - - 0 1
4. 4q1kr/p6p/1prQPppB/4n3/4P3/2P5/PP2B2P/R5K1 w - - 0 1
5. 8/8/4kpp1/3p1b2/p6P/2B5/6P1/6K1 b - - 0 47 (Shirov's brilliant Bh3 move position)
Since Shirov's Bh3 is mentioned and others not, I bring the sources of the positions with corrected halfmove and fullmove numbers:
2b1rk2/5p2/p1P5/2p2P2/2p5/7B/P7/2KR4 w - - 1 4
Vasily Smyslov's study from 1938, position before 4.- f6. https://yacpdb.org/#277332
r5k1/p1pb2bp/3p3r/P1pPp1p1/2B1Pq2/1R2QPp1/1P4PP/5RBK b - - 3 28
Aaron Summerscale (0-1) Gawain Jones. British Championship (2009). Position before 28.- ..., Rxh2+. https://www.chessgames.com/perl/chessgame?gid=1552664
Ajedrecista wrote: ↑Sun Feb 20, 2022 12:38 pm
Hello:
bmp1974 wrote: ↑Sun Feb 20, 2022 8:05 am
The 10.2.2022 SF dev was able to solve some difficult positions at lower depths & time. Now with the 17.2.2022 it is unable to do so.
I guess the changes post 10th Feb, affected the analytical capability of SF.
1. 2b1rk2/5p2/p1P5/2p2P2/2p5/7B/P7/2KR4 w - - 0 1
2. 6br/1KNp1n1r/2p2p2/P1ppRP2/1kP3pP/3PBB2/PN1P4/8 w - - 0 1
3. r5k1/p1pb2bp/3p3r/P1pPp1p1/2B1Pq2/1R2QPp1/1P4PP/5RBK b - - 0 1
4. 4q1kr/p6p/1prQPppB/4n3/4P3/2P5/PP2B2P/R5K1 w - - 0 1
5. 8/8/4kpp1/3p1b2/p6P/2B5/6P1/6K1 b - - 0 47 (Shirov's brilliant Bh3 move position)
Since Shirov's Bh3 is mentioned and others not, I bring the sources of the positions with corrected halfmove and fullmove numbers:
2b1rk2/5p2/p1P5/2p2P2/2p5/7B/P7/2KR4 w - - 1 4
Vasily Smyslov's study from 1938, position before 4.- f6. https://yacpdb.org/#277332
r5k1/p1pb2bp/3p3r/P1pPp1p1/2B1Pq2/1R2QPp1/1P4PP/5RBK b - - 3 28
Aaron Summerscale (0-1) Gawain Jones. British Championship (2009). Position before 28.- ..., Rxh2+. https://www.chessgames.com/perl/chessgame?gid=1552664
Eelco de Groot wrote: ↑Fri Feb 18, 2022 10:45 am
Suppose that would be somewhere around 5 minutes per game. Total of 4000000 (4 million) minutes = 66.666,666666666666666666666666667 hours. Say 20 Watt per core, that is 1333 1/3 kWh. It is going to be expensive if you do that in Europe, with current electricity prices.
800 000 games is only upper limit.
Actual number of games played was 4400
Fishtest works different way. It doesn't run assigned number of games to measure ELO diff. It rather stop testing as soon as likely of superiority was proved.
Also, very often, authors and mantainers pause or terminate test run if they feel it won't pass
20 Watt per core is also very high estimate. 95% of fishtest machines run on ridiculously slow nps, maybe 3 Watt per core.
People who own 20 W per core pumps, are very unlikely to donate CPU
Hi Yuri, yes I know that for the final test of the tuning a short SPRT was used. But I don't think that, for the tuning itself, you could do that? Because then you don't test against the master, but each time against a previous iteration. So that will take much longer than 4400 games. (but not 400000 games).
It was only a very simple guess further, to guess the order of ten magnitude of electricity. I just used the number of 84 W TDP for my very new four cores only i5 4440 (https://nl.hardware.info/artikel/4992/i ... e-haswells) It's from 2013. My electricity bills have gone down a lot since my 2005 Athlon single core died, I don't know exactly why. The i5 is not usually running on four cores, but mainly it may be my computer was never cleaned from dust in 16 years. I don't know what else, or it must have been the vidocard and/or the Soundblaster that made it seemingly use much more than my new refurbished Dell.
3(!)W. I did not know that! That seems very low, is there no compensation for a slow NPS then. I think they do that at CCRL but don't know details from Fishtest. I could look it up, it must be somewhere but it was just very rough estimate..
Debugging is twice as hard as writing the code in the first
place. Therefore, if you write the code as cleverly as possible, you
are, by definition, not smart enough to debug it.
-- Brian W. Kernighan
Eelco de Groot wrote: ↑Mon Feb 21, 2022 8:43 pm
3(!)W. I did not know that! That seems very low, is there no compensation for a slow NPS then.
Old Zen1 Ryzen gives 1.20 MNps in fishtest per 10W PPT consumption
Newer Zen2 gives 1.60 MNps in fishtest per 5W PPT consumption
At the moment, 71 machines in fishtest run 1.20 or higher
Only 10 machines run 1.60 or higher
171 more machines run less than 1.20 MNps (out of them 131 machine run less than 1 MNps)
10+W CPU (per 1 fishtest core!) do exist on market, but it is higly unlikely that their owner will donate such CPU to fishtest. At moment maybe 5 out of 240 machines have 10+W per 1th worker
I just used the number of 84 W TDP for my very new four cores only i5 4440
despite it's 84W thermal design specification - under chess load it tops at 27.6W (7W per 1th fishtest worker)
At special burner load (FPU + GPU) it tops at 66W