Page 3 of 6
Re: New Tool
Posted: Sat Mar 14, 2020 8:41 am
by Rebel
Ferdy wrote: ↑Fri Mar 13, 2020 11:55 pm
Rebel wrote: ↑Fri Mar 13, 2020 8:32 pm
Thanks Ferdy, I will give it a try, but isn't it more precise to involve the solution time in the formula? For example, running a set at 5000ms, an engine that finds the best move at 200ms (and is constant) should receive more points than an engine that finds the move at 4700ms.
That is possible indeed, but these engines are also capable of changing its bestmove. It may like m1 at 200ms, but maybe it may like m2 at 4700ms. One idea is just to test engines at a lower time of say 200ms, it would become a ranking of engines at that particular time on a particular system.
It's my experience that many engines don't have their
go movetime right below one second, many take too much time, a few others move too fast. Hence I always test at least at 1000ms for reliable results. And even then there is no 100% guarantee, when I ran the latest Lc0 with MT=60000 it moved much too fast, instead of one minute it played at 32 seconds.
Re: New Tool
Posted: Sat Mar 14, 2020 9:11 am
by Ferdy
Rebel wrote: ↑Sat Mar 14, 2020 8:41 am
Ferdy wrote: ↑Fri Mar 13, 2020 11:55 pm
Rebel wrote: ↑Fri Mar 13, 2020 8:32 pm
Thanks Ferdy, I will give it a try, but isn't it more precise to involve the solution time in the formula? For example, running a set at 5000ms, an engine that finds the best move at 200ms (and is constant) should receive more points than an engine that finds the move at 4700ms.
That is possible indeed, but these engines are also capable of changing its bestmove. It may like m1 at 200ms, but maybe it may like m2 at 4700ms. One idea is just to test engines at a lower time of say 200ms, it would become a ranking of engines at that particular time on a particular system.
It's my experience that many engines don't have their
go movetime right below one second, many take too much time, a few others move too fast. Hence I always test at least at 1000ms for reliable results. And even then there is no 100% guarantee, when I ran the latest Lc0 with MT=60000 it moved much too fast, instead of one minute it played at 32 seconds.
Regarding Lc0 moving fast, you can set the option smartpruningfactor to 0 to maximize its search time.
Re: New Tool
Posted: Sun Mar 15, 2020 10:28 am
by Rebel
Ferdy wrote: ↑Fri Mar 13, 2020 12:05 pm
Nice tool, but would like to request a scoring feature, using scoring rate percentage as points. So for example if score1 has 100cp and score2 has 80cp, we may use for example.
Code: Select all
def perf(cp):
K = 0.7 # SF
pr = 100*1/(1+10**(-K*cp/400))
return pr
where K can be adjusted depending on the engine.
for cp = 100
pr = 59.94% or say 60 points
for cp = 80
pr = 47.95% or say 48 points
epd c0 "m1=60, m2=48 ...";
Tried your formula, didn't work for me. Looked at the MEA logfile and everything is present in there to include solution times in the calculation formula.
Re: New Tool
Posted: Sun Mar 15, 2020 11:31 am
by Ferdy
Rebel wrote: ↑Sun Mar 15, 2020 10:28 am
Ferdy wrote: ↑Fri Mar 13, 2020 12:05 pm
Nice tool, but would like to request a scoring feature, using scoring rate percentage as points. So for example if score1 has 100cp and score2 has 80cp, we may use for example.
Code: Select all
def perf(cp):
K = 0.7 # SF
pr = 100*1/(1+10**(-K*cp/400))
return pr
where K can be adjusted depending on the engine.
for cp = 100
pr = 59.94% or say 60 points
for cp = 80
pr = 47.95% or say 48 points
epd c0 "m1=60, m2=48 ...";
Tried your formula, didn't work for me.
Code: Select all
cp = 100
pr = 100*1/(1+10**(-K*cp/400))
a = -K*cp/400
K=0.7
a = -0.7*cp/400 = 0.7*100/400 = -0.175
pr = 100*1/(1+10**a) = 100*1/(1+10^a) = 100/(1 + 10^(-0.175)) = 100/(1 + 0.668) = 100/1.668 = 59.95 or 60
Re: New Tool
Posted: Fri Mar 20, 2020 4:54 pm
by Rebel
UPDATE
I am working on version 1.1, it will support MultiPV 1-8 and include the time key moves are found in the formula that calculates the (bonus) points for positions. As for an
impression I created a small (10 positions) not so hard tactical set and let 11 engines run it at 10 seconds per move.
Version 1.0 will give:
Code: Select all
EPD : epd\eddy.epd
Time : 10000ms
Top Top Max Total Time Hash
Engine Score Rating Hits Pos Rate Score Rate ms Mb Cpu
1 Stockfish 11 8 3200 8 10 0.800 10 0.800 10000 128 1
2 Xiphos 0.6 7 2799 7 10 0.700 10 0.700 10000 128 1
3 Andscacs 0.95 7 2799 7 10 0.700 10 0.700 10000 128 1
4 Wasp 3.75 6 2399 6 10 0.600 10 0.600 10000 128 1
5 Komodo 10 5 2000 5 10 0.500 10 0.500 10000 128 1
6 Ethereal 12 4 1600 4 10 0.400 10 0.400 10000 128 1
7 Laser 1.7 4 1600 4 10 0.400 10 0.400 10000 128 1
8 rofChade 2.2 4 1600 4 10 0.400 10 0.400 10000 128 1
9 Fire 7.1 4 1600 4 10 0.400 10 0.400 10000 128 1
10 RubiChess 1.4 2 800 2 10 0.200 10 0.200 10000 128 1
11 Arasan 21.3 2 800 2 10 0.200 10 0.200 10000 128 1
Created with MEA
by
Ferdinand
Mosca
Stockfish the winner 8/10, Andscacs and Xiphos following with 7 found positions.
With (beta) version version 1.1 that includes time in the formula as now a key move found at 350ms receives much more points when a key move is found at 8000ms.
Code: Select all
EPD : epd\eddy.epd
Time : 10000ms
Top Top Max Total Time Hash
Engine Score Rating Hits Pos Rate Score Rate ms Mb Cpu
1 Andscacs 0.95 210 2799 7 10 0.700 300 0.700 10000 128 1
2 Stockfish 11 155 2068 8 10 0.800 300 0.517 10000 128 1
3 Komodo 10 150 2000 5 10 0.500 300 0.500 10000 128 1
4 Xiphos 0.6 132 1760 7 10 0.700 300 0.440 10000 128 1
5 Fire 7.1 120 1600 4 10 0.400 300 0.400 10000 128 1
6 rofChade 2.2 120 1600 4 10 0.400 300 0.400 10000 128 1
7 Wasp 3.75 119 1588 6 10 0.600 300 0.397 10000 128 1
8 Ethereal 12 102 1360 4 10 0.400 300 0.340 10000 128 1
9 Laser 1.7 95 1268 4 10 0.400 300 0.317 10000 128 1
10 Arasan 21.3 51 680 2 10 0.200 300 0.170 10000 128 1
11 RubiChess 1.4 32 427 2 10 0.200 300 0.107 10000 128 1
Created with MEA
by
Ferdinand
Mosca
Andscacs on top now even though it found less key moves, Andscacs solved them quicker than Stockfish, for the same reason Komodo only 5 points below Stockfish for the second place.
The 10 poistions for MEA:
Code: Select all
r1bk1n1r/pp1n1q1p/2p2p1R/3p4/3PpN2/2NB2Q1/PPP2PP1/2K1R3 w - - bm Bxe4; c0 "Bxe4=1";
2rr2k1/1pqbbppp/p3p3/4p3/2P5/2N1P1P1/PP2QPBP/2RR2K1 w - - bm c5; c0 "c5=1";
r1b1k2r/p2n1ppp/1p2p3/3p2B1/P2P4/1Nn1P3/5PPP/R3KB1R w KQkq - bm f3; c0 "f3=1";
r3qrk1/4bppp/4p3/p2pP2Q/1p1B4/1PpPP3/P1P2RPP/5RK1 w - - bm Rf6; c0 "Rf6=1";
r1b1kb1r/1p1n1ppp/p2ppn2/6BB/2qNP3/2N5/PPP2PPP/R2Q1RK1 w kq - bm Nxe6; c0 "Nxe6=1";
1br1r1k1/1b1q1pp1/p1p2n1p/1p2n3/1P6/PNN1P3/2QBBPPP/4RRK1 b - - bm c5; c0 "c5=1";
2r1qrk1/pp1b1ppp/4pn2/n1b5/8/2NQ1NP1/PP1BPPBP/R2R2K1 w - - bm b4; c0 "b4=1";
r5k1/2p1b1p1/6bp/p4P2/q3pP2/2PnB2P/PP1N4/KR3Q1R b - - bm Nb4; c0 "Nb4=1";
5R2/2kp4/pr6/1p2P3/1p6/8/1PP2PPP/6K1 b - - bm a5; c0 "a5=1";
1R6/p3k1p1/7p/2b1pP2/P1r3P1/B7/7P/7K w - - bm Rc8; c0 "Rc8=1";
Re: New Tool
Posted: Tue Mar 24, 2020 5:41 am
by Rebel
I have given up on STS, it's not only outdated but can't be updated to something useful.
STS in its orginal form:
Code: Select all
EPD : epd\sts.epd
Time : 1000ms
Top Top Max Total Time Hash
Engine Score Rating Hits Pos Rate Score Rate ms Mb Cpu
1 Bouquet 1.5 35920 3192 1226 1500 0.817 45000 0.798 1000 128 1
2 Houdini 1.5 35869 3188 1223 1500 0.815 45000 0.797 1000 128 1
3 Stockfish 11 35124 3124 1208 1500 0.805 45000 0.781 1000 128 1
4 Komodo 10 34622 3076 1196 1500 0.797 45000 0.769 1000 128 1
5 Laser 1.7 34156 3036 1162 1500 0.775 45000 0.759 1000 128 1
6 Ethereal 1.2 33795 3004 1153 1500 0.769 45000 0.751 1000 128 1
7 Xiphos 0.6 33193 2951 1133 1500 0.755 45000 0.738 1000 128 1
8 rofChade 2.2 32768 2911 1119 1500 0.746 45000 0.728 1000 128 1
9 Wasp 3.75 32040 2847 1089 1500 0.726 45000 0.712 1000 128 1
10 Arasan 21.3 30190 2684 1040 1500 0.693 45000 0.671 1000 128 1
11 RubiChess 1.4 29900 2656 1024 1500 0.683 45000 0.664 1000 128 1
12 Fire 7.1 29310 2604 1019 1500 0.679 45000 0.651 1000 128 1
Okay, I aready knew that a couple of years ago, the set is tuned with the Rybka family and its derivatives Houdini, Bouquet, the then strongest engines and it can't be that those 300-400 less programs outperform Stockfish and Komodo.
Code: Select all
EPD : sts-sf11
Time : 1000ms
Top Top Max Total Time Hash
Engine Score Rating Hits Pos Rate Score Rate ms Mb Cpu
1 Houdini 1.5 30125 2676 1026 1500 0.684 45000 0.669 1000 128 1
2 Komodo 10 29570 2628 1023 1500 0.682 45000 0.657 1000 128 1
3 Xiphos 0.6 29569 2628 1011 1500 0.674 45000 0.657 1000 128 1
4 Ethereal 1.2 29103 2588 992 1500 0.661 45000 0.647 1000 128 1
5 Bouquet 1.5 29080 2584 992 1500 0.661 45000 0.646 1000 128 1
6 Laser 1.7 28806 2560 980 1500 0.653 45000 0.640 1000 128 1
7 rofChade 2.2 28693 2552 981 1500 0.654 45000 0.638 1000 128 1
8 Wasp 3.75 27597 2451 938 1500 0.625 45000 0.613 1000 128 1
9 Arasan 21.3 26379 2343 909 1500 0.606 45000 0.586 1000 128 1
10 RubiChess 1.4 25996 2311 889 1500 0.593 45000 0.578 1000 128 1
11 Fire 7.1 25731 2287 895 1500 0.597 45000 0.572 1000 128 1
Running TSC allowing Sf11 at 20 cores with MultiPV=4 at 60.000ms (1 miute) creating more reliable points distribution did not help either, Houdini 1.5 still tops., Bouquet as fifth.
It not only shows how STS was tuned with the Ryba family engines but also the positions were chosen to fit the wished outcome, I will not longer support it, it's beyond hope although it will have value for starters.
----------------------------------------------------------------------------------------------------------
I switched to Jon's Arasan test suite. That helped:
Code: Select all
EPD : epd\arasan.epd
Time : 1000ms
Top Top Max Total Time Hash
Engine Score Rating Hits Pos Rate Score Rate ms Mb Cpu
1 Stockfish 11 1684 1143 62 196 0.316 5880 0.286 1000 128 1
2 Xiphos 0.6 1335 908 48 196 0.245 5880 0.227 1000 128 1
3 Ethereal 1.2 672 456 24 196 0.122 5880 0.114 1000 128 1
4 Komodo 10 623 423 22 196 0.112 5880 0.106 1000 128 1
5 Wasp 3.75 619 419 22 196 0.112 5880 0.105 1000 128 1
6 Houdini 1.5 528 355 19 196 0.096 5880 0.089 1000 128 1
7 Fire 7.1 477 324 17 196 0.086 5880 0.081 1000 128 1
8 Laser 1.7 460 311 16 196 0.081 5880 0.078 1000 128 1
9 rofChade 2.2 393 264 14 196 0.071 5880 0.066 1000 128 1
10 Arasan 21.3 267 179 10 196 0.051 5880 0.045 1000 128 1
11 RubiChess 1.4 252 168 9 196 0.045 5880 0.042 1000 128 1
And at 5000ms (5 seconds)
Code: Select all
EPD : epd\arasan.epd
Time : 5000ms
Top Top Max Total Time Hash
Engine Score Rating Hits Pos Rate Score Rate ms Mb Cpu
1 Stockfish 11 2556 1739 109 196 0.556 5880 0.435 5000 128 1
2 Xiphos 0.6 1828 1243 75 196 0.383 5880 0.311 5000 128 1
3 Komodo 10 1521 1036 60 196 0.306 5880 0.259 5000 128 1
4 Ethereal 1.2 1086 739 46 196 0.235 5880 0.185 5000 128 1
5 Houdini 1.5 1067 723 45 196 0.230 5880 0.181 5000 128 1
6 Wasp 3.75 948 644 40 196 0.204 5880 0.161 5000 128 1
7 rofChade 2.2 942 640 42 196 0.214 5880 0.160 5000 128 1
8 Laser 1.7 829 563 38 196 0.194 5880 0.141 5000 128 1
9 Fire 7.1 675 460 29 196 0.148 5880 0.115 5000 128 1
10 Arasan 21.3 511 343 24 196 0.122 5880 0.086 5000 128 1
11 RubiChess 1.4 322 215 13 196 0.066 5880 0.054 5000 128 1
Re: New Tool
Posted: Tue Mar 24, 2020 9:44 am
by Dann Corbit
Re:"Okay, I aready knew that a couple of years ago, the set is tuned with the Rybka family and its derivatives Houdini, Bouquet, the then strongest engines and it can't be that those 300-400 less programs outperform Stockfish and Komodo."
I don't think I ever used Boquet.
At the start of the test, Rybka was used, but Houdini did not exist yet.
As soon as Komodo and Stockfish were at least the third strongest engines, they were used.
When I first started, the strongest engine available was Rybka. I did not have a 64 bit OS, so it was the 32 bit version that i used. About the middle of the test set, Rybka was no longer used because it was no longer the third strongest.
My formula for the analysis was to use the top three engines at one hour each for each position. It spanned a duration of 6 years or so. Hence the hardware and software were both on the order of 64 times stronger at the end of the test formation than at the beginning.
If the methodology of your test improvement system is correct, it should work for any random collection of positions.
Re: New Tool
Posted: Tue Mar 24, 2020 12:15 pm
by Terje
@Rebel You probably mean Ethereal 12, not 1.2?
Re: New Tool
Posted: Tue Mar 24, 2020 4:45 pm
by Rebel
Yep.
Re: New Tool
Posted: Tue Mar 24, 2020 5:27 pm
by Rebel
Dann Corbit wrote: ↑Tue Mar 24, 2020 9:44 am
Re:"Okay, I aready knew that a couple of years ago, the set is tuned with the Rybka family and its derivatives Houdini, Bouquet, the then strongest engines and it can't be that those 300-400 less programs outperform Stockfish and Komodo."
I don't think I ever used Boquet.
At the start of the test, Rybka was used, but Houdini did not exist yet.
As soon as Komodo and Stockfish were at least the third strongest engines, they were used.
When I first started, the strongest engine available was Rybka. I did not have a 64 bit OS, so it was the 32 bit version that i used. About the middle of the test set, Rybka was no longer used because it was no longer the third strongest.
My formula for the analysis was to use the top three engines at one hour each for each position. It spanned a duration of 6 years or so. Hence the hardware and software were both on the order of 64 times stronger at the end of the test formation than at the beginning.
My admiration for your tireless efforts.
Dann Corbit wrote: ↑Tue Mar 24, 2020 9:44 am
If the methodology of your test improvement system is correct, it should work for any random collection of positions.
Yes, but like with STS, sets created with as base the nowadays strongest engines I suppose these sets will outdate within 3-4-5 years and need refreshment.
Remains those (mainly tactical) sets that have 100% correct best moves, engines will get 10 points for finding the move plus 0-20 points based on how quick they find the move.