New Tool

Discussion of anything and everything relating to chess playing software and machines.

Moderators: bob, hgm, Harvey Williamson

Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
User avatar
Rebel
Posts: 4984
Joined: Thu Aug 18, 2011 10:04 am

Re: New Tool

Post by Rebel » Wed Mar 11, 2020 6:58 am

Have been experimenting if the upcoming automatic approach is (somewhat) better or at least equal to the old handcrafted method. I ran Stockfish 1, 2,3 .... Stockfish 11 with the standard STS 1500 positions and a second run with STS 1500 positions automatically created with Lc0. In principle both runs should list the Stockfish versions in the right order.

STS (standard)

Code: Select all

    EPD  : epd\sts.epd
    Time : 1000ms
                                    Top         Top   Max   Total   Time   Hash    
    Engine           Score Rating  Hits  Pos   Rate  Score   Rate    ms     Mb  Cpu
 1  sf9              13447  3584  1233  1500  0.822  15000  0.896   1000    64    1
 2  sf8              13360  3564  1221  1500  0.814  15000  0.891   1000    64    1
 3  sf10             13335  3556  1218  1500  0.812  15000  0.889   1000    64    1
 4  sf11             13265  3536  1204  1500  0.803  15000  0.884   1000    64    1
 5  sf6              13246  3532  1219  1500  0.813  15000  0.883   1000    64    1
 6  sf7              13107  3495  1190  1500  0.793  15000  0.874   1000    64    1
 7  sf5              13089  3491  1191  1500  0.794  15000  0.873   1000    64    1
 8  sf4              13054  3479  1190  1500  0.793  15000  0.870   1000    64    1
 9  sf3              12704  3387  1154  1500  0.769  15000  0.847   1000    64    1
10  sf2              12458  3323  1110  1500  0.740  15000  0.831   1000    64    1
11  sf1              11879  3168  1040  1500  0.693  15000  0.792   1000    64    1
SF11 at rank 4?

Close but no cigar.

STS (lc0)

Code: Select all

    EPD  : epd\sts-lc0.epd
    Time : 1000ms
                                    Top         Top   Max   Total   Time   Hash    
    Engine           Score Rating  Hits  Pos   Rate  Score   Rate    ms     Mb  Cpu
 1  sf11             12846  3423  1121  1500  0.747  15000  0.856   1000    64    1
 2  sf10             12706  3387  1106  1500  0.737  15000  0.847   1000    64    1
 3  sf9              12572  3351  1088  1500  0.725  15000  0.838   1000    64    1
 4  sf8              12551  3347  1091  1500  0.727  15000  0.837   1000    64    1
 5  sf7              12271  3271  1059  1500  0.706  15000  0.818   1000    64    1
 6  sf6              12248  3267  1053  1500  0.702  15000  0.817   1000    64    1
 7  sf5              12017  3204  1029  1500  0.686  15000  0.801   1000    64    1
 8  sf4              11950  3188  1024  1500  0.683  15000  0.797   1000    64    1
 9  sf3              11626  3100  1004  1500  0.669  15000  0.775   1000    64    1
10  sf2              11468  3060   966  1500  0.644  15000  0.765   1000    64    1
11  sf1              10736  2863   896  1500  0.597  15000  0.716   1000    64    1
Nice!

Time to start on the releae version, will take one or two days.
90% of coding is debugging, the other 10% is writing bugs.

User avatar
Rebel
Posts: 4984
Joined: Thu Aug 18, 2011 10:04 am

Re: New Tool

Post by Rebel » Thu Mar 12, 2020 8:08 am

90% of coding is debugging, the other 10% is writing bugs.

Ferdy
Posts: 4196
Joined: Sun Aug 10, 2008 1:15 pm
Location: Philippines

Re: New Tool

Post by Ferdy » Fri Mar 13, 2020 11:05 am

Rebel wrote:
Thu Mar 12, 2020 8:08 am
Release at - http://rebel13.nl/download/tsc.html
Nice tool, but would like to request a scoring feature, using scoring rate percentage as points. So for example if score1 has 100cp and score2 has 80cp, we may use for example.

Code: Select all

def perf(cp):
    K = 0.7  # SF
    pr = 100*1/(1+10**(-K*cp/400))
    
    return pr

where K can be adjusted depending on the engine.

for cp = 100
pr = 59.94% or say 60 points

for cp = 80
pr = 47.95% or say 48 points

epd c0 "m1=60, m2=48 ...";

I did try rescoring the STS but could not find the time to finish it. I use a higher multipv at 16 to cover weaker engines that may not be able to find the top 5 moves or so. This method also targets tuning of uci elo of engines. Example output.

Code: Select all

1qr1k2r/1p2bp2/pBn1p3/P2pPbpp/5P2/2P1QBPP/1P1N3R/R4K2 b k - acd 27; c0 "h4=66, Bd8=54, Rh7=51, Bf8=50, Rf8=50, Kf8=50, Rg8=49, Kd7=49, Bg6=47, gxf4=46, Rd8=45, Bh7=45, d4=43, Bc2=43, Rc7=39, O-O=36"; id "STS(v1) Undermine 007"; Ae "SF11";
In Rc7=39, 39 is the scoring percentage of that move. Top 1 move gets 66 points and top2 move gets 54 points. These points are close they could be winning, but top1 is close to winning at 66 points. At this point gap we can still differentiate which engine is stronger even though both are winning.

majkelnowaq
Posts: 17
Joined: Fri Aug 10, 2018 2:07 pm
Full name: D.S.

Re: New Tool

Post by majkelnowaq » Fri Mar 13, 2020 6:57 pm

Thanks Rebel for this (another from you) useful tool.
As Ferdy wrote: "Nice tool, but would like to request a scoring feature, using scoring rate percentage as points" could be optimal.
I mean if we have 4 moves like: 1. +1.10; 2. 0.9; 3. 0.0; 4. -0.5. We can add 10 points (to manage minus score) to every move (1. +11.1; 2. 10.9; 3. 10.0; 4. 9.5) then divide every move by sum of all and rounded up (11.1/41.5 = 0,27; 10.9/41.5 = 0,26; 10/41.5 = 0,24; 9.5/41.5 = 0,23). If very big or low or even mate points we have to figure out something else.

Other thing is that stockfish doesn't achieve same depth for every multipv moves if is limited by time. Here:
1k1r2r1/1b4p1/p4n1p/1pq1pPn1/2p1P3/P1N2N2/1PB1Q1PP/3R1R1K b - - id "sts-new pos 2 MultiPV=1"; bm Bc6; ce 121; acd 29;
1k1r2r1/1b4p1/p4n1p/1pq1pPn1/2p1P3/P1N2N2/1PB1Q1PP/3R1R1K b - - id "sts-new pos 2 MultiPV=2"; bm Rge8; ce 116; acd 29;
1k1r2r1/1b4p1/p4n1p/1pq1pPn1/2p1P3/P1N2N2/1PB1Q1PP/3R1R1K b - - id "sts-new pos 2 MultiPV=3"; bm Rgf8; ce 127; acd 28;
1k1r2r1/1b4p1/p4n1p/1pq1pPn1/2p1P3/P1N2N2/1PB1Q1PP/3R1R1K b - - id "sts-new pos 2 MultiPV=4"; bm Rh8; ce 125; acd 28;

Multipv 3 and 4 are "better" than multipv 1 because 3 and 4 stopped at depth 28. Komodo reaches equal depths but not sf.It could be fine if we can force multipv by depth not by time. I tried something like set OPTIONS=depth=24 but it doesn't work (it works with mea but with tsc it doesn't produce correct output with multipv).

More multipve like 8 should be considered too. Tsc has options only for 1 or 4 multipv in creating epd tests.

User avatar
Rebel
Posts: 4984
Joined: Thu Aug 18, 2011 10:04 am

Re: New Tool

Post by Rebel » Fri Mar 13, 2020 7:32 pm

Thanks Ferdy, I will give it a try, but isn't it more precise to involve the solution time in the formula? For example, running a set at 5000ms, an engine that finds the best move at 200ms (and is constant) should receive more points than an engine that finds the move at 4700ms.
90% of coding is debugging, the other 10% is writing bugs.

User avatar
Rebel
Posts: 4984
Joined: Thu Aug 18, 2011 10:04 am

Re: New Tool

Post by Rebel » Fri Mar 13, 2020 7:48 pm

Meanwhile I added mea-compare to the tsc package.

mea-compare can do :

1. Produce a texfile that lists those EPD lines an engine failed to get the best move.

Code: Select all

Input  : epd\mats.epd                  
Input  : epd_out\mats_komodo_10.epd           
output.txt created.. Positions 23 | Found 17 | Failed 6

4rnk1/pp1q1ppp/2p4r/3p4/3P4/4P1NP/PPQ2PP1/R3R1K1 w - - c0 "b4=10, a4=8, Rab1=8, Qd1=7";
4rnk1/pp1q1ppp/2p4r/3p4/3P4/4P1NP/PPQ2PP1/R3R1K1 w - - bm Rac1; ce 14; acd 15;

r2q1rk1/pp2bppp/4pn2/3nN1B1/3P4/1B5Q/PP3PPP/3R1RK1 w - - c0 "f4=10, Rfe1=1, Rde1=0, Bc2=0";
r2q1rk1/pp2bppp/4pn2/3nN1B1/3P4/1B5Q/PP3PPP/3R1RK1 w - - bm Rfe1; ce 42; acd 16;

2b1k2r/1p2q1pp/p1p5/P7/3r1P2/1Q1N4/1P1bP1BP/R4RK1 b k - c0 "Be6=10, Rf8=0, Bf5=0, h5=0";
2b1k2r/1p2q1pp/p1p5/P7/3r1P2/1Q1N4/1P1bP1BP/R4RK1 b k - bm Bg4; ce 149; acd 15;

r2q1rk1/1p4pp/4p1n1/pNbpPp2/P1P5/3Q4/1P3PPP/R1B1R1K1 b - - c0 "Qh4=10, f4=23, Bb4=16, dxc4=5";
r2q1rk1/1p4pp/4p1n1/pNbpPp2/P1P5/3Q4/1P3PPP/R1B1R1K1 b - - bm Bb4; ce 60; acd 16;

b7/3knp2/p2p2pb/1p1Pp2p/1B2P1PP/1P1B1P2/P7/5NK1 b - - c0 "hxg4=10, Bc1=5, f6=5, Bf4=0";
b7/3knp2/p2p2pb/1p1Pp2p/1B2P1PP/1P1B1P2/P7/5NK1 b - - bm Bf4; ce -48; acd 19;

2kr3r/pp1n4/2pb2q1/3pp2p/2P5/1B1P1R2/PP1NQ1PP/5RK1 b - - c0 "Nc5=10, e4=7, Nb6=4, Qg4=0";
2kr3r/pp1n4/2pb2q1/3pp2p/2P5/1B1P1R2/PP1NQ1PP/5RK1 b - - bm e4; ce 15; acd 16;
2. Produce a texfile that lists those EPD lines an engine scored zero points.

Code: Select all

Input  : epd\mats.epd                  
Input  : epd_out\mats_komodo_10.epd           
output.txt created.. Positions 23 | Zero point cases 3

4rnk1/pp1q1ppp/2p4r/3p4/3P4/4P1NP/PPQ2PP1/R3R1K1 w - - c0 "b4=10, a4=8, Rab1=8, Qd1=7";
4rnk1/pp1q1ppp/2p4r/3p4/3P4/4P1NP/PPQ2PP1/R3R1K1 w - - bm Rac1; ce 14; acd 15;

2b1k2r/1p2q1pp/p1p5/P7/3r1P2/1Q1N4/1P1bP1BP/R4RK1 b k - c0 "Be6=10, Rf8=0, Bf5=0, h5=0";
2b1k2r/1p2q1pp/p1p5/P7/3r1P2/1Q1N4/1P1bP1BP/R4RK1 b k - bm Bg4; ce 149; acd 15;

b7/3knp2/p2p2pb/1p1Pp2p/1B2P1PP/1P1B1P2/P7/5NK1 b - - c0 "hxg4=10, Bc1=5, f6=5, Bf4=0";
b7/3knp2/p2p2pb/1p1Pp2p/1B2P1PP/1P1B1P2/P7/5NK1 b - - bm Bf4; ce -48; acd 19;
If you think it's useful feature download again - https://rebel13.nl/download/tsc.html

Just double-click mea-compare from the MEA folder.
90% of coding is debugging, the other 10% is writing bugs.

Damir
Posts: 2285
Joined: Mon Feb 11, 2008 2:53 pm
Location: Denmark
Full name: Damir Desevac

Re: New Tool

Post by Damir » Fri Mar 13, 2020 8:13 pm

majkelnowaq wrote:
Fri Mar 13, 2020 6:57 pm
Thanks Rebel for this (another from you) useful tool.
As Ferdy wrote: "Nice tool, but would like to request a scoring feature, using scoring rate percentage as points" could be optimal.
I mean if we have 4 moves like: 1. +1.10; 2. 0.9; 3. 0.0; 4. -0.5. We can add 10 points (to manage minus score) to every move (1. +11.1; 2. 10.9; 3. 10.0; 4. 9.5) then divide every move by sum of all and rounded up (11.1/41.5 = 0,27; 10.9/41.5 = 0,26; 10/41.5 = 0,24; 9.5/41.5 = 0,23). If very big or low or even mate points we have to figure out something else.

Other thing is that stockfish doesn't achieve same depth for every multipv moves if is limited by time. Here:
1k1r2r1/1b4p1/p4n1p/1pq1pPn1/2p1P3/P1N2N2/1PB1Q1PP/3R1R1K b - - id "sts-new pos 2 MultiPV=1"; bm Bc6; ce 121; acd 29;
1k1r2r1/1b4p1/p4n1p/1pq1pPn1/2p1P3/P1N2N2/1PB1Q1PP/3R1R1K b - - id "sts-new pos 2 MultiPV=2"; bm Rge8; ce 116; acd 29;
1k1r2r1/1b4p1/p4n1p/1pq1pPn1/2p1P3/P1N2N2/1PB1Q1PP/3R1R1K b - - id "sts-new pos 2 MultiPV=3"; bm Rgf8; ce 127; acd 28;
1k1r2r1/1b4p1/p4n1p/1pq1pPn1/2p1P3/P1N2N2/1PB1Q1PP/3R1R1K b - - id "sts-new pos 2 MultiPV=4"; bm Rh8; ce 125; acd 28;

Multipv 3 and 4 are "better" than multipv 1 because 3 and 4 stopped at depth 28. Komodo reaches equal depths but not sf.It could be fine if we can force multipv by depth not by time. I tried something like set OPTIONS=depth=24 but it doesn't work (it works with mea but with tsc it doesn't produce correct output with multipv).

More multipve like 8 should be considered too. Tsc has options only for 1 or 4 multipv in creating epd tests.
Hi Majkel

When can we expect new ThothFish ? :) :) :D It has been some time since there has been any new version. I am the only one who is playing with it on Infinity Chess Server. :( :(

User avatar
Rebel
Posts: 4984
Joined: Thu Aug 18, 2011 10:04 am

Re: New Tool

Post by Rebel » Fri Mar 13, 2020 8:56 pm

majkelnowaq wrote:
Fri Mar 13, 2020 6:57 pm
Thanks Rebel for this (another from you) useful tool.
As Ferdy wrote: "Nice tool, but would like to request a scoring feature, using scoring rate percentage as points" could be optimal.
I mean if we have 4 moves like: 1. +1.10; 2. 0.9; 3. 0.0; 4. -0.5. We can add 10 points (to manage minus score) to every move (1. +11.1; 2. 10.9; 3. 10.0; 4. 9.5) then divide every move by sum of all and rounded up (11.1/41.5 = 0,27; 10.9/41.5 = 0,26; 10/41.5 = 0,24; 9.5/41.5 = 0,23). If very big or low or even mate points we have to figure out something else.

Other thing is that stockfish doesn't achieve same depth for every multipv moves if is limited by time. Here:
1k1r2r1/1b4p1/p4n1p/1pq1pPn1/2p1P3/P1N2N2/1PB1Q1PP/3R1R1K b - - id "sts-new pos 2 MultiPV=1"; bm Bc6; ce 121; acd 29;
1k1r2r1/1b4p1/p4n1p/1pq1pPn1/2p1P3/P1N2N2/1PB1Q1PP/3R1R1K b - - id "sts-new pos 2 MultiPV=2"; bm Rge8; ce 116; acd 29;
1k1r2r1/1b4p1/p4n1p/1pq1pPn1/2p1P3/P1N2N2/1PB1Q1PP/3R1R1K b - - id "sts-new pos 2 MultiPV=3"; bm Rgf8; ce 127; acd 28;
1k1r2r1/1b4p1/p4n1p/1pq1pPn1/2p1P3/P1N2N2/1PB1Q1PP/3R1R1K b - - id "sts-new pos 2 MultiPV=4"; bm Rh8; ce 125; acd 28;

Multipv 3 and 4 are "better" than multipv 1 because 3 and 4 stopped at depth 28. Komodo reaches equal depths but not sf.It could be fine if we can force multipv by depth not by time. I tried something like set OPTIONS=depth=24 but it doesn't work (it works with mea but with tsc it doesn't produce correct output with multipv).
Ferdy is the MEA expert here but you can use the depth option as follows:

Code: Select all

set MT=10000  (maximum time)
set OPTIONS="MultiPV=4, depth=24"
More multipve like 8 should be considered too. Tsc has options only for 1 or 4 multipv in creating epd tests.
I knew such a request would coming :D
90% of coding is debugging, the other 10% is writing bugs.

Ferdy
Posts: 4196
Joined: Sun Aug 10, 2008 1:15 pm
Location: Philippines

Re: New Tool

Post by Ferdy » Fri Mar 13, 2020 10:55 pm

Rebel wrote:
Fri Mar 13, 2020 7:32 pm
Thanks Ferdy, I will give it a try, but isn't it more precise to involve the solution time in the formula? For example, running a set at 5000ms, an engine that finds the best move at 200ms (and is constant) should receive more points than an engine that finds the move at 4700ms.
That is possible indeed, but these engines are also capable of changing its bestmove. It may like m1 at 200ms, but maybe it may like m2 at 4700ms. One idea is just to test engines at a lower time of say 200ms, it would become a ranking of engines at that particular time on a particular system.

majkelnowaq
Posts: 17
Joined: Fri Aug 10, 2018 2:07 pm
Full name: D.S.

Re: New Tool

Post by majkelnowaq » Sat Mar 14, 2020 1:32 am

Thanks for reply Rebel, I ll try some tests with depth and I ll be happy if more multipv options ll come.

Damir I don't have new Thothfish but i decided to release some experimental version of unfinished project. Maybe you ll enjoy it.

Here: http://www.talkchess.com/forum3/viewtop ... =2&t=73353

Post Reply