New Tool

Discussion of anything and everything relating to chess playing software and machines.

Moderators: Harvey Williamson, bob, hgm

Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
User avatar
Rebel
Posts: 5686
Joined: Thu Aug 18, 2011 10:04 am

New Tool

Post by Rebel » Sun Mar 08, 2020 8:45 am

I have created a tool that can make from any EPD file a STS-like Test Suite. It's not available yet (need to write the documentation first and think of a suitable name) but can present the first results.

The STS (Strategic Test Suite) was created by Swamithan and Dann Corbit and contains 1500 carefully chosen positions with multiple good moves varying with bonus points in the range of 10 to 1. The 1500 positions were checked and double checked with the best software available at the time. And herein lies a problem, during the years engines has increased 300-400 elo points and the set has become outdated as one can see when run the STS set with about 12 random picked strong engines, running MEA we get:

Code: Select all

    EPD  : sts.epd
    Time : 1000ms
                                    Top         Top   Max   Total   Time   Hash    
    Engine           Score Rating  Hits  Pos   Rate  Score   Rate    ms     Mb  Cpu
 1  Stockfish 11     13266  3536  1203  1500  0.802  15000  0.884   1000    64    1
 2  Komodo 10        13089  3491  1193  1500  0.795  15000  0.873   1000    64    1
 3  Ethereal 12      12876  3431  1171  1500  0.781  15000  0.858   1000    64    1
 4  Laser 1.7        12868  3431  1164  1500  0.776  15000  0.858   1000    64    1
 5  Xiphos 0.6       12815  3415  1149  1500  0.766  15000  0.854   1000    64    1
 6  Andscacs 0.95    12774  3407  1161  1500  0.774  15000  0.852   1000    64    1
 7  rofChade 2.2     12608  3363  1121  1500  0.747  15000  0.841   1000    64    1
 8  lc0              12595  3359  1123  1500  0.749  15000  0.840   1000    64    1
 9  Wasp 3.75        12417  3311  1099  1500  0.733  15000  0.828   1000    64    1
10  Arasan 21.3      12033  3208  1052  1500  0.701  15000  0.802   1000    64    1
11  Fire 7.1         11892  3172  1023  1500  0.682  15000  0.793   1000    64    1
12  RubiChess 1.4    11863  3164  1030  1500  0.687  15000  0.791   1000    64    1

                                    Created with MEA
                                          by
                                       Ferdinand
                                         Mosca
And note the the 8th rank of Lc0 ?

The tool tries to fix this problem, no more handcraft, all automatic. How it works:
1. I ran the 1500 positions with SF11, MultiPV=4, 20 cores, 60 seconds per move, result: sts-sf11.epd
2. I ran the 1500 positions with Lc0, MultiPV=4, 60 seconds per move, result: sts-lc0.epd

The best move gets 10 points and based on the evaluation score differences the points of next 3 moves are calculated.

3. Combine sts-sf11 and sts-lc0, result: sts-2020.epd. points are added when moves are equal, else moves are added.

The results, first sts-sf11.epd and of course SF11 is excluded.

Code: Select all

    EPD  : sts-sf11.epd
    Time : 1000ms
                                    Top         Top   Max   Total   Time   Hash    
    Engine           Score Rating  Hits  Pos   Rate  Score   Rate    ms     Mb  Cpu
 1  lc0              12410  3307  1066  1500  0.711  15000  0.827   1000    64    1
 2  Komodo 10        12210  3255  1032  1500  0.688  15000  0.814   1000    64    1
 3  Ethereal 12      11991  3196  1011  1500  0.674  15000  0.799   1000    64    1
 4  Xiphos 0.6       11903  3176  1008  1500  0.672  15000  0.794   1000    64    1
 5  Laser 1.7        11859  3164  1001  1500  0.667  15000  0.791   1000    64    1
 6  Andscacs 0.95    11850  3160   991  1500  0.661  15000  0.790   1000    64    1
 7  rofChade 2.2     11780  3140   986  1500  0.657  15000  0.785   1000    64    1
 8  Wasp 3.75        11417  3044   949  1500  0.633  15000  0.761   1000    64    1
 9  Arasan 21.3      11083  2955   910  1500  0.607  15000  0.739   1000    64    1
10  Fire 7.1         11032  2939   903  1500  0.602  15000  0.735   1000    64    1
11  RubiChess 1.4    10762  2867   891  1500  0.594  15000  0.717   1000    64    1

                                    Created with MEA
                                          by
                                       Ferdinand
                                         Mosca
And note Lc0 at its rightful rank one.

Second sts-lc0.epd and of course Lc0 is excluded.

Code: Select all

    EPD  : sts-lc0.epd
    Time : 1000ms
                                    Top         Top   Max   Total   Time   Hash    
    Engine           Score Rating  Hits  Pos   Rate  Score   Rate    ms     Mb  Cpu
 1  Stockfish 11     12813  3415  1121  1500  0.747  15000  0.854   1000    64    1
 2  Ethereal 12      12265  3271  1057  1500  0.705  15000  0.818   1000    64    1
 3  Komodo 10        12250  3267  1055  1500  0.703  15000  0.817   1000    64    1
 4  Laser 1.7        12158  3244  1040  1500  0.693  15000  0.811   1000    64    1
 5  Xiphos 0.6       12052  3212  1043  1500  0.695  15000  0.803   1000    64    1
 6  rofChade 2.2     11997  3200  1028  1500  0.685  15000  0.800   1000    64    1
 7  Andscacs 0.95    11920  3180  1011  1500  0.674  15000  0.795   1000    64    1
 8  Wasp 3.75        11543  3080   978  1500  0.652  15000  0.770   1000    64    1
 9  Arasan 21.3      11334  3024   954  1500  0.636  15000  0.756   1000    64    1
10  RubiChess 1.4    11234  2995   942  1500  0.628  15000  0.749   1000    64    1
11  Fire 7.1         11150  2971   929  1500  0.619  15000  0.743   1000    64    1

                                    Created with MEA
                                          by
                                       Ferdinand
                                         Mosca
And finally sts-2020.epd, SF11 and Lc0 excluded.

Code: Select all

    EPD  : sts-2020.epd
    Time : 1000ms
                                    Top         Top   Max   Total   Time   Hash    
    Engine           Score Rating  Hits  Pos   Rate  Score   Rate    ms     Mb  Cpu
 1  Komodo 10        16194  3427  1025  1500  0.683  18888  0.857   1000    64    1
 2  Xiphos 0.6       16062  3399  1011  1500  0.674  18888  0.850   1000    64    1
 3  Ethereal 12      16044  3395  1010  1500  0.673  18888  0.849   1000    64    1
 4  Laser 1.7        16041  3395  1000  1500  0.667  18888  0.849   1000    64    1
 5  rofChade 2.2     16033  3395   986  1500  0.657  18888  0.849   1000    64    1
 6  Andscacs 0.95    15911  3367   988  1500  0.659  18888  0.842   1000    64    1
 7  Wasp 3.75        15654  3315   951  1500  0.634  18888  0.829   1000    64    1
 8  Fire 7.1         15588  3299   901  1500  0.601  18888  0.825   1000    64    1
 9  Arasan 21.3      15354  3251   909  1500  0.606  18888  0.813   1000    64    1
10  RubiChess 1.4    15045  3188   890  1500  0.593  18888  0.797   1000    64    1

                                    Created with MEA
                                          by
                                       Ferdinand
                                         Mosca
More in the next post.
90% of coding is debugging, the other 10% is writing bugs.

User avatar
Rebel
Posts: 5686
Joined: Thu Aug 18, 2011 10:04 am

Re: New Tool

Post by Rebel » Sun Mar 08, 2020 9:12 am

Download at : http://rebel13.nl/dl/mea.7z

1. You can view the results in more detail.

2. You can run other pre-installed test suites, just run the batch files.

3. To increase the time control change the MT (MoveTime) parameter in the batch file.
90% of coding is debugging, the other 10% is writing bugs.

Alayan
Posts: 429
Joined: Tue Nov 19, 2019 7:48 pm
Full name: Alayan Feh

Re: New Tool

Post by Alayan » Sun Mar 08, 2020 3:44 pm

A significant issue with this approach is that Leela's MultiPV is different from typical MultiPV. It outputs score and moves, but it doesn't devote any additional resources to actually check the other lines more than usually. This means that if it likes a lot one move, the ordering/eval of other available moves is much more likely to be incorrect than it would be if it actually checked other moves with as much nodes.

EDIT: Also, how is Ethereal 12 beating Xiphos 0.6 in both the SF11 and the Lc0 STS, but losing in the merged one ? I think the way points are awarded to moves when you merge might be improved.

User avatar
Rebel
Posts: 5686
Joined: Thu Aug 18, 2011 10:04 am

Re: New Tool

Post by Rebel » Sun Mar 08, 2020 7:25 pm

Alayan wrote:
Sun Mar 08, 2020 3:44 pm
A significant issue with this approach is that Leela's MultiPV is different from typical MultiPV. It outputs score and moves, but it doesn't devote any additional resources to actually check the other lines more than usually. This means that if it likes a lot one move, the ordering/eval of other available moves is much more likely to be incorrect than it would be if it actually checked other moves with as much nodes.
Ok, if that's a fact then MultiPV makes no sense and only the best move should count. But I like to hear the opinion of others first before taking action.
EDIT: Also, how is Ethereal 12 beating Xiphos 0.6 in both the SF11 and the Lc0 STS, but losing in the merged one ? I think the way points are awarded to moves when you merge might be improved.
Look at the small differences between Ethereal and Xiphos, almost neglectable. Likely at 500ms or 1500ms it all might change, you know how it goes. And last, 1500 positions at 1000ms is not exactly exact science.
90% of coding is debugging, the other 10% is writing bugs.

Alayan
Posts: 429
Joined: Tue Nov 19, 2019 7:48 pm
Full name: Alayan Feh

Re: New Tool

Post by Alayan » Sun Mar 08, 2020 8:07 pm

Rebel wrote:
Sun Mar 08, 2020 7:25 pm
Look at the small differences between Ethereal and Xiphos, almost neglectable. Likely at 500ms or 1500ms it all might change, you know how it goes. And last, 1500 positions at 1000ms is not exactly exact science.
I sort of assumed you ran the positions only once and counted the points in 3 different ways, but if you ran the positions thrice, with 1s/move, noise could explain variability.

That said, depending on how often SF or Leela have a top4 moves that the other has not, it might make sense to run multiPV=6 or so, to limit the cut-off to 0 at the 5th move.

User avatar
Rebel
Posts: 5686
Joined: Thu Aug 18, 2011 10:04 am

Re: New Tool

Post by Rebel » Sun Mar 08, 2020 8:17 pm

Alayan wrote:
Sun Mar 08, 2020 8:07 pm
Rebel wrote:
Sun Mar 08, 2020 7:25 pm
Look at the small differences between Ethereal and Xiphos, almost neglectable. Likely at 500ms or 1500ms it all might change, you know how it goes. And last, 1500 positions at 1000ms is not exactly exact science.
I sort of assumed you ran the positions only once and counted the points in 3 different ways, but if you ran the positions thrice, with 1s/move, noise could explain variability.

That said, depending on how often SF or Leela have a top4 moves that the other has not, it might make sense to run multiPV=6 or so, to limit the cut-off to 0 at the 5th move.
Or choose another second engine (Komodo for instance) that correctly supports MultiPV. Or maybe soon Ethereal :wink:
90% of coding is debugging, the other 10% is writing bugs.

User avatar
Ovyron
Posts: 4399
Joined: Tue Jul 03, 2007 2:30 am

Re: New Tool

Post by Ovyron » Mon Mar 09, 2020 5:12 am

Rebel wrote:
Sun Mar 08, 2020 7:25 pm
Ok, if that's a fact then MultiPV makes no sense and only the best move should count. But I like to hear the opinion of others first before taking action.
That's right, in some positions the only thing that matters is Leela's main PV, the rest of the MultiPV lines are random garbage.

However, you can take a look at how nonsensical they are by their length, the closer their length to the mainline's length, the more useful they are, to the point a secondary PV line with the same length does actually have moves and scores that would count.

You can pretend Leela doesn't support MultiPV except for the cases where the length of the PVs match, though you can see them become more and more useful over time (there's points where a non-best PV has a better score than others but not enough length, so you can see live how it gets longer and eventually becomes mainline - so when it does you know the line it overtook is also valid.)

So you could check their length.

User avatar
Ovyron
Posts: 4399
Joined: Tue Jul 03, 2007 2:30 am

Re: New Tool

Post by Ovyron » Mon Mar 09, 2020 5:17 am

(note it makes sense to use MultiPV Leela because using it doesn't slow her down, so you get the extra info for free)

User avatar
Rebel
Posts: 5686
Joined: Thu Aug 18, 2011 10:04 am

Re: New Tool

Post by Rebel » Mon Mar 09, 2020 9:18 pm

Ovyron wrote:
Mon Mar 09, 2020 5:17 am
(note it makes sense to use MultiPV Leela because using it doesn't slow her down, so you get the extra info for free)
I know, it's part of the MTCS.

Had a look at the Lc0 MultiPV, it's not garbage although I am not entirely sure. Will keep on using Lc0 for the moment. Created another Testsuite from the (mainly positional based) positions I used in the 80's to improve Rebel, a time when there was no internet (imagine that), no PGN, no EPD, handtyping positions and storing them in your own format. 635 positions and the result looks quite good, more difficult than STS.

rebel-sf11.epd - 635 positions analyzed with SF11 at 1 minute per move on 20 cores:

Code: Select all

    EPD  : epd\rebel-sf11.epd
    Time : 1000ms
                                    Top         Top   Max   Total   Time   Hash    
    Engine           Score Rating  Hits  Pos   Rate  Score   Rate    ms     Mb  Cpu
 1  Komodo 10         4717  2971   394   635  0.620   6350  0.743   1000    64    1
 2  Xiphos 0.6        4637  2919   383   635  0.603   6350  0.730   1000    64    1
 3  Wasp 3.75         4555  2867   376   635  0.592   6350  0.717   1000    64    1
 4  Laser 1.7         4476  2819   370   635  0.583   6350  0.705   1000    64    1
 5  Ethereal 12       4422  2783   352   635  0.554   6350  0.696   1000    64    1
 6  Andscacs 0.95     4414  2779   367   635  0.578   6350  0.695   1000    64    1
 7  rofChade 2.2      4389  2763   350   635  0.551   6350  0.691   1000    64    1
 8  Arasan 21.3       4315  2720   359   635  0.565   6350  0.680   1000    64    1
 9  RubiChess 1.4     4193  2640   335   635  0.528   6350  0.660   1000    64    1
10  Fire 7.1          4135  2604   327   635  0.515   6350  0.651   1000    64    1

                                    Created with MEA
                                          by
                                       Ferdinand
                                         Mosca
rebel-lc0-epd - 635 positions analyzed with Lc0 at 1 minute per move with a GTX 1060-6mB

Code: Select all

    EPD  : epd\rebel-lc0.epd
    Time : 1000ms
                                    Top         Top   Max   Total   Time   Hash    
    Engine           Score Rating  Hits  Pos   Rate  Score   Rate    ms     Mb  Cpu
 1  Stockfish 11      5024  3164   412   635  0.649   6350  0.791   1000    64    1
 2  Xiphos 0.6        4813  3032   403   635  0.635   6350  0.758   1000    64    1
 3  Komodo 10         4807  3028   401   635  0.631   6350  0.757   1000    64    1
 4  Laser 1.7         4561  2871   383   635  0.603   6350  0.718   1000    64    1
 5  Wasp 3.75         4535  2855   383   635  0.603   6350  0.714   1000    64    1
 6  Ethereal 12       4506  2839   377   635  0.594   6350  0.710   1000    64    1
 7  rofChade 2.2      4493  2831   376   635  0.592   6350  0.708   1000    64    1
 8  Fire 7.1          4401  2771   366   635  0.576   6350  0.693   1000    64    1
 9  Andscacs 0.95     4318  2720   359   635  0.565   6350  0.680   1000    64    1
10  Arasan 21.3       4271  2692   357   635  0.562   6350  0.673   1000    64    1
11  RubiChess 1.4     4219  2656   347   635  0.546   6350  0.664   1000    64    1

                                    Created with MEA
                                          by
                                       Ferdinand
                                         Mosca
rebel-2020-epd - rebel-sf11-epd and rebel-lc0-epd combined:

Code: Select all

    EPD  : epd\rebel-2020.epd
    Time : 1000ms
                                    Top         Top   Max   Total   Time   Hash    
    Engine           Score Rating  Hits  Pos   Rate  Score   Rate    ms     Mb  Cpu
 1  Komodo 10         6699  3104   393   635  0.619   8633  0.776   1000    64    1
 2  Xiphos 0.6        6665  3088   390   635  0.614   8633  0.772   1000    64    1
 3  Ethereal 12       6639  3076   376   635  0.592   8633  0.769   1000    64    1
 4  rofChade 2.2      6637  3076   376   635  0.592   8633  0.769   1000    64    1
 5  Laser 1.7         6578  3048   384   635  0.605   8633  0.762   1000    64    1
 6  Wasp 3.75         6483  3004   379   635  0.597   8633  0.751   1000    64    1
 7  Fire 7.1          6467  2995   366   635  0.576   8633  0.749   1000    64    1
 8  RubiChess 1.4     6409  2967   346   635  0.545   8633  0.742   1000    64    1
 9  Andscacs 0.95     6255  2899   358   635  0.564   8633  0.725   1000    64    1
10  Arasan 21.3       6253  2895   357   635  0.562   8633  0.724   1000    64    1

                                    Created with MEA
                                          by
                                       Ferdinand
                                         Mosca
90% of coding is debugging, the other 10% is writing bugs.

Sergio Martinez
Posts: 1974
Joined: Fri Oct 05, 2012 12:35 pm
Location: Spain
Contact:

Re: New Tool

Post by Sergio Martinez » Tue Mar 10, 2020 10:36 pm

Thanks you very much Ed. Your tools always are useful :)
Member of the CCRL Group. Write me if you want I test your engine.

Post Reply