STS re-re-re-re-re-visited

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

Ferdy
Posts: 4833
Joined: Sun Aug 10, 2008 3:15 pm
Location: Philippines

Re: STS re-re-re-re-re-visited

Post by Ferdy »

criko wrote: Wed Oct 26, 2022 11:56 am Hi all,

will it be possible to set other engine parameters within the batch file?
f.e. GoldDigger for ShashChess.

cu Chris
You can access help of mea with

Code: Select all

./mea.exe --help
from command prompt.

To set engine param or option, use the --eoption option.

Example:

Single option

Code: Select all

--eoption "GoldDigger=true"
Two options

Code: Select all

--eoption "GoldDigger=true,High Tal=true"
Typical command line with engine option and logging enabled to see if the option is sent properly to the engine by examining the log file.

Code: Select all

mea.exe --engine %EXE% --name %NAME% --hash %HASH% --threads %THREADS% --protocol %PROTOCOL% --epd %EPD% --movetime %MT% --eoption "GoldDigger=true" --log
Ferdy
Posts: 4833
Joined: Sun Aug 10, 2008 3:15 pm
Location: Philippines

Re: STS re-re-re-re-re-visited

Post by Ferdy »

Make an attempt to analyze the STS at higher depths. I write a tool to analyze an epd and output its analysis to csv file. You can open the sts_positions sheet, I tried to track the progress of analysis. Also started to check which positions will be replaced. The tool csv output so far are in the csv folder, also in the repository analysis folder. If you want to increase the analysis depth of the analysis that was already done you can do so.

If you want to help specially on the analysis, message me, I will give you write access to the sheet and be able to upload csv file to csv folder. I may need your gmail email.

Later I will use the csv analysis output files to build the test suite with up to 10 top moves similar to STS1-STS15_LAN_v6.epd.
chrisw
Posts: 4317
Joined: Tue Apr 03, 2012 4:28 pm

Re: STS re-re-re-re-re-visited

Post by chrisw »

Ferdy wrote: Wed Oct 26, 2022 4:22 pm Make an attempt to analyze the STS at higher depths. I write a tool to analyze an epd and output its analysis to csv file. You can open the sts_positions sheet, I tried to track the progress of analysis. Also started to check which positions will be replaced. The tool csv output so far are in the csv folder, also in the repository analysis folder. If you want to increase the analysis depth of the analysis that was already done you can do so.

If you want to help specially on the analysis, message me, I will give you write access to the sheet and be able to upload csv file to csv folder. I may need your gmail email.

Later I will use the csv analysis output files to build the test suite with up to 10 top moves similar to STS1-STS15_LAN_v6.epd.
It’s also a useful tool to test dev versions to ensure one’s engine isn’t suddenly broken someway. It’s really quite sensitive to mods and tuning and also seems to rank engines well (anyway, in the top range). With more test positions, it would be quite a reliable first shot progress-o-meter.
criko
Posts: 138
Joined: Wed Feb 28, 2018 2:50 pm

Re: STS re-re-re-re-re-visited

Post by criko »

Short example with depth 33 and SF15

but it would be better to have a command line with all epd positions, or a a whole epd suites.
just my 2 cent

Code: Select all

epd,move,eval,depth,pv,engine
1kr5/3n4/q3p2p/p2n2p1/PppB1P2/5BP1/1P2Q2P/3R2K1 w - -,f4f5,559,33,f4f5 d5c7 d4g7 d7c5 f5f6 c4c3 f6f7,Stockfish 15
1kr5/3n4/q3p2p/p2n2p1/PppB1P2/5BP1/1P2Q2P/3R2K1 w - -,d4f2,346,33,d4f2 c8c6 f4f5 c4c3 e2a6 c6a6 b2c3,Stockfish 15
1kr5/3n4/q3p2p/p2n2p1/PppB1P2/5BP1/1P2Q2P/3R2K1 w - -,f4g5,191,33,f4g5 h6g5 d1e1 d5e7 f3g4 e7d5 d4f2,Stockfish 15
1kr5/3n4/q3p2p/p2n2p1/PppB1P2/5BP1/1P2Q2P/3R2K1 w - -,b2b3,187,33,b2b3 g5f4 g3f4 c8c7 f4f5 d5c3 d4c3,Stockfish 15
1kr5/3n4/q3p2p/p2n2p1/PppB1P2/5BP1/1P2Q2P/3R2K1 w - -,d4g7,136,33,d4g7 g5f4 f3d5 e6d5 d1d5 a6c6 d5b5,Stockfish 15
1kr5/3n4/q3p2p/p2n2p1/PppB1P2/5BP1/1P2Q2P/3R2K1 w - -,f3g4,131,33,f3g4 c8c6 f4f5 c4c3 e2a6 c6a6 f5e6,Stockfish 15
1kr5/3n4/q3p2p/p2n2p1/PppB1P2/5BP1/1P2Q2P/3R2K1 w - -,g1h1,65,33,g1h1 g5f4 g3f4 c8f8 d4g7 f8f5 d1d4,Stockfish 15
1kr5/3n4/q3p2p/p2n2p1/PppB1P2/5BP1/1P2Q2P/3R2K1 w - -,d1e1,40,33,d1e1 g5f4 g3f4 c8f8 e1c1 f8f5 c1c4,Stockfish 15
1kr5/3n4/q3p2p/p2n2p1/PppB1P2/5BP1/1P2Q2P/3R2K1 w - -,d4e3,32,33,d4e3 g5f4 e3f4 d5f4 g3f4 a6b6 g1g2,Stockfish 15
1kr5/3n4/q3p2p/p2n2p1/PppB1P2/5BP1/1P2Q2P/3R2K1 w - -,h2h3,28,33,h2h3 g5f4 g3f4 c8f8 d1c1 f8f5 c1c4,Stockfish 15
Ferdy
Posts: 4833
Joined: Sun Aug 10, 2008 3:15 pm
Location: Philippines

Re: STS re-re-re-re-re-visited

Post by Ferdy »

chrisw wrote: Wed Oct 26, 2022 6:01 pm
Ferdy wrote: Wed Oct 26, 2022 4:22 pm Make an attempt to analyze the STS at higher depths. I write a tool to analyze an epd and output its analysis to csv file. You can open the sts_positions sheet, I tried to track the progress of analysis. Also started to check which positions will be replaced. The tool csv output so far are in the csv folder, also in the repository analysis folder. If you want to increase the analysis depth of the analysis that was already done you can do so.

If you want to help specially on the analysis, message me, I will give you write access to the sheet and be able to upload csv file to csv folder. I may need your gmail email.

Later I will use the csv analysis output files to build the test suite with up to 10 top moves similar to STS1-STS15_LAN_v6.epd.
It’s also a useful tool to test dev versions to ensure one’s engine isn’t suddenly broken someway. It’s really quite sensitive to mods and tuning and also seems to rank engines well (anyway, in the top range). With more test positions, it would be quite a reliable first shot progress-o-meter.
Checking other engine's analyses is certainly better. But for now, the focus is on sf15. There is a lot of work to be done to improve STS. More test positions are always better.

One of the priorities here is the theme. Finding a good test position takes time, and classifying it is another issue. A good test position that I tried to find is something that has a nice eval distribution [300, 200, 150, 120, 80, 50, 20, 0, -20, -50] and that top 1 move is not easy to find by at least a 2200 CCRL blitz engine under blitz condition. On the eval distribution, I plan to use the stdev on the 10 evals. If high or within a predefined threshold, save it. On finding a good test position, one way is to scan the games played by top engines and let the lower-rated engine evaluate some of the positions there.
Ferdy
Posts: 4833
Joined: Sun Aug 10, 2008 3:15 pm
Location: Philippines

Re: STS re-re-re-re-re-visited

Post by Ferdy »

criko wrote: Wed Oct 26, 2022 6:05 pm Short example with depth 33 and SF15

but it would be better to have a command line with all epd positions, or a a whole epd suites.
just my 2 cent
That can be certainly done, but I will not burden people by analyzing too many positions. Just take one or two, analyze it deeply and go. Come back if you still have time. Also the idea of the current system is to analyze just one position and keep the analysis in a single file. That single file can be accessed easily when somebody wants to see the analysis of a certain position.
peter
Posts: 3186
Joined: Sat Feb 16, 2008 7:38 am
Full name: Peter Martan

Re: STS re-re-re-re-re-visited

Post by peter »

Ferdy wrote: Thu Oct 27, 2022 5:25 am
criko wrote: Wed Oct 26, 2022 6:05 pm Short example with depth 33 and SF15

but it would be better to have a command line with all epd positions, or a a whole epd suites.
just my 2 cent
That can be certainly done, but I will not burden people by analyzing too many positions. Just take one or two, analyze it deeply and go. Come back if you still have time. Also the idea of the current system is to analyze just one position and keep the analysis in a single file. That single file can be accessed easily when somebody wants to see the analysis of a certain position.
Once again many thanks for you fine work, Ferdy!
One thing remains from my side to be said against the suite as such, the positions are too easily solved by modern strong engines, in wich way ever you evaluate them, the measurement in points helps to make the differences in results look bigger, but it's like playing games from positions with advantage of one side or from balanced ones, performance of the stronger engine can be made better that way, but at the moment the 1:1- pairs start replacing the draws, error bar get's bigger than it would be with more draws as well. It's just fake- Elo you produce like that. That's the same with positional testing and suites, too many positions solved by too many engines in given hardware- TC means raising the error bar of the test more than the performance- differences.
That's why I took only about 590 positions out of old STS, that could be used as kind of sinlge best move- positions. Here

https://www.dropbox.com/s/yu7dnjpofa6bb8n/1024.epd?dl=0

1024 positions include the 590 of old STS solved easily too, but not too easily for TCs of 1" and less. Rest of the 1024 are not so easy up to tactical difficulter ones to get an average- level for strong hardware- TC SMP of 1" working with some statistical relevance. These I use that way: positions solved from one engine only (in a 1:1- match) are counted as 1-0- results, solved ones from other opponent only as 0-1- points, rest (solved from both or from none of two compared engines) as drawn ( half points).
Thus Elo- rating can be done easily too, if one wants to compare the results Elo- wise at all and you can let error bar be given by e.g. EloStat togehter with results too.

Just my two cents again, regards
Peter.
peter
Posts: 3186
Joined: Sat Feb 16, 2008 7:38 am
Full name: Peter Martan

Re: STS re-re-re-re-re-visited

Post by peter »

Anyhow, I really like those html- screens
:)
Image

Fishes runing single threaded (3.5GHz CPU) with 100msec/pos., LC0 (with two different nets) with 300msec (initialation- time''s difficult to compare, but Leela- ratio is so too yet anyhow, so what) and a 3070ti GPU.
Peter.
User avatar
Rebel
Posts: 6991
Joined: Thu Aug 18, 2011 12:04 pm

Re: STS re-re-re-re-re-visited

Post by Rebel »

Image

Seems you don't need the STS positions at all, this is made from games of Alekhine.
90% of coding is debugging, the other 10% is writing bugs.
peter
Posts: 3186
Joined: Sat Feb 16, 2008 7:38 am
Full name: Peter Martan

Re: STS re-re-re-re-re-visited

Post by peter »

Rebel wrote: Fri Oct 28, 2022 6:45 pm Image

Seems you don't need the STS positions at all, this is made from games of Alekhine.
:)
I'm gonna have my 1024 from link above evaluated like that:
Tactical more difficult ones up to 100 points, single best moves max. points and no alternative "solutions" of course, easier ones 1-10, only positions of no more then 3, 4 equivalent candidate moves, relation of points dependent on eval- relations of moves to be rewarded and corresponding with difficulty of positions as for hardware- time necessary to solve them.
E.g difficult position best move 100 points, second best having about 50% (from lower one) engine- eval in cp gets 50 points,
easy position best move 10, second best with 50% eval 5 points.

As for hardware- TC I'd hope to get with about 1"/pos. enough disrimination of best engines' performances with yet good statistical relevance too, (not too many solved ones neither too many unsolved positions).
That will take some manpower- time of course again, but the MEA- tool itself is great enough to have such an effort once again, anyhow knowing the 1500 positions of STS already from picking out the about 590 "hardest" ones for the 1024, regards
Peter.