MEA and temere.epd

pohl4711 · Post by **pohl4711** » Sun Apr 26, 2020 1:32 pm

Rebel wrote: ↑Sat Apr 25, 2020 7:38 pm No problem here, make sure MEA.EXE is in the same folder as the Lc0 files.

Or in other words, install Lc0 in the TEMERE folder, not in the engines folder.

I found a way, to avoid that MEA-bug and place all engines in the engines folder. Works with Fat Fritz. All you have to do is to give the full path to the engine-exe in MEA:

wrong (but works with Lc0 0.24.1 and Stockfish etc.):
set EXE=engines\FatFritz_cpu_1\lc0-fatfritz-blas.exe

right:
set EXE=C:\MEA\engines\FatFritz_cpu_1\lc0-fatfritz-blas.exe
(if MEA is directly on C:\)

Rebel · Post by **Rebel** » Sun Apr 26, 2020 7:52 pm

pohl4711 wrote: ↑Sun Apr 26, 2020 5:28 am
Dann Corbit wrote: ↑Sun Apr 26, 2020 12:16 am It must be one of those "laws of big numbers" things that makes it work so well.
If it can be used to make engines play better, then it is revolutionary.
It is! And it is perfect for fast tests (only 1 calculated node!) of neural-nets...
Look on my website:
https://www.sp-cc.de/nn-mea-testing.htm

Example:
Code: Select all
Engine                           :  Top1  Top1Rate  Score  ScoreRate
lc0 0.24.1 LS 14.3 (20x256)      : 17929    0.515  257442    0.739
lc0 0.24.1 LS 14.2 (20x256)      : 17899    0.514  257012    0.738
lc0 0.24.1 LS 14.1 (20x256)      : 17769    0.510  255712    0.734
lc0 0.24.1 LS 14 (20x256)        : 17749    0.509  255361    0.733
Not only the ranking of all LS 14 nets is correct. Aditionally, the wider gap between LS 14.1 and LS 14.2 is correct. Awesome!

Great, more good news.

Because the lack of strength options in nowadays top engines I turned to my own. Compared a somewhat stronger ProDeo (10-15 elo) versus the last official one of 2016.

http://rebel13.nl/mea/ProDeo%203.html

I want to do one more test before releasing the new tool, I am thinking of the current Stockfish and compare it with version 11. Where can I download the source code and how much stronger is it?

pohl4711 · Post by **pohl4711** » Sun Apr 26, 2020 9:21 pm

https://github.com/glinscott/fishtest/w ... sion-Tests

https://github.com/official-stockfish/S ... master/src

Dann Corbit · Post by **Dann Corbit** » Sun Apr 26, 2020 10:14 pm

You can get the very latest development stockfish build from here:
https://abrok.eu/stockfish/

Rebel · Post by **Rebel** » Mon Apr 27, 2020 12:43 am

Okay, thanks. Compiler both to have equal NPS, Result is not great.

Code: Select all

    EPD  : epd\45000.epd
    Time : 100ms
                                                       Solving       Max    Total    Time   Hash          
    Engine           Score   Used Time   Found   Pos     Time       Score    Rate     ms     Mb  Cpu  CCRL
 1  sf11-april-26    951897  01:36:21.1  22370  45000  00:08:01.4  1350000  70.5%    100    128    1  2900
 2  sf11-release     950327  01:36:18.6  22293  45000  00:07:54.8  1350000  70.4%    100    128    1  2900

pohl4711 · Post by **pohl4711** » Mon Apr 27, 2020 12:02 pm

Rebel wrote: ↑Mon Apr 27, 2020 12:43 am Okay, thanks. Compiler both to have equal NPS, Result is not great.

Code: Select all

    EPD  : epd\45000.epd
    Time : 100ms
                                                       Solving       Max    Total    Time   Hash          
    Engine           Score   Used Time   Found   Pos     Time       Score    Rate     ms     Mb  Cpu  CCRL
 1  sf11-april-26    951897  01:36:21.1  22370  45000  00:08:01.4  1350000  70.5%    100    128    1  2900
 2  sf11-release     950327  01:36:18.6  22293  45000  00:07:54.8  1350000  70.4%    100    128    1  2900

IMHO MEA is not good for testing really strong AB-engines. Why? I tested Stockfish with 5''/position (with the huge 34844 epd-set, I use for NN-testings) on a Hexacore and got a much too high Scorerate of more than 87% (I believe, beyond 85%, the result are not reliable anymore). So, the conclusion here is, that Stockfish should be tested with very short timecontrol, only, like you did here. But what makes Stockfish so incredible strong is, that its search is very, very well tuned and tricky. And with only 100ms thinking-time, this strength can not unfold its effect. And because of this, MEA can not measure the progress of Stockfish under that conditions.
MEA is good for testing weaker AB-engines. And it is perfect for testing NNs (without any search, only 1 node/position). But for Stockfish I would not recommend to use it.

Rebel · Post by **Rebel** » Mon Apr 27, 2020 12:54 pm

I think that's a bit premature, what you only have is the temere util meant to create a reasonable ranking list (without elo) with an error bar of -25/+25 elo. The util to be released in about a couple of days is a try to narrow that gap to 5-10 elo and meant for further improvement. I think the system has potential but it will be a long ride to get the maximum out of it. It can also fail.

Rebel · Post by **Rebel** » Mon Apr 27, 2020 1:06 pm

Two more examples for demonstration purpose only.

Tuned 7 parameters in the hope to find an indication (emphasis added) of an improvement.

http://rebel13.nl/mea/mix1.html

No cigar.

Run time 43 minutes, the preparation of the batch files almost took longer

Tuning the Bishop Value.

http://rebel13.nl/mea/Bishop_Value.html

Too bad.

It's starting to itch to re-tune that old dinosaur and that's a long time ago.

Dann Corbit · Post by **Dann Corbit** » Tue Apr 28, 2020 4:39 am

Here is my data for the temere positions.

I have 21,586 positions where we agree on the best move (temerity-arg.epd)
I have 13,212 positions where we disagree on the best move (temerity-dis.epd)

I do not have your data file, so I am not sure what the evaluations and depths are.
Hence, it is difficult for me to make contrasts and comparisons.

Rebel · Post by **Rebel** » Tue Apr 28, 2020 5:51 am

Dann Corbit wrote: ↑Tue Apr 28, 2020 4:39 am Here is my data for the temere positions.

I have 21,586 positions where we agree on the best move (temerity-arg.epd)
I have 13,212 positions where we disagree on the best move (temerity-dis.epd)

I do not have your data file, so I am not sure what the evaluations and depths are.
Hence, it is difficult for me to make contrasts and comparisons.

MEA creates perfect EPD'S in the "epd_out" folder, example.

Code: Select all

1b1qrr2/1p4pk/1np4p/p3Np1B/Pn1P4/R1N3B1/1Pb2PPP/2Q1R1K1 b - - bm Bxe5; ce 203; acd 12;
1k1r2r1/1b4p1/p4n1p/1pq1pPn1/2p1P3/P1N2N2/1PB1Q1PP/3R1R1K b - - bm Nxf3; ce 124; acd 13;
1k1r3r/pb1q2p1/B4p2/2p4p/Pp1bPPn1/7P/1P2Q1P1/R1BN1R1K b - - bm Bxa6; ce 178; acd 14;

BTW, you must have noticed by now that many positions come from your 110 million EPD database, excellent to create random sets.

MEA and temere.epd

Re: MEA and temere.epd

Re: MEA and temere.epd

Re: MEA and temere.epd

Re: MEA and temere.epd

Re: MEA and temere.epd

Re: MEA and temere.epd

Re: MEA and temere.epd

Re: MEA and temere.epd

Re: MEA and temere.epd

Re: MEA and temere.epd