New Tool

Dann Corbit · Post by **Dann Corbit** » Tue Mar 24, 2020 5:36 pm

I think the main problem with test sets like STS is age. About 70% of the positions are still valid, but especially the oldest ones simply have inadequate analysis. One hour of 32 bit Rybka at the start of the project is roughly one second of modern SF at the end of the project (scale factor of 4096 for 64 * 64 due to improvements in hardware and software). And it is far less than that today.

It reminds me of the CAP project where we had computers all over the world analyzing chess positions at 12 minutes each (scaled more or less depending on hardware).
In that time, we could achieve perhaps 15 plies, but often only 12 or 13. I remember analyzing the Evans Gambit for 16 hours and I got to ply 19.

In short, I can reproduce the entire effort of the CAP project which used hundreds of computers for a few years in a single hour on a single computer.

Ovyron · Post by **Ovyron** » Tue Mar 24, 2020 11:38 pm

Will that continue? Are computers of the future going to be able to reproduce in less than an hour what it'd take hundreds of computers to produce today?

Dann Corbit · Post by **Dann Corbit** » Wed Mar 25, 2020 8:12 am

Ovyron wrote: ↑Tue Mar 24, 2020 11:38 pm Will that continue? Are computers of the future going to be able to reproduce in less than an hour what it'd take hundreds of computers to produce today?

People often take the position that Moore's law is dead. But I guess a new technology will come along and rejuvenate it.
Ray Kurzweil has posted some interesting articles on that topic, and I think he's probably right.

Right now, the way they march forward with power is parallel threads and trace shrink. But at some point you would get traces one atom wide and by that time, quantum behavior would take the driver's seat and crash us into some wall that appears and disappears.

There are other technologies already identified for the next step.

flok · Post by **flok** » Wed Mar 25, 2020 3:28 pm

Rebel wrote: ↑Sun Mar 08, 2020 10:12 am Download at : http://rebel13.nl/dl/mea.7z

1. You can view the results in more detail.

2. You can run other pre-installed test suites, just run the batch files.

3. To increase the time control change the MT (MoveTime) parameter in the batch file.

That's not for Linux, is it?

jdart · Post by **jdart** » Wed Mar 25, 2020 4:03 pm

I think the main problem with test sets like STS is age

That's a valid point. Unless there's a forced mate or tablebase win, deeper search can always possibly find a hole in existing analysis. Although my understanding is STS is really designed to test positional understanding, not tactics, so that's harder to validate.

The bigger problem though, as I am sure you are aware, is that test suite results are just not a good indicator of performance in games. There is some rough correlation: very strong engines tend also to do very well on the Arasan test suite, for example, and weak engines do poorly. Add a few hundred rating points and you will get better results. But the test suites can't measure small differences. If you make a change that is worth 5 ELO, you might well do worse on a test suite.

--Jon

Dann Corbit · Post by **Dann Corbit** » Wed Mar 25, 2020 6:55 pm

To add to Jon's remarks:
Tactical strength is not a great measure of playing strength. There is some correlation, but I do not think it is a strong one.
Long ago, I tuned Scorpio to a large tactical test set using parabolic curve fitting and gradient search.
It could solve tactics like nobody's business, but it played about 50 Elo weaker than before the tuning.
Tactical test sets are full of things like sacrifices and zugzwang positions which happens but are unusual.
When you tune to that, you do not necessarily make the engine stronger.
A positional test should be different, but it is going to age poorly due to the exponential advance of hardware and software.

I remember when I was working with Colin on Beowulf we got to 285 out of 300 on the WAC test on some wimpy hardware.
But Shredder, which was the strongest program in the world at the time got 280 on the same hardware.
But if you played an actual game of chess against it, it would tear you limb from limb.

Rebel · Post by **Rebel** » Fri Mar 27, 2020 10:17 am

flok wrote: ↑Wed Mar 25, 2020 3:28 pm
Rebel wrote: ↑Sun Mar 08, 2020 10:12 am Download at : http://rebel13.nl/dl/mea.7z

1. You can view the results in more detail.

2. You can run other pre-installed test suites, just run the batch files.

3. To increase the time control change the MT (MoveTime) parameter in the batch file.
That's not for Linux, is it?

Correct, Windows only.

flok · Post by **flok** » Fri Mar 27, 2020 10:20 am

Rebel wrote: ↑Fri Mar 27, 2020 10:17 am
flok wrote: ↑Wed Mar 25, 2020 3:28 pm
Rebel wrote: ↑Sun Mar 08, 2020 10:12 am Download at : http://rebel13.nl/dl/mea.7z

1. You can view the results in more detail.

2. You can run other pre-installed test suites, just run the batch files.

3. To increase the time control change the MT (MoveTime) parameter in the batch file.
That's not for Linux, is it?
Correct, Windows only.

Please consider releasing the source-code so that macos and linux users also can use it.

Rebel · Post by **Rebel** » Fri Mar 27, 2020 10:23 am

jdart wrote: ↑Wed Mar 25, 2020 4:03 pm
I think the main problem with test sets like STS is age
That's a valid point. Unless there's a forced mate or tablebase win, deeper search can always possibly find a hole in existing analysis. Although my understanding is STS is really designed to test positional understanding, not tactics, so that's harder to validate.

The bigger problem though, as I am sure you are aware, is that test suite results are just not a good indicator of performance in games. There is some rough correlation: very strong engines tend also to do very well on the Arasan test suite, for example, and weak engines do poorly. Add a few hundred rating points and you will get better results. But the test suites can't measure small differences. If you make a change that is worth 5 ELO, you might well do worse on a test suite.

--Jon

Correct.

One reason would be that 1500 positions roughly is (just) 20 games.

I am trying large random sets now at 100ms.

Meanwhile Arasan does well on the STS test.

Code: Select all

    EPD  : epd\sts-lc0.epd
    Time : 100ms
                                                      Solving    Max   Total   Time   Hash          
    Engine           Score   Used Time Found   Pos     Time     Score   Rate    ms     Mb  Cpu  CCRL
 1  Arasan 22        24000  00:03:08.5   800  1500  00:00:16.3  45000  0.533    100   128    1  3144
 2  Arasan 21        22980  00:03:07.5   766  1500  00:00:12.5  45000  0.511    100   128    1  3042
 3  Arasan 20        22470  00:03:07.9   749  1500  00:00:13.6  45000  0.499    100   128    1  2946
 4  Arasan 19        22200  00:03:07.4   740  1500  00:00:13.1  45000  0.493    100   128    1  2908
 5  Arasan 18        20490  00:03:06.9   683  1500  00:00:11.0  45000  0.455    100   128    1  2861
 6  Arasan 17        20220  00:03:06.9   674  1500  00:00:16.7  45000  0.449    100   128    1  2812

Or better looking : http://rebel13.nl/mea/sts-lc0-arasan-100ms.html

mar · Post by **mar** » Fri Mar 27, 2020 10:49 am

flok wrote: ↑Fri Mar 27, 2020 10:20 am Please consider releasing the source-code so that macos and linux users also can use it.

Why can't you simply use Wine? (unless you've already tried and it didn't work)

New Tool

Re: New Tool

Re: New Tool

Re: New Tool

Re: New Tool

Re: New Tool

Re: New Tool

Re: New Tool

Re: New Tool

Re: New Tool

Re: New Tool