New Tool

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

Dann Corbit
Posts: 12537
Joined: Wed Mar 08, 2006 8:57 pm
Location: Redmond, WA USA

Re: New Tool

Post by Dann Corbit »

I think the main problem with test sets like STS is age. About 70% of the positions are still valid, but especially the oldest ones simply have inadequate analysis. One hour of 32 bit Rybka at the start of the project is roughly one second of modern SF at the end of the project (scale factor of 4096 for 64 * 64 due to improvements in hardware and software). And it is far less than that today.

It reminds me of the CAP project where we had computers all over the world analyzing chess positions at 12 minutes each (scaled more or less depending on hardware).
In that time, we could achieve perhaps 15 plies, but often only 12 or 13. I remember analyzing the Evans Gambit for 16 hours and I got to ply 19.

In short, I can reproduce the entire effort of the CAP project which used hundreds of computers for a few years in a single hour on a single computer.
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
User avatar
Ovyron
Posts: 4556
Joined: Tue Jul 03, 2007 4:30 am

Re: New Tool

Post by Ovyron »

Will that continue? Are computers of the future going to be able to reproduce in less than an hour what it'd take hundreds of computers to produce today?
Dann Corbit
Posts: 12537
Joined: Wed Mar 08, 2006 8:57 pm
Location: Redmond, WA USA

Re: New Tool

Post by Dann Corbit »

Ovyron wrote: Tue Mar 24, 2020 11:38 pm Will that continue? Are computers of the future going to be able to reproduce in less than an hour what it'd take hundreds of computers to produce today?
People often take the position that Moore's law is dead. But I guess a new technology will come along and rejuvenate it.
Ray Kurzweil has posted some interesting articles on that topic, and I think he's probably right.

Right now, the way they march forward with power is parallel threads and trace shrink. But at some point you would get traces one atom wide and by that time, quantum behavior would take the driver's seat and crash us into some wall that appears and disappears.

There are other technologies already identified for the next step.
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
User avatar
flok
Posts: 481
Joined: Tue Jul 03, 2018 10:19 am
Full name: Folkert van Heusden

Re: New Tool

Post by flok »

Rebel wrote: Sun Mar 08, 2020 10:12 am Download at : http://rebel13.nl/dl/mea.7z

1. You can view the results in more detail.

2. You can run other pre-installed test suites, just run the batch files.

3. To increase the time control change the MT (MoveTime) parameter in the batch file.
That's not for Linux, is it?
jdart
Posts: 4366
Joined: Fri Mar 10, 2006 5:23 am
Location: http://www.arasanchess.org

Re: New Tool

Post by jdart »

I think the main problem with test sets like STS is age
That's a valid point. Unless there's a forced mate or tablebase win, deeper search can always possibly find a hole in existing analysis. Although my understanding is STS is really designed to test positional understanding, not tactics, so that's harder to validate.

The bigger problem though, as I am sure you are aware, is that test suite results are just not a good indicator of performance in games. There is some rough correlation: very strong engines tend also to do very well on the Arasan test suite, for example, and weak engines do poorly. Add a few hundred rating points and you will get better results. But the test suites can't measure small differences. If you make a change that is worth 5 ELO, you might well do worse on a test suite.

--Jon
Dann Corbit
Posts: 12537
Joined: Wed Mar 08, 2006 8:57 pm
Location: Redmond, WA USA

Re: New Tool

Post by Dann Corbit »

To add to Jon's remarks:
Tactical strength is not a great measure of playing strength. There is some correlation, but I do not think it is a strong one.
Long ago, I tuned Scorpio to a large tactical test set using parabolic curve fitting and gradient search.
It could solve tactics like nobody's business, but it played about 50 Elo weaker than before the tuning.
Tactical test sets are full of things like sacrifices and zugzwang positions which happens but are unusual.
When you tune to that, you do not necessarily make the engine stronger.
A positional test should be different, but it is going to age poorly due to the exponential advance of hardware and software.

I remember when I was working with Colin on Beowulf we got to 285 out of 300 on the WAC test on some wimpy hardware.
But Shredder, which was the strongest program in the world at the time got 280 on the same hardware.
But if you played an actual game of chess against it, it would tear you limb from limb.
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
User avatar
Rebel
Posts: 6991
Joined: Thu Aug 18, 2011 12:04 pm

Re: New Tool

Post by Rebel »

flok wrote: Wed Mar 25, 2020 3:28 pm
Rebel wrote: Sun Mar 08, 2020 10:12 am Download at : http://rebel13.nl/dl/mea.7z

1. You can view the results in more detail.

2. You can run other pre-installed test suites, just run the batch files.

3. To increase the time control change the MT (MoveTime) parameter in the batch file.
That's not for Linux, is it?
Correct, Windows only.
90% of coding is debugging, the other 10% is writing bugs.
User avatar
flok
Posts: 481
Joined: Tue Jul 03, 2018 10:19 am
Full name: Folkert van Heusden

Re: New Tool

Post by flok »

Rebel wrote: Fri Mar 27, 2020 10:17 am
flok wrote: Wed Mar 25, 2020 3:28 pm
Rebel wrote: Sun Mar 08, 2020 10:12 am Download at : http://rebel13.nl/dl/mea.7z

1. You can view the results in more detail.

2. You can run other pre-installed test suites, just run the batch files.

3. To increase the time control change the MT (MoveTime) parameter in the batch file.
That's not for Linux, is it?
Correct, Windows only.
Please consider releasing the source-code so that macos and linux users also can use it.
User avatar
Rebel
Posts: 6991
Joined: Thu Aug 18, 2011 12:04 pm

Re: New Tool

Post by Rebel »

jdart wrote: Wed Mar 25, 2020 4:03 pm
I think the main problem with test sets like STS is age
That's a valid point. Unless there's a forced mate or tablebase win, deeper search can always possibly find a hole in existing analysis. Although my understanding is STS is really designed to test positional understanding, not tactics, so that's harder to validate.

The bigger problem though, as I am sure you are aware, is that test suite results are just not a good indicator of performance in games. There is some rough correlation: very strong engines tend also to do very well on the Arasan test suite, for example, and weak engines do poorly. Add a few hundred rating points and you will get better results. But the test suites can't measure small differences. If you make a change that is worth 5 ELO, you might well do worse on a test suite.

--Jon
Correct.

One reason would be that 1500 positions roughly is (just) 20 games.

I am trying large random sets now at 100ms.

Meanwhile Arasan does well on the STS test.

Code: Select all

    EPD  : epd\sts-lc0.epd
    Time : 100ms
                                                      Solving    Max   Total   Time   Hash          
    Engine           Score   Used Time Found   Pos     Time     Score   Rate    ms     Mb  Cpu  CCRL
 1  Arasan 22        24000  00:03:08.5   800  1500  00:00:16.3  45000  0.533    100   128    1  3144
 2  Arasan 21        22980  00:03:07.5   766  1500  00:00:12.5  45000  0.511    100   128    1  3042
 3  Arasan 20        22470  00:03:07.9   749  1500  00:00:13.6  45000  0.499    100   128    1  2946
 4  Arasan 19        22200  00:03:07.4   740  1500  00:00:13.1  45000  0.493    100   128    1  2908
 5  Arasan 18        20490  00:03:06.9   683  1500  00:00:11.0  45000  0.455    100   128    1  2861
 6  Arasan 17        20220  00:03:06.9   674  1500  00:00:16.7  45000  0.449    100   128    1  2812
Or better looking : http://rebel13.nl/mea/sts-lc0-arasan-100ms.html
90% of coding is debugging, the other 10% is writing bugs.
mar
Posts: 2554
Joined: Fri Nov 26, 2010 2:00 pm
Location: Czech Republic
Full name: Martin Sedlak

Re: New Tool

Post by mar »

flok wrote: Fri Mar 27, 2020 10:20 am Please consider releasing the source-code so that macos and linux users also can use it.
Why can't you simply use Wine? (unless you've already tried and it didn't work)
Martin Sedlak