I think the main problem with test sets like STS is age. About 70% of the positions are still valid, but especially the oldest ones simply have inadequate analysis. One hour of 32 bit Rybka at the start of the project is roughly one second of modern SF at the end of the project (scale factor of 4096 for 64 * 64 due to improvements in hardware and software). And it is far less than that today.
It reminds me of the CAP project where we had computers all over the world analyzing chess positions at 12 minutes each (scaled more or less depending on hardware).
In that time, we could achieve perhaps 15 plies, but often only 12 or 13. I remember analyzing the Evans Gambit for 16 hours and I got to ply 19.
In short, I can reproduce the entire effort of the CAP project which used hundreds of computers for a few years in a single hour on a single computer.
New Tool
Moderators: hgm, Rebel, chrisw
-
- Posts: 12541
- Joined: Wed Mar 08, 2006 8:57 pm
- Location: Redmond, WA USA
Re: New Tool
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
-
- Posts: 4556
- Joined: Tue Jul 03, 2007 4:30 am
Re: New Tool
Will that continue? Are computers of the future going to be able to reproduce in less than an hour what it'd take hundreds of computers to produce today?
-
- Posts: 12541
- Joined: Wed Mar 08, 2006 8:57 pm
- Location: Redmond, WA USA
Re: New Tool
People often take the position that Moore's law is dead. But I guess a new technology will come along and rejuvenate it.
Ray Kurzweil has posted some interesting articles on that topic, and I think he's probably right.
Right now, the way they march forward with power is parallel threads and trace shrink. But at some point you would get traces one atom wide and by that time, quantum behavior would take the driver's seat and crash us into some wall that appears and disappears.
There are other technologies already identified for the next step.
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
-
- Posts: 481
- Joined: Tue Jul 03, 2018 10:19 am
- Full name: Folkert van Heusden
Re: New Tool
That's not for Linux, is it?Rebel wrote: ↑Sun Mar 08, 2020 10:12 am Download at : http://rebel13.nl/dl/mea.7z
1. You can view the results in more detail.
2. You can run other pre-installed test suites, just run the batch files.
3. To increase the time control change the MT (MoveTime) parameter in the batch file.
-
- Posts: 4367
- Joined: Fri Mar 10, 2006 5:23 am
- Location: http://www.arasanchess.org
Re: New Tool
That's a valid point. Unless there's a forced mate or tablebase win, deeper search can always possibly find a hole in existing analysis. Although my understanding is STS is really designed to test positional understanding, not tactics, so that's harder to validate.I think the main problem with test sets like STS is age
The bigger problem though, as I am sure you are aware, is that test suite results are just not a good indicator of performance in games. There is some rough correlation: very strong engines tend also to do very well on the Arasan test suite, for example, and weak engines do poorly. Add a few hundred rating points and you will get better results. But the test suites can't measure small differences. If you make a change that is worth 5 ELO, you might well do worse on a test suite.
--Jon
-
- Posts: 12541
- Joined: Wed Mar 08, 2006 8:57 pm
- Location: Redmond, WA USA
Re: New Tool
To add to Jon's remarks:
Tactical strength is not a great measure of playing strength. There is some correlation, but I do not think it is a strong one.
Long ago, I tuned Scorpio to a large tactical test set using parabolic curve fitting and gradient search.
It could solve tactics like nobody's business, but it played about 50 Elo weaker than before the tuning.
Tactical test sets are full of things like sacrifices and zugzwang positions which happens but are unusual.
When you tune to that, you do not necessarily make the engine stronger.
A positional test should be different, but it is going to age poorly due to the exponential advance of hardware and software.
I remember when I was working with Colin on Beowulf we got to 285 out of 300 on the WAC test on some wimpy hardware.
But Shredder, which was the strongest program in the world at the time got 280 on the same hardware.
But if you played an actual game of chess against it, it would tear you limb from limb.
Tactical strength is not a great measure of playing strength. There is some correlation, but I do not think it is a strong one.
Long ago, I tuned Scorpio to a large tactical test set using parabolic curve fitting and gradient search.
It could solve tactics like nobody's business, but it played about 50 Elo weaker than before the tuning.
Tactical test sets are full of things like sacrifices and zugzwang positions which happens but are unusual.
When you tune to that, you do not necessarily make the engine stronger.
A positional test should be different, but it is going to age poorly due to the exponential advance of hardware and software.
I remember when I was working with Colin on Beowulf we got to 285 out of 300 on the WAC test on some wimpy hardware.
But Shredder, which was the strongest program in the world at the time got 280 on the same hardware.
But if you played an actual game of chess against it, it would tear you limb from limb.
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
-
- Posts: 6995
- Joined: Thu Aug 18, 2011 12:04 pm
Re: New Tool
Correct, Windows only.flok wrote: ↑Wed Mar 25, 2020 3:28 pmThat's not for Linux, is it?Rebel wrote: ↑Sun Mar 08, 2020 10:12 am Download at : http://rebel13.nl/dl/mea.7z
1. You can view the results in more detail.
2. You can run other pre-installed test suites, just run the batch files.
3. To increase the time control change the MT (MoveTime) parameter in the batch file.
90% of coding is debugging, the other 10% is writing bugs.
-
- Posts: 481
- Joined: Tue Jul 03, 2018 10:19 am
- Full name: Folkert van Heusden
Re: New Tool
Please consider releasing the source-code so that macos and linux users also can use it.Rebel wrote: ↑Fri Mar 27, 2020 10:17 amCorrect, Windows only.flok wrote: ↑Wed Mar 25, 2020 3:28 pmThat's not for Linux, is it?Rebel wrote: ↑Sun Mar 08, 2020 10:12 am Download at : http://rebel13.nl/dl/mea.7z
1. You can view the results in more detail.
2. You can run other pre-installed test suites, just run the batch files.
3. To increase the time control change the MT (MoveTime) parameter in the batch file.
-
- Posts: 6995
- Joined: Thu Aug 18, 2011 12:04 pm
Re: New Tool
Correct.jdart wrote: ↑Wed Mar 25, 2020 4:03 pmThat's a valid point. Unless there's a forced mate or tablebase win, deeper search can always possibly find a hole in existing analysis. Although my understanding is STS is really designed to test positional understanding, not tactics, so that's harder to validate.I think the main problem with test sets like STS is age
The bigger problem though, as I am sure you are aware, is that test suite results are just not a good indicator of performance in games. There is some rough correlation: very strong engines tend also to do very well on the Arasan test suite, for example, and weak engines do poorly. Add a few hundred rating points and you will get better results. But the test suites can't measure small differences. If you make a change that is worth 5 ELO, you might well do worse on a test suite.
--Jon
One reason would be that 1500 positions roughly is (just) 20 games.
I am trying large random sets now at 100ms.
Meanwhile Arasan does well on the STS test.
Code: Select all
EPD : epd\sts-lc0.epd
Time : 100ms
Solving Max Total Time Hash
Engine Score Used Time Found Pos Time Score Rate ms Mb Cpu CCRL
1 Arasan 22 24000 00:03:08.5 800 1500 00:00:16.3 45000 0.533 100 128 1 3144
2 Arasan 21 22980 00:03:07.5 766 1500 00:00:12.5 45000 0.511 100 128 1 3042
3 Arasan 20 22470 00:03:07.9 749 1500 00:00:13.6 45000 0.499 100 128 1 2946
4 Arasan 19 22200 00:03:07.4 740 1500 00:00:13.1 45000 0.493 100 128 1 2908
5 Arasan 18 20490 00:03:06.9 683 1500 00:00:11.0 45000 0.455 100 128 1 2861
6 Arasan 17 20220 00:03:06.9 674 1500 00:00:16.7 45000 0.449 100 128 1 2812
90% of coding is debugging, the other 10% is writing bugs.
-
- Posts: 2559
- Joined: Fri Nov 26, 2010 2:00 pm
- Location: Czech Republic
- Full name: Martin Sedlak
Re: New Tool
Why can't you simply use Wine? (unless you've already tried and it didn't work)
Martin Sedlak