Page 6 of 9

Re: 7 Man Syzygy and SSD

Posted: Wed Jul 31, 2019 2:16 am
by Zenmastur
syzygy wrote: Wed Jul 31, 2019 12:37 am
duncan wrote: Tue Jul 30, 2019 9:50 pm
syzygy wrote: Fri Dec 21, 2018 2:27 am
But perfect play once a 6-men ending has been reached on the board is only a small part of the story. The phase before it is much more important (especially at longer time controls where it will indeed be rare that the engine can't figure out a 6-men ending on its own).

My apologies for getting off topic, but is Lomonosov 2 with its 94 Tb Ram capable of solving 8 pieces. ?

https://www.top500.org/system/178444
That should be more than enough RAM.
One would think! LOL!

I have a question. What happens when stockfish uses 7-man TB's and some of them are missing? Does it handle this in a graceful manner?

Regards,

Zenmastur

Re: 7 Man Syzygy and SSD

Posted: Wed Jul 31, 2019 7:04 am
by Nordlandia
What about generation time for 8-man for Lomonosov 2.

Re: 7 Man Syzygy and SSD

Posted: Wed Jul 31, 2019 10:20 am
by Vinvin
Zenmastur wrote: Wed Jul 31, 2019 2:16 am
syzygy wrote: Wed Jul 31, 2019 12:37 am
duncan wrote: Tue Jul 30, 2019 9:50 pm
syzygy wrote: Fri Dec 21, 2018 2:27 am
But perfect play once a 6-men ending has been reached on the board is only a small part of the story. The phase before it is much more important (especially at longer time controls where it will indeed be rare that the engine can't figure out a 6-men ending on its own).

My apologies for getting off topic, but is Lomonosov 2 with its 94 Tb Ram capable of solving 8 pieces. ?

https://www.top500.org/system/178444
That should be more than enough RAM.
One would think! LOL!

I have a question. What happens when stockfish uses 7-man TB's and some of them are missing? Does it handle this in a graceful manner?

Regards,

Zenmastur
Yes, Stockfish manage this very smoothly. I use only one 7-man : KRPPKRP.

Re: 7 Man Syzygy and SSD

Posted: Wed Jul 31, 2019 11:11 am
by Zenmastur
Nordlandia wrote: Wed Jul 31, 2019 7:04 am What about generation time for 8-man for Lomonosov 2.
What about them?

I personally didn't see any great need for the 7-man bases to be created.

5-man tablebases are relatively small and most of the files were useful. i.e. in a 1,000,000 game database ~90% of the end games were seen at least once. Which means ~10% weren't seen at all. Another 30% of the files were seen at a rate of less than 1 games in 50,000. Not exactly what you would call “high use”. But since they only consume about 1GB of disk space this wasn't an issue. About 4.7% of all games include 5-man TB positions.

6-man tablebases are much larger at ~150Gb. Only about 40% of the endgames are seen at a rate greater than 1 game in 50,000. So about 90Gb of the files are essentially useless. About 6% of all games contain a 6-man TB position.

7-man tablebases are large enough that most people will have to make special accommodations to obtain and use them. Approximately 80% of these endgames are seen less than 1 game in 50,000 and ~54% are seen at less than 1 game in a million. These file are huge! I don't know what the total size is but I've counted of 18,000GB of files. That's quite a bit of data considering 80% (~14.4 TB) is of questionable value. i.e. it's a waste of disk space for most people's applications.

8-man seems like a waste for anything other than academic purposes or analysis. They would likely consume around 2PB of disk space. From a practical point of view, there are only a couple of these files that are worth the effort to generate. I did an analysis about 5-years ago and the data I have suggests that SOME of these files are worth generating.

Code: Select all

#	endgame		# occ.	Sum # occ.
1	krppkrpp	128,688	128,688
2	krpppkrp	55,780	184,468
3	krpppkrp	55,780	240,248
4	kpppkppp	48,227	288,475
5	kbppkbpp	23,218	311,693
6	kbpppkbp	10,955	322,648
7	kppppkpp	10,235	332,883
8	kpppkbpp	8,543	341,426
9	krppkrbp	8,266	349,692
10	kpppknpp	6,829	356,521
11	krppkrnp	6,429	362,950
12	krbppkrp	5,987	368,937
13	kpppkrpp	5,442	374,379
14	krnppkrp	4,899	379,278
15	krpppkpp	4,155	383,433
16	krpppkpp	4,155	387,588
17	kbpppkpp	4,022	391,610
18	knpppkpp	3,588	395,198
19	kpppkqpp	3,472	398,670
20	krppkqrp	2,146	400,816
21	krppppkr	2,056	402,872
22	kqpppkpp	2,025	404,897
23	krrppkrp	1,805	406,702
24	krppkrrp	1,681	408,383
25	krpppkrb	1,597	409,980
26	krpppkrn	1,374	411,354
27	kbppknbp	1,244	412,598
28	kbppkrbp	1,025	413,623
Keep in mind that this was from a 7M game database so your mileage may vary. This wasn't taken from a complete analysis. It was data I already had on hand from a “first look” analysis. It would take a little more work to go back and verify these numbers and make sure the list is “relatively” complete as far as containing the endgames that will see the most occurrences in a give number of games. I would note that since the size of each file is unknown it's not possible to rank them in order of number of occurrences per MB of file.

These files are likely to be 100 times the size of the 7-man files so they aren't worth generating unless they have at large number of occurrences per 1M games. In this case, I listed those that occur with a frequency of at least 1 game in a 7000.

I think what is needed is a utility like Finalgen that uses 6-man (or 7-man since they are already generated) syzygy TBs as it's base and can generate given TB's to solve specific problems.

Finalgen has several huge advantages. It doesn't generate the whole TB at ones, just the pawn slices that correspond to the given problem, this makes it use much less RAM and, it has a GUI that anyone can uses to extend it's endgame table bases and query the data in the data set.

It does have several limitations though. It doesn't use any pre-generated TB's as a base set. It can only handle positions with kings and one non-pawn piece per side plus pawns. It's not multi-threaded, it's 32-bit, it doesn't use a “standard” GUI as it's interface, it has no ability to compress it's data, and there is no way to see what data is already in it's data set.

Something like Finalgen that uses SYZYGY as it's base set of EGTBs and can build on the base by adding more pawns as needed would be a very handy gadget. It would be orders of magnitude better than blindly generating the complete set of 8-man TBs!

Regards,

Zenmastur

Re: 7 Man Syzygy and SSD

Posted: Wed Jul 31, 2019 12:01 pm
by duncan
Zenmastur wrote: Wed Jul 31, 2019 11:11 am


What about them?
He meant how long should it take ?


Code: Select all

#	endgame		# occ.	Sum # occ.
1	krppkrpp	128,688	128,688
2	krpppkrp	55,780	184,468
3	krpppkrp	55,780	240,248
4	kpppkppp	48,227	288,475
5	kbppkbpp	23,218	311,693
6	kbpppkbp	10,955	322,648
7	kppppkpp	10,235	332,883
8	kpppkbpp	8,543	341,426
9	krppkrbp	8,266	349,692
10	kpppknpp	6,829	356,521
11	krppkrnp	6,429	362,950
12	krbppkrp	5,987	368,937
13	kpppkrpp	5,442	374,379
14	krnppkrp	4,899	379,278
15	krpppkpp	4,155	383,433
16	krpppkpp	4,155	387,588
17	kbpppkpp	4,022	391,610
18	knpppkpp	3,588	395,198
19	kpppkqpp	3,472	398,670
20	krppkqrp	2,146	400,816
21	krppppkr	2,056	402,872
22	kqpppkpp	2,025	404,897
23	krrppkrp	1,805	406,702
24	krppkrrp	1,681	408,383
25	krpppkrb	1,597	409,980
26	krpppkrn	1,374	411,354
27	kbppknbp	1,244	412,598
28	kbppkrbp	1,025	413,623


Good to know. Do you have the percentages ?

Re: 7 Man Syzygy and SSD

Posted: Wed Jul 31, 2019 12:04 pm
by duncan
Zenmastur wrote: Wed Jul 31, 2019 11:11 am
7-man tablebases are large enough that most people will have to make special accommodations to obtain and use them. Approximately 80% of these endgames are seen less than 1 game in 50,000 and ~54% are seen at less than 1 game in a million. These file are huge! I don't know what the total size is but I've counted of 18,000GB of files. That's quite a bit of data considering 80% (~14.4 TB) is of questionable value. i.e. it's a waste of disk space for most people's applications.
I think it was said before, it is useful for the search .

Re: 7 Man Syzygy and SSD

Posted: Wed Jul 31, 2019 12:16 pm
by Paloma
Zenmastur wrote: Wed Jul 31, 2019 11:11 am ...
I personally didn't see any great need for the 7-man bases to be created.

5-man tablebases are relatively small and most of the files were useful. i.e. in a 1,000,000 game database ~90% of the end games were seen at least once. Which means ~10% weren't seen at all. Another 30% of the files were seen at a rate of less than 1 games in 50,000. Not exactly what you would call “high use”. But since they only consume about 1GB of disk space this wasn't an issue. About 4.7% of all games include 5-man TB positions.

6-man tablebases are much larger at ~150Gb. Only about 40% of the endgames are seen at a rate greater than 1 game in 50,000. So about 90Gb of the files are essentially useless. About 6% of all games contain a 6-man TB position.

7-man tablebases are large enough that most people will have to make special accommodations to obtain and use them. Approximately 80% of these endgames are seen less than 1 game in 50,000 and ~54% are seen at less than 1 game in a million. These file are huge! I don't know what the total size is but I've counted of 18,000GB of files. That's quite a bit of data considering 80% (~14.4 TB) is of questionable value. i.e. it's a waste of disk space for most people's applications.

8-man seems like a waste for anything other than academic purposes or analysis. They would likely consume around 2PB of disk space. From a practical point of view, there are only a couple of these files that are worth the effort to generate. I did an analysis about 5-years ago and the data I have suggests that SOME of these files are worth generating.
Your descriptions seems logical.

Re: 7 Man Syzygy and SSD

Posted: Wed Jul 31, 2019 1:02 pm
by Zenmastur
duncan wrote: Wed Jul 31, 2019 12:01 pm
Zenmastur wrote: Wed Jul 31, 2019 11:11 am


What about them?
He meant how long should it take ?
If it could be done in memory about 200 times as long as it took to do the 7-man Tbs. If it needed to use disk space as a buffer then MUCH MUCH longer.

But I think you're missing the point. The top 5 in the list below would be worth generating. Maybe, even the top 10. The rest... are probably not worth the effort. Even if they were already generated how long do you think it would take to download them across the internet? At 5 MB per second, 2PB (i.e. a full 8-man set) would take over 12 years to download. So what's the point? Who could/would host the files? How would they support 100's of users trying to download them? It would be a logistics nightmare. 5 or 10 files would at least be doable.

Code: Select all

#	endgame		# occ.	Sum # occ.
1	krppkrpp	128,688	128,688
2	krpppkrp	55,780	184,468
3	krpppkrp	55,780	240,248
4	kpppkppp	48,227	288,475
5	kbppkbpp	23,218	311,693
6	kbpppkbp	10,955	322,648
7	kppppkpp	10,235	332,883
8	kpppkbpp	8,543	341,426
9	krppkrbp	8,266	349,692
10	kpppknpp	6,829	356,521
11	krppkrnp	6,429	362,950
12	krbppkrp	5,987	368,937
13	kpppkrpp	5,442	374,379
14	krnppkrp	4,899	379,278
15	krpppkpp	4,155	383,433
16	krpppkpp	4,155	387,588
17	kbpppkpp	4,022	391,610
18	knpppkpp	3,588	395,198
19	kpppkqpp	3,472	398,670
20	krppkqrp	2,146	400,816
21	krppppkr	2,056	402,872
22	kqpppkpp	2,025	404,897
23	krrppkrp	1,805	406,702
24	krppkrrp	1,681	408,383
25	krpppkrb	1,597	409,980
26	krpppkrn	1,374	411,354
27	kbppknbp	1,244	412,598
28	kbppkrbp	1,025	413,623

Good to know. Do you have the percentages ?
I'm not sure what percentages you're looking for. If you mean percentages of all games in the database that contain one of the 7-man position in the given file simply divide the number under “# occ.” by 7,076,320.

I think it would be much better to give the end users a program so they can generate their own files as they are needed. That was the point of my monologue about Finalgen. That solves all the logistics issues with generating and distributing files. The only problem with this is getting one of the UCI GUI's (or Xboard GUIs) to support the program with additional interface options for generating, managing, and querying the files. And last a library that the engine programs could use to access the EGTB's generated.

Regards,

Zenmastur

Re: 7 Man Syzygy and SSD

Posted: Wed Jul 31, 2019 8:59 pm
by Dann Corbit
If you examine the TCEC games, you will see that there are hundreds of millions of TB hits during search.
The bandwidth problem is an enormous problem today.
By the time the files are generated, it will be possible to download them in a reasonable time and store them at a reasonable cost.

That having been said, it is an incredibly difficult task to build them with very little reward for the builders.
It takes some kind of special dedication to do something like that.

We should be incredibly grateful for the efforts of Bojun Guo (aka noobpwnftw) and Ronald de Man for the stupendous effort of building the 7 man syzygy files and for making the code public and freely available for use.
It was very altruistic on the part of both parties.

Now, let's consider usefulness. I think that some things (like forming the file KQQQQQQk may not have a lot of physical usefulness as far as winning the game. Such a formation will almost never occur and when it does the dominant side has a colossal advantage. But calculation of this file is still incredibly interesting. How many draw chances does black have in total? How does it compare to his drawing chances with KQQQQQk?

There is a mathematical curiosity. I think some things (like going to the moon) do not have obvious monetary or pragmatic benefit on first glance. But over time, there will be things that drop out that are generally useful.

Consider the Mersenne prime search:
https://www.mersenne.org/
Seems a little silly so spend such a colossal amount of electricity to find titanic prime numbers that are also not terribly difficult to detect the primality due to the special format of the numbers. However, it turns out that these weird little buggers are incredibly useful. We use a Mersenne prime in a random number generator that has spectacularly good properties (not cryptographically secure, but produces a marvelously uniform distribution instead of fragmenting into planes like almost all other PRNGs).

So, something that seems useless can have a very valuable use. And even if it does not, there is still the scientific and mathematical curiosity angle.

I am very grateful that Ronald designed to allow for calculation of the ultra-lopsided bases and that Bojun Guo went ahead and calculated them. The chances are very high that they will never, ever be used in a game. But they provide valuable insight and mathematical completeness that is very satisfying for me.

As far as construction of 8 man files goes, I guess someone will start on them in about 5 years and will finish in about 12 years.
By the time he is done, the cost of storing them will be about the same as the cost of storing the 7 man files today (a lot but not prohibitive).

But I am a lousy prophet, so take my guestimate with a tablespoon of salt.

Re: 7 Man Syzygy and SSD

Posted: Wed Jul 31, 2019 11:38 pm
by Zenmastur
Dann Corbit wrote: Wed Jul 31, 2019 8:59 pm If you examine the TCEC games, you will see that there are hundreds of millions of TB hits during search.
The bandwidth problem is an enormous problem today.
By the time the files are generated, it will be possible to download them in a reasonable time and store them at a reasonable cost.

That having been said, it is an incredibly difficult task to build them with very little reward for the builders.
It takes some kind of special dedication to do something like that.

We should be incredibly grateful for the efforts of Bojun Guo (aka noobpwnftw) and Ronald de Man for the stupendous effort of building the 7 man syzygy files and for making the code public and freely available for use.
It was very altruistic on the part of both parties.
I agree that both were selfless in their acts and deserve to be commended for their efforts!
Now, let's consider usefulness. I think that some things (like forming the file KQQQQQQk may not have a lot of physical usefulness as far as winning the game. Such a formation will almost never occur and when it does the dominant side has a colossal advantage. But calculation of this file is still incredibly interesting. How many draw chances does black have in total? How does it compare to his drawing chances with KQQQQQk?

There is a mathematical curiosity. I think some things (like going to the moon) do not have obvious monetary or pragmatic benefit on first glance. But over time, there will be things that drop out that are generally useful.

Consider the Mersenne prime search:
https://www.mersenne.org/
Seems a little silly so spend such a colossal amount of electricity to find titanic prime numbers that are also not terribly difficult to detect the primality due to the special format of the numbers. However, it turns out that these weird little buggers are incredibly useful. We use a Mersenne prime in a random number generator that has spectacularly good properties (not cryptographically secure, but produces a marvelously uniform distribution instead of fragmenting into planes like almost all other PRNGs).
As a side note, at one time I had generated a large fraction of all factors of non-prime numbers of the form 2^n-1 with moderately large to large N. So, I don't think there is anything silly about such efforts as I have participated in them myself. I do tend to be more practical in most things though.
So, something that seems useless can have a very valuable use. And even if it does not, there is still the scientific and mathematical curiosity angle.

I am very grateful that Ronald designed to allow for calculation of the ultra-lopsided bases and that Bojun Guo went ahead and calculated them. The chances are very high that they will never, ever be used in a game. But they provide valuable insight and mathematical completeness that is very satisfying for me.

As far as construction of 8 man files goes, I guess someone will start on them in about 5 years and will finish in about 12 years.
By the time he is done, the cost of storing them will be about the same as the cost of storing the 7 man files today (a lot but not prohibitive).
My point is that it's currently impracticable to generate and distribute a complete 8-man TB. When someone ignorant of the realities involved with such an effort asks “When will they be done?” it strikes me as rudely “entitled”. But, maybe that's just me.

Some of the 8-man files could justify the efforts required to generate, store, distribute, and use them. I would love to have them, but most of the files will be a waste in my opinion.
But I am a lousy prophet, so take my guestimate with a tablespoon of salt.
I'm sure you are right. I've seen many such things like this come to fruiting in my life time so I'm sure it will happen. I'm just not betting on when it will happen.

And thank you Bojun Guo (aka noobpwnftw) and Ronald de Man (aka syzygy)! I sure it's something you don't here often enough!

Regards,

Zenmastur