pawn enumeration

Discussion of chess software programming and technical issues.

Moderators: hgm, Dann Corbit, Harvey Williamson

Daniel Shawul
Posts: 4185
Joined: Tue Mar 14, 2006 11:34 am
Location: Ethiopia

Re: pawn enumeration

Post by Daniel Shawul »

diep wrote:
hgm wrote:I don't think this is theoreticlly possible: the pawn advance is a function in a multi-dimensional spae, and it cannot be mapped by a continuous function on the one-dimensional space of numbers.


BS statement.

I solved that in Diep already in 1994 or so. I was 20 years old and just had started Diep.

Note that the best way to order moves is just using a picksort - i'm using that of course.

But solving just the pawns is so simple....

Note with nowadays bitboards you can solve it in a different manner.
Just enumerate for white the fields a8 as least significant, followed by b8 and h1 as most significant.

Now with a simple bitboard trick you can get out the first pawn.
then you fiddle with that and you got the pawn move.

You know this is highschool math...

It is 1 dimensional math...

But I don't understand why you would need this. Why not simply make a pawn hash table which lists all possible positions, and the name of the file on the HD where you stored the slice, if it is already computed? You can use a hashed perft on positions with Pawns only (and counts of the other pieces, so you know when lane-changing is still possible), starting with a placement step on 2nd rank. Then you can just run through the list to search for not-yet-computed slices that have all successors already computed, and start working on those. If it is not a pure Pawn ending, I don't think you would ever get in a situation where a linear search through the list of Pawn structures would take significant time compared to calculating a single slice.

This also makes it much easier to fully exploit the exchange symmetry of a large number of Pawns.

With pure Pawn endings slices would only be 64 positions, so for the sake of efficiency you would probably combine them to super-slices, e.g. only slice based on white Pawn locations.
On a GPU it makes sense, but he's making a mistake.

He should talk to Joost Buijs how he did do his ordering for nightmare in 90s...
I think you missed why it is difficult to enumerate. The question is given an index (a number) determine the locations of the n-bitbases. I like this approach a lot because it avoids the use of nested loops for each piece added. If I want to generate 10 men my function is still

Code: Select all

for&#40;i=0;i < n_positions;i++) &#123;
    sq&#91;0&#93;=f&#40;i&#41; , sq&#91;1&#93; = g&#40;i&#41;, ...
&#125;
This is much more difficult than you think. Even enumerating n alike pieces requires some weird math. Look in my other post for a code snippet. Now merging this for two groups of white and black pawns, you see it is a nightmare. Even though I think it may be possible to find an enumeration function it will be difficult.
I know the above code may be a little slower because when you go over inner loops, you still have to generate each square unlike the case shown below. I like the above code for its elegance but it may be slower ..

Code: Select all

for&#40;i=0;i<64;i++) &#123;
sq&#91;0&#93;=i;
for&#40;j=0;j<64;j++)
sq&#91;1&#93;=j;
...
&#123;
    if&#40;invalid&#41; continue; //or store invalid value anyway
&#125;
&#125;
Daniel Shawul
Posts: 4185
Joined: Tue Mar 14, 2006 11:34 am
Location: Ethiopia

Re: pawn enumeration

Post by Daniel Shawul »

In that case dividing the pawn table into 4 files (based on the leading pawn) would reduce 13G to 3.25G < 4G, which should be sufficient.

My simple indexing scheme when using 3 bits per position would require 4.5G per file (2 x 6 x 64^5 x 3 / 8 bytes), assuming you need 3 bits for white to move and 3 bits for black to move. When eliminating positions with two pieces on the same square, you still need 2 x 6 x 63 x 62 x 61 x 60 x 59 x 3/8 = 3.54 G in the general case.

You can go even lower by eliminating positions with neighbouring kings, but that will give complicated indexing functions. Another option is keeping only part of the file in memory. For generating the slice with the leading (white) pawn on a2, you only need to access the slices with the leading pawn on a3 and a4. This cuts RAM requirements by a factor of 2, giving 2.25G with the simple indexing scheme.
Yes I did the slices now and the size of the bit bases becomes slow. I use all indexing tricks there is. Slicing all the possible pawn slices is interesting because the tables are very small and only n-slices need to be loaded at any one time. Btw I generate both sides to move bitbases at the same time (this 3bits/pos) is for both sides. I recall you can generate one side to move bitbase alone but I forgot how to do it . Also it would not be a significant saving when generating bitbases (max 2bits/pos including counters so max of 1/3 saving).

File slicing helps in two ways compared to exhaustive slicing. a) size is manageable and always 8 b) The slices are self-sufficient since you don't need to load other slices when pawn moves. The latter will require you to load the necessary helper slices manually. But exhaustive slicing could be advantageous for DTZ as Wylie pointed out.

Another topic I want to raise is parallelization of bitbase generation. Not the obvious generate multiple bitbases at the same time but that of efficiently generating a single bitbase. Despite my expectation it turns out to be very difficult to parallelize because I store 4 positions per byte. If I use retrograde analysis, there will be a "false sharing" effect when two positons taking the same byte are accessed. Because of this I think a position should atleast take a byte for its value which is actually the case for dtm. Since during an egtb generation a positons's score is always improved (alpha-beta), I don't think two writes to the same position will cause problems.Ofcourse I can use the simple forward generation method that is easily parallelizable but it may be too slow.
I'm generating what I call DTZ50+ tables. For positions that can be won under the 50-move rule, I calculate the number of plies to a zeroing move. The value for positions that cannot be won under the 50-move rule but can be won without it, is either the number of plies to a zeroing move into a position won under the 50-move rule, or 100 plus the number of plies to a zeroing move into a "won" position that cannot be won under the 50-move rule, whichever is smaller. For use during search I generate WDL50+ tables, which have 5 values: -2, -1, 0, 1, 2, where -1 and 1 indicate that the position is lost or won, but drawn under the 50-move rule.
diep
Posts: 1822
Joined: Thu Mar 09, 2006 11:54 pm
Location: The Netherlands

Re: pawn enumeration

Post by diep »

Daniel Shawul wrote:
diep wrote:
hgm wrote:I don't think this is theoreticlly possible: the pawn advance is a function in a multi-dimensional spae, and it cannot be mapped by a continuous function on the one-dimensional space of numbers.


BS statement.

I solved that in Diep already in 1994 or so. I was 20 years old and just had started Diep.

Note that the best way to order moves is just using a picksort - i'm using that of course.

But solving just the pawns is so simple....

Note with nowadays bitboards you can solve it in a different manner.
Just enumerate for white the fields a8 as least significant, followed by b8 and h1 as most significant.

Now with a simple bitboard trick you can get out the first pawn.
then you fiddle with that and you got the pawn move.

You know this is highschool math...

It is 1 dimensional math...

But I don't understand why you would need this. Why not simply make a pawn hash table which lists all possible positions, and the name of the file on the HD where you stored the slice, if it is already computed? You can use a hashed perft on positions with Pawns only (and counts of the other pieces, so you know when lane-changing is still possible), starting with a placement step on 2nd rank. Then you can just run through the list to search for not-yet-computed slices that have all successors already computed, and start working on those. If it is not a pure Pawn ending, I don't think you would ever get in a situation where a linear search through the list of Pawn structures would take significant time compared to calculating a single slice.

This also makes it much easier to fully exploit the exchange symmetry of a large number of Pawns.

With pure Pawn endings slices would only be 64 positions, so for the sake of efficiency you would probably combine them to super-slices, e.g. only slice based on white Pawn locations.
On a GPU it makes sense, but he's making a mistake.

He should talk to Joost Buijs how he did do his ordering for nightmare in 90s...
I think you missed why it is difficult to enumerate. The question is given an index (a number) determine the locations of the n-bitbases. I like this approach a lot because it avoids the use of nested loops for each piece added. If I want to generate 10 men my function is still

Code: Select all

for&#40;i=0;i < n_positions;i++) &#123;
    sq&#91;0&#93;=f&#40;i&#41; , sq&#91;1&#93; = g&#40;i&#41;, ...
&#125;
This is much more difficult than you think. Even enumerating n alike pieces requires some weird math. Look in my other post for a code snippet. Now merging this for two groups of white and black pawns, you see it is a nightmare. Even though I think it may be possible to find an enumeration function it will be difficult.
I know the above code may be a little slower because when you go over inner loops, you still have to generate each square unlike the case shown below. I like the above code for its elegance but it may be slower ..

Code: Select all

for&#40;i=0;i<64;i++) &#123;
sq&#91;0&#93;=i;
for&#40;j=0;j<64;j++)
sq&#91;1&#93;=j;
...
&#123;
    if&#40;invalid&#41; continue; //or store invalid value anyway
&#125;
&#125;
Oh comeon diep has its own EGTBs. It's 1 lookup table to remove squares for n of the same pieces, i combine that with kings as the first layer (most significant). That's not for generation, that's the resulting EGTBs.

In the resulting EGTB i try to avoid positions with >= 2 pieces on the same squares (goes here and there wrong for en passant - i don't store castling rights). Nalimov goes further there removing also bunches of positions where the king can't stand legal for side not to move, yet that's white and black dependant - could save you 15% or so max. Read further why i don't do that.

After the up to a couple of hundreds of passes of bitmap generation, you convert the resulting bitmap to win/draw/loss. If you want DTM or something silly like that, then you need to add each bitmap pass to the DTM tables.

Either way, you'll be using 2 or 3 different EGTB formats. A simple bitmap format, if you want DTM or something more silly like correct 50 move rule, you'll need 3 formats. One to generate quickly. the same format for the win in N and then after it's all done you need to convert to the format that's using a better indexation removing more illegals.

DTM is not so clever nor anything similar. it's 1 or 2 bytes. If you use 50 move rule, you have still a 7 bits need an entry. That's way too much.

win/draw/loss compresses up to factor 1000 handsdown and for some even factor 5000+ is not a problem.

With the DTM's you'll get to factor 4 and the 50 move rule tables a tad under factor 4.

So with all 6 men in DTM format you need 1.3TB or so. Uncompressed wdl in Diep format is already 1.05TB (with TB as power of 10, not 1024) and with compression it's far smaller (not operational in Diep yet, maybe soon though might i get some 7 men done - but that's not a priority yet because of price of disks).

It's all about the compression.

So you're forced to WDL or you will need 1TB of SSD's.

Note that if you use WDL you don't need to do extreme effort to get a compact format - only mirroring is important and avoiding pieces on the same square/same pieces. So the big reductions that lose you a factor 2 if you don't do it . Positions where king is in check - that's a 0 in your format anyway - compresses well.

With DTM style format of course you better remove all positions here you can capture the king.

The whole question is: do you want 7 men tables one day?

If so - go WDL - if not, buy expensive SSD's and go get 6 men.

With spinning harddrives you can't use EGTBs in search you know other than what fits in your caches as you'll get too many nps in the first place.

So what fits in RAM or on a SSD, that's what you can use nowadays. Slowing down your nps too much is not gonna be happy.

You'll get 30 ply nominal anyway, so already in openingsposition you start probing EGTBs. That's usually KRPKRP and that's a rather useless EGTB to probe there.

So magnetic disks are too slow for that.

p.s. note biggest diff between generation format and resulting format: storing 5 positions a byte for the last and 1 bit a position for the generation format.
Last edited by diep on Wed Jul 25, 2012 12:36 am, edited 3 times in total.
syzygy
Posts: 5554
Joined: Tue Feb 28, 2012 11:56 pm

Re: pawn enumeration

Post by syzygy »

Daniel Shawul wrote:This is much more difficult than you think. Even enumerating n alike pieces requires some weird math. Look in my other post for a code snippet. Now merging this for two groups of white and black pawns, you see it is a nightmare. Even though I think it may be possible to find an enumeration function it will be difficult.
What makes it difficult are the symmetries if none of the sides have exactly 1 pawn.
If one side has exactly 1 pawn, all symmetry is eliminated by restricting this pawn to files a-d. The rest is a matter of placing k alike pieces in n squares, which you are already doing for pawnless tables anyway. First the pawns of the other side, then groups of pieces. Of course this doesn't yet solve the problem of preserving the partial ordering on pawn structures imposed by pawn moves, which you need for easy slicing.

To have a perfect index (i.e. no symmetry remaining and no broken positions with two pieces on the same square) with (say) white having two pawns and black either 0 or at least 2 pawns, you first have to distinguish between the case that the white pawns are in symmetric positions and the case that the white pawns are not in symmetric positions. The latter case is relatively easy, since indexing the white pawns breaks all symmetry. In the symmetric case, there is symmetry left that you can exploit when placing the remaining pawns or pieces. If black has no pawns you can use the white king for this. If black has pawns, you have to analyze a bit more. With 3 white pawns things actually get simpler, since the 3 pawns always break symmetry. So only KPPvKPP needs a bit more analysis. (Looking at my old code for KPPvPP in suicide chess, I distinguish 5 cases. The code for KPPvKPP would be similar. Not too bad.)
diep
Posts: 1822
Joined: Thu Mar 09, 2006 11:54 pm
Location: The Netherlands

Re: Related: malloc virtual

Post by diep »

Daniel Shawul wrote:
jdart wrote:Shared memory basically does this. See mmap (Linux) or CreateFileMapping/MapViewOfFile (Windows).

--Jon
Indeed, that is what I needed. Now I can think of available physical RAM as a cache of a much bigger address space. Here I could not allocate 13G with malloc/new even though it is 64bits machine.I read somewhere there is a special version of malloc can do that, but anyway mmap() is a direct solution. With this method, I can generate any size bitbases (5,6,7...etc) without the fear of exceeding physical RAM limits. The performance of generation depends on the efficiency of paging which is better done by the OS.
Thanks
the biggest possible bitmap is somewhere in the 3.x GB. How the hell do you get to 13GB?

You want to do it the slow nalimov way?

Load both EGTBs in RAM in 2 bytes a position and then total trash your caches doing the lookups, so instead of getting 1 bit you get 2 bytes an entry?

Just put it in bitmaps and then your L1 and L2 get the big hit.

Google for Ken Thompson - he was also using the bitmap manner - though weirdly reversed for some weird reason. You can also do it forward of course which is faster and more efficient.
diep
Posts: 1822
Joined: Thu Mar 09, 2006 11:54 pm
Location: The Netherlands

Re: pawn enumeration

Post by diep »

syzygy wrote:
Daniel Shawul wrote:This is much more difficult than you think. Even enumerating n alike pieces requires some weird math. Look in my other post for a code snippet. Now merging this for two groups of white and black pawns, you see it is a nightmare. Even though I think it may be possible to find an enumeration function it will be difficult.
What makes it difficult are the symmetries if none of the sides have exactly 1 pawn.
If one side has exactly 1 pawn, all symmetry is eliminated by restricting this pawn to files a-d. The rest is a matter of placing k alike pieces in n squares, which you are already doing for pawnless tables anyway. First the pawns of the other side, then groups of pieces. Of course this doesn't yet solve the problem of preserving the partial ordering on pawn structures imposed by pawn moves, which you need for easy slicing.

To have a perfect index (i.e. no symmetry remaining and no broken positions with two pieces on the same square) with (say) white having two pawns and black either 0 or at least 2 pawns, you first have to distinguish between the case that the white pawns are in symmetric positions and the case that the white pawns are not in symmetric positions. The latter case is relatively easy, since indexing the white pawns breaks all symmetry. In the symmetric case, there is symmetry left that you can exploit when placing the remaining pawns or pieces. If black has no pawns you can use the white king for this. If black has pawns, you have to analyze a bit more. With 3 white pawns things actually get simpler, since the 3 pawns always break symmetry. So only KPPvKPP needs a bit more analysis. (Looking at my old code for KPPvPP in suicide chess, I distinguish 5 cases. The code for KPPvKPP would be similar. Not too bad.)
You know - this is highschool type math. Not sure what courses you had there, but in my highschool we had some math regarding : "how many ways can i put n blue balls in there". Never had that?

The mirrorring is very well documented by Ken Thompson. There is nothing new to invent there.
syzygy
Posts: 5554
Joined: Tue Feb 28, 2012 11:56 pm

Re: pawn enumeration

Post by syzygy »

Daniel Shawul wrote:File slicing helps in two ways compared to exhaustive slicing. a) size is manageable and always 8 b) The slices are self-sufficient since you don't need to load other slices when pawn moves. The latter will require you to load the necessary helper slices manually. But exhaustive slicing could be advantageous for DTZ as Wylie pointed out.
I use exhaustive slicing, but only to make the generation more efficient, especially for DTZ but it should help as well for DTM and WDL For reducing memory requirements, I only use file slicing since that's enough. A file is smaller than a pawnless table, and I need to be able to do pawnless tables anyway.
Another topic I want to raise is parallelization of bitbase generation. Not the obvious generate multiple bitbases at the same time but that of efficiently generating a single bitbase. Despite my expectation it turns out to be very difficult to parallelize because I store 4 positions per byte.
Exactly. You need to be able to atomically update position values, which seems pretty hard to do efficiently when storing 2 bits per position.
If I use retrograde analysis, there will be a "false sharing" effect when two positons taking the same byte are accessed. Because of this I think a position should atleast take a byte for its value which is actually the case for dtm. Since during an egtb generation a positons's score is always improved (alpha-beta), I don't think two writes to the same position will cause problems.
They do in my scheme, since I do essentially two plies per iteration (I think what hgm calls leapfrogging). There is a danger that two threads attempt to set a position value at the same time, the first to value ply+1, the second to value ply. I solve this using a lock xchgcmpb instruction (which hardly slows down the generation on modern processors since practically all accesses are uncontested). Another point in the generation where this is needed is when retrograding captures.

I guess some inline assembly could be used to atomically change just two bits.
Ofcourse I can use the simple forward generation method that is easily parallelizable but it may be too slow.
Yes, that would be slow.
diep
Posts: 1822
Joined: Thu Mar 09, 2006 11:54 pm
Location: The Netherlands

Re: pawn enumeration

Post by diep »

syzygy wrote:
Daniel Shawul wrote:File slicing helps in two ways compared to exhaustive slicing. a) size is manageable and always 8 b) The slices are self-sufficient since you don't need to load other slices when pawn moves. The latter will require you to load the necessary helper slices manually. But exhaustive slicing could be advantageous for DTZ as Wylie pointed out.
I use exhaustive slicing, but only to make the generation more efficient, especially for DTZ but it should help as well for DTM and WDL For reducing memory requirements, I only use file slicing since that's enough. A file is smaller than a pawnless table, and I need to be able to do pawnless tables anyway.
Another topic I want to raise is parallelization of bitbase generation. Not the obvious generate multiple bitbases at the same time but that of efficiently generating a single bitbase. Despite my expectation it turns out to be very difficult to parallelize because I store 4 positions per byte.
Exactly. You need to be able to atomically update position values, which seems pretty hard to do efficiently when storing 2 bits per position.
If I use retrograde analysis, there will be a "false sharing" effect when two positons taking the same byte are accessed. Because of this I think a position should atleast take a byte for its value which is actually the case for dtm. Since during an egtb generation a positons's score is always improved (alpha-beta), I don't think two writes to the same position will cause problems.
They do in my scheme, since I do essentially two plies per iteration (I think what hgm calls leapfrogging). There is a danger that two threads attempt to set a position value at the same time, the first to value ply+1, the second to value ply. I solve this using a lock xchgcmpb instruction (which hardly slows down the generation on modern processors since practically all accesses are uncontested). Another point in the generation where this is needed is when retrograding captures.

I guess some inline assembly could be used to atomically change just two bits.
Ofcourse I can use the simple forward generation method that is easily parallelizable but it may be too slow.
Yes, that would be slow.
Please don't tell me you give a course programming somewhere :)

Except the: "How my methods can slow you down at modern hardware - let's start writing in the same cacheline with different threads"

p.s. Forward generating is of course 60 times faster than any of your methods in normal OTB chess - google for Urban Koistinen.
diep
Posts: 1822
Joined: Thu Mar 09, 2006 11:54 pm
Location: The Netherlands

Re: pawn enumeration

Post by diep »

I found it for you:

http://www.abc.se/~m10051/eg.txt

Go read that :)

It's 60x faster.

That's how on those P4's all those 7 men were generated by those 2 guys.

Basic concept is very simple. If white is to move,
then put the black king as the last piece.

Now you generate the index for the white pieces without the black king.

As every move that already can capture the black ing already was set to a win, it doesn't matter.

So you generate a legal movelist (index lookup) for 64 positions at the same time.

That beats everything.

Then for black to move, you put the black king as last piece.

Of course you generate forwards - i never understood all those guys who invented generating inverse moves.

A generator that generates forward is real easy to build. Inverse generation has a lot of pitfalls.

So this speeds up practical factor 60.
Last edited by diep on Wed Jul 25, 2012 1:04 am, edited 1 time in total.
syzygy
Posts: 5554
Joined: Tue Feb 28, 2012 11:56 pm

Re: pawn enumeration

Post by syzygy »

diep wrote:DTM is not so clever nor anything similar. it's 1 or 2 bytes. If you use 50 move rule, you have still a 7 bits need an entry. That's way too much.
For up to 5 pieces, my WDL50+ tables are 438MB and my DTZ50+ tables are 668MB. The DTZ50+ tables are half-sided and make use of the WDL50+ information. So draws in DTZ50+ can be compressed as don't cares and win/loss values can be stored without a sign. I also store distances in full moves unless I potentially need plies for perfect 50-move rule play. There are lots of further tricks.
It's all about the compression.
How much is 5-piece WDL for diep? (And do you store both wtm and dtm or half-sided?)
So you're forced to WDL or you will need 1TB of SSD's.
I completely agree that during search one should use WDL (on SSD or in RAM) and not DTM or DTZ.