pawn enumeration

Discussion of chess software programming and technical issues.

Moderator: Ras

Daniel Shawul
Posts: 4186
Joined: Tue Mar 14, 2006 11:34 am
Location: Ethiopia

Re: Related: malloc virtual

Post by Daniel Shawul »

diep wrote:
Daniel Shawul wrote:
jdart wrote:Shared memory basically does this. See mmap (Linux) or CreateFileMapping/MapViewOfFile (Windows).

--Jon
Indeed, that is what I needed. Now I can think of available physical RAM as a cache of a much bigger address space. Here I could not allocate 13G with malloc/new even though it is 64bits machine.I read somewhere there is a special version of malloc can do that, but anyway mmap() is a direct solution. With this method, I can generate any size bitbases (5,6,7...etc) without the fear of exceeding physical RAM limits. The performance of generation depends on the efficiency of paging which is better done by the OS.
Thanks
the biggest possible bitmap is somewhere in the 3.x GB. How the hell do you get to 13GB?
That was before I did pawn slicing. Now it is about 4.4GB for generation of largest 6 men pawnless bitbases that uses 3bits/pos per side. 2*(462x62x61x60x59)*3bits/pos / 8 = 4.4GB. I may reduce this further by 1/3rd to 3G but I really don't know how to do that while using 3bits/pos. Can someone explain how to generate one side bitbases only ?
You want to do it the slow nalimov way?

Load both EGTBs in RAM in 2 bytes a position and then total trash your caches doing the lookups, so instead of getting 1 bit you get 2 bytes an entry?

Just put it in bitmaps and then your L1 and L2 get the big hit.

Google for Ken Thompson - he was also using the bitmap manner - though weirdly reversed for some weird reason. You can also do it forward of course which is faster and more efficient.
diep
Posts: 1822
Joined: Thu Mar 09, 2006 11:54 pm
Location: The Netherlands

Re: Related: malloc virtual

Post by diep »

Daniel Shawul wrote:
diep wrote:
Daniel Shawul wrote:
jdart wrote:Shared memory basically does this. See mmap (Linux) or CreateFileMapping/MapViewOfFile (Windows).

--Jon
Indeed, that is what I needed. Now I can think of available physical RAM as a cache of a much bigger address space. Here I could not allocate 13G with malloc/new even though it is 64bits machine.I read somewhere there is a special version of malloc can do that, but anyway mmap() is a direct solution. With this method, I can generate any size bitbases (5,6,7...etc) without the fear of exceeding physical RAM limits. The performance of generation depends on the efficiency of paging which is better done by the OS.
Thanks
the biggest possible bitmap is somewhere in the 3.x GB. How the hell do you get to 13GB?
That was before I did pawn slicing. Now it is about 4.4GB for generation of largest 6 men pawnless bitbases that uses 3bits/pos per side. 2*(462x62x61x60x59)*3bits/pos / 8 = 4.4GB. I may reduce this further by 1/3rd to 3G but I really don't know how to do that while using 3bits/pos. Can someone explain how to generate one side bitbases only ?
You want to do it the slow nalimov way?

Load both EGTBs in RAM in 2 bytes a position and then total trash your caches doing the lookups, so instead of getting 1 bit you get 2 bytes an entry?

Just put it in bitmaps and then your L1 and L2 get the big hit.

Google for Ken Thompson - he was also using the bitmap manner - though weirdly reversed for some weird reason. You can also do it forward of course which is faster and more efficient.
the easiest thing is to use bitmaps. 1 bit a position. That reduces the lookup problem considerable. At 1 bit a position suddenly you're 3.x GB.

What you stream through sequential, doesn't matter whether you store that in 1 bitmap or 2 bitmaps.

It's about the lookup buffer that you want to keep as tiny as possible. That is your biggest speedwin.
Daniel Shawul
Posts: 4186
Joined: Tue Mar 14, 2006 11:34 am
Location: Ethiopia

Re: Related: malloc virtual

Post by Daniel Shawul »

Mine is also a bitbmap. It doesn't have to be 1bit/pos to be called so. You would need atleast 1bit more for marking as visited otherwise the generator will spend too much time re-visiting. Anyway what I have a problem is how to generate one side bitbases. Say I mark all the losses for white->then wins for black->losses for white etc... It seems you need the bitmap for black to move as well bringing your total to 3bitmaps. I know there is a "leap-frog" or similar method that can generate one side only but I can't figure it out right now...
Daniel Shawul
Posts: 4186
Joined: Tue Mar 14, 2006 11:34 am
Location: Ethiopia

Re: pawn enumeration

Post by Daniel Shawul »

diep wrote:
syzygy wrote:
Daniel Shawul wrote:This is much more difficult than you think. Even enumerating n alike pieces requires some weird math. Look in my other post for a code snippet. Now merging this for two groups of white and black pawns, you see it is a nightmare. Even though I think it may be possible to find an enumeration function it will be difficult.
What makes it difficult are the symmetries if none of the sides have exactly 1 pawn.
If one side has exactly 1 pawn, all symmetry is eliminated by restricting this pawn to files a-d. The rest is a matter of placing k alike pieces in n squares, which you are already doing for pawnless tables anyway. First the pawns of the other side, then groups of pieces. Of course this doesn't yet solve the problem of preserving the partial ordering on pawn structures imposed by pawn moves, which you need for easy slicing.

To have a perfect index (i.e. no symmetry remaining and no broken positions with two pieces on the same square) with (say) white having two pawns and black either 0 or at least 2 pawns, you first have to distinguish between the case that the white pawns are in symmetric positions and the case that the white pawns are not in symmetric positions. The latter case is relatively easy, since indexing the white pawns breaks all symmetry. In the symmetric case, there is symmetry left that you can exploit when placing the remaining pawns or pieces. If black has no pawns you can use the white king for this. If black has pawns, you have to analyze a bit more. With 3 white pawns things actually get simpler, since the 3 pawns always break symmetry. So only KPPvKPP needs a bit more analysis. (Looking at my old code for KPPvPP in suicide chess, I distinguish 5 cases. The code for KPPvKPP would be similar. Not too bad.)
You know - this is highschool type math. Not sure what courses you had there, but in my highschool we had some math regarding : "how many ways can i put n blue balls in there". Never had that?

The mirrorring is very well documented by Ken Thompson. There is nothing new to invent there.
Again you completely miss this one but I don't want to dwell on this in light of other useful discussion you make :)
diep
Posts: 1822
Joined: Thu Mar 09, 2006 11:54 pm
Location: The Netherlands

Re: pawn enumeration

Post by diep »

Daniel Shawul wrote:
diep wrote:
syzygy wrote:
Daniel Shawul wrote:This is much more difficult than you think. Even enumerating n alike pieces requires some weird math. Look in my other post for a code snippet. Now merging this for two groups of white and black pawns, you see it is a nightmare. Even though I think it may be possible to find an enumeration function it will be difficult.
What makes it difficult are the symmetries if none of the sides have exactly 1 pawn.
If one side has exactly 1 pawn, all symmetry is eliminated by restricting this pawn to files a-d. The rest is a matter of placing k alike pieces in n squares, which you are already doing for pawnless tables anyway. First the pawns of the other side, then groups of pieces. Of course this doesn't yet solve the problem of preserving the partial ordering on pawn structures imposed by pawn moves, which you need for easy slicing.

To have a perfect index (i.e. no symmetry remaining and no broken positions with two pieces on the same square) with (say) white having two pawns and black either 0 or at least 2 pawns, you first have to distinguish between the case that the white pawns are in symmetric positions and the case that the white pawns are not in symmetric positions. The latter case is relatively easy, since indexing the white pawns breaks all symmetry. In the symmetric case, there is symmetry left that you can exploit when placing the remaining pawns or pieces. If black has no pawns you can use the white king for this. If black has pawns, you have to analyze a bit more. With 3 white pawns things actually get simpler, since the 3 pawns always break symmetry. So only KPPvKPP needs a bit more analysis. (Looking at my old code for KPPvPP in suicide chess, I distinguish 5 cases. The code for KPPvKPP would be similar. Not too bad.)
You know - this is highschool type math. Not sure what courses you had there, but in my highschool we had some math regarding : "how many ways can i put n blue balls in there". Never had that?

The mirrorring is very well documented by Ken Thompson. There is nothing new to invent there.
Again you completely miss this one but I don't want to dwell on this in light of other useful discussion you make :)
Well i'm not missing anything here. I already did implement it. And it's simple.
diep
Posts: 1822
Joined: Thu Mar 09, 2006 11:54 pm
Location: The Netherlands

Re: Related: malloc virtual

Post by diep »

Daniel Shawul wrote:Mine is also a bitbmap. It doesn't have to be 1bit/pos to be called so. You would need atleast 1bit more for marking as visited otherwise the generator will spend too much time re-visiting. Anyway what I have a problem is how to generate one side bitbases. Say I mark all the losses for white->then wins for black->losses for white etc... It seems you need the bitmap for black to move as well bringing your total to 3bitmaps. I know there is a "leap-frog" or similar method that can generate one side only but I can't figure it out right now...
Do you understand that if we are generating at lightspeed, that we have a few bitmaps that we STREAM, and a few bitmaps where we do in random manner probing to?

The ones we do random probing to, if you use 64KB - 512KB segments, you'll get to the sequential speed of your harddrive. Yet it's crucial that the bitmap we lookup to is as tiny as possible.

1 bit per position is the minimum without compression, so that's what you use.

Anything larger will slow you down of course.

As for the number of bitmaps:

You'll end up with a dozen bitmaps of course. In each given position you maximize also with the capture bitmap.

This all goes at warp speed. 64 positions at the same time.
Every bitmap you lookup to, you need it in 2 formats.

However the bitmaps you stream sequential through it doesn't matter how many you have. This is what your processor has been optimized for. Streaming.

So that doesn't eat much of a RAM.

The bitmap you lookup to, that's what its all about.
That one is in your caching system.

You selfcreated caching system will handle it. In Diep i use 64KB bucket size for caching. A number of megabytes in total is enough, but it increases a tad for the 6 men. You need more RAM for caching with 6 men than for 5 men.

64KB at the time was the minimum to get sequential behaviour of harddrives. I'm not sure what's ideal for todays magnetic harddrives. Might be you might to move to 128KB or 256KB.

With 64KB at IDE (parallel) drives i get close to the full sequential speed.
So random probing then, with the caching system gets nearly the same bandwidth out of the drive as when you stream to/from it.
Last edited by diep on Wed Jul 25, 2012 2:31 pm, edited 1 time in total.
Daniel Shawul
Posts: 4186
Joined: Tue Mar 14, 2006 11:34 am
Location: Ethiopia

Re: pawn enumeration

Post by Daniel Shawul »

Ok then what is the function that will be used for the 7men KPPPKpp. It should be able to give all the squares for each piece given the index. Let me give you the example that does enumerate alike pieces for one side only given and index.

Code: Select all

void get_squares_like(int* sq,const int N,const int index) {
   int comb;
   sq[0] = index;
   for(int i = N - 1;i >= 1;i--) {
      sq[i] = i;
      comb = 1;
      while(sq[0] >= comb) {
         sq[0] -= comb;
         comb = combination[++sq[i]][i - 1];
      }
   }
}
That is non-trivial code to say the least and now I want a function something like that that does it for two groups of alike pieces together. Not something that sorts given a list, which you and some others seem to harp on. That is not the question. The question is given an index, how to get all the squares. I already said that you can use for nested loops over a broader range and skip some of the invalid ones but I judged that as not ellegant. If I was allowed to loop over a broader range, it would be a high schoool math but that is not the question.
Daniel Shawul
Posts: 4186
Joined: Tue Mar 14, 2006 11:34 am
Location: Ethiopia

Re: Related: malloc virtual

Post by Daniel Shawul »

diep wrote:
Daniel Shawul wrote:Mine is also a bitbmap. It doesn't have to be 1bit/pos to be called so. You would need atleast 1bit more for marking as visited otherwise the generator will spend too much time re-visiting. Anyway what I have a problem is how to generate one side bitbases. Say I mark all the losses for white->then wins for black->losses for white etc... It seems you need the bitmap for black to move as well bringing your total to 3bitmaps. I know there is a "leap-frog" or similar method that can generate one side only but I can't figure it out right now...
Do you understand that if we are generating at lightspeed, that we have a few bitmaps that we STREAM, and a few bitmaps where we do in random manner probing to?

The ones we do random probing to, if you use 64KB - 512KB segments, you'll get to the sequential speed of your harddrive. Yet it's crucial that the bitmap we lookup to is as tiny as possible.

1 bit per position is the minimum without compression, so that's what you use.

Anything larger will slow you down of course.

As for the number of bitmaps:

You'll end up with a dozen bitmaps of course. In each given position you maximize also with the capture bitmap.

This all goes at warp speed. 64 positions at the same time.
Every bitmap you lookup to, you need it in 2 formats.

However the bitmaps you stream sequential through it doesn't matter how many you have. This is what your processor has been optimized for. Streaming.

So that doesn't eat much of a RAM.

The bitmap you lookup to, that's what its all about.
That one is in your caching system.

You selfcreated caching system will handle it. In Diep i use 64KB bucket size for caching. A number of megabytes in total is enough, but it increases a tad for the 6 men. You need more RAM for caching with 6 men than for 5 men.

64KB at the time was the minimum to get sequential behaviour of harddrives. I'm not sure what's ideal for todays magnetic harddrives. Might be you might to move to 128KB or 256KB.

With 64KB at IDE (parallel) drives i get close to the full sequential speed.
So random probing then, with the caching system gets nearly the same bandwidth out of the drive as when you stream to/from it.
It is really hard to discuss with you when you go back to SSD/HD every minute. For the nth time I want everything on RAM so all this talk about streaming data is irrelevant. I have to ask if you generate your 6 men on RAM or optimized disk access...
diep
Posts: 1822
Joined: Thu Mar 09, 2006 11:54 pm
Location: The Netherlands

Re: pawn enumeration

Post by diep »

Daniel Shawul wrote:Ok then what is the function that will be used for the 7men KPPPKpp. It should be able to give all the squares for each piece given the index. Let me give you the example that does enumerate alike pieces for one side only given and index.

Code: Select all

void get_squares_like(int* sq,const int N,const int index) {
   int comb;
   sq[0] = index;
   for(int i = N - 1;i >= 1;i--) {
      sq[i] = i;
      comb = 1;
      while(sq[0] >= comb) {
         sq[0] -= comb;
         comb = combination[++sq[i]][i - 1];
      }
   }
}
That is non-trivial code to say the least and now I want a function something like that that does it for two groups of alike pieces together. Not something that sorts given a list, which you and some others seem to harp on. That is not the question. The question is given an index, how to get all the squares. I already said that you can use for nested loops over a broader range and skip some of the invalid ones but I judged that as not ellegant. If I was allowed to loop over a broader range, it would be a high schoool math but that is not the question.
kpppkpp in diep is 28,269,629,520 entries in total. Note i calculate the kings and pawnsize at the same time. Starting with the king-king reduction.

Keeping track of the kings is interesting as i don't want to end up with pawns on the square of a king.

The en passant i'm not doing so very efficiently. Around a 1970309880 positions go to en passant from the above EGTB size of 28G.

KPPPKPP would be an interesting EGTB to have as it's 1 pawn difference, also the EGTB is not too huge and i suspect supercompression will get it real tiny.

I suspect most 7 men will be extremely tiny with supercompression. Of course the 42p will be magically small as well, yet smaller amount of pieces always is more tricky and it's 1 dimension smaller from compression viewpoint. I look forward to doing that one day - when i have the time and priority to do it.

End 90s it seemed like EGTBs would get really important. We're in 2012 now and mistakes made first move out of book are STILL total dominant in computerchess... ...who can see the future role of EGTBs there can raise a hand though.
diep
Posts: 1822
Joined: Thu Mar 09, 2006 11:54 pm
Location: The Netherlands

Re: Related: malloc virtual

Post by diep »

Daniel Shawul wrote:
diep wrote:
Daniel Shawul wrote:Mine is also a bitbmap. It doesn't have to be 1bit/pos to be called so. You would need atleast 1bit more for marking as visited otherwise the generator will spend too much time re-visiting. Anyway what I have a problem is how to generate one side bitbases. Say I mark all the losses for white->then wins for black->losses for white etc... It seems you need the bitmap for black to move as well bringing your total to 3bitmaps. I know there is a "leap-frog" or similar method that can generate one side only but I can't figure it out right now...
Do you understand that if we are generating at lightspeed, that we have a few bitmaps that we STREAM, and a few bitmaps where we do in random manner probing to?

The ones we do random probing to, if you use 64KB - 512KB segments, you'll get to the sequential speed of your harddrive. Yet it's crucial that the bitmap we lookup to is as tiny as possible.

1 bit per position is the minimum without compression, so that's what you use.

Anything larger will slow you down of course.

As for the number of bitmaps:

You'll end up with a dozen bitmaps of course. In each given position you maximize also with the capture bitmap.

This all goes at warp speed. 64 positions at the same time.
Every bitmap you lookup to, you need it in 2 formats.

However the bitmaps you stream sequential through it doesn't matter how many you have. This is what your processor has been optimized for. Streaming.

So that doesn't eat much of a RAM.

The bitmap you lookup to, that's what its all about.
That one is in your caching system.

You selfcreated caching system will handle it. In Diep i use 64KB bucket size for caching. A number of megabytes in total is enough, but it increases a tad for the 6 men. You need more RAM for caching with 6 men than for 5 men.

64KB at the time was the minimum to get sequential behaviour of harddrives. I'm not sure what's ideal for todays magnetic harddrives. Might be you might to move to 128KB or 256KB.

With 64KB at IDE (parallel) drives i get close to the full sequential speed.
So random probing then, with the caching system gets nearly the same bandwidth out of the drive as when you stream to/from it.
It is really hard to discuss with you when you go back to SSD/HD every minute. For the nth time I want everything on RAM so all this talk about streaming data is irrelevant. I have to ask if you generate your 6 men on RAM or optimized disk access...
Do you use the RAM in GPU computing a lot?

If not. Why?

Basically with EGTBs there is 2 things you can't avoid.

a) that the files are on disk
b) that you have to do things at the cpu

The smaller the RAM size you use, the faster your EGTB generator is.

RAM is a lot slower than your L1/L2.

Preferably we do everything in L1 of course and only stream through RAM.

What you want to do is the slow nalimov way where you are going to get hurt bigtime by the slow RAM TLB latencies.

That'll slow you down factor 100.

Thanks,
Vincent