pawn enumeration

Discussion of chess software programming and technical issues.

Moderator: Ras

syzygy
Posts: 5880
Joined: Tue Feb 28, 2012 11:56 pm

Re: price of complex indexing

Post by syzygy »

hgm wrote:
syzygy wrote:(There might be a problem though with such cache lines mapping to the same cache entry, since they are at distances (a mutiple of) a power of two.)
This can be avoided by using 64-byte spacer regions in between the higher slices. If you don't want to do that in the index, (e.g. to keep it easy to mirror the board by bit-manipulation on the index), you can use an index->address translation just before every access, like

address = index*01000101 >> 18;

This is still much faster than the access itself, so it will not cause a noticeable slowdown. (As it will be computed in parallel.)
Yes, I will certainly be trying something like this if generation speed really drops for 6-piece tables.
Daniel Shawul
Posts: 4186
Joined: Tue Mar 14, 2006 11:34 am
Location: Ethiopia

Re: price of complex indexing

Post by Daniel Shawul »

syzygy wrote:
Daniel Shawul wrote:Just because you say so doesn't mean it is true for everyone. How about giving a reason for why it is?
Clearly the reason is that it is more efficient! Fewer instructions executed including fewer branches. Magic move generation must have good cache locality, because it is unbeatable by other techniques that use much smaller arrays.
Let me just remind you the fastest perfts are usually mailbox qpertt,iperft etc. The tablebases are even more appropriate because the only weeknesses of 0x88 move generation (capture generation) is gone. You generate all moves at once so capture any opponent piece that got on your way. However without the help of hardware one can atleast see that bitboard will be slower with the bitscans for non-capture generation. That has always be the case so that is the reason why I got surprized.
I mentioned that an int[128] will be in cache anyway and it is wrong to forward a reason that 0x88 is bulky.
That's not the reason I gave... You need to maintain it. That takes time. If you make that unnecessary, you save time. Saving time means the generation speeds up. Q.E.D.?
You also maintain bitboards with clear/set bit besides loosing big on cache hits.
Look, it's fine if you don't think it will work for your program, but just saying that it is all wrong and bullshit doesn't look too good on you, at least not in my eyes.
Ok sorry about that but try to understand why this did not go down well with me.
You are running a single instance of the program. Move generation in chess engines is fast with 0x88 especially when captures are not invovled. I can only see it helping attacks() which could be the cause if you see speedups...
Again, if you generate captures for tablebase generation you are doing something wrong. I'm not sure what you mean by "you are running a single instance of the program".
I clearly said captures are not necessary for tbs which is why I am even for 0x88 so I don't understand...
Running a single program instance means you can get away with pretty big data structures for your purpose. In our case we have one program generating the tbs. This is in very big contrast to doing any kind of search on the GPU. I recently had big fights here about using 0x88. I was against 0x88 because you run thousands of threads and you don't have space for board[128] or move lists of [256] etc.. So I used bitboards not majics however! Kindergarten bitboards require about 512bytes or so for attacks so I used that start thousands of threads working together which would be impossible with piece lists.
My generator generates and compresses all 3-4-5 piece WDL50+ and DTZ50+ tables in 2 hours 20 minutes on my laptop. Most of that time is spent on compression. Actual generation time is less than an hour (maybe 45 minutes or so). I have no numbers yet for 6 pieces.
That is great because mine sucks. It takes about 20 secs for each 4 men but 5 men could take anywhere between 6 - 12 minutes each. I always otpmized for compression infact the bitbases I distributed took about 5 months with the forward generator. I was lazy to study the retro grade analyzer about 4 years ago :(
syzygy
Posts: 5880
Joined: Tue Feb 28, 2012 11:56 pm

Re: price of complex indexing

Post by syzygy »

Daniel Shawul wrote:This sounds like a good way of doing the backward generating method with 1bit counter/visit bitmap.
As I understand it, you are using 3 bits per position, so you can have 8 possible values. I would propose using 7 values: unknown, illegal, win, loss, draw, potloss, newwin.
Algorithm:
1. all positions to unknown
2. retrograde from king captures to find illegal positions (i.e. side-to-move can capture the king)
3. retrograde from captures (captured piece by captured piece), setting positions to newwin, draw and potloss.
4. loop through each table (white, black) to find mates (set to loss). A position is mate if its value is unknown, the value of the corresponding position in the other table is illegal, and all non-capture moves lead to a position with value illegal. Set parents to newwin.
5. Loop through white, black, white, black, until you are finished:
- set positions with value newwin to win, set parents to potloss.
- verify positions with value potloss. if not lost, set to unknown. if lost, set to loss and set parents to newwin.
6. set positions with value unknown to draw.
syzygy
Posts: 5880
Joined: Tue Feb 28, 2012 11:56 pm

Re: price of complex indexing

Post by syzygy »

syzygy wrote:3. retrograde from captures (captured piece by captured piece), setting positions to newwin, draw and potloss.
To clarify: if a particular capture leads to a loss (i.e. the capture is winning), then each parent is set to newwin.
If the capture leads to a draw, then each parent is set to draw, unless it is already set to newwin (because of retrograding captures from another captured piece).
If the capture leads to a loss, then each parent is set to potloss, unless it is already set to newwin or draw.
syzygy
Posts: 5880
Joined: Tue Feb 28, 2012 11:56 pm

Re: price of complex indexing

Post by syzygy »

syzygy wrote:5. Loop through white, black, white, black, until you are finished:
- set positions with value newwin to win, set parents to potloss.
- verify positions with value potloss. if not lost, set to unknown. if lost, set to loss and set parents to newwin.
Of course parents should only be set to potloss if they have a value of unknown. Parents should only be set to newwin if they do not have a value of win or illegal (so if their value is one of unknown, draw, potloss; a value of loss will not happen).
Daniel Shawul
Posts: 4186
Joined: Tue Mar 14, 2006 11:34 am
Location: Ethiopia

Re: price of complex indexing

Post by Daniel Shawul »

syzygy wrote:
Daniel Shawul wrote:This sounds like a good way of doing the backward generating method with 1bit counter/visit bitmap.
As I understand it, you are using 3 bits per position, so you can have 8 possible values. I would propose using 7 values: unknown, illegal, win, loss, draw, potloss, newwin.
Yes 3 bits but I have 2bits for score and 1bit for marking as visited separate. It may be better to bring them together if some states are impossible like a visited-illegal position. Let me just describe what I use for 6 men bitbases generation before we go on
I have 4 states
00-Draw
01-Win
10-Loss
11-Illegal
I generate white to move and black to move bitbases at the same time. You know I always get confused that it is possible to generate only one stm bitbases but what I didnt understand was that it usually requires a 1bit bitmap to store won positons for that side we didn't dedicate a full byte to. So for bitbases doing both at the same time is competitive. Anyway 2bit/pos for score wtm & btm each, 1bit/pos for vistis for wtm & btm as well.

Note that wtm & btm wins are done in the same pass.

a) mark all as ILLEGAL (differs from regular backward algo which mark them as LOSS)
b) first iteration -> generate all moves
i) captures/promotions -> get -score and store it
ii) Increment counter. If it is a loss && we have moves left set it back to ILLEGAL
iii) No moves mate or stalemate
c) backward pass
i) LOSS for any side -> WIN for the opposite side
ii) WIN for any side -> Potential LOSS for the other side and do verify
-> generate non-captures and break out on WIN or ILLEGAL (meaning not visited). Very fast because of that
d) Iterate on c. If I do sides separately it will not converge but doing both sides converges.
e) Final pass, if position has still ILLEGAL, then check if opponent attacks our king and it is his turn. If not illegal positions set it as draw, otherwise leave it as illegal for the compressor to decide what it sets it too.

That is it and it works as verified by the 4-5 men generation but it is significantly slower than using a byte and avoiding verfication step.

I will think about your proposal and get back to you. It seems it is much different than what I have.
cheers
Algorithm:
1. all positions to unknown
2. retrograde from king captures to find illegal positions (i.e. side-to-move can capture the king)
3. retrograde from captures (captured piece by captured piece), setting positions to newwin, draw and potloss.
4. loop through each table (white, black) to find mates (set to loss). A position is mate if its value is unknown, the value of the corresponding position in the other table is illegal, and all non-capture moves lead to a position with value illegal. Set parents to newwin.
5. Loop through white, black, white, black, until you are finished:
- set positions with value newwin to win, set parents to potloss.
- verify positions with value potloss. if not lost, set to unknown. if lost, set to loss and set parents to newwin.
6. set positions with value unknown to draw.
Daniel Shawul
Posts: 4186
Joined: Tue Mar 14, 2006 11:34 am
Location: Ethiopia

Re: price of complex indexing

Post by Daniel Shawul »

Now that I look in to it more carefully, I think the ideas are similar. A major difference is that you have the 3 bits together and that can actually give you more states than me. I was playing with 4 states so yours could be better. Newwin = not visited + win for me etc..
syzygy wrote:
Daniel Shawul wrote:This sounds like a good way of doing the backward generating method with 1bit counter/visit bitmap.
As I understand it, you are using 3 bits per position, so you can have 8 possible values. I would propose using 7 values: unknown, illegal, win, loss, draw, potloss, newwin.
Algorithm:
1. all positions to unknown
I set them as Illegal which is probably the same anyway. With the byte coutnter it is set to Loss however
2. retrograde from king captures to find illegal positions (i.e. side-to-move can capture the king)
I do this in the last step! If the position has still Illegal, I go and check by setting the postion and see if my king is attacked and opponent is to move. Otherwise I set it to draw.
3. retrograde from captures (captured piece by captured piece), setting positions to newwin, draw and potloss.
Yes I take care of that in the first pass. But potloss is set back to Illegal if there are some non-captures as the tb is not built yet to verify all of them.
4. loop through each table (white, black) to find mates (set to loss). A position is mate if its value is unknown, the value of the corresponding position in the other table is illegal, and all non-capture moves lead to a position with value illegal. Set parents to newwin.
I also work on two sides in the same pass. But I am not sure I understand the rest. later
5. Loop through white, black, white, black, until you are finished:
- set positions with value newwin to win, set parents to potloss.
- verify positions with value potloss. if not lost, set to unknown. if lost, set to loss and set parents to newwin.
This is the same for me I think.
6. set positions with value unknown to draw.
Only difference I check for illegality of positions here too.

So I would say the ideas are the same but mine could be worse since I have less states to work with since I separated them to 2bits score and 1bit visited. It is small implementations difference but the ideas are prety much similar.
syzygy
Posts: 5880
Joined: Tue Feb 28, 2012 11:56 pm

Re: price of complex indexing

Post by syzygy »

Daniel Shawul wrote:
syzygy wrote:
Daniel Shawul wrote:This sounds like a good way of doing the backward generating method with 1bit counter/visit bitmap.
As I understand it, you are using 3 bits per position, so you can have 8 possible values. I would propose using 7 values: unknown, illegal, win, loss, draw, potloss, newwin.
Yes 3 bits but I have 2bits for score and 1bit for marking as visited separate. It may be better to bring them together if some states are impossible like a visited-illegal position.
Also with my scheme you could keep the 2 bits and 1 bit separate. Unfortunately I don't immediately see a good encoding that allows you to only access one of the two when updating parents.
Note that wtm & btm wins are done in the same pass.
a) mark all as ILLEGAL (differs from regular backward algo which mark them as LOSS)
b) first iteration -> generate all moves
i) captures/promotions -> get -score and store it
ii) set score and increment counter. If it is a loss && we have moves left set it back to ILLEGAL[/quote]
This way of doing captures is slower than retrograding from (legal) captures.
What do you mean by incrementing counter?
c) backward pass
i) LOSS for any side -> WIN for the opposite side
ii) WIN for any side -> Potential LOSS for the other side and do verify
-> generate non-captures and break out on WIN or ILLEGAL (meaning not visited). Very fast because of that
This does many more verifications than my scheme needs. I basically delay the verification so that a position will often have been found to be potentially losing multiple times but only needs one verification. Also, a position that has been found to be potentially losing might turn out to be won before verification is performed. In addition, my approach seems to be more cache friendly since verification visits the same cache lines as marking positions as newwin or potloss.

Also you don't seem to be distinguishing between newly resolved positions and old positions? That would mean you do a lot of unnecessary double work. But I suppose you use that 1 bit/pos here?
d) Iterate on c. If I do sides separately it will not converge but doing both sides converges.
If alternating between white and black does not converge, there is a problem with your algorithm. I see no reason why it would not converge: the number of resolved positions is strictly increasing and cannot be larger than the total number of positions. Once it stops increasing, you are finished.
Daniel Shawul
Posts: 4186
Joined: Tue Mar 14, 2006 11:34 am
Location: Ethiopia

Re: price of complex indexing

Post by Daniel Shawul »

Also with my scheme you could keep the 2 bits and 1 bit separate. Unfortunately I don't immediately see a good encoding that allows you to only access one of the two when updating parents.
Well then if you separate them then it would be much like mine. I am saying yours is better because you could have multiple states together...
This way of doing captures is slower than retrograding from (legal) captures.
What do you mean by incrementing counter?
How can I retrograde from captures when I have those stored in separate tables and I access them through an egbbdll ? Anyway with the byte counter you still need to generate all moves so I use that for the 1bit case too. First iteration is slow in any case for me. And that is not a problem as it the only place I do parallel generation. Everything has a plus and dowside so it is a matter of choice. For me it sound weird that you retrograde from captures too...
This does many more verifications than my scheme needs. I basically delay the verification so that a position will often have been found to be potentially losing multiple times but only needs one verification.
Well then I don't understand your method. To delay for verfication of potential losses you need a counter or something you trigger verfication search up on reaching a threshold value. I have only one bit left so I just try verfication which is very fast btw because it returns when it finds an univisted postions and ofcourse a mate.
Also, a position that has been found to be potentially losing might turn out to be won before verification is performed. In addition, my approach seems to be more cache friendly since verification visits the same cache lines as marking positions as newwin or potloss.
It is too early to talk about cache optimizations ;) My guess is doing verifciation will help in convergence since it finds wins if there are any. You wait (through some means) for a retrograde to change your potential loss to win, I do it right away. So clearly mine wins if the potential loss is actually a win. If it is not then any univisited position will break the loop. The only problem is when most are visited and are lost leading to a real loss.
Also you don't seem to be distinguishing between newly resolved positions and old positions? That would mean you do a lot of unnecessary double work. But I suppose you use that 1 bit/pos here?
Ofcourse I forgot about that here but I clearly mentioned it in my previous post. Any win/loss that generated is marked as visited and never generates retro moves again. So each child position visits its parents only once and at the same time asks its parent for verification as well.
If alternating between white and black does not converge, there is a problem with your algorithm. I see no reason why it would not converge: the number of resolved positions is strictly increasing and cannot be larger than the total number of positions. Once it stops increasing, you are finished.
For the 5 men which use a byte counter can converge with the two sides back passed at the same time be it a win or loss, but when I use bits I can still backpass on the two sides but wins first, then losses. In code

Code: Select all

             GET(i,v1,ptab1);
				GET(i,v2,ptab2);
				if(!is_6man) {
					r1 = ((v1 == WIN && c1[i] != 0xff) || (v1 == LOSS && c1[i] == 0));
					r2 = ((v2 == WIN && c2[i] != 0xff) || (v2 == LOSS && c2[i] == 0));
				} else {
					if(!pass) {
						if(iteration > 1) {
							r1 = (v1 == WIN && !(c1[i >> 3] & (1 << (i & 7))));
							r2 = (v2 == WIN && !(c2[i >> 3] & (1 << (i & 7))));
						} else {
							r1 = (v1 == WIN);
							r2 = (v2 == WIN);
						}
					} else {
						if(iteration > 1) {
							r1 = (v1 == LOSS && !(c1[i >> 3] & (1 << (i & 7))));
							r2 = (v2 == LOSS && !(c2[i >> 3] & (1 << (i & 7))));
						} else {
							r1 = (v1 == LOSS);
							r2 = (v2 == LOSS);
						}
					}
				}
syzygy
Posts: 5880
Joined: Tue Feb 28, 2012 11:56 pm

Re: price of complex indexing

Post by syzygy »

Daniel Shawul wrote:
4. loop through each table (white, black) to find mates (set to loss). A position is mate if its value is unknown, the value of the corresponding position in the other table is illegal, and all non-capture moves lead to a position with value illegal. Set parents to newwin.
I also work on two sides in the same pass. But I am not sure I understand the rest. later
The test for whether a position is mate is as follows: if white is to move, the value in the white table is unknown, the value in the black position is illegal, and all non-capture moves bring you to a position (in the black table) with value illegal, then the position is mate. Otherwise, it is not.

So what I do is loop through white and black at the same time. If one value equals unknown and the other equals illegal, then I loop through the moves for the appropriate side and stop once I find a legal move. If I don't find legal moves, I set the position to from unknown to loss. (I know that there are no legal captures either, since otherwise the value of the position would have been potloss, newwin or draw and not unknown.)

As you see, I never need to verify capture moves. Not when testing for mate, not when verifying potloss cases. That might be different in your generator (I'm not sure). This is the reason for the draw state.