Indeed, in Xiangqi you have to be a bit more subtle, as moves can be blocked on squares that are not under attack themselves, and Cannon attacks can be 'activated' as well as blocked. But none of that takes really much extra time, it just takes more code to handle the larger variety of cases. After setting up a constellation for the winning side you have to mark squares where the opponent is under attack, can block attacks, or can activate attacks all separately, and have alternative handlers for each in these cases, which efficiently generate the affected squares.
The main point is that when you place the checked side's non-royal, it would affect any of the attackered squares only in a minority of cases. The chances that a piece blocks a Horse are pretty slim. Even a Rook at best attacks 17 squares out of 90. In the majority of cases placing the piece would change nothing, and you can simply mark the positions in the EGT corresponding corresponding to the attacked squares. Spending zero time on the majority of positions that were not in check. For the few placements of a non-royal piece that actually do change something, these again change only a minor fraction of the squares that were attacked. E.g. if the checker has a Horse, at most 4 placements out of 90 would block that Horse, and it would then only affect 2 King placements for each of these. That is a very negligible fraction.
Obviously I am not talking about 'trivial end-games'. For the problem to be considered solved, it must be solved for every end-game, no matter how complex or how much out of reach of current hardware. And that it is not generally solved does not mean that it is not solved for some sub-set of end-games. I don't think the paper describes such a general solution. Perhaps you developed one, I cannot judge that from here.
			
			
									
						
										
						Generate EGTB with graphics cards?
Moderator: Ras
- 
				hgm  
- Posts: 28395
- Joined: Fri Mar 10, 2006 10:06 am
- Location: Amsterdam
- Full name: H G Muller
- 
				noobpwnftw
- Posts: 694
- Joined: Sun Nov 08, 2015 11:10 pm
- Full name: Bojun Guo
Re: Generate EGTB with graphics cards?
Like you said, it just need more complicated to handle all cases. There aren't much difference in theory, consider this: My_Foul + My_Foul = Lose, My_Foul + Opp_Foul = Draw, Opp_Foul + Opp_Foul = Win, how to correctly define all of them are just specifics.
The main point of that paper is to introduce a secondary counter to push the progression of board states in case of perpetuals, so that one will not get stuck in a chain of loops, note that your problem with chasing different pieces and so on are no different from checking with different pieces, which would have happened in the aforementioned endgames.
I don't think over-engineering the generator is meaningful, but since labeling the perpetuals are complicated, i.e. finding discovered chases of rooks requires recursion, one obvious optimization is to not generate irrelevant moves when not necessary. I think you'd save more with generating only evasions when in chases and checks, when in_check() already initialized all attack tables.
I'm also not quite sure if I've covered every possible case of perpetuals in my tablebases, I have crossed checked most of them with another source: http://lpforth.forthfreak.net/endgame-en.html
However, it doesn't contain some more complicated endgames.
			
			
									
						
										
						The main point of that paper is to introduce a secondary counter to push the progression of board states in case of perpetuals, so that one will not get stuck in a chain of loops, note that your problem with chasing different pieces and so on are no different from checking with different pieces, which would have happened in the aforementioned endgames.
I don't think over-engineering the generator is meaningful, but since labeling the perpetuals are complicated, i.e. finding discovered chases of rooks requires recursion, one obvious optimization is to not generate irrelevant moves when not necessary. I think you'd save more with generating only evasions when in chases and checks, when in_check() already initialized all attack tables.
I'm also not quite sure if I've covered every possible case of perpetuals in my tablebases, I have crossed checked most of them with another source: http://lpforth.forthfreak.net/endgame-en.html
However, it doesn't contain some more complicated endgames.
- 
				hgm  
- Posts: 28395
- Joined: Fri Mar 10, 2006 10:06 am
- Location: Amsterdam
- Full name: H G Muller
Re: Generate EGTB with graphics cards?
Well, the original subject of this thread was whether EGT generation could be accelerated, and in particular whether GPUs would be useful in that respect. We seem to have drifted away from that completely, and although solving the chase + check or multi-chase problem generally is a topic that very much interest me, the original question is still unsolved. There still is the huge discrepancy between my observation that during generation my CPU is idle >99% of the time, while waiting for a data from DRAM to arrive, and phhnguyen's observation that he needs so much CPU power that it helps to have multiple threads divide the load. The only way I can think of to reconcile that is if he would need some 1000 times more CPU power as I do.
So the crucial question remains: how many positions per second do you resolve, while generating? FairyGen resolves about 2M/sec, Marcel van Kervinck's "PrettyFast" program does about 20M/sec. So how is that for your generators?
			
			
									
						
										
						So the crucial question remains: how many positions per second do you resolve, while generating? FairyGen resolves about 2M/sec, Marcel van Kervinck's "PrettyFast" program does about 20M/sec. So how is that for your generators?
- 
				noobpwnftw
- Posts: 694
- Joined: Sun Nov 08, 2015 11:10 pm
- Full name: Bojun Guo
Re: Generate EGTB with graphics cards?
KBNK on single thread:
Build DTC:
45ms
Build DTM:
30ms
WDL with compression:
22ms W:721B B:751B
DTC with compression:
16ms W:2748B B:3060B
DTM with compression:
24ms W:2746B B:3060B
			
			
									
						
										
						Build DTC:
45ms
Build DTM:
30ms
WDL with compression:
22ms W:721B B:751B
DTC with compression:
16ms W:2748B B:3060B
DTM with compression:
24ms W:2746B B:3060B
- 
				hgm  
- Posts: 28395
- Joined: Fri Mar 10, 2006 10:06 am
- Location: Amsterdam
- Full name: H G Muller
Re: Generate EGTB with graphics cards?
That is fast! (About twice the speed of the "PrettyFast" generator on my desktop.)
How is DTM different from DTC, in KBNK? And how can the times with compression be shorter than those without? I suppose you compress after the building is complete. Or is this just the extra time it takes to compress?
I guess for KBNK the building can be done entirely in (level-3) cache; even with one byte per position (as FairyGen uses) it would only be slightly larger than 2MB. A bitmap of it would be only 256KB, and even fit in the level-2 cache of modern Intel CPUs. But then you would need separate bitmaps for the won positions and each DTM/DTC.
In either case, the memory bottleneck would not occur yet, as DRAM would not be used at all. It is conceivable that multithreading would help when working from cache, as the bandwidth of level-3 to level-2 cache data traffic is orders of magnitude larger than DRAM to CPU, and not likely to be a bottleneck.
			
			
									
						
										
						How is DTM different from DTC, in KBNK? And how can the times with compression be shorter than those without? I suppose you compress after the building is complete. Or is this just the extra time it takes to compress?
I guess for KBNK the building can be done entirely in (level-3) cache; even with one byte per position (as FairyGen uses) it would only be slightly larger than 2MB. A bitmap of it would be only 256KB, and even fit in the level-2 cache of modern Intel CPUs. But then you would need separate bitmaps for the won positions and each DTM/DTC.
In either case, the memory bottleneck would not occur yet, as DRAM would not be used at all. It is conceivable that multithreading would help when working from cache, as the bandwidth of level-3 to level-2 cache data traffic is orders of magnitude larger than DRAM to CPU, and not likely to be a bottleneck.
- 
				noobpwnftw
- Posts: 694
- Joined: Sun Nov 08, 2015 11:10 pm
- Full name: Bojun Guo
Re: Generate EGTB with graphics cards?
They are time taken for each separate step, DTM is a bit faster because it reuses WDL tables and DTC usually requires more iterations.
Note that almost a half of the time are spent on compression because they will quickly become huge without it. You need also consider the time needed to decompress sub-tables, making it less and less attractive for a GPU port.
			
			
									
						
										
						Note that almost a half of the time are spent on compression because they will quickly become huge without it. You need also consider the time needed to decompress sub-tables, making it less and less attractive for a GPU port.
- 
				hgm  
- Posts: 28395
- Joined: Fri Mar 10, 2006 10:06 am
- Location: Amsterdam
- Full name: H G Muller
Re: Generate EGTB with graphics cards?
Just to make absolutely sure there is no miscommunication: with KBNK you mean the Chess end-game. Not Xiangqi KHEK, right?
			
			
									
						
										
						- 
				noobpwnftw
- Posts: 694
- Joined: Sun Nov 08, 2015 11:10 pm
- Full name: Bojun Guo
Re: Generate EGTB with graphics cards?
There! I meant Xiangqi tablebases, I have never wrote a generator for chess.
			
			
									
						
										
						- 
				hgm  
- Posts: 28395
- Joined: Fri Mar 10, 2006 10:06 am
- Location: Amsterdam
- Full name: H G Muller
Re: Generate EGTB with graphics cards?
Ah, you nearly fooled me!   
 
But KHEK only has 25K positions, with 2-fold symmetry (9*9*7*90/2), instead of 2M for KBNK with 8-fold symmetry (64*64*64*64/8). That is 80 times smaller.
So it is not so fast at all: in stead of 2x faster than Marcel's "PrettyFast" generator it is 40 times slower. And 4 times slower than FairyGen.
			
			
									
						
										
						 
 But KHEK only has 25K positions, with 2-fold symmetry (9*9*7*90/2), instead of 2M for KBNK with 8-fold symmetry (64*64*64*64/8). That is 80 times smaller.
So it is not so fast at all: in stead of 2x faster than Marcel's "PrettyFast" generator it is 40 times slower. And 4 times slower than FairyGen.
- 
				hgm  
- Posts: 28395
- Joined: Fri Mar 10, 2006 10:06 am
- Location: Amsterdam
- Full name: H G Muller
Re: Generate EGTB with graphics cards?
Perhaps KHEK is too small to be representative: with 25KB the entire EGT would easily fit in the level-2 cache, which is very fast. Could you also give me the building times for KRCKR? This has 59M positions. (Half of that when using symmetry. Does your generator make use of this symmetry?) This is much bigger than any cache, and should thus be more representative for those builds that rely on DRAM, such as Pawnless 5-men (128M positions) in Chess. FairyGen takes 104 sec for generating the (generally won) KRNKN (including all sub-tables), which should be representative. (KBBKN takes 72 sec, but the Bishop color binding causes that nothing happens at all in that half with like Bishops, which is not very representative.) A general draw like KRNKR takes only 42 sec. But I guess it is mostly the number of won positions that counts, for the generation time.
BTW, the 'Cloud Interface' you gave the link to indeed takes account of perpetual chasing, in KRHAEEKR. And of the fact that one-check, one-chase is allowed, in KRHEEKR. So there must exist generators that elaborate on what the paper describes.
			
			
									
						
										
						BTW, the 'Cloud Interface' you gave the link to indeed takes account of perpetual chasing, in KRHAEEKR. And of the fact that one-check, one-chase is allowed, in KRHEEKR. So there must exist generators that elaborate on what the paper describes.