Daniel Shawul wrote:diep wrote:
What i do is a tad more work. I have a function that's looping in the same manner as how the indexation works. So the way how it puts pieces on the board, that's exactly the same as how the indexation works. That's WAY FASTER. then for each lookup you do, you figure out where it is in incremental manners. The things you don't want to do is sophisticated tricks. What you can't avoid is mirroring and n of the same men, as the savings is too huge of that. Koistinen wanted to save also onto the 63 * 62 * bla bla, and just use either 48 for pawns or 64 there,
but that's something you have to test yourself. With a few tables you already get far there and just replace the tables.
For 6 men with no alike peices I have 462x62x61x60x59 = 5.7G positons and naive is 64^6 =64G which is about a saving of 11x times, most of it
Please read what i wrote.
I said you can't do without reducing for kings nor the same man.
if we speak about 4 different pieces after the kings then koistinen proposal was 10 * 64^5
That's quite a lot of overhead as you can see:
So in case of pawnless the Koistinen proposal is to do it in 10 * 64^5 = 10.7G which is nearly factor 2 overhead.
Note i do it with a bit more calculation :
That's 10 * 63 * 62 * 61 * 60 * 64 = 9.1G entries.
this doesn't eat 8GB ram as you store everything in 1 bit an entry a bitmap.
For largest 6 men WITH PAWN you need 64MB ram roughly using this method or something to run well.
You do need to write a caching system then of course which you can also use for the program - and you need that caching system anyway for your programs search, so you can reuse most of that code, and/or make it generic.
If you have enough RAM you can put some of the bitmaps in a RAMDISK,
which will speed you up a tad, yet it's not really ging to be that relevant, it's just a small percentage of total time, as the harddisk reads get prefetched anyway.
Be careful though which filesystem to do this onto.
Now in nalimov format the biggest entry size for a 6 man, with all tricks included is something like a 16.1 G entries or so (didn't really check).
The biggest bitmap in this unoptimized method is:
24 * 63 * 62 * 61 * 60 * 64 = 21.9G entries
So your overhead is just : 21.9G / 16.1G = 36.4%
That's not really much huh?
So for the toughest to generate EGTBs the overhead is limited. It's a tad more for the pawnless though.
Realize there is another overhead that's rewriting the EGTB.
With the caching system it simply doesn't matter which of the previous 4 pieces is the other king, of course easiest is to take the one last piece:
So if we use for example KRBP kq
24 * 63 * 62 * 61 * 60 * 64 = PRBqKk
then you have to write that for the next pass to: PRBqkK
Koistinens proposal which assumes pages to avoid random pages is simply not needed, your caching system already will make things lineair fast; you get the maximum bandwidth out of a harddrive when the pages you lookup randomly at them are big enough. In fact the drive implicitly already is doing this.
However i do not know how modern drives react here. The 64KB pages that i use in Diep's caching system might be maybe too small for the terabyte drives. You might want to experiment there what size pages you need to get the full bandwidth out of the drive.
Such benchmark software is easy to write. If i dig hard i might still have the testprogram i used 12 years ago there.
Be careful to not fall for the pitfall though. Drives are up to 2 times faster at the edges in bandwith than when they store data at the inside. This has a simple explanation, but you might want to ask HGM about that. It's more his expertise to know of that hardware side explanation than mine.
At the time i did need to reformat my drive by the way for this, as NTFS after a while messed up.
The bandwidth was dominated more by NTFS than by hardware.
Using empty drive solved that problem though for me.
I hope you really realize by now that DOING RANDOM LOOKUPS IS NOT A PROBLEM?
If you read 64KB pages from harddrive then you get the full bandwidth out of the harddrive.
You got that message?
So if you need position 101919191 right now, you get from disk the 64KB where this position is inside and then you hope you need soon the other positions in it. Practical this is the case.
You realize what i mean with a caching system now? As i wrote that 10 times now and each time you hammer about 'random lookups'.
There are no random lookups to a drive if you get huge pages simply as you get 99.99% of the full bandwidth out of it then.
The way how to build up the caching system is endless, there is many solutions. Using a FIFO system and binary lookup tree is fastest. Note for 6 men you also can make it O ( 1 ) with a few megabytes of RAM as lookup table. That's what i did create. The simple math is that 5T positions / 64k = 5G / 64 = a bit more than 78M pages of 64KB.
the slowest thing for me was rewriting the kK to Kk and vice versa, as you rewrite every bit there. I didn't optimize this well simply. maybe you'll find some bitboard fiddling trick that can do this FAST. Would be grateful if you shared that one.
coming from the 8x fold symmetry. So you are right that mirroring is a must but the rest are not. I think I will try to use his method for in RAM generation but that would need an 8G ram. My goal was to get 6 men on 32 bit computers with less than 4G ram but if I can get out a lot of speed from generation it will be worth it.
Note that most profilers do not reflect correctly how much you would lose to the RAM when code would be optimized. Don't forget this remark please some time from now. The GCC/valgrind junk is notorious there. Intel's stuff doing a tad better job there.
p.s. doing things backwards is too slow - do things forward it's easier and faster and more bugfree.
Even if I do forward,the RAM access would still be random. If Koinstein's method does the 8 fold symmetry and still "congregates" the positions after we make a move then it forward may be better but I doubt it. Also since retro moves are almost the same as forward moves, they would still benefit from whatever tricks you have for the former. I really don't see how it could be faster. What I do is a combination of both actually. You can save a lot of effort by retro moving lost potitions to won as that will need no verification at all. I also do retro moves from won-positions to lost followed by verification which is time consuming.