Fancy magic - find me the code for this pattern.

Discussion of chess software programming and technical issues.

Moderator: Ras

Cardoso
Posts: 363
Joined: Thu Mar 16, 2006 7:39 pm
Location: Portugal
Full name: Alvaro Cardoso

Re: Fancy magic - find me the code for this pattern.

Post by Cardoso »

Thank you for your work.
Daniel could you please post just the code for rook move generation (I don't need the scoring function).
I mean the one that doesn't need the big usual lookup table.

thanks,
Álvaro
dangi12012
Posts: 1062
Joined: Tue Apr 28, 2020 10:03 pm
Full name: Daniel Infuehr

Re: Fancy magic - find me the code for this pattern.

Post by dangi12012 »

Look at this beauty!

Fast on architectures where array lookups dont come cheap. Fast on architectures that dont have fast multiplication.
Fast on architectures that do have fast multiplication because clang will recompile all of this into the appropriate imul when its faster.

Faster than fancy magic fixed shift if we allow popcounts of 1 or 2 to also exist.
You can clearly see the pattern in the shifts. This pattern can be solved thats what I know now.

If you dont take advantage of the pattern you can still see magics that are shifted by 0 which is also free because m << 0 is a nop.

Problem is with magic numbers there are uncountably many patterns and ideas to take advantage of. But the brainworm has been defeated. For the cost of 666USD and another month of thinking besides other work on my part.

Code: Select all

uint64_t rook(int sq, uint64_t occ) { 
   uint64_t m = occ | rook_masks[sq];    
   switch(sq) {    
      case 0: return (attack_table_rook + 84)[((m << 14) + (m << 39) + (m << 55)) >> 46]; 
      case 1: return (attack_table_rook + 87)[((m << 13) + (m << 36) + (m << 54)) >> 46]; 
      case 2: return (attack_table_rook + 78)[((m << 12) + (m << 37) + (m << 55)) >> 46]; 
      case 3: return (attack_table_rook + 60)[((m << 11) + (m << 36) + (m << 55)) >> 46]; 
      case 4: return (attack_table_rook + 66)[((m << 10) + (m << 35) + (m << 55)) >> 46]; 
      case 5: return (attack_table_rook + 72)[((m << 9) + (m << 34) + (m << 55)) >> 46]; 
      case 6: return (attack_table_rook + 0)[((m << 8) + (m << 33) + (m << 56)) >> 46]; 
      case 7: return (attack_table_rook + 21)[((m << 7) + (m << 32) + (m << 55)) >> 46]; 
      case 8: return (attack_table_rook + 123)[((m << 15) + (m << 30) + (m << 47)) >> 46]; 
      case 9: return (attack_table_rook + 345)[((m << 13) + (m << 24) + (m << 46)) >> 46]; 
      case 10: return (attack_table_rook + 120)[((m << 13) + (m << 28) + (m << 47)) >> 46]; 
      case 11: return (attack_table_rook + 93)[((m << 10) + (m << 27) + (m << 46)) >> 46]; 
      case 12: return (attack_table_rook + 75)[((m << 9) + (m << 26) + (m << 46)) >> 46]; 
      case 13: return (attack_table_rook + 30)[((m << 8) + (m << 25) + (m << 46)) >> 46]; 
      case 14: return (attack_table_rook + 0)[((m << 9) + (m << 27) + (m << 49)) >> 46]; 
      case 15: return (attack_table_rook + 9)[((m << 7) + (m << 30) + (m << 46)) >> 46]; 
      case 16: return (attack_table_rook + 216)[((m << 14) + (m << 19) + (m << 39)) >> 46]; 
      case 17: return (attack_table_rook + 390)[((m << 13) + (m << 38) + (m << 63)) >> 46]; 
      case 18: return (attack_table_rook + 912)[((m << 8) + (m << 21) + (m << 40)) >> 46]; 
      case 19: return (attack_table_rook + 618)[((m << 3) + (m << 22) + (m << 41)) >> 46]; 
      case 20: return (attack_table_rook + 240)[((m << 11) + (m << 22) + (m << 39)) >> 46]; 
      case 21: return (attack_table_rook + 168)[((m << 2) + (m << 19) + (m << 41)) >> 46]; 
      case 22: return (attack_table_rook + 150)[((m << 2) + (m << 17) + (m << 40)) >> 46]; 
      case 23: return (attack_table_rook + 102)[((m << 8) + (m << 22) + (m << 39)) >> 46]; 
      case 24: return (attack_table_rook + 24)[((m << 13) + (m << 30) + (m << 46)) >> 46]; 
      case 25: return (attack_table_rook + 21)[((m << 14) + (m << 31) + (m << 53)) >> 46]; 
      case 26: return (attack_table_rook + 3)[((m << 11) + (m << 29) + (m << 50)) >> 46]; 
      case 27: return (attack_table_rook + 12)[((m << 11) + (m << 31) + (m << 52)) >> 46]; 
      case 28: return (attack_table_rook + 21)[((m << 10) + (m << 31) + (m << 51)) >> 46]; 
      case 29: return (attack_table_rook + 363)[((m << 10) + (m << 32) + (m << 48)) >> 46]; 
      case 30: return (attack_table_rook + 39)[((m << 9) + (m << 32) + (m << 56)) >> 46]; 
      case 31: return (attack_table_rook + 33)[((m << 8) + (m << 31) + (m << 53)) >> 46]; 
      case 32: return (attack_table_rook + 168)[((m << 7) + (m << 23) + (m << 46)) >> 46]; 
      case 33: return (attack_table_rook + 1836)[((m << 5) + (m << 24) + (m << 46)) >> 46]; 
      case 34: return (attack_table_rook + 225)[((m << 8) + (m << 24) + (m << 37)) >> 46]; 
      case 35: return (attack_table_rook + 144)[((m << 7) + (m << 23) + (m << 36)) >> 46]; 
      case 36: return (attack_table_rook + 111)[((m << 3) + (m << 25) + (m << 37)) >> 46]; 
      case 37: return (attack_table_rook + 96)[((m << 2) + (m << 25) + (m << 44)) >> 46]; 
      case 38: return (attack_table_rook + 51)[((m << 2) + (m << 24) + (m << 41)) >> 46]; 
      case 39: return (attack_table_rook + 90)[((m << 6) + (m << 22) + (m << 31)) >> 46]; 
      case 40: return (attack_table_rook + 6)[((m << 15) + (m << 38) + (m << 53)) >> 46]; 
      case 41: return (attack_table_rook + 27)[((m << 14) + (m << 37) + (m << 61)) >> 46]; 
      case 42: return (attack_table_rook + 186)[((m << 13) + (m << 19) + (m << 36)) >> 46]; 
      case 43: return (attack_table_rook + 8007)[((m << 12) + (m << 35) + (m << 42)) >> 46]; 
      case 44: return (attack_table_rook + 17133)[((m << 1) + (m << 16) + (m << 34)) >> 46]; 
      case 45: return (attack_table_rook + 1986)[((m << 0) + (m << 19) + (m << 34)) >> 46]; 
      case 46: return (attack_table_rook + 1041)[((m << 3) + (m << 17) + (m << 32)) >> 46]; 
      case 47: return (attack_table_rook + 180)[((m << 2) + (m << 17) + (m << 39)) >> 46]; 
      case 48: return (attack_table_rook + 3)[((m << 6) + (m << 29) + (m << 46)) >> 46]; 
      case 49: return (attack_table_rook + 0)[((m << 7) + (m << 30) + (m << 53)) >> 46]; 
      case 50: return (attack_table_rook + 0)[((m << 7) + (m << 29) + (m << 52)) >> 46]; 
      case 51: return (attack_table_rook + 0)[((m << 7) + (m << 28) + (m << 51)) >> 46]; 
      case 52: return (attack_table_rook + 3)[((m << 6) + (m << 26) + (m << 51)) >> 46]; 
      case 53: return (attack_table_rook + 108)[((m << 9) + (m << 26) + (m << 44)) >> 46]; 
      case 54: return (attack_table_rook + 6)[((m << 8) + (m << 33) + (m << 56)) >> 46]; 
      case 55: return (attack_table_rook + 12)[((m << 7) + (m << 32) + (m << 53)) >> 46]; 
      case 56: return (attack_table_rook + 0)[((m << 0) + (m << 14) + (m << 39)) >> 46]; 
      case 57: return (attack_table_rook + 45)[((m << 0) + (m << 13) + (m << 38)) >> 46]; 
      case 58: return (attack_table_rook + 33)[((m << 0) + (m << 12) + (m << 37)) >> 46]; 
      case 59: return (attack_table_rook + 12)[((m << 0) + (m << 11) + (m << 36)) >> 46]; 
      case 60: return (attack_table_rook + 6)[((m << 0) + (m << 10) + (m << 35)) >> 46]; 
      case 61: return (attack_table_rook + 27)[((m << 0) + (m << 9) + (m << 34)) >> 46]; 
      case 62: return (attack_table_rook + 45)[((m << 0) + (m << 25) + (m << 48)) >> 46]; 
      case 63: return (attack_table_rook + 54)[((m << 0) + (m << 15) + (m << 33)) >> 46]
      default: unreachable();
   }
}

Other updates and the results of the competition so far are here: https://www.talkchess.com/forum3/viewto ... &start=160

Offtopic:
I still own the community the publication of my fast NNUE impl.
I still own the community the publication of advanced Galoisfield movegen
I still own the community the publication of Gigantua V2 taking advantage of SIMD board - SIMD movegen and SIMD moveapplication for twice the PERFT of the one in my signature but without any templates. Repeating: No template machinery - twice the speed of Gigantua - no bulk counting
I still own the community the publication of NNUEv5 single file.
I still own Leorik the finishment of the C# NNUE codepath.
I still own the community a better explanation for the hypercube move algo. Its still a special one that has a cost of O(1) for the first square but all other squares are solved afterwards. Meaning you pay a cost for looking up a queen - but can lookup queens on the other 63 squares for free. With any other algorithm you cannot do that you always need to calculate something. Hypercube especially has been grossly misunderstood.

I will move all of this to a blog and a dedicated wiki page or a book. But I will of course also always link here in this forum and post here too.

Also the publication of my abstract syntax tree sifter where we can find alternative forms to short c++ code.
Essentially trying all permutations of 8 different C++ symbols like (paramter: square, +, -, >>, 1) etc.. for a known snippet.
Was handy here for instance: https://www.chessprogramming.org/Obstruction_Difference
Worlds-fastest-Bitboard-Chess-Movegenerator
Daniel Inführ - Software Developer
dangi12012
Posts: 1062
Joined: Tue Apr 28, 2020 10:03 pm
Full name: Daniel Infuehr

Re: Fancy magic - find me the code for this pattern.

Post by dangi12012 »

Finally also found my old ssd. Software written in 2019 shows the magic patterns when each black pixel is a taken slot and each white pixel is empty.
A graphical view of the lookup table fancy magic is using.

Anyways I am happy with how whole this project turned out. Cost me 666USD after all.
Fancy magic fixed shift can be rewritten as this per square:
old:

Code: Select all

*(attack_table_rook + 75 + ((70368811287040ull * m) >> 46));
simple popcount3 mult:

Code: Select all

*(attack_table_rook + 75 + ((m << 9) + (m << 26) + (m << 46)) >> 46); 
as well as a correlated fancy magic where we essentially only need to store 8 and16 magics instead of 64 which I wont repeat again.
Happy happy happy and here a image of the form.

Now imagine 64 of those images fitting into the same space. All of a sudden its VERY white.
Image

As you can see the optical impression is mostly black. I wish I could share a gif here you could see a random noise along with clear lines dancing around in there which corresponds to one valid magic number for a particular square. line angles come from aliasing with the width of the image, the fact that lines exist show us that the hash function of fancy magic is not really random too which is expected since a single imul shift is really the bare possible minimum.

Special goodness comes from the fact that C# allows us to view a memory mapped image directly or alias a pinned array into the gui.

Code: Select all

public Form1()
{
	InitializeComponent();
	reference = new Integrations("..\\..\\..\\CUDA_CANDIDATES");
	reference.AddTxt("..\\..\\..\\x64\\Solve_17Bits_64c_3b_Dense");
	handle = GCHandle.Alloc(pixel, GCHandleType.Pinned);
	scan0 = handle.AddrOfPinnedObject();
	bmp = new Bitmap(512, 256, 512, System.Drawing.Imaging.PixelFormat.Format8bppIndexed, scan0);
	System.Drawing.Imaging.ColorPalette pal = bmp.Palette;
	for (int i = 0; i <= 255; i++)
	{
		// create greyscale color table
		pal.Entries[i] = Color.FromArgb(i, i, i);
	}
	bmp.Palette = pal;
}

Gotta restart my old system and try if I can crosscompile with clang for a Pentium-II system version somehow. I would LOVE to see C++20 Clang16 compiled chesscode on there and compare with MSVC C++ ala 1998.
Worlds-fastest-Bitboard-Chess-Movegenerator
Daniel Inführ - Software Developer