I did conclusevely solve that topic for my mind to be at piece which took 2 years or so. I exceeded the denseness of existing published fancy magic numbers but not by much (a few %) and much denser is not feasible.
At first I used your scheme of trying rnd64() & rnd64() & rnd64(). I had this generate around 1E8 solutions (or was it 1E9 cant remeber) for any given square so that magic * (occ | mask) >> n does not overlap - or when it does it does so with the same value. First realisation - directly hashing into ONE big power of 2 table for all squares is much worse than hashing into buckets of 2^(> number of relevant bits) which is 12.
A second realisation is that even in this simple scheme negative offsets are possible because we index into magic * (occ | mask) >> n in an offset and that is a free parameter and put those into buckets of 2^12.
This multiplication can consistently generate numbers outside of the bounds of the buckets but with negative offsets they all fall into that range.
Third realisation: We use (occ | mask) and not (occ &~ mask) because the many more '1' help with multiplication of the lower rook corners. I found that one on my own but saw it was common knowledge later.
Fourth realisation:
And this on is published right now for the first time: rnd64() & rnd64() & rnd64() is not what you want. What you do here is getting a random number with on average 1/2*1/2*1/2 * 64 bits set - which is 8. This is a binomial sample around the number of 8 bits set. After a few 100 Million candidates you will get more duplicates on the binomial samplingcurve. After a few billion distinct numbers you are essentially not trying any new candidates - so you put them in a hashset OR:
You can exhaustively enumerate all 64bit numbers with 0..8 set bits which is all permutations of ncr(64, 8) + nrc(64,7) + ...
which gives you many many good candidates for almost free. 9-16bits have to be sampled but now with just rnd64() & rnd64() - but you can use c++ robin_hood hashset - which is very fast.
Getting it together:
Once you have many hundred of million known self intersecting (but self intersection on same numbers that hash into 0..2^12) with a known offset you just pick one - then the next one and now you can just iterate over all of them. Out of 100Million for square 2 you pick the best one.
At least that is what i thought - but that is not the case also. You need to order them so that the harder squares come first. That is the distribution of bits is non random and very packed. (That is the case for the rook corner squares).
With this sorted square list you do the steps above pick 1 - iterate over all 1E8 candidates - pick the best.
You will find that the table is very very dense and I remember for example 4096 self intersecting down to 1912 distinct slots and that intersecting more with other buckets in the bigger list etc.
TLDR:
You need to implement a sliding window for buckets of 4096 known taken slots and put them into the first slot they fit into. The trick is to have many many known buckets (which is candidate + initial offset which can be negative) and increment that index by one until you get above 109000 or your target total table size.
Of course even when slotting them into the first free slot you run into local optima where the table becomes dense at the bottom but later squares dont fit anymore - showing that not even the first free slot is correct. My optimal solution had index += 17 (a small prime) insted of index++ and take the first fitting slot there to circumvent early over density and missmatching upper tables.
High level background:
I got really into that topic for 1 year before even staring on movegeneration - generated a visalisation tool that showed bmp images of the seeds (as single black pixels) and showed how adding more into that table makes the image blacker and blacker until at the end its only a few white pixels.. That proved to me visually that the existing is really very dense with many many overlaps. I finally got a few % below the published numbers and was happy with that.
At one point I had 400+GB of square candidates to pick and match as dense as possible. - bishops are so small and easy to distribute into the table its not worth mentioning - and even the rooks are 50% "easy" squares. This topic is also what sparked my interest in creating faster movegenerators than fancy magic - going for faster pext and now in 2023 we have galois field, kiss, pext and fancy magic. Many movegenerators were tightly coupled to the engine code - but over the past few years we uncoupled and modernized a whole list of movegens.
Take a look:
https://github.com/Gigantua/Chess_Movegen