I am not sure how exactly this was supposed to work. I know that there is a representation where the 'occupied' information is stored per ray, and you have board-size tables that indicate which 4 rays intersect a square. A change in occupancy then has to be recoded in 4 different rays, in a way very similar to rotated bitboard. The main differene is that in the latter parallel rays are packed into the same machine word. When you don't do that, you can afford much larger boards. This method is therefore frequently used in Xiangqi engines, which need a 9x10 board (and only needs 2 rays per square, as there are no diagonally moving sliders).
In principle this woud work even for 32x32 boards on a 32-bit architecture, but of course to use the occupancy of a ray as the index in a lookup table puts a practical limit on how many bits of the word can be used. But it still can give a speedup over a plain mailbox board scan to simply extract the moves from the rays separately. You would need both BSF and BSB in that case, but this can be avoided by duplicating the occupancy info for the ray in a single word (so you could go to 16x16 boards in 32-bit, or 18x18 actually, since there is no need to store the edge occupancy) in the inverted direction, or using 8 rays through each square (in which case you could go again to 34x34).
Unfortunately Taikyoku Shogi needs 36x36...
So for now I am stuck with the board scan. But this can also be sped up by keeping a count for each square of the number of attacks that is coming from each of the 8 directions, as 3-bit fields in a single word (with still a byte to spare for Knight attacks or similar info). Then you can suppress scans in directions from which there are no listed attacks.