Have you tried with memcpy to populate the undo struct? I have started but gived up, not enough inertia.
I don't think I have done any low-level optimization anywhere in the program, apart from trying to write the code in a simple and straightforward way and not doing too much unnecessary work. I doubt that a memcpy would be faster, though. The compiler should be able to optimize the assignments and a memcopy to the same code anyway.
This is a bit strange I got some less here. I will check better.
It might depend on the position. I did a "./glaurung bench 128 1" from the command line, which runs 15 benchmark positions for 60 seconds each, with 128 MB TT and one search thread.