Okay, so I fixed a big load of bugs and I'm getting correct perft results both for the perft result tests on the chess programming wiki and the hardcore perft tests E Diaz linked here.
This batch of testing has helped uncover numerous bugs which was awesome

. I intend to introduce a number of unit tests (I already have an extensive system test which tests consistency of cached data vs. calculated results) to make regression testing easier in future.
I am now using GCC 4.7, which has improved speed both for weak and stockfish, and I am getting very close Mnps speeds to Stockfish (e.g. perft 6 from standard initial position is ~ 53 Mnps on my 2.13GHz core 2 duo macbook air).
Next steps - improving the currently terrible hacky search code, create a non-skeletal evaluation function, add UCI support, add a transition table... oh the list goes on

some way from v1, but getting there!