Until you get the perft counts 100% correct, you can't make any assumption with regarding to the speed of your engine. If you don't have castling, you're omitting two calls to square_attacked(); you need to see if your king is in check, and the square next to the king is not attacked. If you don't do promotions, you omit adding four pieces to your move list; this will obviously make add_move() faster. Without castling and promotions you get less positions to perft through, so you're calling make/unmake less often, which will also mean you call even less square_attacked() less often (you need to check somehow if your move is legal; if you have less moves, you need to check less).
make(), unmake(), square_attacked(), and add_move() are the top four functions used when running perft. Not generating all the possible moves gives you a huge speed advantage.
My advice would be to first finish your move generator and THEN compare with other engines or perft tools.
Oh, and perft needs to be checked in other positions than only in the starting position... search for 'perftsuite.epd' on the internet, and look at this as well: https://www.chessprogramming.org/Perft_Results
Especially the "kiwipete" position catches a lot of mistakes in move generation. I thought my move generation and make/unmake were perfect at the beginning, and I _STILL_ forgot one edge case (one that I have never seen in an over the board game played by myself), which thus gave wrong perft results.
And make no mistake; if your perft results aren't perfect, your engine might very well, at some point, make an illegal move in an engine match and then the GUI will kill it and award the win to your opponent.