Thank you!

The generator is mainly fast because it uses magic bitboards. It also uses the DeBruijn bit-scan heavily, and a few bit-twiddling tricks from "Hacker's Delight."
I was originally compiling the code with g++, using the "-g" flag so that I could debug with gdb. The generator was only making 40,000,000 leaf moves per second, so I tried extra hard to optimize what I could.
I hinted "inline" for my small functions as much as possible (I wrote some assembly code last semester and it rubbed off on me lol).
I learned about a few of the uses of the "constexpr" keyword in early July. As a result, "Move" is a literal type, and the generator relies on lots of global constexpr tables to map squares to masks and exploit my laptop's big caches. It also uses pointers to constexpr aggregate-structs with "Alliance" defaults (directions, castling information, etc.).
I liked seeing the pretty hex tables in Leela, so I decided to make some of my own. I think that many of these tables really paid off! I noticed that the Stockfish engineers decided to calculate and verify magics at runtime, while the Leela engineers printed their best magics and stashed them in a table. Leela's magic initialization time is definitely faster than Stockfish's. I feel like the Stockfish authors probably had a reason for their design, and I'm probably just missing it...
I avoided heap allocation (except when building the magic database, however, because it wasn't time-critical, and it seemed natural to make the magic entries immutable).
I took lots of notes on Stockfish and imitated its template use, taking care of as much logic as possible during compile-time. Lots of the functions and methods take enum constants as template parameters and rely on a single conditional in the caller to control flow. I believe templating like this has the same effect as writing a separate function (lots of redundant code) for each new parameter (or new combination), and I think it helps to mitigate the (admittedly rare) possibility of pipeline stalls due to branching.
I also separated pinned pieces from free pieces during generation, as H.G.M. suggested, and I used a path mask to filter moves when the king is in single-check. Additionally, the generator considers only king moves whenever the king is in double-check.
Anyhow, About a week ago, I took a look at the g++ assembly output for the generator and noticed that the compiler wasn't inlining my functions! I did some Pro Googling™ and replaced the "-g" flag with "-O3." I re-made the executable and ran it in bash. The generator gained 140,000,000+ leaf moves per second! It totally blew me away lol. So... The moral to this story is to avoid using the "-g" flag with g++ if you aren't debugging. But most people already know that because they aren't dumb like me lol.
** I am still fairly new to C++, so please correct me if any of this is wrong (or reckless). I don't want to spread misinformation or bad practices, and I'd really like to learn how to do things the right way. **