10K is a counsel of perfection, for detection of tiny differences. 500 games will pick up important differences.10K games! The computer that I'm using to run my tests isn't all that powerful. Currently on low depth it can finish a game in 2.5 minuets; so 10K games would take me about 2 weeks!
It's not clear how advanced your engine is in 'big-ticket' algorithmic features (TT, NMP...). Without these and a decent move generator, the engine would be slow; testing would be limited to very shallow searches indeed. It's not worth tuning anything under these conditions, because it will likely have to be redone later.
Many people think it better to test an engine against a completely different one, not to test a base version against a slight modification. The reference engine should be better than yours. (If you and your opponent only have pikes as weapons, you won't learn how to defend against bows and arrows, let alone machine guns). By trying various free engines, you quickly find one that plays 150--250 elo better than yours, at a suitable time control. A counsel of perfection, irrelevant to most of us, is to test against a huge suite of engines.
To organise test matches I recommend cutechess-cli, with opening positions taken from gaviota-starters.pgn. This has more than enough positions for practical testing (max 2400 games; 4800 with cutechess-cli -repeat option).
Robert P.