
The thought is to store a search tree from a pure alpha-beta search (no pruning - but using normal q-search) up to depth X (let's say 10) for 1000 different positions.
Then, run the search with the change you want to test over the positions, and measuring the resulting # of nodes and evaluation correctness (it's interesting to trade these off - obviously pure alpha beta has perf eval. correctness, but it doesn't make it nearly as deep).
Tuning parameters in the search is the same, except probably tune on a random subset, and measure against the entire set to avoid overfitting.
If you have a tree browser, you can do a diff. between the two searches, and see what effect the pruning actually had on the tree. This was actually the most useful part for me.
Clearly this won't be as good as running 10k games, but it should get you in the right ballpark for parameters, and give a good indicator of how well your search is working.