Poor mans testing process please

Discussion of chess software programming and technical issues.

Moderators: bob, hgm, Harvey Williamson

Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
lauriet
Posts: 197
Joined: Sun Nov 03, 2013 8:32 am
Contact:

Poor mans testing process please

Post by lauriet » Wed Mar 18, 2020 11:41 pm

Hi all,
I have pretty much no facilities for testing changes in my program for gains or loses.
What is the best compromise that will get me in the ball park? I can, and have a lot of stats, but is there a most logical/reliable way to determine if a change is a plus or a minus ?

Time to depth ?
Node counts ?
Cut offs ?
TT hits.
etc, etc.

Thanks
Laurie (LTchess2)

brianr
Posts: 424
Joined: Thu Mar 09, 2006 2:01 pm

Re: Poor mans testing process please

Post by brianr » Thu Mar 19, 2020 1:14 am

Depending on the current strength of your engine, early changes that result in large improvements (hopefully) are relatively easy to test. As the engine gets stronger, it gets increasingly more difficult to measure smaller improvements. Fast time control games are helpful for search related changes initially, and fixed node games are OK for evaluation changes. Suggest using the Ordo 1.2.6 release. The smaller the improvements become, the harder it becomes to measure them; many more games will be required. Position and test sets can be useful once you get a feel for how your engine behaves, but the best thing will be match games. From time to time also suggest testing your test methodology by matching a copy of the engine against itself to make sure things are still very close to 50/50. I cannot tell you how many times I have messed that up over the years. Finally, self-play is fine, but from time to time play against a pool of opponents. Self-play tends to find things that your engine can exploit and those areas will be very different with a pool of engines.

lauriet
Posts: 197
Joined: Sun Nov 03, 2013 8:32 am
Contact:

Re: Poor mans testing process please

Post by lauriet » Thu Mar 19, 2020 7:05 am

I cant really play 10,000 self play games or that many games against an opponent.
Are the test set positions useful ? Do they work. Are they accurate?
I really need a simple, over arching, ball park test that can help me elliminate dumb ideas and illuminate potention good ideas.

My ideal would be "time to depth" gives an indication of how fast Im moving through nodes.

User avatar
hgm
Posts: 24656
Joined: Fri Mar 10, 2006 9:06 am
Location: Amsterdam
Full name: H G Muller
Contact:

Re: Poor mans testing process please

Post by hgm » Thu Mar 19, 2020 9:19 am

Unfortunately there is no real alternative for playing games. Other methods will do more harm than good.

Upside is that initially you won't need nearly as many games as 10,000 to weed out dumb ideas. 1000 games can be enough.

It is also not clear to me why you cannot play 10,000 games. You could play 10,000 games in a few minutes, if you wanted.

Ratosh
Posts: 77
Joined: Mon Apr 16, 2018 4:56 pm

Re: Poor mans testing process please

Post by Ratosh » Thu Mar 19, 2020 10:40 am

Playing games is the best way to validate improvements, even better if you use SPRT. I'm using OpenBench, and it is pretty easy to setup, start tests and let them running while you do other stuff.

You can have different elo bounds:
You can try to use bigger bounds like [0,10] should be safe to reach a decent CCRL rating. With this bound you have at least 50% chance of a +3 elo patch to pass and all tests should take less than 10k games. Of course small changes should fail, but you have a decent confidence that all passing tests are improvements.

You can have different alpha and beta bounds:
Default SPRT bound is 5%, giving you a 95% confidence that the test was correct. You can set it to 10% and tests will finish faster with 90% confidence. I used 20% until my engine was somewhat strong (both STC and LTC).

Notes:
- Changes have bigger impact on weaker engines (you don't need many games to validate it).
- You don't really need to test 2 different TC to validate a change (especially on a weak engine).
- It is fine to have lower confidence that is a STC improvement, but you want to have a decent confidence that it is a improvement on LTC.

Check this to have an estimation of how many games you need: http://chess-sprt-calc.azurewebsites.net/

User avatar
xr_a_y
Posts: 1189
Joined: Sat Nov 25, 2017 1:28 pm
Location: France

Re: Poor mans testing process please

Post by xr_a_y » Fri Mar 20, 2020 7:22 am

lauriet wrote:
Thu Mar 19, 2020 7:05 am
I cant really play 10,000 self play games or that many games against an opponent.
May I ask what are your constrain ?

Using only xboard/winboard on a single low tech core, you can easily test engine versus engine 10 sec games, 1000 games.
If you are looking for +20/30elo gain, this will be ok, and you can run many of those tests each days (even 20 sec games).

lauriet
Posts: 197
Joined: Sun Nov 03, 2013 8:32 am
Contact:

Re: Poor mans testing process please

Post by lauriet » Fri Mar 20, 2020 7:33 am

I'm afraid I have very limited knowledge and faciitiy to do any "hitech" testing.
I think the best I can do is to use test suites.
I have looked at 'STS'.
Can anyone explain how I can use this to give me a 'Ball Park' idea.
I'm not looking for +/- 10 elo resolution, but would be happy to know if my engine is 1800 or 2200.

brianr
Posts: 424
Joined: Thu Mar 09, 2006 2:01 pm

Re: Poor mans testing process please

Post by brianr » Fri Mar 20, 2020 9:40 am

Read from here and follow the various links:
https://www.chessprogramming.org/Engine_Testing

Henk
Posts: 6417
Joined: Mon May 27, 2013 8:31 am

Re: Poor mans testing process please

Post by Henk » Fri Mar 20, 2020 11:01 am

lauriet wrote:
Wed Mar 18, 2020 11:41 pm
Hi all,
I have pretty much no facilities for testing changes in my program for gains or loses.
What is the best compromise that will get me in the ball park? I can, and have a lot of stats, but is there a most logical/reliable way to determine if a change is a plus or a minus ?

Time to depth ?
Node counts ?
Cut offs ?
TT hits.
etc, etc.

Thanks
Laurie (LTchess2)
They say I am a bad tester. Play one game and see what errors it made. Try correct these errors in your software. Extract positions. Always make sure that errors are reproducable.

User avatar
hgm
Posts: 24656
Joined: Fri Mar 10, 2006 9:06 am
Location: Amsterdam
Full name: H G Muller
Contact:

Re: Poor mans testing process please

Post by hgm » Fri Mar 20, 2020 11:19 am

lauriet wrote:
Fri Mar 20, 2020 7:33 am
I'm afraid I have very limited knowledge and faciitiy to do any "hitech" testing.
I think the best I can do is to use test suites.
I have looked at 'STS'.
Can anyone explain how I can use this to give me a 'Ball Park' idea.
I'm not looking for +/- 10 elo resolution, but would be happy to know if my engine is 1800 or 2200.
Well, you won't ever get to know that from a test suite. You could know it, however, by playing a dozen games.

Post Reply