Simple self test code

sje · Post by **sje** » Fri Mar 13, 2009 3:40 pm

For an inexpensive aid to peace of mind, Symbolic has a self test routine that runs just after program initialization. At the moment this includes only pathway enumerations, but these are sufficient to catch many kinds of programming errors. The program counts moves at a rate of about 50 MHz, so the entire test takes only a tenth of a second.

(Some positions are from http://www.chessbox.de/Compu/schachzahl4_e.html)

Code: Select all

ui64 Pos&#58;&#58;SimpleEMP&#40;const ui depth&#41;
&#123;
  ui64 count;

  if &#40;depth == 0&#41; count = 1;
  else
  &#123;
    if &#40;depth == 1&#41; count = CountMoves&#40;);
    else
    &#123;
      GMVec gmvec&#40;*this&#41;;

      count = 0;
      for &#40;ui i = 0; i < gmvec.GetCount&#40;); i++)
      &#123;
        Execute&#40;gmvec&#91;i&#93;); count += SimpleEMP&#40;depth - 1&#41;; Retract&#40;);
      &#125;;
    &#125;;
  &#125;;
  return count;
&#125;

bool TensorTask&#58;&#58;SelfTestMPGAux&#40;
  const char *fenstr, const ui64 c1, const ui64 c2, const ui64 c3, const ui64 c4&#41; const
&#123;
  bool pass;
  Pos pos;

  if (!pos.LoadFromString&#40;fenstr&#41;) pass = false;
  else
    pass =
      &#40;pos.SimpleEMP&#40;1&#41; == c1&#41; && &#40;pos.SimpleEMP&#40;2&#41; == c2&#41; &&
      &#40;pos.SimpleEMP&#40;3&#41; == c3&#41; && &#40;pos.SimpleEMP&#40;4&#41; == c4&#41;;
  if (!pass&#41; std&#58;&#58;clog << "Tensor&#58;&#58;SelfTestMPGAux&#58; Failed position&#58; " << fenstr << '\n';
  return pass;
&#125;

bool TensorTask&#58;&#58;SelfTestMPG&#40;void&#41; const
&#123;
  bool pass = true;

  if &#40;pass&#41; pass = SelfTestMPGAux&#40;"2qrr1n1/3b1kp1/2pBpn1p/1p2PP2/p2P4/1BP5/P3Q1PP/4RRK1 w - - 0 1", 44, 833, 35770, 766147&#41;;
  if &#40;pass&#41; pass = SelfTestMPGAux&#40;"8/2p5/3p4/KP5r/1R3p1k/8/4P1P1/8 w - - 0 1", 14, 191, 2812, 43238&#41;;
  if &#40;pass&#41; pass = SelfTestMPGAux&#40;"8/3K4/2p5/p2b2r1/5k2/8/8/1q6 b - - 0 1", 50, 279, 13310, 54703&#41;;
  if &#40;pass&#41; pass = SelfTestMPGAux&#40;"8/7p/p5pb/4k3/P1pPn3/8/P5PP/1rB2RK1 b - d3 0 1", 5, 117, 3293, 67197&#41;;
  if &#40;pass&#41; pass = SelfTestMPGAux&#40;"8/PPP4k/8/8/8/8/4Kppp/8 w - - 0 1", 18, 290, 5044, 89363&#41;;
  if &#40;pass&#41; pass = SelfTestMPGAux&#40;"r3k2r/p1ppqpb1/bn2pnp1/3PN3/1p2P3/2N2Q1p/PPPBBPPP/R3K2R w KQkq - 0 1", 48, 2039, 97862, 4085603&#41;;
  if &#40;pass&#41; pass = SelfTestMPGAux&#40;"rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1", 20, 400, 8902, 197281&#41;;
  return pass;
&#125;

steffan · Post by **steffan** » Sat Mar 14, 2009 12:26 am

Testing the correctness of an implementation can actually speed up the programming process rather than slow it down, as this affords the programmer greater confidence that incorrect changes will be quickly flagged. Writing unit tests is one way (of several) of testing correctness that gives you very quick feedback on the validity of your own code as you write it. As more code is created, more unit tests are added to the test suite. Of course it is infeasible to create test cases for every scenario, but if any test case fails that is a sure indication of a bug. Some discipline is required to ensure the program under development is suitably organised as testable units, each having good cohesion and loose coupling to other units.

When I program in Java, I use the JUnit framework for writing my unit test cases, and a build system (Maven 2) which automatically runs and reports my unit tests every time I do a build. Just one test case failure is enough to flag the build as FAILED. A test coverage tool (EMMA) is used to measure test coverage, a proportion of program lines executed during the test run versus total program lines written.

Here is a snippet of unit test code for a Bitboard class:

Code: Select all

package saw.chess.engine.core;

import static org.junit.Assert.*;
import org.junit.Test;
import saw.chess.engine.core.Bitboard;
import static saw.chess.engine.core.Bitboard.*;

/**
 * Test class for &#123;@link saw.chess.engine.core.Bitboard&#125;
 */
public class BitboardTest &#123;

    /** Test method for &#123;@link saw.chess.engine.core.Bitboard#shiftLeft&#40;long&#41;&#125;. */
    @Test
    public void testShiftLeft&#40;) &#123;
        assertEquals&#40;0L, shiftLeft&#40;A1&#41;);
        assertEquals&#40;0L, shiftLeft&#40;A8&#41;);
        assertEquals&#40;G1, shiftLeft&#40;H1&#41;);
        assertEquals&#40;G8, shiftLeft&#40;H8&#41;);
        assertEquals&#40;F4 | A2, shiftLeft&#40;A8 | G4 | B2&#41;);
    &#125;
...
&#125;

Cheers,
Steffan

Aleks Peshkov · Post by **Aleks Peshkov** » Sat Mar 14, 2009 8:31 am

Unit testing methodology requires specific design of a program. While it enforce generally better design, it seems to hurt performance. That is what most chess programmers fear more then bugs.

Onno Garms · Post by **Onno Garms** » Sat Mar 14, 2009 5:42 pm

I consider unit testing essential for a good program. It's not much fun to maintain a program that has no unit tests that can be run automatically at every time you want.

Some more hints from my point of view:
- Do not run the unit tests automatically at startup. Sure they should run quickly, but not that quickly. Have some internal option to run the tests.
- If you have a strong non-open-source engine, do not compile the unit tests into your released executable. They might help a lot to reverse engineer. Also they needlessly blow up the size of your executable and might introduce dependencies on test libraries.
- JUnit may be standard, but avoid to use CppUnit. I made that mistake twice, once at work and once in chess programming. At work we switched to boost::test, in chess programming I switched to a slim selfmade framework.

mcostalba · Post by **mcostalba** » Sat Mar 14, 2009 7:20 pm

For Stockfish development, becasue most of the patches till now are rewrites and/or optimization that do not affect functionality, I always test the program on a set of positions searched at fixed depth.

If the node count changes before and after the patch then something is wrong.

I consider this kind of cross check ABSOLUTLY mandatory, without this there is little point in devloping further becuase it becomes too much frustating and yelds you to just make little tweaks and avoiding cleanup stuff that is what I am most interested in.

sje · Post by **sje** » Sat Mar 14, 2009 8:28 pm

Symbolic has a random game generator that produces statistics of game termination results. The test is usually run manually and left to stew overnight. With a random game generation rate of about 4.3 KHz, a hundred million games can be played over eight hours using a single thread.

The idea is to have a simple way to test game termination detection code, particularly repetition draw detection that can be tricky. For a sufficiently long run, the repetition draw rate should show up as about 2.56 percent.

Some data from a recent ten million game run:

Code: Select all

Checkmate 1529265 &#40;0.152926&#41;
FiftyMoves 1997190 &#40;0.199719&#41;
Insufficient 5608448 &#40;0.560845&#41;
Repetition 254376 &#40;0.0254376&#41;
Stalemate 610721 &#40;0.0610721&#41;

sje · Post by **sje** » Sat Mar 14, 2009 8:33 pm

Onno Garms wrote:- Do not run the unit tests automatically at startup. Sure they should run quickly, but not that quickly. Have some internal option to run the tests.

Having a brief test run EVERY time at start-up works against the all too common operator error of forgetting to set a test option.

wgarvin · Post by **wgarvin** » Sun Mar 15, 2009 6:03 am

sje wrote:
Onno Garms wrote:- Do not run the unit tests automatically at startup. Sure they should run quickly, but not that quickly. Have some internal option to run the tests.
Having a brief test run EVERY time at start-up works against the all too common operator error of forgetting to set a test option.

I think you're both right. I agree with Onno that if you are doing TDD and have a whole bunch of unit tests, the right time to run them is when the engine is compiled, not when it gets run. However, not everybody uses unit tests. This startup test sounds especially useful for engine authors who don't have unit test suites that they run during every compile. It is not a complete test suite, it is just a quick sanity check based on perft. It rapidly exercises a lot of important functionality such as the move generator and making and unmaking moves. If you change something and that stuff no longer works correctly, perft will probably catch it--and you will want to stop and fix that bug before doing anything else. Far better to find that out when the engine refuses to start, then after a long debugging session trying to figure out why it missed an obvious legal move (or even worse, thinks it could move itself into check or something). It seems well worth the 1/10th of a second it adds to the startup time!

Onno Garms · Post by **Onno Garms** » Sun Mar 15, 2009 4:36 pm

I have a level option for my tests. The default level needs a few minutes to run all tests. The enhanced level 1 needs much longer but still below an hour. Also I can replace the pgn input file for a specific test by a larger one to have many runs of one specific test overnight. Yes I did have bugs that were detected after hours of CPU time.

Simple self test code

Simple self test code

Re: Simple self test code

Re: Simple self test code

Re: Simple self test code

Re: Simple self test code

More simple self test code

Re: Simple self test code

Re: Simple self test code

Re: Simple self test code