Usage sprt / cutechess-cli

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

User avatar
Desperado
Posts: 879
Joined: Mon Dec 15, 2008 11:45 am

Re: Usage sprt / cutechess-cli.

Post by Desperado »

Hi, Ilari,

thanks for your effort.

I will do the next tests tomorrow. I already used the -ratinginterval 10, so i will use it again. I will provide any information you need.

Second, i will do the test with the new version 0.7.2 and the version 0.7.1.
I may also check the concurency option and will use 1 instead of 4, and i will repeat it for 4 too.

This should make it more reproducable.(on the fly it might check for an issue in this context / very wild guess of course).

regards
User avatar
Desperado
Posts: 879
Joined: Mon Dec 15, 2008 11:45 am

Re: Usage sprt / cutechess-cli.

Post by Desperado »

Hello, Ilari,

starting the next test, it seems that i track down an issue in context to the given problem. But before i am able to conclude what really is going on,
just a question:

How are "no result" values handled ?

The point is that the engine i used, "Omen0003" for example, is in very early development stage, so there is the posibility that there is a bug an "no result"
games "can" be produced by an invalid move for example.

On the other hand, looking at the pgn file, there is nothing obivious to see.

(Please just ignore the quality of the game :-))

Code: Select all

[Event "?"]
[Site "?"]
[Date "2015.09.06"]
[Round "1"]
[White "Omen0003"]
[Black "Omen0002"]
[Result "*"]
[ECO "A21"]
[Opening "English, Kramnik-Shirov counterattack"]
[PlyCount "92"]
[Termination "unterminated"]
[TimeControl "10000"]

1. c4 {book} e5 {book} 2. Nc3 {book} Bb4 {book} 3. g3 {book} Bxc3 {book}
4. dxc3 {book} d6 {book} 5. Bg2 {book} Nc6 {book} 6. Nf3 {book} h6 {book}
7. Be3 {+0.05/2 0.005s} Nf6 {-0.09/2 0.006s} 8. Nh4 {-0.07/2 0.005s}
e4 {+0.03/2 0.004s} 9. a4 {-0.15/2 0.004s} g5 {+0.13/2 0.007s}
10. Nf3 {-2.36/2 0.006s} exf3 {+2.36/2 0.005s} 11. Bxf3 {-2.60/2 0.005s}
Ne5 {+2.32/2 0.006s} 12. b3 {-2.56/2 0.005s} Ke7 {+2.37/2 0.006s}
13. Qc2 {-2.53/2 0.007s} Nxf3+ {+2.41/2 0.006s} 14. exf3 {-2.43/3 0.004s}
Bd7 {+2.43/2 0.004s} 15. Ke2 {-2.51/2 0.007s} c5 {+2.47/2 0.007s}
16. g4 {-2.60/2 0.006s} Qb6 {+2.56/2 0.004s} 17. a5 {-2.56/2 0.005s}
Qa6 {+2.52/2 0.006s} 18. h3 {-2.56/2 0.005s} Nd5 {+2.52/2 0.005s}
19. Qe4+ {-2.45/2 0.006s} Be6 {+1.45/2 0.006s} 20. Bd2 {-2.52/2 0.006s}
f6 {+2.50/2 0.007s} 21. Kd3 {-2.53/2 0.007s} Rac8 {+1.51/2 0.004s}
22. Kc2 {-2.48/2 0.004s} f5 {+1.43/2 0.003s} 23. Bxg5+ {-1.43/1 0.005s}
hxg5 {+4.01/2 0.006s} 24. gxf5 {-4.01/1 0.004s} Nxc3 {+1.48/2 0.003s}
25. Qxe6+ {+1.12/2 0.005s} Kd8 {-2.16/2 0.004s} 26. Kxc3 {+2.10/2 0.007s}
Re8 {-2.10/2 0.004s} 27. Qf7 {+2.07/2 0.007s} Rc7 {-2.03/2 0.005s}
28. Qg6 {+2.04/2 0.004s} d5 {-3.00/2 0.006s} 29. cxd5 {+3.00/2 0.005s}
Qxg6 {-3.00/2 0.003s} 30. fxg6 {+2.98/2 0.004s} Rf8 {-2.04/2 0.004s}
31. Rhe1 {+2.04/2 0.005s} Rxf3+ {-1.00/2 0.002s} 32. Re3 {+1.00/2 0.002s}
Rxf2 {-1.04/2 0.002s} 33. d6 {+1.02/2 0.003s} Rd7 {-1.04/2 0.005s}
34. Rd1 {+1.02/2 0.006s} b6 {-1.04/2 0.003s} 35. axb6 {+1.02/2 0.003s}
axb6 {-1.06/2 0.003s} 36. Re5 {+1.00/2 0.004s} Rf3+ {-1.02/2 0.006s}
37. Rd3 {+0.02/2 0.003s} Rxd3+ {-0.02/2 0.002s} 38. Kxd3 {-0.02/3 0.002s}
Rxd6+ {+0.96/3 0.002s} 39. Kc2 {-0.90/3 0.002s} Rxg6 {+0.96/3 0.002s}
40. Rd5+ {-0.90/3 0.004s} Kc7 {+1.00/3 0.002s} 41. Re5 {-0.92/3 0.004s}
Rg8 {+0.92/2 0.003s} 42. Re7+ {-0.92/2 0.003s} Kd6 {+1.00/3 0.002s}
43. Re1 {-1.00/2 0.003s} Ra8 {+1.00/3 0.003s} 44. Rd1+ {-1.00/2 0.002s}
Ke5 {+1.00/3 0.002s} 45. Re1+ {-1.00/2 0.003s} Kd4 {+1.00/3 0.002s}
46. Rd1+ {-1.00/3 0.003s} Ke3 {+1.00/3 0.002s, No result} *

[Event "?"]
[Site "?"]
[Date "2015.09.06"]
[Round "1"]
[White "Omen0002"]
[Black "Omen0003"]
[Result "*"]
[ECO "A21"]
[Opening "English, Kramnik-Shirov counterattack"]
[PlyCount "91"]
[Termination "unterminated"]
[TimeControl "10000"]

1. c4 {book} e5 {book} 2. Nc3 {book} Bb4 {book} 3. g3 {book} Bxc3 {book}
4. dxc3 {book} d6 {book} 5. Bg2 {book} Nc6 {book} 6. Nf3 {book} h6 {book}
7. Be3 {+0.05/2 0.007s} Nf6 {-0.09/2 0.005s} 8. Nh4 {-0.07/2 0.006s}
e4 {+0.03/2 0.005s} 9. a4 {-0.15/2 0.005s} g5 {+0.13/2 0.006s}
10. Nf3 {-2.36/2 0.005s} exf3 {+2.36/2 0.004s} 11. Bxf3 {-2.60/2 0.007s}
Ne5 {+2.32/2 0.007s} 12. b3 {-2.56/2 0.004s} Ke7 {+2.37/2 0.004s}
13. Qc2 {-2.53/2 0.007s} Nxf3+ {+2.41/2 0.005s} 14. exf3 {-2.43/3 0.006s}
Bd7 {+2.43/2 0.006s} 15. Ke2 {-2.51/2 0.004s} c5 {+2.47/2 0.005s}
16. g4 {-2.60/2 0.006s} Qb6 {+2.56/2 0.005s} 17. a5 {-2.56/2 0.004s}
Qa6 {+2.52/2 0.003s} 18. h3 {-2.56/2 0.003s} Nd5 {+2.52/2 0.004s}
19. Qe4+ {-2.45/2 0.007s} Be6 {+1.45/2 0.007s} 20. Bd2 {-2.52/2 0.004s}
f6 {+2.50/2 0.004s} 21. Kd3 {-2.53/2 0.004s} Rac8 {+1.51/2 0.004s}
22. Kc2 {-2.48/2 0.004s} f5 {+1.43/2 0.003s} 23. Bxg5+ {-1.43/1 0.004s}
hxg5 {+4.01/2 0.003s} 24. gxf5 {-4.01/1 0.003s} Nxc3 {+1.48/2 0.003s}
25. Qxe6+ {+1.12/2 0.003s} Kd8 {-2.10/3 0.004s} 26. Kxc3 {+2.10/2 0.006s}
Re8 {-2.10/2 0.006s} 27. Qf7 {+2.07/2 0.007s} Rc7 {-2.03/2 0.005s}
28. Qg6 {+2.04/2 0.004s} d5 {-3.00/2 0.005s} 29. cxd5 {+3.00/2 0.003s}
Qxg6 {-3.00/2 0.005s} 30. fxg6 {+2.98/2 0.003s} Rf8 {-2.04/2 0.003s}
31. Rhe1 {+2.04/2 0.005s} Rxf3+ {-1.00/2 0.003s} 32. Re3 {+1.00/2 0.005s}
Rxf2 {-1.04/2 0.002s} 33. d6 {+1.02/2 0.003s} Rd7 {-1.04/2 0.003s}
34. Rd1 {+1.02/2 0.005s} b6 {-1.04/2 0.004s} 35. axb6 {+1.02/2 0.003s}
axb6 {-1.06/2 0.002s} 36. Re5 {+1.00/2 0.003s} Rf3+ {-1.02/2 0.006s}
37. Rd3 {+0.02/2 0.003s} Rxd3+ {-0.02/2 0.002s} 38. Kxd3 {-0.02/3 0.002s}
Rxd6+ {+0.96/3 0.003s} 39. Kc2 {-0.90/3 0.002s} Rxg6 {+0.96/3 0.002s}
40. Rd5+ {-0.90/3 0.002s} Kc7 {+1.00/3 0.002s} 41. Re5 {-0.92/3 0.003s}
Rg8 {+0.92/2 0.005s} 42. Re7+ {-0.92/2 0.003s} Kd6 {+1.00/3 0.002s}
43. Re1 {-1.00/2 0.003s} Ra8 {+1.00/3 0.003s} 44. Rd1+ {-1.00/2 0.004s}
Ke5 {+1.00/3 0.002s} 45. Re1+ {-1.00/2 0.004s} Kd4 {+1.00/3 0.002s}
46. Rd1+ {-1.00/2 0.002s, No result} *
Of course i need to check what is going on with my new born engines before i can make another conclusion.
The engines are 100% deterministic at his point, so at one point the move "Ke3" is accepted the next time i get the message

Code: Select all

UnexpFinished game 8391 (Omen0003 vs Omen0002): * {No result}
ected move from "Omen0003"
So, one time ok, next time not, but maybe it is an issue due to the hyperfast time control ("nodes=10000").

I am going to debug my engine for the matter, but even if it produces "no result" values, i hope it does not influence the sprt calculation. The -recover option is enabled by the way.

regards
User avatar
ilari
Posts: 750
Joined: Mon Mar 27, 2006 7:45 pm
Location: Finland

Re: Usage sprt / cutechess-cli.

Post by ilari »

Desperado wrote:Hello, Ilari,

starting the next test, it seems that i track down an issue in context to the given problem. But before i am able to conclude what really is going on,
just a question:

How are "no result" values handled ?
You can see the full SPRT implementation here: https://github.com/cutechess/cutechess/ ... c/sprt.cpp

The relevant part is the "addGameResult" function:

Code: Select all

void Sprt::addGameResult(GameResult result)
{
	if (result == Win)
		m_wins++;
	else if (result == Draw)
		m_draws++;
	else if (result == Loss)
		m_losses++;
}
So "no result" values are ignored completely.
User avatar
Desperado
Posts: 879
Joined: Mon Dec 15, 2008 11:45 am

Re: Usage sprt / cutechess-cli.

Post by Desperado »

Hello, Ilari,

here are my analysis, results and conclusions.

1: Version 0.7.2(old) with concurrency 1

Code: Select all

Started game 8391 of 35000 (Omen0003 vs Omen0002)
Finished game 8391 (Omen0003 vs Omen0002): 1/2-1/2 {Draw by 3-fold repetition}
Score of Omen0003 vs Omen0002: 2137 - 1939 - 4315  [0.512] 8391
Started game 8392 of 35000 (Omen0002 vs Omen0003)
Finished game 8392 (Omen0002 vs Omen0003): 1/2-1/2 {Draw by 3-fold repetition}
Score of Omen0003 vs Omen0002: 2137 - 1939 - 4316  [0.512] 8392
Started game 8393 of 35000 (Omen0003 vs Omen0002)
Finished game 8393 (Omen0003 vs Omen0002): 0-1 {Black mates}
Score of Omen0003 vs Omen0002: 2137 - 1940 - 4316  [0.512] 8393
Started game 8394 of 35000 (Omen0002 vs Omen0003)
Finished game 8394 (Omen0002 vs Omen0003): 0-1 {Black mates}
Score of Omen0003 vs Omen0002: 2138 - 1940 - 4316  [0.512] 8394
Started game 8395 of 35000 (Omen0003 vs Omen0002)
Finished game 8395 (Omen0003 vs Omen0002): 1-0 {White mates}
Score of Omen0003 vs Omen0002: 2139 - 1940 - 4316  [0.512] 8395
ELO difference: 8
SPRT: llr 4.64, lbound -4.6, ubound 4.6 - H1 was accepted
Finished match
Conclusion: SPRT looks reasonable. There aren't any unexpected moves.

2: Version 0.7.2(latest linked in general topics forum thread) with concurreny 1

Code: Select all

Started game 8390 of 35000 (Omen0002 vs Omen0003)
Finished game 8390 (Omen0002 vs Omen0003): 0-1 {Black mates}
Score of Omen0003 vs Omen0002: 2137 - 1939 - 4314  [0.512] 8390
ELO difference: 8
SPRT: llr 4.58, lbound -4.6, ubound 4.6
Started game 8391 of 35000 (Omen0003 vs Omen0002)
Finished game 8391 (Omen0003 vs Omen0002): 1/2-1/2 {Draw by 3-fold repetition}
Score of Omen0003 vs Omen0002: 2137 - 1939 - 4315  [0.512] 8391
Started game 8392 of 35000 (Omen0002 vs Omen0003)
Finished game 8392 (Omen0002 vs Omen0003): 1/2-1/2 {Draw by 3-fold repetition}
Score of Omen0003 vs Omen0002: 2137 - 1939 - 4316  [0.512] 8392
Started game 8393 of 35000 (Omen0003 vs Omen0002)
Finished game 8393 (Omen0003 vs Omen0002): 0-1 {Black mates}
Score of Omen0003 vs Omen0002: 2137 - 1940 - 4316  [0.512] 8393
Started game 8394 of 35000 (Omen0002 vs Omen0003)
Finished game 8394 (Omen0002 vs Omen0003): 0-1 {Black mates}
Score of Omen0003 vs Omen0002: 2138 - 1940 - 4316  [0.512] 8394
Started game 8395 of 35000 (Omen0003 vs Omen0002)
Finished game 8395 (Omen0003 vs Omen0002): 1-0 {White mates}
Score of Omen0003 vs Omen0002: 2139 - 1940 - 4316  [0.512] 8395
ELO difference: 8
SPRT: llr 4.64, lbound -4.6, ubound 4.6 - H1 was accepted
Finished match
Conclusion: SPRT looks reasonable. There aren't any unexpected moves.

3: Version 0.7.2(latest) concurrency 4

Code: Select all

Started game 8393 of 35000 (Omen0003 vs Omen0002)
Started game 8394 of 35000 (Omen0002 vs Omen0003)
Finished game 8390 (Omen0002 vs Omen0003): 0-1 {Black mates}
Score of Omen0003 vs Omen0002: 2137 - 1939 - 4314  [0.512] 8390
ELO difference: 8
SPRT: llr 4.58, lbound -4.6, ubound 4.6
Finished game 8394 (Omen0002 vs Omen0003): 0-1 {Black mates}
Score of Omen0003 vs Omen0002: 2138 - 1939 - 4314  [0.512] 8391
Finished Unexpected move from "Omen0002"
Unexpected move from "Omen0002"
game 8393 (Omen0003 vs Omen0002): 0-1 {Black mates}
Score of Omen0003 vs Omen0002: 2138 - 1940 - 4314  [0.512] 8392
Finished game 8391 (Omen0003 vs Omen0002): * {No result}
Score of Omen0003 vs Omen0002: 2138 - 1940 - 4314  [0.512] 8392
Finished game 8392 (Omen0002 vs Omen0003): * {No result}
Score of Omen0003 vs Omen0002: 2138 - 1940 - 4314  [0.512] 8392
ELO difference: 8
SPRT: llr 4.58, lbound -4.6, ubound 4.6
Finished match
Conclusion:
* SPRT looks reasonable too but there isn't a report like "H1" was accepted.
* There is, like in the previous version a report "Unexpected move from...", so i started to debug my engine also because of self-interest.
The point is, there is no bug and my engine does not generate any illegal moves. Beside some more detailed debug activities, i checked the pgn games with the status "unterminated".(converted the move list into uci move format)

Some analysis:

Code: Select all

[Event "?"]
[Site "?"]
[Date "2015.09.07"]
[Round "1"]
[White "Omen0003"]
[Black "Omen0002"]
[Result "*"]
[ECO "A21"]
[Opening "English, Kramnik-Shirov counterattack"]
[TimeControl "10000"]
[Termination "unterminated"]
[PlyCount "111"]

c2c4 e7e5 b1c3 f8b4 g2g3 b4c3 d2c3 d7d6 f1g2 b8c6 g1f3 h7h6 c1e3 g8f6 f3h4 e5e4 a2a4 g7g5 h4f3 e4f3 g2f3 c6e5 b2b3 e8e7 d1c2 e5f3 e2f3 c8d7 e1e2 c7c5 g3g4 d8b6 a4a5 b6a6 h2h3 f6d5 c2e4 d7e6 e3d2 f7f6 e2d3 a8c8 d3c2 f6f5 d2g5 h6g5 g4f5 d5c3 e4e6 e7d8 c2c3 h8e8 e6f7 c8c7 f7g6 d6d5 c4d5 a6g6 f5g6 e8f8 h1e1 f8f3 e1e3 f3f2 d5d6 c7d7 a1d1 b7b6 a5b6 a7b6 e3e5 f2f3 d1d3 f3d3 c3d3 d7d6 d3c2 d6g6 e5d5 d8c7 d5e5 g6g8 e5e7 c7d6 e7e1 g8a8 e1d1 d6e5 d1e1 e5d4 e1d1 d4e3 d1e1 e3f2 e1d1 f2g3 d1d3 g3g2 d3e3 b6b5 e3d3 b5b4 d3e3 a8h8 e3e5 h8h3 e5g5 g2f1 g5c5 h3h1 c5a5 *

CTM : BLACK
EPT : --
woo : 0
wooo: 0
boo : 0
booo: 0
hmc : 2
-- -- -- -- -- -- -- --
-- -- -- -- -- -- -- --
-- -- -- -- -- -- -- --
wr -- -- -- -- -- -- --
-- bp -- -- -- -- -- --
-- wp -- -- -- -- -- --
-- -- wk -- -- -- -- --
-- -- -- -- -- bk -- br

info depth 1 seldepth 1 nodes 1 score cp -10 nps 1000 pv h1g1
info depth 1 seldepth 1 nodes 2 score cp -4 nps 2000 pv h1h2
info depth 1 seldepth 1 nodes 3 score cp -2 nps 3000 pv h1h3
info depth 1 seldepth 1 nodes 7 score cp 0 nps 7000 pv h1h6
info depth 2 seldepth 2 nodes 36 score cp -10 nps 36000 pv h1g1 a5d5
info depth 2 seldepth 2 nodes 41 score cp 0 nps 41000 pv h1h2 c2b1
info depth 3 seldepth 3 nodes 586 score cp 0 nps 36625 pv h1g1 a5a1 f1f2
info depth 4 seldepth 4 nodes 8394 score cp 0 nps 524625 pv h1g1 a5a1 f1f2 a1g1
info depth 3 seldepth 3 nodes 10240 score cp 0 nps 640000 pv h1g1 a5a1 f1f2

I got the debug fen 2nd position: "position fen rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1 moves c2c4 e7e5 b1c3 f8b4 g2g3 b4c3 d2c3 d7d6 f1g2 b8c6 g1f3 h7h6 c1e3 g8f6 f3h4 e5e4 a2a4 g7g5 h4f3 e4f3 g2f3 c6e5 b2b3 e8e7 d1c2 e5f3 e2f3 c8d7 e1e2 c7c5 g3g4 d8b6 a4a5 b6a6 h2h3 f6d5 c2e4 d7e6 e3d2 f7f6 e2d3 a8c8 d3c2 f6f5 d2g5 h6g5 g4f5 d5c3 e4e6 e7d8 c2c3 h8e8 e6f7 c8c7 f7g6 d6d5 c4d5 a6g6 f5g6 e8f8 h1e1 f8f3 e1e3 f3f2 d5d6 c7d7 a1d1 b7b6 a5b6 a7b6 e3e5 f2f3 d1d3 f3d3 c3d3 d7d6 d3c2 d6g6 e5d5 d8c7 d5e5 g6g8 e5e7 c7d6"

Code: Select all

[Event "?"]
[Site "?"]
[Date "2015.09.07"]
[Round "1"]
[White "Omen0002"]
[Black "Omen0003"]
[Result "*"]
[ECO "A21"]
[Opening "English, Kramnik-Shirov counterattack"]
[TimeControl "10000"]
[Termination "unterminated"]
[PlyCount "84"]

c2c4 e7e5 b1c3 f8b4 g2g3 b4c3 d2c3 d7d6 f1g2 b8c6 g1f3 h7h6 c1e3 g8f6 f3h4 e5e4 a2a4 g7g5 h4f3 e4f3 g2f3 c6e5 b2b3 e8e7 d1c2 e5f3 e2f3 c8d7 e1e2 c7c5 g3g4 d8b6 a4a5 b6a6 h2h3 f6d5 c2e4 d7e6 e3d2 f7f6 e2d3 a8c8 d3c2 f6f5 d2g5 h6g5 g4f5 d5c3 e4e6 e7d8 c2c3 h8e8 e6f7 c8c7 f7g6 d6d5 c4d5 a6g6 f5g6 e8f8 h1e1 f8f3 e1e3 f3f2 d5d6 c7d7 a1d1 b7b6 a5b6 a7b6 e3e5 f2f3 d1d3 f3d3 c3d3 d7d6 d3c2 d6g6 e5d5 d8c7 d5e5 g6g8 e5e7 c7d6 *

CTM : WHITE
EPT : --
woo : 0
wooo: 0
boo : 0
booo: 0
hmc : 6
-- -- -- -- -- -- br --
-- -- -- -- wr -- -- --
-- bp -- bk -- -- -- --
-- -- bp -- -- -- bp --
-- -- -- -- -- -- -- --
-- wp -- -- -- -- -- wp
-- -- wk -- -- -- -- --
-- -- -- -- -- -- -- --

info depth 1 seldepth 1 nodes 4 score cp -520 nps 4000 pv b3b4
info depth 1 seldepth 1 nodes 8 score cp -92 nps 8000 pv e7e1
info depth 2 seldepth 2 nodes 48 score cp -620 nps 3200 pv b3b4 d6e7
info depth 2 seldepth 2 nodes 87 score cp -100 nps 5800 pv e7e1 g8a8
info depth 3 seldepth 3 nodes 1640 score cp -620 nps 109333 pv b3b4 d6e7 b4c5
info depth 3 seldepth 3 nodes 3084 score cp -100 nps 205600 pv e7e1 g8a8 e1f1
info depth 3 seldepth 3 nodes 5800 score cp -92 nps 187097 pv e7b7 d6c6 b7a7
info depth 2 seldepth 2 nodes 10240 score cp -100 nps 330323 pv e7e1 g8a8
bestmove e7e1 ponder g8a8

Summary:

*I have good reasons to say now, that SPRT might work as desired, but the program flow/logic after a "Stop Match" with concurrency > 1 is unclean. Running games seem to be stopped too and do not accept another move (my impression), which leads to the "Unexpected move" report. This is somehow misleading (especially writing a new engine), because no one wants to spend time on fixing a bug of this type, which always needs to be solved, doesn't matter how much time it takes. That will take even more time if there isn't a bug of this sort.
*Another hint is that the "Unexpected move" report only occurs in the "range" when SPRT stops, and there have never been reports of these kind in front of the "Stop Match" event.
*Finally, everything works clean with concurrency 1 option.

Maybe my analysis will cause you to have a closer look into the "Stop Match / SPRT" logic.

If you need the opening pgn file and the engines or some other stuff, just let me know, i can provide everything via e-mail. A complete uci com-log was to much effort at the current point, but of course it can be produced too. A test i might perform would be to shuffle the opening pgn, but i am sure (as someone can be without testing) there won't be a "unexpected move" behaviour for the same positions/games.

regards
User avatar
ilari
Posts: 750
Joined: Mon Mar 27, 2006 7:45 pm
Location: Finland

Re: Usage sprt / cutechess-cli.

Post by ilari »

At a first glimpse I'm not completely sure why the third test was stopped early (it was not stopped by SPRT). But "unexpected move" doesn't mean that something horrible happened. Usually it means that cutechess-cli asked an engine to stop playing/thinking and then the engine responded with a move. It's not an error that should cause cutechess-cli to end the match.

The "unexpected move" thing would be a lot easier to debug using cutechess-cli's "-debug" parameter.
User avatar
Desperado
Posts: 879
Joined: Mon Dec 15, 2008 11:45 am

Re: Usage sprt / cutechess-cli.

Post by Desperado »

ilari wrote:At a first glimpse I'm not completely sure why the third test was stopped early (it was not stopped by SPRT). But "unexpected move" doesn't mean that something horrible happened. Usually it means that cutechess-cli asked an engine to stop playing/thinking and then the engine responded with a move. It's not an error that should cause cutechess-cli to end the match.

The "unexpected move" thing would be a lot easier to debug using cutechess-cli's "-debug" parameter.
Hi, Ilari,

i really didn't notice the debug parameter until now :shock:. I will check it out in the evening. Thx for the hint.
User avatar
Desperado
Posts: 879
Joined: Mon Dec 15, 2008 11:45 am

Re: Usage sprt / cutechess-cli.

Post by Desperado »

ilari wrote:At a first glimpse I'm not completely sure why the third test was stopped early (it was not stopped by SPRT). But "unexpected move" doesn't mean that something horrible happened. Usually it means that cutechess-cli asked an engine to stop playing/thinking and then the engine responded with a move. It's not an error that should cause cutechess-cli to end the match.

The "unexpected move" thing would be a lot easier to debug using cutechess-cli's "-debug" parameter.
Hello again.

The first thing is that debug mode works fine and there is no issue with my engine.
The next thing is that i am relieved to hear that the "unexpected move" report is not something serious but part of some "normal" communication activities between cutechess and the engine.

There is a SPRT stop:
Unfortunately the concurrency mode is stopped by SPRT, and the report is missing. So, i changed the ratingintveral parameter to 1 and repeated the match with concurrency 4 again.

Here is the result:


Score of Omen0003 vs Omen0002: 2138 - 1940 - 4314 [0.512] 8392
ELO difference: 8
SPRT: llr 4.58, lbound -4.6, ubound 4.6
Started game 8396 of 35000 (Omen0002 vs Omen0003)
Finished game 8395 (Omen0003 vs Omen0002): 1-0 {White mates}
Score of Omen0003 vs Omen0002: 2139 - 1940 - 4314 [0.512] 8393
ELO difference: 8
SPRT: llr 4.64, lbound -4.6, ubound 4.6 - H1 was accepted
Finished game 8392 (Omen0002 vs Omen0003): * {No result}
Unexpected move from "Omen0003"
Score of Omen0003 vs Omen0002: 2139 - 1940 - 4314 [0.512] 8393
Unexpected move from "Omen0002"
ELO difference: 8
SPRT: llr 4.64, lbound -4.6, ubound 4.6 - H1 was accepted[/b]
Finished game 8391 (Omen0003 vs Omen0002):
* {No result}
Score of Omen0003 vs Omen0002: 2139 - 1940 - 4314 [0.512] 8393
ELO difference: 8
SPRT: llr 4.64, lbound -4.6, ubound 4.6 - H1 was accepted
Finished game 8396 (Omen0002 vs Omen0003): * {No result}
Score of Omen0003 vs Omen0002: 2139 - 1940 - 4314 [0.512] 8393
ELO difference: 8
SPRT: llr 4.64, lbound -4.6, ubound 4.6 - H1 was accepted
Finished match

Well, understanding the "unexpected move" report and keeping in mind that the output was just missing, i am already happy.
Of course it would be less confusion if the "no result" and "unexpected move" reports could be removed / maybe replaced by something more intuitive like "match was stopped" or sth. similar, in case of SPRT stop.

IMHO there is still something not in sync (concurrency > 4) when stopping the match by SPRT. (This may be the cause for the final missing output and the confusion in combination with the other points mentioned)
marginalia: to see the game number 8393 four times in a row is suspicious too.

Finally, everything works, if you look carefully enough :).

Many thanks again.
Robert Pope
Posts: 558
Joined: Sat Mar 25, 2006 8:27 pm

Re: Usage sprt / cutechess-cli.

Post by Robert Pope »

I decided I would give SPRT a try in order to put a bit more rigor in my process, so I am trying to make sure I understand what it is saying.

As a test, I tested a clearly stronger version (1602, +200 elo or so) against an old version (1505). So, I ran the following test in cutechess-cli 0.7.2:

-engine conf="Abbess1602" -engine conf="Abbess1505" ^
-sprt elo0=0 elo1=10 alpha=0.05 beta=0.05

The result was:
SPRT: llr 2.97, lbound -2.94, ubound 2.94 - H1 was accepted

According to the cutechess documentation, that translates to "Abbess1602 is stronger than Abbess1505 by at least 0 ELO points". Makes sense.

Now, if I test in the other order (i.e. 1505 is a new patch that performs very poorly), I get: SPRT: llr -2.96, lbound -2.94, ubound 2.94 - H0 was accepted

That would translate to "Abbess1505 is not stronger than Abbess1602 by at least 10 ELO points".

Is that the right interpretation? If we test a better patch, we will determine that it is better, but if we test a worse patch, all we can say is that it isn't a big improvement (might be worse, might be a small improvement)?
User avatar
ilari
Posts: 750
Joined: Mon Mar 27, 2006 7:45 pm
Location: Finland

Re: Usage sprt / cutechess-cli.

Post by ilari »

Desperado wrote:IMHO there is still something not in sync (concurrency > 4) when stopping the match by SPRT. (This may be the cause for the final missing output and the confusion in combination with the other points mentioned)
marginalia: to see the game number 8393 four times in a row is suspicious too.
How did I miss this... I'm quite certain that the same game number being repeated multiple times happens because with concurrency > 1 at least one game is stopped right away when an SPRT hypothesis is accepted. I'll see if I can fix it.
User avatar
ilari
Posts: 750
Joined: Mon Mar 27, 2006 7:45 pm
Location: Finland

Re: Usage sprt / cutechess-cli.

Post by ilari »

Robert Pope wrote:I decided I would give SPRT a try in order to put a bit more rigor in my process, so I am trying to make sure I understand what it is saying.

As a test, I tested a clearly stronger version (1602, +200 elo or so) against an old version (1505). So, I ran the following test in cutechess-cli 0.7.2:

-engine conf="Abbess1602" -engine conf="Abbess1505" ^
-sprt elo0=0 elo1=10 alpha=0.05 beta=0.05

The result was:
SPRT: llr 2.97, lbound -2.94, ubound 2.94 - H1 was accepted

According to the cutechess documentation, that translates to "Abbess1602 is stronger than Abbess1505 by at least 0 ELO points". Makes sense.

Now, if I test in the other order (i.e. 1505 is a new patch that performs very poorly), I get: SPRT: llr -2.96, lbound -2.94, ubound 2.94 - H0 was accepted

That would translate to "Abbess1505 is not stronger than Abbess1602 by at least 10 ELO points".

Is that the right interpretation? If we test a better patch, we will determine that it is better, but if we test a worse patch, all we can say is that it isn't a big improvement (might be worse, might be a small improvement)?
That is correct.