Your observation is correct, there is a bug in my script, the last position (16th) was not included.pedrox wrote:For maxScore it seems that you have used 104, however adding on epd file I think I get 114.Ferdy wrote:Linear regression.Code: Select all
A. Processor Brand : Intel(R) Celeron(R) CPU B800 @ 1.50GHz Arch : X86_64 Count : 2 B. Engine settings Threads : 1 Hash (mb) : 128 Time(s)/pos : 30.0 C. Test set Filename : tony-dcc-caleb.epd NumPos : 16 D. Results Engine : Rating Best Score SRate Elap(s) Stockfish 8 64 : 3334 10 86 0.82 451 Fire 5 x64 : 3132 8 82 0.78 451 Komodo 9.02 64-bit : 3200 8 75 0.71 450 Bobcat v8.0 : 2816 8 70 0.67 428 Texel 1.06 : 2947 7 69 0.66 451 Hannibal 1.7 x64 : 2981 8 67 0.64 451 Cheng 4.39 : 2785 6 67 0.64 451 Deuterium v2017.1.35.431 : 2760 6 63 0.60 451 Arasan 20.2 : 2880 5 62 0.59 450 Rhetoric 1.4.3 x64 : 2631 6 61 0.58 429 Ethereal 8.19 : 2506 7 59 0.56 451 spark-1.0 : 2778 5 58 0.55 450 Gaviota v1.0 : 2716 4 55 0.52 450 Alaric 707 : 2479 3 54 0.51 453 Arminius 2014-01-18 : 2346 4 53 0.50 450 Cheese 1.9 64 bits : 2558 4 52 0.50 450 Maverick 1.5 x64 : 2380 3 43 0.41 451
Estimated Rating = (2443 x ScoreRate) + 1306
ScoreRate = totalScore/maxScore
Tony's positional test suite
Moderators: hgm, chrisw, Rebel
-
- Posts: 4840
- Joined: Sun Aug 10, 2008 3:15 pm
- Location: Philippines
Re: Sample regression
-
- Posts: 349
- Joined: Sat Aug 06, 2016 8:31 pm
- Location: United States
Re: Sample regression
It's encouraging to read this as RookieMonster had gotten up to nearly 60% on the STS, but recently dropped to 58% while simultaneously performing better than before in gauntlets against other engines. I kept those changes, but it was still disappointing to not see both measures improve.Dann Corbit wrote: Typically, there is a very poor regression between engine strength and EPD test suites.
I remember back in the day, when Shredder topped the Elo charts, it scored 285/300 on WAC which was very average.
-
- Posts: 11
- Joined: Sat Jul 22, 2017 2:50 am
- Location: New Zealand
Re: Sample regression
Excellent work many thanks for producing this.
-
- Posts: 7257
- Joined: Thu Aug 18, 2011 12:04 pm
- Full name: Ed Schröder
Re: Tony's positional test suite
Thanks for doing this. BTW, which interface (util) is used to run those EPD sets?Ferdy wrote:This is now fully converted. Duplicates are also removed. Illegal moves are discarded and not replaced, if there is only one move and it is illegal, the epd is removed.
Download rebel.epdCode: Select all
r3r1k1/1p3nqp/2pp4/p4p2/Pn3P1Q/2N4P/1PPR2P1/3R1BK1 w - - bm Ne2; c0 "positional scores are: Ne2=10, g4=6, Bd3=5, Rxd6=2, Re1=2, Qh5=1, Kh2=1, Be2=1"; id "rebel.pos.01"; 4rrk1/pp1b2pp/5n2/3p1N2/8/2QB1qP1/PP3P1P/4RRK1 w - - bm Rxe8; c0 "positional scores are: Rxe8=10, Ne7+=7, Re3=6, Nd4=4"; id "rebel.pos.02"; r6r/p6p/1pnpkn2/q1p2p1p/2P5/2P1P3/P4PP1/1RBQKB1R w K - bm Rb3; c0 "positional scores are: Rb3=10, Qc2=7, Rxh5=7, Be2=7, Bd3=2, g4=2, e4=2, Rb5=1"; id "rebel.pos.03";
https://drive.google.com/file/d/0BwAOsu ... sp=sharing
Sample run at 1s/posCode: Select all
A. Processor Brand : Intel(R) Celeron(R) CPU B800 @ 1.50GHz Arch : X86_64 Count : 2 B. Engine settings Threads : 1 Hash (mb) : 128 Time(s)/pos : 1.0 C. Test set Filename : rebel.epd NumPos : 657 D. Results Engine : Rating Best Score SRate Elap(s) Stockfish 8 64 : 3334 345 3193 0.64 674 Deuterium v2017.1.35.431 : 2760 278 2650 0.53 673
-
- Posts: 4840
- Joined: Sun Aug 10, 2008 3:15 pm
- Location: Philippines
Re: Tony's positional test suite
I am using a script. I hope to release it after improving some output and command line arguments.Rebel wrote:Thanks for doing this. BTW, which interface (util) is used to run those EPD sets?
-
- Posts: 11
- Joined: Sat Jul 22, 2017 2:50 am
- Location: New Zealand
Re: Tony's positional test suite
Something pointed out in Robin Smith’s book “Modern Chess Analysis” (Gambit books, 2004) are ‘ruler flat’ evaluations which indicate fortress draws. (or the evaluation tendency to ‘settle’ approximately so).
This evaluation behavior is further examined in a paper with later engines “Detecting Fortresses in Chess” (Guid & Bratko, 2012).
Example is if an evaluation eventually stabilizes at approximately say +2.24 and maintains this for some time then this behavior strongly indicates a fortress draw, despite a high evaluation for White.
This evaluation behavior is further examined in a paper with later engines “Detecting Fortresses in Chess” (Guid & Bratko, 2012).
Example is if an evaluation eventually stabilizes at approximately say +2.24 and maintains this for some time then this behavior strongly indicates a fortress draw, despite a high evaluation for White.
-
- Posts: 778
- Joined: Sat Jul 01, 2006 7:11 am
-
- Posts: 6442
- Joined: Tue Jan 09, 2007 12:31 am
- Location: PA USA
- Full name: Louis Zulli
Re: Tony's positional test suite
From the article:first25plus5 wrote:Something pointed out in Robin Smith’s book “Modern Chess Analysis” (Gambit books, 2004) are ‘ruler flat’ evaluations which indicate fortress draws. (or the evaluation tendency to ‘settle’ approximately so).
This evaluation behavior is further examined in a paper with later engines “Detecting Fortresses in Chess” (Guid & Bratko, 2012).
Example is if an evaluation eventually stabilizes at approximately say +2.24 and maintains this for some time then this behavior strongly indicates a fortress draw, despite a high evaluation for White.
Calling this idea "novel" in 2012 seems dubious, at best. Probably should not comment further...6 CONCLUSIONS
We introduce a novel idea for detecting fortresses in the
game of chess. We demonstrate that a heuristic-searchbased
program is able to detect fortresses on the basis of
backed-up values obtained at different levels of search.
If a particular position is a fortress, the program is not
able to show any progress towards a win and thus the
backed-up values cease to change significantly from a
certain search depth on.
-
- Posts: 407
- Joined: Sat May 05, 2012 2:48 pm
- Full name: Oliver Roese
Re: Tony's positional test suite
Thank you for that.
I gleaned over the test suite with analysis and diagrams on the web (http://privat.bahnhof.se/wb432434/pos.htm), these are all open positions, except for #14. That means that in the remaining 15 positions stockfish should be irrefutable by humans. I checked that conjecture and indeed in 8(!) out of 15 cases the commentators got it wrong or backwards. How many points would you give for that??
I personally enjoyed this rebuttal the most:
[d]1rN1r1k1/1pq2pp1/2p1nn1p/p2p1B2/3P4/4P2P/PPQ1NPP1/2R2RK1 b - - 0 1
1..Rxbc8 2.Nf4 (allegedly the refutation) Nxf4! 3.Bxc8 Nxg2!
In #14 the alleged best move 1.Nb1, played by Kasparov, is neutralized outright by 1..b5 and black is well.
[d]r3r1k1/ppqbbpp1/2pp1nnp/3Pp3/2P1P3/5N1P/PPBN1PP1/R1BQR1K1 w - - 0 1
In #16 after 34.Qxc5 (stockfish) resigning is an option.
[d]2r2k2/5p2/2Bp1b1r/2qPp1pp/PpN1P3/1P2Q3/5PPP/4R1K1 w - - 0 1
Interestingly with the help of stockfish you might save even this position against a strong human master. Since after the 34. Rc1(?) Qxe3 35.Nxe3(?!) Bd8 36.Rc4(?!) Ba5 37.Nc2(?!) g4 38.Nxb4(??) it follows 38...Rb8 39.Bb5 Bxb4 40. Rxb4 f5! and white is only minimal better (stockfish).
Never trust your test suite.
I gleaned over the test suite with analysis and diagrams on the web (http://privat.bahnhof.se/wb432434/pos.htm), these are all open positions, except for #14. That means that in the remaining 15 positions stockfish should be irrefutable by humans. I checked that conjecture and indeed in 8(!) out of 15 cases the commentators got it wrong or backwards. How many points would you give for that??
I personally enjoyed this rebuttal the most:
[d]1rN1r1k1/1pq2pp1/2p1nn1p/p2p1B2/3P4/4P2P/PPQ1NPP1/2R2RK1 b - - 0 1
1..Rxbc8 2.Nf4 (allegedly the refutation) Nxf4! 3.Bxc8 Nxg2!
In #14 the alleged best move 1.Nb1, played by Kasparov, is neutralized outright by 1..b5 and black is well.
[d]r3r1k1/ppqbbpp1/2pp1nnp/3Pp3/2P1P3/5N1P/PPBN1PP1/R1BQR1K1 w - - 0 1
In #16 after 34.Qxc5 (stockfish) resigning is an option.
[d]2r2k2/5p2/2Bp1b1r/2qPp1pp/PpN1P3/1P2Q3/5PPP/4R1K1 w - - 0 1
Interestingly with the help of stockfish you might save even this position against a strong human master. Since after the 34. Rc1(?) Qxe3 35.Nxe3(?!) Bd8 36.Rc4(?!) Ba5 37.Nc2(?!) g4 38.Nxb4(??) it follows 38...Rb8 39.Bb5 Bxb4 40. Rxb4 f5! and white is only minimal better (stockfish).
Never trust your test suite.
-
- Posts: 2929
- Joined: Sat Jan 22, 2011 12:42 am
- Location: NL
Re: Tony's positional test suite
Yes... it's one of those things that make me wonder how it got past the referee. As it is, the paper points out some obvious points and proceeds to offer no real idea for how to handle fortress detection.zullil wrote: Calling this idea "novel" in 2012 seems dubious, at best. Probably should not comment further...
Saying that the engines "detect" the fortress by having a flat eval seems rather generous; I'd call not returning a draw score a sign of not detecting the fortress.
Still, the paper has a list of interesting fortress positions that I might use if/when I go back to tinkering with fortress detection.