AndrewGrant wrote: ↑Tue Jun 08, 2021 4:44 am
If you see any elo gain at all, then the Network is working as intended. Not having a working Network would be - hundreds of elo, since the NNUE would be spitting out random evals essentially.
OK great news, thanks for confirming.
I am indeed running the Ethereal-13.00-pext-avx2 executable.
Then I'de expect your test to eventually end in the 50-90 range. But YMMV, and the opponent pool plays a role in whether it hits the upper end or the lower end. Ethereal tends to lose to Stockfish and its derivatives (NNUE wise, or Fire / Houdini) more so than it loses to the rest of the AB field. That can be seen on CCRLs elo diffs on individual breakdowns quite broadly over the last few releases. So a pool with stockfishes will deflate the rating. A pool with only Komodo, Xiphos, Laser, Igel (original net version), will inflate the rating.
:shrug:
Not something to shrug about. It's simply true. And not right. A pool of engines should be balanced. At the beginning of the GRL I had put Arasan 22.2 in a too strong engine pool (I had to start somewhere) and it got a too low rating because of that. The new 22.3 was put in a balanced pool and made a mighthy elo jump while Jon said -- This version is modestly stronger than 22.2 in my testing --
Another example, ProDeo 3.1, initially it was matched in a +/- 100 elo stronger pool at CCRL resulting in a lower rating than 3.0 (impossible), later it played its games against ~equal opponents and eventually came out stronger.
Regarding the use of derivatives, it's simply unfair competition. Not on my list.
Shall see. Not too concerned either way, as selfplay testing has proved itself reliable for years now
For sanity's sake, here is the regression test with the 8movesv3 book and LTC (60s here, but effectively 100s due to OpenBench worker speeds, scaled to Fishtest). http://chess.grantnet.us/test/11256/ [ELO | 76.31 +- 6.21 (95%)] This method has been the predictor for Ethereal at CCRL for the last few years, and is generally in the ball park. Where as most other lists see variations based on their book preferences.
90% of coding is debugging, the other 10% is writing bugs.
Modern Times wrote: ↑Tue Jun 08, 2021 12:54 pm
Will be interesting to see where it sits under all the various test conditions. This is what I have at FRC:
Lower than I expected, but I've double-checked everything.
Oh! Have we been talking about FRC this whole time? If so, those results are good and what I was looking for. SF > K > E, thats the idea, and that has Ethereal at #3 edging out Houdini finally. FRC gains are smaller than Standard gains, due to the nature of Networks essentially learning opening theory. I managed to push more elo than usual, by going with the FRC only net, but rough figures were something like (At 10s+.01s, unbalanced book): +40 Standard NNUE in FRC, +70 Fischer NNUE in FRC, +120 Standard NNUE in Standard, +50 Fischer NNUE in Standard.
And just to confirm off what Damir said: This is the e13.fischer.nnue Network? Looks like it based off of the results to me.
AndrewGrant wrote: ↑Tue Jun 08, 2021 2:16 pm
And just to confirm off what Damir said: This is the e13.fischer.nnue Network? Looks like it based off of the results to me.
40/15 1CPU - these results have it at about +38 Elo to 12.75 and the #4 engine. It jumps several places on the list as it is quite tight below the Top 3. I need to play further opponents below Pedone 3.1. The rating could change slightly and of course the error bars will reduce with more games.
Modern Times wrote: ↑Wed Jun 09, 2021 6:22 am
40/15 1CPU - these results have it at about +38 Elo to 12.75 and the #4 engine. It jumps several places on the list as it is quite tight below the Top 3. I need to play further opponents below Pedone 3.1. The rating could change slightly and of course the error bars will reduce with more games.
(and yes this is using the standard net)
There must be something wrong on NNUE... It is 232 Elo under Stockfish and only 36 Elo points better than 12.75 without the net...
And on CCRL stronger engines like Cfish, Corchess, Shashchess, SugaR, Honey, Bluefish, Oki Maguro, Harmon, Black Diamond are missing while Houdini is reported as 5th... Maybe with a custom NNUE would be better.
Modern Times wrote: ↑Wed Jun 09, 2021 6:22 am
40/15 1CPU - these results have it at about +38 Elo to 12.75 and the #4 engine. It jumps several places on the list as it is quite tight below the Top 3. I need to play further opponents below Pedone 3.1. The rating could change slightly and of course the error bars will reduce with more games.
(and yes this is using the standard net)
There must be something wrong on NNUE... It is 232 Elo under Stockfish and only 36 Elo points better than 12.75 without the net...
And on CCRL stronger engines like Cfish, Corchess, Shashchess, SugaR, Honey, Bluefish, Oki Maguro, Harmon, Black Diamond are missing while Houdini is reported as 5th... Maybe with a custom NNUE would be better.
They aren't listed on CCRL because they're not unique engines. Rather, they're just a bunch of Stockfish derivatives which differ only slightly and will almost all play effectively identical to Stockfish with nearly identical evaluations.
AlexChess wrote: ↑Wed Jun 09, 2021 8:01 am
There must be something wrong on NNUE... It is 232 Elo under Stockfish and only 36 Elo points better than 12.75 without the net...
I don't think so, Andrew said if the net wasn't being used then it would perform several hundred Elo worse. And he is only expecting +50 to +60 Elo anyway. So it is only slightly worse than he is expecting.
We will see what other test results come out at, like CEGT, Ed's tests and the CCRL standard chess blitz results.
Modern Times wrote: ↑Wed Jun 09, 2021 9:31 am
I don't think so, Andrew said if the net wasn't being used then it would perform several hundred Elo worse.
That is correct. If I start the commercial Ethereal 13 avx2 without net access in console-mode and start it with "go infinite", it wants to play 1.b4 in the starting position and the whole pv-line is complete nonsense. So, if the net is not found, the Elo performance of Ethereal 13 avx2 would be a complete mess.
Only problem, I can imagine, is, to use the wrong (FRC) nnue-net for classical chess instead of the standard net.
Before my test-machine crashed (it is now for repair at XMG), more than 1000 of the 7000 games testrun of Ethereal 13 avx2 were played and I got a performance around +75 Elo compared to Ethereal 12.75.