Hi Thomas,
more interesting than the unbalanced openings are to use engines with a larger Elo different.
The same results / effect.
UHU is a good idea to look at ... is a draw possible with a bad starting position.
But better if the starting position is balanced.
To avoid too many draws tests with engines around 100-200 weaker made more sense in my opinion for looking.
But 300 Elo is a lot for the time you work on it!
Congratulations ...
I saw on chrsitimas time how strong your engine can play chess with HCE.
And all what I saw are very balanced between midgame / endgame ... stats looks good I made.
Now if you test this again (if you have the hardware possibilities) with longer time controls I am sure you will see ...
The difference with 10x more time is maybe 200+ and with 40x more time maybe 100+.
Should be the same for all balanced programs like your Leorik.
This is a big issue for me here.
This is why John Stanback can see much more gain for the dev version than I can for slower games.
v6.50 is so great, the strongest he ever made with very long time controls.
But to test that need a lot of time and ressources.
At the end of the day I think...
OK, what should I do when testing engines when the results with other time controls are quite different for to many of the engines.
Before Neural-Network I think all are easier than today.
But +300 Elo is really a very strong start in the neural-network World. The grafic you add is very interesting also all your comments. I will study your complete thread next week (on my todo at second). Good for understandings for people like me.
Best
Frank
I will test your engine in Blitz later with 4/2. In my list are engines stronger than 3200 Elo. With about 300 Elo more your engines must be stronger than 3200 Elo if for example Wasp 6.50 plays with 3300 Elo.
Devlog of Leorik
Moderators: hgm, Rebel, chrisw
-
- Posts: 6829
- Joined: Wed Nov 18, 2009 7:16 pm
- Location: Gutweiler, Germany
- Full name: Frank Quisinsky
-
- Posts: 1784
- Joined: Wed Jul 03, 2019 4:42 pm
- Location: Netherlands
- Full name: Marcel Vanthoor
Re: Devlog of Leorik
Dammit. I already thought from your description: that sounds like an advanced version of a tapered, incrementally updated PSQT. I can understand that. But you HAD to go and put "gradient descent" in there, didn't you? Now I don't understand because I only understand (stochastic) gradient descent for single functions, not for entire data-sets. At some point I'll have to look into that.lithander wrote: ↑Sat Jan 27, 2024 3:27 pm And how strong is it? When using Stormphrax net, that was trained on billions of positions, the new Leorik was 300 Elo stronger than the HCE one!! Awesome considering how simple this evaluation approach is! Yes simple, I mean it. If you understand tapered PSQTs and use incremental updates already and know how to find the values with gradient descent you have understood all main concepts needed to completely understand what's powering a NNUE architecture as described above.
Still, it seems like I can stick to my original plan: tune Rustic's tables (the Texel tuner is coming along now, finally), re-tune on self-play data, extend search and HCE until the engine hits 2895 but preferably 3000 (because at that point it will be able to go head-to-head with Fritz 11 and probably win a 1000 game match), add Lazy SMP to add another 120-150 Elo for 4 threads, then add a (simple) NNUE and then call it quits.
I'm sure I can get it done before 2035.
-
- Posts: 881
- Joined: Sun Dec 27, 2020 2:40 am
- Location: Bremen, Germany
- Full name: Thomas Jahn
Re: Devlog of Leorik
You have basically summarized my journey so far except for the "call it quits" part. Not yet!mvanthoor wrote: ↑Mon Jan 29, 2024 9:20 pm Still, it seems like I can stick to my original plan: tune Rustic's tables (the Texel tuner is coming along now, finally), re-tune on self-play data, extend search and HCE until the engine hits 2895 but preferably 3000 (because at that point it will be able to go head-to-head with Fritz 11 and probably win a 1000 game match), add Lazy SMP to add another 120-150 Elo for 4 threads, then add a (simple) NNUE and then call it quits.
Took me three years to get here. But that was with many weeks (sometimes months) of relative inactivity in between.
If you're goal oriented, not process oriented you can reach the 3000+ engine level much faster. Ultimately all the effort spent on the HCE could be considered a waste of time and can be skipped completely. The same goes for pretty much every "original" idea I've had so far. The same goes for writing yet another chess engine.
So that's not how I'm looking at it. Did I have fun? Yes. Okay, then it's all good.
-
- Posts: 1784
- Joined: Wed Jul 03, 2019 4:42 pm
- Location: Netherlands
- Full name: Marcel Vanthoor
Re: Devlog of Leorik
If I can hit 2850 - 3000 with HCE, then add 200-300 Elo because of NNUE and another 150 with SMP, then that'd put the engine at 3200-3450. That's good enough. The sweet spot for an analysis engine is something between 2800-3100 with a HCE, because it gives me moves much better than the ones I could make, but they are not impossible to understand. I never use something like Stockfish for analysis nowadays, because I can't understand the reasoning behind the moves it puts out.
Same here. Because I now (finally) understood the last piece of the Texel tuning puzzle (without the gradient descent, still), I've picked up development again. I've also added quite a bit of material to the book. The hardest thing now is to decide: do I keep all the XBoard code? I dislike the protocol... but it's done 85% of the way, and already described in the documentation.Took me three years to get here. But that was with many weeks (sometimes months) of relative inactivity in between.
Hitting 3000 (on the old CCRL list that is) is only one goal. The 'real' goal is to write the book, and then turn Rustic into a chess library that can be used as a backend for a PicoChess (electronic chess board controller) replacement and a database/front-end app. The endgame is that I'll be able to build my own chess ecosystem, because IMHO, too many of the open-source options besides the engines are too old.If you're goal oriented, not process oriented you can reach the 3000+ engine level much faster. Ultimately all the effort spent on the HCE could be considered a waste of time and can be skipped completely. The same goes for pretty much every "original" idea I've had so far. The same goes for writing yet another chess engine.
-
- Posts: 2557
- Joined: Tue Aug 30, 2016 8:19 pm
- Full name: Rasmus Althoff
Re: Devlog of Leorik
Just... why? That's what UCI already allows, and even better, it also allows easily switching the engine.
Rasmus Althoff
https://www.ct800.net
https://www.ct800.net
-
- Posts: 1784
- Joined: Wed Jul 03, 2019 4:42 pm
- Location: Netherlands
- Full name: Marcel Vanthoor
Re: Devlog of Leorik
An electronic chess board controller such as PicoChess needs a board representation and a move generator to be able to check legality of incoming moves. I intend to make separate crates (libraries) from this logic so I can include those in a different project than the engine. The chess engine you play against, will be started by the controller too, and controlled by normal UCI.
And yes (if you have this in mind): I could write a controller that starts two engines: one to play against, and the other with a somewhat extended UCI-interface to check and report on move legality. I could also just pass any incoming move to the running engine which will then reject this move and have the controller react accordingly. The reason why I wouldn't do it like this is because every engine would need to react to illegal moves in the same way to make them work with the chess board controller. (This is the reason why PicoChess, written in Python, uses PyChess as its library for the board and move generator to reject incoming illegal moves before they reach the engine.)
-
- Posts: 2557
- Joined: Tue Aug 30, 2016 8:19 pm
- Full name: Rasmus Althoff
Re: Devlog of Leorik
Ah, that way. I'm not sure whether that's ideal, depending on how closely you tie your move generator to other parts of the engine. Such as static move ordering via MVV/LVA, but also history (which uses data from search), and updating mobility (used in eval). Even if you split it out somewhat cleanly, the implementation details will still be so engine specific that it won't be a library in that sense.mvanthoor wrote: ↑Tue Jan 30, 2024 9:25 pmAn electronic chess board controller such as PicoChess needs a board representation and a move generator to be able to check legality of incoming moves. I intend to make separate crates (libraries) from this logic so I can include those in a different project than the engine.
Rasmus Althoff
https://www.ct800.net
https://www.ct800.net
-
- Posts: 1784
- Joined: Wed Jul 03, 2019 4:42 pm
- Location: Netherlands
- Full name: Marcel Vanthoor
Re: Devlog of Leorik
At the moment I don’t intend to have the move generator do anything but generate moves. It would also be possible to use a feature flag to get a clean move generator. I’ll see when I get there.Ras wrote: ↑Tue Jan 30, 2024 10:18 pmAh, that way. I'm not sure whether that's ideal, depending on how closely you tie your move generator to other parts of the engine. Such as static move ordering via MVV/LVA, but also history (which uses data from search), and updating mobility (used in eval). Even if you split it out somewhat cleanly, the implementation details will still be so engine specific that it won't be a library in that sense.
Back to Leorik, before we get shouted at
-
- Posts: 881
- Joined: Sun Dec 27, 2020 2:40 am
- Location: Bremen, Germany
- Full name: Thomas Jahn
Re: Devlog of Leorik - New version 3.0
I have finally released Leorik 3.0!
The transition from HCE to NNUE warrants a "big" version change. This also gives me the chance to revisit the 2.X versions in the future but strength-wise NNUE is vastly superior.
The network was trained using Bullet on 622M labeled positions extracted from selfplay games.The architecture is (768->256)x2->1 which means that the network uses 768 inputs (2 colors x 6 piece-types x 64 squares) and one hidden layer of 256 neurons. It maintains two separate accumulators (from black's perspective and from white's perspective) and has two sets of output weights to allow the network to learn tempo. No originality here, but if you're a C# user you may find the manual vectorization interesting.
Other notable changes since the previous released version (v2.5) are Aspiration Windows and a revamped Staged Move Generation that replaces the 2nd Killer with a Counter and FollowUp move if available. These came before the change to NNUE but only provided a cumulative ~50 Elo so I never released a separate version with them.
Together with the NNUE however we're looking at a ~350 Elo strength increase with this version! It's now playing among the "super-human" engines; something I never expected to achieve when I started this "one-weekend" project three years ago!
To give human players a chance I have added another UCI parameter: Temperature
It's based on an idea from Ronald de Man regarding how to apply randomness to rootmoves. Basically if you set Temperature to 100 then each root move gets assigned a bonus randomly in the range of [0..100] so the engine will play moves that are not complete blunders but also not necessarily the best.
I used it for training-data generation but you can use that to weaken the engine:
The transition from HCE to NNUE warrants a "big" version change. This also gives me the chance to revisit the 2.X versions in the future but strength-wise NNUE is vastly superior.
The network was trained using Bullet on 622M labeled positions extracted from selfplay games.The architecture is (768->256)x2->1 which means that the network uses 768 inputs (2 colors x 6 piece-types x 64 squares) and one hidden layer of 256 neurons. It maintains two separate accumulators (from black's perspective and from white's perspective) and has two sets of output weights to allow the network to learn tempo. No originality here, but if you're a C# user you may find the manual vectorization interesting.
Other notable changes since the previous released version (v2.5) are Aspiration Windows and a revamped Staged Move Generation that replaces the 2nd Killer with a Counter and FollowUp move if available. These came before the change to NNUE but only provided a cumulative ~50 Elo so I never released a separate version with them.
Together with the NNUE however we're looking at a ~350 Elo strength increase with this version! It's now playing among the "super-human" engines; something I never expected to achieve when I started this "one-weekend" project three years ago!
To give human players a chance I have added another UCI parameter: Temperature
It's based on an idea from Ronald de Man regarding how to apply randomness to rootmoves. Basically if you set Temperature to 100 then each root move gets assigned a bonus randomly in the range of [0..100] so the engine will play moves that are not complete blunders but also not necessarily the best.
I used it for training-data generation but you can use that to weaken the engine:
- Temperature=0 results in the strongest play.
Temperature=50 weakens the engine by ~60 Elo
Temperature=100 weakens the engine by ~130 Elo
Temperature=1000 weakens the engine by ~1000 Elo
-
- Posts: 881
- Joined: Sun Dec 27, 2020 2:40 am
- Location: Bremen, Germany
- Full name: Thomas Jahn
Re: Devlog of Leorik
Since release I did a compatibility update (v. 3.0.1) to make Leorik play nice on older hardware and ARM CPUs.
Lars did an extensive test of 4300 30-minute games with Leorik 3 and the new version gained +309 Elo over the previous one!
Leorik 3 is also now listed on the CCRL Blitz list with a rating of 3277 Elo. That's +359 Elo over the previous version.
A CCRL 40/15 Gauntlet is scheduled for Wednesday and the CEGT guys are also currently testing with preliminary results suggesting a rating of 3124 Elo which would be +358 Elo over to v.2.5.
Leorik is also currently playing live in Division 5: https://ccrl.live/16064/
So thanks a lot everyone for giving Leorik some play!
Lars did an extensive test of 4300 30-minute games with Leorik 3 and the new version gained +309 Elo over the previous one!
Code: Select all
[Lars] 108 Leorik 3 (1thr) M15 : 3203 8 8 4300 52.1 % 3188 33.2 %
[Lars] 216 Leorik 2.5 (1thr) M15 : 2894 17 17 1227 56.5 % 2848 27.6 %
A CCRL 40/15 Gauntlet is scheduled for Wednesday and the CEGT guys are also currently testing with preliminary results suggesting a rating of 3124 Elo which would be +358 Elo over to v.2.5.
Leorik is also currently playing live in Division 5: https://ccrl.live/16064/
So thanks a lot everyone for giving Leorik some play!