that's the perfect use of smart pointers, i was mainly referring to why use heap memory in the first place. but that's a matter of style i suppose and shouldn't affect performance at all.niel5946 wrote: ↑Fri Apr 15, 2022 11:27 amI appreciate your commentstcusr wrote: ↑Wed Apr 13, 2022 4:26 pm i've been browsing your code quickly and the only advice i have for you is try not to use a C++ feature just because you can, for example here a simple ternary operator is enough. also the extensive use of smart pointers is a bit weird (at least for me, if i'm not missing anything).
but nonetheless the code looks much better, good luck.![]()
I agree that the lambda is a little overkill when compared to the ternary operator. However, I don't think that an extensive use of smart pointers are bad or weird. This is primarily for the reason that it's more modern, and less bug-prone. With normal heap-allocated memory, you have to always keep track of which things you have allocated and which you have freed. I have previously experienced a lot of bugs from either calling free() prematurely or never, causing a memory leak.
Additionally, smart pointers have nearly no overhead. Of course std::shared_ptr does, but I intend to pass it by reference, which will not make it increment the reference count.
Progress on Loki
Moderator: Ras
-
- Posts: 325
- Joined: Tue Aug 31, 2021 10:32 pm
- Full name: tcusr
Re: Progress on Loki
-
- Posts: 325
- Joined: Tue Aug 31, 2021 10:32 pm
- Full name: tcusr
Re: Progress on Loki
no no, that's exactly what i meant. one should prefer a simple solution solution to one more complex, principle of least power.dangi12012 wrote: ↑Fri Apr 15, 2022 12:12 pmhttps://github.com/BimmerBass/Loki/blob ... ndex.h#L35niel5946 wrote: ↑Fri Apr 15, 2022 11:27 am I agree that the lambda is a little overkill when compared to the ternary operator. However, I don't think that an extensive use of smart pointers are bad or weird. This is primarily for the reason that it's more modern, and less bug-prone. With normal heap-allocated memory, you have to always keep track of which things you have allocated and which you have freed. I have previously experienced a lot of bugs from either calling free() prematurely or never, causing a memory leak.
I think the questioner misunderstood what this line of code does. It is not a lambda - it is a compiletime (constexpr) constant initialized by a lambda.
This could be done with a constexpr function but this is much cleaner so your compiletime initializer is anonymous.
The runtime cost is ZERO and its the same as writing two classes - one with const 4096 and one with const 512.
but don't get me wrong i use immediately called lambdas in my code too to initialize constexpr arrays, but that's different than just assigning a number
-
- Posts: 174
- Joined: Thu Nov 26, 2020 10:06 am
- Full name: Niels Abildskov
Re: Progress on Loki
Update on the refactoring
Hello everyone
I know it has been some time since I announced my big refactoring of Loki, but life got in the way, so I - sadly - didn't get to work as much on it as I had initially hoped.
With that being said, I have just as of today finished implementing the basic search function and evaluation, as well as determined an approximate rating. The current feature list is rather short but I have implemented the following:
There were two bugs from implementing the move-generator, I discovered while doing my initial play-testing:
) thought ahead and made all the values of the different evaluation terms hot-swappable - or that is at least my plan. All my evaluation parameters are stored in an abstract evaluation_params object whose deriving classes are required to implement an initialization function. Right now, I have only implemented a hard-coded class, but my hope is to some-day be able to load the evaluation parameters from something like a JSON or XML file. This will allow me to specify which parameters i want tuned (by an external tuner - I'll have to come up with some sort of protocol with that...) during runtime instead of having to either tune ALL parameters, or re-compiling a binary with the ones I want.
It should be noted that no initialization/parameter-reading is done during a search. It is always done beforehand, so the overhead of having the parameters as objects instead of inlined constexpr variables is minimal (at least I think so..?)
I tested the new Loki against Sargon (1280 elo on CCRL), and got the following results:
Which, if I understand correctly, means that the current Loki has an estimated rating of 1337 +/- 10 elo. Is this a correct assumption? One thing that stands out to me is that Loki had a majority of its wins playing with black, even though each engine played all openings from both sides. This makes me think there is a bug somewhere that makes Loki weaker when playing as white.
Lastly, I don't really know what I should expect given the current features. Is a rating of 1330-1340 acceptable or is Loki supposed to be a lot stronger already? I will add that the speed of the refactored codebase is around 30% faster than Loki 3.5.0 (as measured in perft and tentatively in the search function)
Hello everyone

I know it has been some time since I announced my big refactoring of Loki, but life got in the way, so I - sadly - didn't get to work as much on it as I had initially hoped.
With that being said, I have just as of today finished implementing the basic search function and evaluation, as well as determined an approximate rating. The current feature list is rather short but I have implemented the following:
- Search
- Iterative deepening (a bit too early IMO considering I don't have a transposition table yet)
- Fail-hard principal variation, alpha-beta search.
- Quiescence search
- Evaluation
- Material. I just used the classic Pawn = 1, Knight = Bishop = 3, Rook = 5 and Queen = 9 values, converted to units of 1 / 512'th of a pawn.
- Piece-square tables. The values of these tables are taken as averages from the old Loki tables. This means that a square = (oldLoki.mg + oldLoki.eg / 2) / 2, and converted to the new units, since I haven't implemented real evaluation tapering yet.
There were two bugs from implementing the move-generator, I discovered while doing my initial play-testing:
- make_move() never reset the fifty-move counter, even though a capture or pawn-move had been performed, which led to erroneous draw-scores.
- move_generator was held by position as a std::shared_ptr<move_generator>, but it also held a std::shared_ptr<position> itself, which resulted in a position object never being freed after it had been allocated.. This was solved by just having a raw position pointer in the move_generator instead of a smart pointer.

It should be noted that no initialization/parameter-reading is done during a search. It is always done beforehand, so the overhead of having the parameters as objects instead of inlined constexpr variables is minimal (at least I think so..?)
I tested the new Loki against Sargon (1280 elo on CCRL), and got the following results:
Code: Select all
Score of Loki 4.0 vs Sargon Engine: 78 - 34 - 161 [0.581] 273
... Loki 4.0 playing White: 23 - 22 - 91 [0.504] 136
... Loki 4.0 playing Black: 55 - 12 - 70 [0.657] 137
... White vs Black: 35 - 77 - 161 [0.423] 273
Elo difference: 56.5 +/- 26.2, LOS: 100.0 %, DrawRatio: 59.0 %
SPRT: llr 3 (101.8%), lbound -2.94, ubound 2.94 - H1 was accepted
Lastly, I don't really know what I should expect given the current features. Is a rating of 1330-1340 acceptable or is Loki supposed to be a lot stronger already? I will add that the speed of the refactored codebase is around 30% faster than Loki 3.5.0 (as measured in perft and tentatively in the search function)
-
- Posts: 174
- Joined: Thu Nov 26, 2020 10:06 am
- Full name: Niels Abildskov
Re: Progress on Loki
SMP and UCI-options done!
Hello again everyone!
I have just finished implementing the new Lazy SMP function in Loki, and I am quite pleased with the result to say the least
Of course, I can't say anything about the playing strength increase from multithreading, since it is too early for testing considering I do not have a hash table yet. But I like the way it is implemented, and that I was able to avoid using global variables or simple lists of std::thread's (having a look at Stockfish's great implementation of course helped, but I would say mine is original considering all the sorrounding code
).
Besides the multithreading, I have implemented an options_manager class that registers options and their callbacks to make new option implementations a walk in the park to introduce. Now, some of you may have checked out the source code of the refactor and wondered why I have chosen to make the engine more complex and object-oriented, and my reasons are two-fold:
My plan for adding features to Loki looks something like the following:
All in all, I would say the progress is promising
Hello again everyone!
I have just finished implementing the new Lazy SMP function in Loki, and I am quite pleased with the result to say the least


Besides the multithreading, I have implemented an options_manager class that registers options and their callbacks to make new option implementations a walk in the park to introduce. Now, some of you may have checked out the source code of the refactor and wondered why I have chosen to make the engine more complex and object-oriented, and my reasons are two-fold:
- I hate global variables and - to a moderate extent - free functions with a passion! I think they clutter the codebase, and make the application state, while running, unpredictable and unmanageable. Therefore I have been insistent on removing all of them, wrapping everything in classes, and have one single "admin" class that is, in essence, an entirely independent instance of Loki. In addition, I want to be able to write unit-tests, whose worst enemy is a global application state.
- This is more of a personal reason, but after having looked at dozens of engines's source code, I wanted to design Loki my own way. Many other engines share the same project structure, control flow, and overall have a very similar architecture. I don't necessarily think that's bad, but I do believe it is a very C-like way of coding, which I personally do not enjoy, so Loki will be smothered in classes and smart pointers
My plan for adding features to Loki looks something like the following:
- Simple move ordering: The refactored version of Loki currently has no move ordering at all, so I think this is the most natural starting point for some quick strength gains.
- MVV-LVA: I would like to experiment with putting losing captures last in the move list, but IIRC Loki lost elo last time I did that, so I might wait until my move ordering is more precise.
- Killer moves
- History heuristic
- PV-move and hash-move - the latter when a transposition table is implemented.
- Transposition Table: This one is obvious
Even though a hash table is hard to get right, I want to implement it early on since iterative deepening and - especially - Lazy SMP is so reliant on it. As part of this I would like to experiment with different amounts of buckets and replacement strategies.
- Search. I think I'm going to be rather conservative with the features I add in the search function since I 1) don't want it to become unnecessarily bloated, and 2) have burnt myself earlier by adding a lot of exotic pruning/reduction methods too hastily without proper testing.
- Null move pruning
- Check extensions
- Late move reductions. This one is not certain, since I might wait until my evaluation is better.
- Staged move generation. I'm not sure about this, since the last time I tried, it was a mess. But it might be worth a shot now
- Evaluation. Hopefully the previous points will have collected enough positions from playing that I can implement some sort of self-learning capability for Loki (otherwise I'll just use the 3000+ engines's games for the time being).
- Tapered evaluation.
- (perhaps)Terms like pawn-structure, mobility and king safety, but I haven't fully decided yet.
- Evaluation tuning. This will also be when I write code for loading the evaluation from a file. I think my tuner will be based on Andrew Grant's paper.
All in all, I would say the progress is promising

-
- Posts: 1784
- Joined: Wed Jul 03, 2019 4:42 pm
- Location: Netherlands
- Full name: Marcel Vanthoor
Re: Progress on Loki
Welcome back. Same here: I've been away for over a year myself. Darn life.
The result means that Loki is 56 Elo stronger than Sargon, with a margin of 26 Elo. So the difference is between +30 and +82 Elo for Loki.I tested the new Loki against Sargon (1280 elo on CCRL), and got the following results:Which, if I understand correctly, means that the current Loki has an estimated rating of 1337 +/- 10 elo.Code: Select all
Score of Loki 4.0 vs Sargon Engine: 78 - 34 - 161 [0.581] 273 ... Loki 4.0 playing White: 23 - 22 - 91 [0.504] 136 ... Loki 4.0 playing Black: 55 - 12 - 70 [0.657] 137 ... White vs Black: 35 - 77 - 161 [0.423] 273 Elo difference: 56.5 +/- 26.2, LOS: 100.0 %, DrawRatio: 59.0 % SPRT: llr 3 (101.8%), lbound -2.94, ubound 2.94 - H1 was accepted
You should certainly investigate it if the difference between white and black scores is massive. White will statistically score a bit better because of the first move advantage. When you play your engine against itself, that would probably mean white scoring something like 54-56%.Is this a correct assumption? One thing that stands out to me is that Loki had a majority of its wins playing with black, even though each engine played all openings from both sides. This makes me think there is a bug somewhere that makes Loki weaker when playing as white.
You have only a move generator, alpha-beta, no move ordering and PVS, right?Lastly, I don't really know what I should expect given the current features. Is a rating of 1330-1340 acceptable or is Loki supposed to be a lot stronger already? I will add that the speed of the refactored codebase is around 30% faster than Loki 3.5.0 (as measured in perft and tentatively in the search function)
Rustic Alpha 1 has a move generator, alpha-beta, MVV-LVA move ordering, and no PVS. Therefore I have made a version where I disabled all move ordering, and I'm running a 600 game test against Sargon 1.00 and 1.01 (I suspect you used 1.00 because that one has a rating on CCRL, but I wanted to see the playing strength difference between the two versions as well).
Compiling without move ordering:
Code: Select all
(/home/marcel/Code/rustic)
warning: associated function `add_score` is never used
--> src/movegen/defs.rs:137:12
|
137 | pub fn add_score(&mut self, value: u8) {
| ^^^^^^^^^
|
= note: `#[warn(dead_code)]` on by default
warning: associated function `get_mut_move` is never used
--> src/movegen/movelist.rs:79:12
|
79 | pub fn get_mut_move(&mut self, index: u8) -> &mut Move {
| ^^^^^^^^^^^^
warning: associated function `swap` is never used
--> src/movegen/movelist.rs:83:12
|
83 | pub fn swap(&mut self, a: usize, b: usize) {
| ^^^^
warning: constant `MVV_LVA` is never used
--> src/search/sorting.rs:30:11
|
30 | pub const MVV_LVA: [[u8; NrOf::PIECE_TYPES + 1]; NrOf::PIECE_TYPES + 1] = [
| ^^^^^^^
warning: associated function `score_moves` is never used
--> src/search/sorting.rs:41:12
|
41 | pub fn score_moves(ml: &mut MoveList) {
| ^^^^^^^^^^^
warning: associated function `pick_move` is never used
--> src/search/sorting.rs:49:12
|
49 | pub fn pick_move(ml: &mut MoveList, index: u8) {
| ^^^^^^^^^
warning: `rustic` (bin "rustic") generated 6 warnings
Finished release [optimized] target(s) in 16.32s
Here is a very early preliminary result. I'll post the final result when the test finishes:
Code: Select all
0 rustic-1.3-no-sort 79 86 45 61.1% 33.3%
1 sargon-engine-1.00 -61 116 23 41.3% 39.1%
2 sargon-engine-1.01 -97 134 22 36.4% 27.3%
45 of 600 games finished.
Re @ previous post above:
First: Most people use something like 10s+0.1s in their gauntlets and SPRT testing.
Second: I'm of the opinion you 're implementing features in the wrong order. How did you implement Lazy SMP without a TT? AFAIK, the key point of Lazy SMP is to rely on the TT for inter-thread communication. Also, you shouldn't be looking at functions such as LMR and LMP. It's easily possible to reach 2100+ Elo on CCRL with just the basics: alpha-beta, check extension, MVV-LVA, TT + TT Move Ordering, PVS (Rustic Alpha 3 feature set: 1920 Elo), and a tapered and tuned evaluation should add another 250 points or so on top of that. I expect Rustic 4 to hit at least 2150, or I'll be disappointed.
Good luck with the rest of the refactoring.
In my case: I've finally decided on a new system and I've got a 7950X languishing away in its box under the desk. This weekend I HAVE to get around to building that new system, because waiting for this test to finish is already irritating me. In the beginning it wasn't that big of a problem, but now that I'm at the point where Rustic "just needs some more features" to become stronger, I want to be able to test (and later, tune) the engine faster. The 6700K finally has to go after 7 years of service.
-
- Posts: 1784
- Joined: Wed Jul 03, 2019 4:42 pm
- Location: Netherlands
- Full name: Marcel Vanthoor
Re: Progress on Loki
This morning the gauntlet was at 556 games out of 600. I didn't feel like leaving it to run when I went to work and then have the computer idle for 6 hours or so, so I canceled it. This was the final standings at the point of cancelation:
It seems Rustic Alpha 1 without move ordering and without PVS is 184 +/-23 Elo stronger than Sargon 1.0.0. This would put it at around 1460 in the CCRL blitz list. To be entirely sure however, I'd need to run this version in a gauntlet against other engines (but I'm not going to). MAYBE I'll retroactively release an Alpha 0.5 so CCRL can test this and there's an even lower rated version of Rustic available. I will run this version against the normal Alpha 1 so I can do an SPRT-test to see how much Elo move ordering brings. (Seems to be about 240 Elo, because Alpha 1 is at 1700 in the CCRL list.)
Code: Select all
Rank Name Elo +/- Games Score Draw
0 rustic-1.3-no-sort 95 23 556 63.3% 36.0%
1 sargon-engine-1.00 -89 29 279 37.5% 49.1%
2 sargon-engine-1.01 -101 37 277 35.9% 22.7%
556 of 600 games finished.
-
- Posts: 174
- Joined: Thu Nov 26, 2020 10:06 am
- Full name: Niels Abildskov
Re: Progress on Loki
That is correct

As you can see in the match results, Loki 4.0 (which is Loki DEV really, but anyway) won a lot more games playing black than when playing white. But I think the issue is solved. I noticed that I had made what can best be described as a brain-fart while implementing the simple time-management scheme; the return value of the end-time method was based on the side to move of the current position and not the side to move of the root position, which of course led to losing a lot of games on time... why this had a bigger influence while playing as white as opposed to black might have something to do with the average amount of plies/nodes Loki DEV searched, and therefore when it would check.mvanthoor wrote: ↑Thu Mar 30, 2023 9:48 pm You should certainly investigate it if the difference between white and black scores is massive. White will statistically score a bit better because of the first move advantage. When you play your engine against itself, that would probably mean white scoring something like 54-56%.
I just ran two SPRT tests with the fixed DEV-version; One with 5s+0.1s tc and one with 10s+0.1s:
Code: Select all
For 5s+0.1s:
Score of Loki DEV vs Sargon 1978 1.01: 111 - 53 - 136 [0.597] 300
... Loki DEV playing White: 45 - 28 - 77 [0.557] 150
... Loki DEV playing Black: 66 - 25 - 59 [0.637] 150
... White vs Black: 70 - 94 - 136 [0.460] 300
Elo difference: 68.0 +/- 29.2, LOS: 100.0 %, DrawRatio: 45.3 %
SPRT: llr 2.95 (100.3%), lbound -2.94, ubound 2.94 - H1 was accepted
For 10s+0.1s:
Score of Loki DEV vs Sargon 1978 1.01: 109 - 46 - 100 [0.624] 255
... Loki DEV playing White: 64 - 15 - 48 [0.693] 127
... Loki DEV playing Black: 45 - 31 - 52 [0.555] 128
... White vs Black: 95 - 60 - 100 [0.569] 255
Elo difference: 87.7 +/- 33.7, LOS: 100.0 %, DrawRatio: 39.2 %
SPRT: llr 2.96 (100.7%), lbound -2.94, ubound 2.94 - H1 was accepted
Estimated elo of Loki DEV:
5s+0.1s: 1344 +/- 29
10s+0.1s: 1364 +/- 34

- The piece-square tables are tuned to another set of piece-values, which probably distorts them quite a lot.
- The value of a given square right now (since I don't have tapered eval) is an interpolation of Loki 3.5.0's middlegame and endgame values. It is computed as (sq.mgScore + sq.egScore / 2) / 2, and then corrected for the new score granularity of Loki DEV. Now that I think about it, I should probably take a look at just giving my PSQT's pretty basic values and then tuning them later...
That seems like a reasonable time limit. As you can see from my tests earlier, there is also a ~20 elo difference between Loki playing on 5s+0.1s and 10s+0.1s.
I asked because it seems there is a big difference between the tc's egine developers use. I remember reading that some Stockfish tests used something like 1 second plus a little increment (not that I'm comparing Loki to SF... yet

I'm thinking that I will start off with a mixture of 5s+0.1s and 10s+0.1s, and if their scores keep fluctuating by ~20 elo or something, I will use the former time control going forward. I will then calibrate Loki during my development by running gauntlets with longer TC, which will also hopefully reveal any problems there might be with my SPRT TC..
I think I might've called it Lazy SMP a bit too hastily, as I know it is based on a working TT. My point was that I had implemented the ability to perform searches on multiple threads simultaneously, which of course won't offer a strength gain in and of itself (since all threads will search the exact same nodes). My reason for this is that I wanted to make sure multithreading was out of the way before adding too many new features. SMP would have been a mess to implement if I had started writing Loki as a single-threaded engine and then changed my mind later on. And besides, as soon as I get around to creating a hash-table, Lazy SMP will basically be implemented automaticallymvanthoor wrote: ↑Thu Mar 30, 2023 9:48 pm Second: I'm of the opinion you 're implementing features in the wrong order. How did you implement Lazy SMP without a TT? AFAIK, the key point of Lazy SMP is to rely on the TT for inter-thread communication. Also, you shouldn't be looking at functions such as LMR and LMP. It's easily possible to reach 2100+ Elo on CCRL with just the basics: alpha-beta, check extension, MVV-LVA, TT + TT Move Ordering, PVS (Rustic Alpha 3 feature set: 1920 Elo), and a tapered and tuned evaluation should add another 250 points or so on top of that. I expect Rustic 4 to hit at least 2150, or I'll be disappointed.


Regarding LMR and LMP, I tend to agree with you. My reason for adding it would be that I experienced a (IIRC) 150 elo gain previously, but that might as well have been the deeper search covering up for other parts of the old Loki lacking behind... I must say, I admire (and try to learn from) your patience as a developer


Thank you very much, and good luck to you with your re-started development of Rustic. I look forward to reading more of your blog posts

That sounds like a fun weekend project. I share your experience with an old machine. All my work on Loki before the refactor was done on my school computer, and I remember the long wait times. Now that I have a 5900X on a gaming setup I built a year or so ago (even though I rarely play computer games anymore), the tests are much faster.mvanthoor wrote: ↑Thu Mar 30, 2023 9:48 pm In my case: I've finally decided on a new system and I've got a 7950X languishing away in its box under the desk. This weekend I HAVE to get around to building that new system, because waiting for this test to finish is already irritating me. In the beginning it wasn't that big of a problem, but now that I'm at the point where Rustic "just needs some more features" to become stronger, I want to be able to test (and later, tune) the engine faster. The 6700K finally has to go after 7 years of service.
-
- Posts: 1784
- Joined: Wed Jul 03, 2019 4:42 pm
- Location: Netherlands
- Full name: Marcel Vanthoor
Re: Progress on Loki
You could disable PVS in your version, and then use Rustic 1's evaluation tables. They are written from white's point of view, and then use a flip table for the black counterpart tables (like most engines seem to do). I wrote these tables myself three years ago and they performed very well. (They also play a lot in my own style, which makes playing against Rustic Alpha 1 and 2 very weird for me. I can defeat it because I can often predict what it's going to doniel5946 wrote: ↑Fri Mar 31, 2023 8:20 pm If Rustic Alpha 1 without move sorting is 184 +/- 23 elo stronger than Sargon, I feel rather confident in my development version of Loki at the moment (but I think I'll have another look at the search just in case). This is because my evaluation is probably very badly configured.


I forgot if I used 10s+1s, or 10s+0.1s... I'll probably go with 10s+0.1s. On the new computer the NPS should be monstrous... On the old 6700K, Rustic runs perft at 32M positions per second, and that is including all incrementals such as zobrist and all evaluation scores. Somewhere this weekend I hope to see what the speed on the new computer will be.That seems like a reasonable time limit. As you can see from my tests earlier, there is also a ~20 elo difference between Loki playing on 5s+0.1s and 10s+0.1s.
I asked because it seems there is a big difference between the tc's egine developers use.
Seems like a plan. I'll probably be running my gauntlets and SPRT testing at 1m+0.6s. I remember having used that time control in the past.I'm thinking that I will start off with a mixture of 5s+0.1s and 10s+0.1s, and if their scores keep fluctuating by ~20 elo or something, I will use the former time control going forward. I will then calibrate Loki during my development by running gauntlets with longer TC, which will also hopefully reveal any problems there might be with my SPRT TC..
Ah, yes. That explains it. I implemented a (sloppy) multi-threaded perft in Rustic, which just split the first 20 legal moves into the number of threads; so 20 moves/thread on 1 thread, 10 moves/thread on 2 threads, 5 moves/thread on 4 threads. That's not really optimal. I'll think about this later. Probably, just to make tests run faster (in case I start to tinker with the move generator) I'll implement bulk counting.I think I might've called it Lazy SMP a bit too hastily, as I know it is based on a working TT. My point was that I had implemented the ability to perform searches on multiple threads simultaneously, which of course won't offer a strength gain in and of itself (since all threads will search the exact same nodes).
In Rustic, I should be able to just create a vector of thread pools, and then launch the iterative deepening function for each thread. Should work. Then I'd only need to think of a way how to pick the best move out of the ones coming from that bunch of threads. Or should I go into iterative deepening and then launch a bunch of alpha-beta functions? I don't know yet. I'll read up about this after Rustic hits 3000 Elo. In another 40 years or so.My reason for this is that I wanted to make sure multithreading was out of the way before adding too many new features. SMP would have been a mess to implement if I had started writing Loki as a single-threaded engine and then changed my mind later on.
Hey, let's calculate the best move 12 times. Wheeee!!And besides, as soon as I get around to creating a hash-table, Lazy SMP will basically be implemented automaticallyBut you're right, at the current moment, Loki's multithreaded abilities are nothing impressive
![]()

Thanks for the props. However, you need two very special super-powers to be as patient as I am:Regarding LMR and LMP, I tend to agree with you. My reason for adding it would be that I experienced a (IIRC) 150 elo gain previously, but that might as well have been the deeper search covering up for other parts of the old Loki lacking behind... I must say, I admire (and try to learn from) your patience as a developer
1. I am massively lazy. (As I write a chess engine to play chess FOR me instead of ME playing it myself.)
2. I am extremely adverse having to go back to code I wrote because it doesn't work right. That is one thing I absolutely HATE. The only reasons I go back to rewrite code are:
- To make it faster
- My requirements have changed so I need to change the code
- It has to change because of features that are being added.
Thus I research everything into great detail, then write a first version that works, and then I try to make it as fast and as clean as possible before I move on, so I never have to touch it again (unless it's because of one of the reasons mentioned above).
Possibly... Probably. Not much to do about that, except becoming more lazy... eh... patient. Yeah. Patient. I should swap "lazy" for "patient" in my CV. Maybe we should even rename "Lazy SMP" to "Effortless SMP".I really had (perhaps I still do, but sure hope I don't) a tendency to get carried away with all the new, fancy methods to speed up a search, and make the eval better, which meant I never really took the time to study the ones I had in detail, and I think that has reflected itself in previous versions of Loki's performance![]()
First I'll have to build that computer. The parts are all in (and most were hell to get for a decent price), and now they're sitting against the wall in the box. I'm very... patient... with regard to building this computer because I watched some (... 140 or so...) YouTube video's about people building computers because I wanted to research this topic in great detail. (Some would possibly call it procrastination, but IMHO, they're just HASTY.)Thank you very much, and good luck to you with your re-started development of Rustic. I look forward to reading more of your blog posts![]()
I don't like building computers anymore, even though it costs a LOT less work and cable management than it used to. Basically, if you don't require 10-20-30 TB of data storage, everything is on the motherboard these days: CPU, RAM, 4 SSD's, and all peripherals. I intend to use this computer for close to a decade, except for MAYBE the graphics card (RX 6750 XT) if I ever run across a game that doesn't run it. If I had the money (and they didn't come with a 350w+ TDP), I would have built myself a 32-core Xeon with 128 GB RAM and an RX 6950 XT so I could use this computer for even longer.That sounds like a fun weekend project.
The one requirement for this computer was: as many of the same cores as possible. None of that Efficiency-core stuff, because it'll just skew the results (or they have to be disabled). No X3D CPU either, because these CPU's have TWO die chiplets, but only ONE of them has the extra cache, which will also skew the results. As said: I looked into Xeon's and Epycs to see if I could get 24 or more cores, but they would have been too slow per core, and would draw too much power. I'm going to run the 7950X in Eco-mode, which will basically tamp down on its boost clock (which I'll possibly even disable), and put it's power draw in line with the 5950X. The base clock of the 7950X is almost as fast as the all-core boost of the 5950X; that should be enough.I share your experience with an old machine. All my work on Loki before the refactor was done on my school computer, and I remember the long wait times. Now that I have a 5900X on a gaming setup I built a year or so ago (even though I rarely play computer games anymore), the tests are much faster.
If left unchecked, the 7950X will just keep boosting higher and higher until it hits 95C, and then it throttles. (So, better cooler = better performance.) That will make clock speeds unstable and thus skew testing.
I hope to have built and installed this system (with Debian 12 Bookworm) on Sunday. If so I'll post about it in my own thread. (Or probably, in the Ramblings section of the book and link to it.)
-
- Posts: 253
- Joined: Mon Aug 26, 2019 4:34 pm
- Location: Clearwater, Florida USA
- Full name: JoAnn Peeler
Re: Progress on Loki
I sped up the Texel local search by 16X using "mini-batches" and also randomly adjusting parameters. I've dubbed it Stochastic Texel with Mini-Batchesniel5946 wrote: ↑Wed May 05, 2021 10:25 am Although this local optimization is easiest to implement and works in a lot of cases, there are two things you should be aware of:Stochastic gradient descent will usually solve both of these problems:
- As the name suggests, it is local. This means that the objective function can very well (and often does) end up settling down in a local minimum. This is still an improvement, but it could be better.
- The time it takes is unneccesarily long. As you know, local optimization measures the objective function for every parameter, in every iteration, which is very time consuming. This is especially true when using larger data-sets.
- It uses random perturbations, which are usually larger than 1, in order to explore the space while also measuring the objective function. This has the effect that it usually finds either global minima or very strong local ones. When it does find a minimum, it won't be able to recognize it itself like local optimization does, but this isn't a problem since it'll just hover around that minimum.
- It only measures the objective function twice in every iteration, which means that tuning 6000 parameters will take the same time as tuning only 1. This results in a very significant speedup. It does lose some accuracy though, since a perturbation of one parameter will affect another one, but this usually isn't a very big issue.

My primary speed-up was recognizing that, as you say, " local optimization measures the objective function for every parameter, in every iteration, which is very time consuming", and instead only selecting a small sample of the data to evaluate whether a parameter should be increased or decreased. I only execute the full objective function once every pass so I can keep a precise control of where my error is. Since I only use a small sample to determine direction 100-200K, I can easily run my tuner for well over 8 million positions and it will complete in about 30 minutes. The mini-batch technique does introduce some noise into the convergence so I had to change its exit condition so that it only exits if it can't reduce errors after at least three tries. Since every pass I use a different mini-batch, all of the data eventually affects the direction of the parameters. When I've used all of the mini-batches I just go through them again and repeat until I have convergence. So far this works perfectly.
The "stochastic" change I mentioned is that I no longer adjust parameters in order. Instead, I go through them randomly because sometimes adjusting one parameter affects how the next parameter can be adjusted. Because I go through the parameters in a different order on every pass, I have yet to stall in a local minimum and I have always converged on a minimum within my chosen error margin.
Although I think Gradient Descent is superior, I have yet to understand the math or how to actually implement it. My changes to the Texel local search put off my need for GD a while longer.
-
- Posts: 174
- Joined: Thu Nov 26, 2020 10:06 am
- Full name: Niels Abildskov
Re: Progress on Loki
Move ordering and transposition table done
Hi everyone
I have just finished an SPRT test of Loki's new transposition table, and the results are looking quite good!
This places Loki's elo at a very tentative 1913 +/- 188 (all changes's elo gain and uncertainty summed up), so I will run a gauntlet with engines ranging from 1700 to 2300 to test this.
Apart from TT (+ tt move ordering), I have been implementing the move ordering methods I listed in my last post. These consist of MVV-LVA, killer moves and history heuristic. Re-implementing them felt rather rewarding since I got to incrementally test the way Loki 3.5.0 did it, and as a result improve upon them.
This means my plan is now:
Removing PVS only lost ~35 elo, but I think it will provide a bigger strength increase now that I have a transposition table. PVS will probably be the next thing I implement.

If you're going to use Lazy SMP, most engines seem to just have a designated main-thread that outputs the PV and bestmove info. Hopefully all threads will be more or less in agreement regarding the best moves.
I think you should launch each thread with its own iterative deepening loop. Having to re-launch all worker threads for each iteration seems a little too much, and it complicates the depth-perturbations that (as far as i know) are essential to Lazy SMP.
) as being some of the most enjoyable things to work on in regards to Loki.
My plan is to implement a tuning and data-generation GUI because I want to both make tuning easy in the future, but also learn more about .NET MAUI. I don't know how advanced it will be but I'm looking forward to giving it a shot
Hi everyone

I have just finished an SPRT test of Loki's new transposition table, and the results are looking quite good!
Code: Select all
Score of Loki DEV vs Loki DEV baseline: 83 - 24 - 69 [0.668] 176
... Loki DEV playing White: 50 - 9 - 30 [0.730] 89
... Loki DEV playing Black: 33 - 15 - 39 [0.603] 87
... White vs Black: 65 - 42 - 69 [0.565] 176
Elo difference: 121.2 +/- 40.9, LOS: 100.0 %, DrawRatio: 39.2 %
SPRT: llr 2.98 (101.3%), lbound -2.94, ubound 2.94 - H1 was accepted
Apart from TT (+ tt move ordering), I have been implementing the move ordering methods I listed in my last post. These consist of MVV-LVA, killer moves and history heuristic. Re-implementing them felt rather rewarding since I got to incrementally test the way Loki 3.5.0 did it, and as a result improve upon them.
This means my plan is now:
- Try to squeeze a little more elo out of the TT. Perhaps I will look into only storing the first 16 bits of the hashkey, so an entry fits in a cacheline. I would also like to experiment with different amounts of buckets and replacement strategies.
- Make the search better with PVS, null move pruning and check extensions. My current architecture is built around implementing staged move generation some day, so I might try that (on the surface, it seems pretty easy compared to how I did it in the old Loki)
- Improve the evaluation. This will consist of tapered eval with hand-crafted parameters, implementing a tuner and perhaps testing some new terms like pawn-structure, king safety etc..
I did just that, and it seems to have offered some elo points (although I tested this and a repetition-detection bug-fix, so I don't know the exact elo gain). Soon I am going to write/tune my own evaluation parameters, but it was probably a good idea to give Loki some parameters that were properly tested as a starting pointmvanthoor wrote: ↑Fri Mar 31, 2023 9:05 pm You could disable PVS in your version, and then use Rustic 1's evaluation tables. They are written from white's point of view, and then use a flip table for the black counterpart tables (like most engines seem to do). I wrote these tables myself three years ago and they performed very well. (They also play a lot in my own style, which makes playing against Rustic Alpha 1 and 2 very weird for me. I can defeat it because I can often predict what it's going to do![]()
)

Removing PVS only lost ~35 elo, but I think it will provide a bigger strength increase now that I have a transposition table. PVS will probably be the next thing I implement.
You could say Loki's multithreading is also just a vector of threads in essence. I have a single main_thread (inherits from search_thread) object which manages a vector of search_thread's internally. I just like the feeling of having a wrapper-class so all I have to do to launch a search is to make a single function call and provide some limits. Modularity and splitting of responsibility is the best!mvanthoor wrote: ↑Fri Mar 31, 2023 9:05 pm In Rustic, I should be able to just create a vector of thread pools, and then launch the iterative deepening function for each thread. Should work. Then I'd only need to think of a way how to pick the best move out of the ones coming from that bunch of threads. Or should I go into iterative deepening and then launch a bunch of alpha-beta functions? I don't know yet. I'll read up about this after Rustic hits 3000 Elo. In another 40 years or so.

If you're going to use Lazy SMP, most engines seem to just have a designated main-thread that outputs the PV and bestmove info. Hopefully all threads will be more or less in agreement regarding the best moves.
I think you should launch each thread with its own iterative deepening loop. Having to re-launch all worker threads for each iteration seems a little too much, and it complicates the depth-perturbations that (as far as i know) are essential to Lazy SMP.
That sounds like an interesting idea! I have not come around to tuning just yet in the refactor, but I plan on doing it within these next few weeks. I will be sure to try out this method. I remember the tuning and data-generation (even though i sucked at the latterJoAnnP38 wrote: ↑Fri Mar 31, 2023 9:22 pm My primary speed-up was recognizing that, as you say, " local optimization measures the objective function for every parameter, in every iteration, which is very time consuming", and instead only selecting a small sample of the data to evaluate whether a parameter should be increased or decreased. I only execute the full objective function once every pass so I can keep a precise control of where my error is. Since I only use a small sample to determine direction 100-200K, I can easily run my tuner for well over 8 million positions and it will complete in about 30 minutes. The mini-batch technique does introduce some noise into the convergence so I had to change its exit condition so that it only exits if it can't reduce errors after at least three tries. Since every pass I use a different mini-batch, all of the data eventually affects the direction of the parameters. When I've used all of the mini-batches I just go through them again and repeat until I have convergence. So far this works perfectly.

My plan is to implement a tuning and data-generation GUI because I want to both make tuning easy in the future, but also learn more about .NET MAUI. I don't know how advanced it will be but I'm looking forward to giving it a shot
