Progress on Loki

tcusr · Post by **tcusr** » Fri Apr 15, 2022 12:39 pm

niel5946 wrote: ↑Fri Apr 15, 2022 11:27 am
tcusr wrote: ↑Wed Apr 13, 2022 4:26 pm i've been browsing your code quickly and the only advice i have for you is try not to use a C++ feature just because you can, for example here a simple ternary operator is enough. also the extensive use of smart pointers is a bit weird (at least for me, if i'm not missing anything).
but nonetheless the code looks much better, good luck.
I appreciate your comments
I agree that the lambda is a little overkill when compared to the ternary operator. However, I don't think that an extensive use of smart pointers are bad or weird. This is primarily for the reason that it's more modern, and less bug-prone. With normal heap-allocated memory, you have to always keep track of which things you have allocated and which you have freed. I have previously experienced a lot of bugs from either calling free() prematurely or never, causing a memory leak.
Additionally, smart pointers have nearly no overhead. Of course std::shared_ptr does, but I intend to pass it by reference, which will not make it increment the reference count.

that's the perfect use of smart pointers, i was mainly referring to why use heap memory in the first place. but that's a matter of style i suppose and shouldn't affect performance at all.

tcusr · Post by **tcusr** » Fri Apr 15, 2022 12:47 pm

dangi12012 wrote: ↑Fri Apr 15, 2022 12:12 pm
niel5946 wrote: ↑Fri Apr 15, 2022 11:27 am I agree that the lambda is a little overkill when compared to the ternary operator. However, I don't think that an extensive use of smart pointers are bad or weird. This is primarily for the reason that it's more modern, and less bug-prone. With normal heap-allocated memory, you have to always keep track of which things you have allocated and which you have freed. I have previously experienced a lot of bugs from either calling free() prematurely or never, causing a memory leak.
https://github.com/BimmerBass/Loki/blob ... ndex.h#L35

I think the questioner misunderstood what this line of code does. It is not a lambda - it is a compiletime (constexpr) constant initialized by a lambda.
This could be done with a constexpr function but this is much cleaner so your compiletime initializer is anonymous.
The runtime cost is ZERO and its the same as writing two classes - one with const 4096 and one with const 512.

no no, that's exactly what i meant. one should prefer a simple solution solution to one more complex, principle of least power.
but don't get me wrong i use immediately called lambdas in my code too to initialize constexpr arrays, but that's different than just assigning a number

niel5946 · Post by **niel5946** » Mon Mar 27, 2023 4:14 pm

Update on the refactoring
Hello everyone

I know it has been some time since I announced my big refactoring of Loki, but life got in the way, so I - sadly - didn't get to work as much on it as I had initially hoped.
With that being said, I have just as of today finished implementing the basic search function and evaluation, as well as determined an approximate rating. The current feature list is rather short but I have implemented the following:

Search
- Iterative deepening (a bit too early IMO considering I don't have a transposition table yet)
- Fail-hard principal variation, alpha-beta search.
- Quiescence search
Evaluation
- Material. I just used the classic Pawn = 1, Knight = Bishop = 3, Rook = 5 and Queen = 9 values, converted to units of 1 / 512'th of a pawn.
- Piece-square tables. The values of these tables are taken as averages from the old Loki tables. This means that a square = (oldLoki.mg + oldLoki.eg / 2) / 2, and converted to the new units, since I haven't implemented real evaluation tapering yet.

There is currently no move ordering whatsoever.

There were two bugs from implementing the move-generator, I discovered while doing my initial play-testing:

make_move() never reset the fifty-move counter, even though a capture or pawn-move had been performed, which led to erroneous draw-scores.
move_generator was held by position as a std::shared_ptr<move_generator>, but it also held a std::shared_ptr<position> itself, which resulted in a position object never being freed after it had been allocated.. This was solved by just having a raw position pointer in the move_generator instead of a smart pointer.

When implementing the evaluation function, I remembered all the trouble I had had earlier when implementing different tuning algorithms. So I (for once

) thought ahead and made all the values of the different evaluation terms hot-swappable - or that is at least my plan. All my evaluation parameters are stored in an abstract evaluation_params object whose deriving classes are required to implement an initialization function. Right now, I have only implemented a hard-coded class, but my hope is to some-day be able to load the evaluation parameters from something like a JSON or XML file. This will allow me to specify which parameters i want tuned (by an external tuner - I'll have to come up with some sort of protocol with that...) during runtime instead of having to either tune ALL parameters, or re-compiling a binary with the ones I want.
It should be noted that no initialization/parameter-reading is done during a search. It is always done beforehand, so the overhead of having the parameters as objects instead of inlined constexpr variables is minimal (at least I think so..?)

I tested the new Loki against Sargon (1280 elo on CCRL), and got the following results:

Code: Select all

Score of Loki 4.0 vs Sargon Engine: 78 - 34 - 161  [0.581] 273
...      Loki 4.0 playing White: 23 - 22 - 91  [0.504] 136
...      Loki 4.0 playing Black: 55 - 12 - 70  [0.657] 137
...      White vs Black: 35 - 77 - 161  [0.423] 273
Elo difference: 56.5 +/- 26.2, LOS: 100.0 %, DrawRatio: 59.0 %
SPRT: llr 3 (101.8%), lbound -2.94, ubound 2.94 - H1 was accepted

Which, if I understand correctly, means that the current Loki has an estimated rating of 1337 +/- 10 elo. Is this a correct assumption? One thing that stands out to me is that Loki had a majority of its wins playing with black, even though each engine played all openings from both sides. This makes me think there is a bug somewhere that makes Loki weaker when playing as white.
Lastly, I don't really know what I should expect given the current features. Is a rating of 1330-1340 acceptable or is Loki supposed to be a lot stronger already? I will add that the speed of the refactored codebase is around 30% faster than Loki 3.5.0 (as measured in perft and tentatively in the search function)

niel5946 · Post by **niel5946** » Thu Mar 30, 2023 6:45 pm

SMP and UCI-options done!
Hello again everyone!
I have just finished implementing the new Lazy SMP function in Loki, and I am quite pleased with the result to say the least

Of course, I can't say anything about the playing strength increase from multithreading, since it is too early for testing considering I do not have a hash table yet. But I like the way it is implemented, and that I was able to avoid using global variables or simple lists of std::thread's (having a look at Stockfish's great implementation of course helped, but I would say mine is original considering all the sorrounding code

).
Besides the multithreading, I have implemented an options_manager class that registers options and their callbacks to make new option implementations a walk in the park to introduce. Now, some of you may have checked out the source code of the refactor and wondered why I have chosen to make the engine more complex and object-oriented, and my reasons are two-fold:

I hate global variables and - to a moderate extent - free functions with a passion! I think they clutter the codebase, and make the application state, while running, unpredictable and unmanageable. Therefore I have been insistent on removing all of them, wrapping everything in classes, and have one single "admin" class that is, in essence, an entirely independent instance of Loki. In addition, I want to be able to write unit-tests, whose worst enemy is a global application state.
This is more of a personal reason, but after having looked at dozens of engines's source code, I wanted to design Loki my own way. Many other engines share the same project structure, control flow, and overall have a very similar architecture. I don't necessarily think that's bad, but I do believe it is a very C-like way of coding, which I personally do not enjoy, so Loki will be smothered in classes and smart pointers

Going forward I am determined to develop Loki in a more methodical and controlled fashion, with plenty of SPRT testing for every single feature, I introduce. In order to save some time, I am thinking a time-control of 5s+0.1s when testing between two different commit versions of Loki. What time control do other authors use when testing incremental changes?
My plan for adding features to Loki looks something like the following:

Simple move ordering: The refactored version of Loki currently has no move ordering at all, so I think this is the most natural starting point for some quick strength gains.
- MVV-LVA: I would like to experiment with putting losing captures last in the move list, but IIRC Loki lost elo last time I did that, so I might wait until my move ordering is more precise.
- Killer moves
- History heuristic
- PV-move and hash-move - the latter when a transposition table is implemented.
Transposition Table: This one is obvious Even though a hash table is hard to get right, I want to implement it early on since iterative deepening and - especially - Lazy SMP is so reliant on it. As part of this I would like to experiment with different amounts of buckets and replacement strategies.
Search. I think I'm going to be rather conservative with the features I add in the search function since I 1) don't want it to become unnecessarily bloated, and 2) have burnt myself earlier by adding a lot of exotic pruning/reduction methods too hastily without proper testing.
- Null move pruning
- Check extensions
- Late move reductions. This one is not certain, since I might wait until my evaluation is better.
- Staged move generation. I'm not sure about this, since the last time I tried, it was a mess. But it might be worth a shot now
Evaluation. Hopefully the previous points will have collected enough positions from playing that I can implement some sort of self-learning capability for Loki (otherwise I'll just use the 3000+ engines's games for the time being).
- Tapered evaluation.
- (perhaps)Terms like pawn-structure, mobility and king safety, but I haven't fully decided yet.
- Evaluation tuning. This will also be when I write code for loading the evaluation from a file. I think my tuner will be based on Andrew Grant's paper.

I don't know if Loki will be stronger than its previous release (3.5.0) after I have finished testing and implementing the above methods, so I can't say for certain whether or not I will release version 4.0 then.
All in all, I would say the progress is promising

mvanthoor · Post by **mvanthoor** » Thu Mar 30, 2023 9:48 pm

niel5946 wrote: ↑Mon Mar 27, 2023 4:14 pm I know it has been some time since I announced my big refactoring of Loki, but life got in the way, so I - sadly - didn't get to work as much on it as I had initially hoped.

Welcome back. Same here: I've been away for over a year myself. Darn life.

I tested the new Loki against Sargon (1280 elo on CCRL), and got the following results:
Code: Select all
Score of Loki 4.0 vs Sargon Engine: 78 - 34 - 161  [0.581] 273
...      Loki 4.0 playing White: 23 - 22 - 91  [0.504] 136
...      Loki 4.0 playing Black: 55 - 12 - 70  [0.657] 137
...      White vs Black: 35 - 77 - 161  [0.423] 273
Elo difference: 56.5 +/- 26.2, LOS: 100.0 %, DrawRatio: 59.0 %
SPRT: llr 3 (101.8%), lbound -2.94, ubound 2.94 - H1 was accepted
Which, if I understand correctly, means that the current Loki has an estimated rating of 1337 +/- 10 elo.

The result means that Loki is 56 Elo stronger than Sargon, with a margin of 26 Elo. So the difference is between +30 and +82 Elo for Loki.

Is this a correct assumption? One thing that stands out to me is that Loki had a majority of its wins playing with black, even though each engine played all openings from both sides. This makes me think there is a bug somewhere that makes Loki weaker when playing as white.

You should certainly investigate it if the difference between white and black scores is massive. White will statistically score a bit better because of the first move advantage. When you play your engine against itself, that would probably mean white scoring something like 54-56%.

Lastly, I don't really know what I should expect given the current features. Is a rating of 1330-1340 acceptable or is Loki supposed to be a lot stronger already? I will add that the speed of the refactored codebase is around 30% faster than Loki 3.5.0 (as measured in perft and tentatively in the search function)

You have only a move generator, alpha-beta, no move ordering and PVS, right?

Rustic Alpha 1 has a move generator, alpha-beta, MVV-LVA move ordering, and no PVS. Therefore I have made a version where I disabled all move ordering, and I'm running a 600 game test against Sargon 1.00 and 1.01 (I suspect you used 1.00 because that one has a rating on CCRL, but I wanted to see the playing strength difference between the two versions as well).

Compiling without move ordering:

Code: Select all

(/home/marcel/Code/rustic)
warning: associated function `add_score` is never used
   --> src/movegen/defs.rs:137:12
    |
137 |     pub fn add_score(&mut self, value: u8) {
    |            ^^^^^^^^^
    |
    = note: `#[warn(dead_code)]` on by default

warning: associated function `get_mut_move` is never used
  --> src/movegen/movelist.rs:79:12
   |
79 |     pub fn get_mut_move(&mut self, index: u8) -> &mut Move {
   |            ^^^^^^^^^^^^

warning: associated function `swap` is never used
  --> src/movegen/movelist.rs:83:12
   |
83 |     pub fn swap(&mut self, a: usize, b: usize) {
   |            ^^^^

warning: constant `MVV_LVA` is never used
  --> src/search/sorting.rs:30:11
   |
30 | pub const MVV_LVA: [[u8; NrOf::PIECE_TYPES + 1]; NrOf::PIECE_TYPES + 1] = [
   |           ^^^^^^^

warning: associated function `score_moves` is never used
  --> src/search/sorting.rs:41:12
   |
41 |     pub fn score_moves(ml: &mut MoveList) {
   |            ^^^^^^^^^^^

warning: associated function `pick_move` is never used
  --> src/search/sorting.rs:49:12
   |
49 |     pub fn pick_move(ml: &mut MoveList, index: u8) {
   |            ^^^^^^^^^

warning: `rustic` (bin "rustic") generated 6 warnings
    Finished release [optimized] target(s) in 16.32s

Loki has PVS, while Rustic Alpha 1 does not. Apart from that, the features are the same as far as I can see. When I added PVS, that feature gained 54 Elo, but that version of Rustic already had a TT (which gained about 150 Elo), and PVS takes advantage of that. I have never tested PVS without a TT, so I don't know how much Elo that feature would bring in this case.

Here is a very early preliminary result. I'll post the final result when the test finishes:

Code: Select all

   0 rustic-1.3-no-sort             79      86      45   61.1%   33.3% 
   1 sargon-engine-1.00            -61     116      23   41.3%   39.1% 
   2 sargon-engine-1.01            -97     134      22   36.4%   27.3% 

45 of 600 games finished.

So at this point, Rustic Alpha 1 (the .3 are just maintenance updates such as newer library versions: no code changes) without move sorting and without PVS is 140 Elo stronger than sargon-engine-1.00, while Sargon 1.00 is 36 Elo stronger than 1.01. (Take into account that the error bars are still massive at 86 Elo for Rustic, and over 100 for the Sargon versions.)

Re @ previous post above:

First: Most people use something like 10s+0.1s in their gauntlets and SPRT testing.

Second: I'm of the opinion you 're implementing features in the wrong order. How did you implement Lazy SMP without a TT? AFAIK, the key point of Lazy SMP is to rely on the TT for inter-thread communication. Also, you shouldn't be looking at functions such as LMR and LMP. It's easily possible to reach 2100+ Elo on CCRL with just the basics: alpha-beta, check extension, MVV-LVA, TT + TT Move Ordering, PVS (Rustic Alpha 3 feature set: 1920 Elo), and a tapered and tuned evaluation should add another 250 points or so on top of that. I expect Rustic 4 to hit at least 2150, or I'll be disappointed.

Good luck with the rest of the refactoring.

In my case: I've finally decided on a new system and I've got a 7950X languishing away in its box under the desk. This weekend I HAVE to get around to building that new system, because waiting for this test to finish is already irritating me. In the beginning it wasn't that big of a problem, but now that I'm at the point where Rustic "just needs some more features" to become stronger, I want to be able to test (and later, tune) the engine faster. The 6700K finally has to go after 7 years of service.

mvanthoor · Post by **mvanthoor** » Fri Mar 31, 2023 7:30 am

This morning the gauntlet was at 556 games out of 600. I didn't feel like leaving it to run when I went to work and then have the computer idle for 6 hours or so, so I canceled it. This was the final standings at the point of cancelation:

Code: Select all

Rank Name                          Elo     +/-   Games   Score    Draw 
   0 rustic-1.3-no-sort             95      23     556   63.3%   36.0% 
   1 sargon-engine-1.00            -89      29     279   37.5%   49.1% 
   2 sargon-engine-1.01           -101      37     277   35.9%   22.7% 

556 of 600 games finished.

It seems Rustic Alpha 1 without move ordering and without PVS is 184 +/-23 Elo stronger than Sargon 1.0.0. This would put it at around 1460 in the CCRL blitz list. To be entirely sure however, I'd need to run this version in a gauntlet against other engines (but I'm not going to). MAYBE I'll retroactively release an Alpha 0.5 so CCRL can test this and there's an even lower rated version of Rustic available. I will run this version against the normal Alpha 1 so I can do an SPRT-test to see how much Elo move ordering brings. (Seems to be about 240 Elo, because Alpha 1 is at 1700 in the CCRL list.)

niel5946 · Post by **niel5946** » Fri Mar 31, 2023 8:20 pm

mvanthoor wrote: ↑Thu Mar 30, 2023 9:48 pm You have only a move generator, alpha-beta, no move ordering and PVS, right?

That is correct

Apart from that my eval has material and PSQT's.

mvanthoor wrote: ↑Thu Mar 30, 2023 9:48 pm You should certainly investigate it if the difference between white and black scores is massive. White will statistically score a bit better because of the first move advantage. When you play your engine against itself, that would probably mean white scoring something like 54-56%.

As you can see in the match results, Loki 4.0 (which is Loki DEV really, but anyway) won a lot more games playing black than when playing white. But I think the issue is solved. I noticed that I had made what can best be described as a brain-fart while implementing the simple time-management scheme; the return value of the end-time method was based on the side to move of the current position and not the side to move of the root position, which of course led to losing a lot of games on time... why this had a bigger influence while playing as white as opposed to black might have something to do with the average amount of plies/nodes Loki DEV searched, and therefore when it would check.
I just ran two SPRT tests with the fixed DEV-version; One with 5s+0.1s tc and one with 10s+0.1s:

Code: Select all

For 5s+0.1s:

	Score of Loki DEV vs Sargon 1978 1.01: 111 - 53 - 136  [0.597] 300
	...      Loki DEV playing White: 45 - 28 - 77  [0.557] 150
	...      Loki DEV playing Black: 66 - 25 - 59  [0.637] 150
	...      White vs Black: 70 - 94 - 136  [0.460] 300
	Elo difference: 68.0 +/- 29.2, LOS: 100.0 %, DrawRatio: 45.3 %
	SPRT: llr 2.95 (100.3%), lbound -2.94, ubound 2.94 - H1 was accepted

For 10s+0.1s:

	Score of Loki DEV vs Sargon 1978 1.01: 109 - 46 - 100  [0.624] 255
	...      Loki DEV playing White: 64 - 15 - 48  [0.693] 127
	...      Loki DEV playing Black: 45 - 31 - 52  [0.555] 128
	...      White vs Black: 95 - 60 - 100  [0.569] 255
	Elo difference: 87.7 +/- 33.7, LOS: 100.0 %, DrawRatio: 39.2 %
	SPRT: llr 2.96 (100.7%), lbound -2.94, ubound 2.94 - H1 was accepted
	
Estimated elo of Loki DEV:
	5s+0.1s: 1344 +/- 29
	10s+0.1s: 1364 +/- 34

If Rustic Alpha 1 without move sorting is 184 +/- 23 elo stronger than Sargon, I feel rather confident in my development version of Loki at the moment (but I think I'll have another look at the search just in case

). This is because my evaluation is probably very badly configured. My piece values are based on the usual 100cp for pawns, 300cp for knights/bishops, 500cp for rooks and 900cp for queens, while my piece-square tables are a relic from Loki 3.5.0, which has two problems:

The piece-square tables are tuned to another set of piece-values, which probably distorts them quite a lot.
The value of a given square right now (since I don't have tapered eval) is an interpolation of Loki 3.5.0's middlegame and endgame values. It is computed as (sq.mgScore + sq.egScore / 2) / 2, and then corrected for the new score granularity of Loki DEV. Now that I think about it, I should probably take a look at just giving my PSQT's pretty basic values and then tuning them later...

I might try testing a version of Loki DEV without PVS against the current one to see if my implementation is correct..

mvanthoor wrote: ↑Thu Mar 30, 2023 9:48 pm First: Most people use something like 10s+0.1s in their gauntlets and SPRT testing.

That seems like a reasonable time limit. As you can see from my tests earlier, there is also a ~20 elo difference between Loki playing on 5s+0.1s and 10s+0.1s.
I asked because it seems there is a big difference between the tc's egine developers use. I remember reading that some Stockfish tests used something like 1 second plus a little increment (not that I'm comparing Loki to SF... yet

), whereas I saw someone on this forum - a while back though - who was determined to keep testing with 5 minute games. I don't remember who it was.
I'm thinking that I will start off with a mixture of 5s+0.1s and 10s+0.1s, and if their scores keep fluctuating by ~20 elo or something, I will use the former time control going forward. I will then calibrate Loki during my development by running gauntlets with longer TC, which will also hopefully reveal any problems there might be with my SPRT TC..

mvanthoor wrote: ↑Thu Mar 30, 2023 9:48 pm Second: I'm of the opinion you 're implementing features in the wrong order. How did you implement Lazy SMP without a TT? AFAIK, the key point of Lazy SMP is to rely on the TT for inter-thread communication. Also, you shouldn't be looking at functions such as LMR and LMP. It's easily possible to reach 2100+ Elo on CCRL with just the basics: alpha-beta, check extension, MVV-LVA, TT + TT Move Ordering, PVS (Rustic Alpha 3 feature set: 1920 Elo), and a tapered and tuned evaluation should add another 250 points or so on top of that. I expect Rustic 4 to hit at least 2150, or I'll be disappointed.

I think I might've called it Lazy SMP a bit too hastily, as I know it is based on a working TT. My point was that I had implemented the ability to perform searches on multiple threads simultaneously, which of course won't offer a strength gain in and of itself (since all threads will search the exact same nodes). My reason for this is that I wanted to make sure multithreading was out of the way before adding too many new features. SMP would have been a mess to implement if I had started writing Loki as a single-threaded engine and then changed my mind later on. And besides, as soon as I get around to creating a hash-table, Lazy SMP will basically be implemented automatically

But you're right, at the current moment, Loki's multithreaded abilities are nothing impressive

Regarding LMR and LMP, I tend to agree with you. My reason for adding it would be that I experienced a (IIRC) 150 elo gain previously, but that might as well have been the deeper search covering up for other parts of the old Loki lacking behind... I must say, I admire (and try to learn from) your patience as a developer

I really had (perhaps I still do, but sure hope I don't) a tendency to get carried away with all the new, fancy methods to speed up a search, and make the eval better, which meant I never really took the time to study the ones I had in detail, and I think that has reflected itself in previous versions of Loki's performance

mvanthoor wrote: ↑Thu Mar 30, 2023 9:48 pm Good luck with the rest of the refactoring.

Thank you very much, and good luck to you with your re-started development of Rustic. I look forward to reading more of your blog posts

mvanthoor wrote: ↑Thu Mar 30, 2023 9:48 pm In my case: I've finally decided on a new system and I've got a 7950X languishing away in its box under the desk. This weekend I HAVE to get around to building that new system, because waiting for this test to finish is already irritating me. In the beginning it wasn't that big of a problem, but now that I'm at the point where Rustic "just needs some more features" to become stronger, I want to be able to test (and later, tune) the engine faster. The 6700K finally has to go after 7 years of service.

That sounds like a fun weekend project. I share your experience with an old machine. All my work on Loki before the refactor was done on my school computer, and I remember the long wait times. Now that I have a 5900X on a gaming setup I built a year or so ago (even though I rarely play computer games anymore), the tests are much faster.

mvanthoor · Post by **mvanthoor** » Fri Mar 31, 2023 9:05 pm

niel5946 wrote: ↑Fri Mar 31, 2023 8:20 pm If Rustic Alpha 1 without move sorting is 184 +/- 23 elo stronger than Sargon, I feel rather confident in my development version of Loki at the moment (but I think I'll have another look at the search just in case ). This is because my evaluation is probably very badly configured.

You could disable PVS in your version, and then use Rustic 1's evaluation tables. They are written from white's point of view, and then use a flip table for the black counterpart tables (like most engines seem to do). I wrote these tables myself three years ago and they performed very well. (They also play a lot in my own style, which makes playing against Rustic Alpha 1 and 2 very weird for me. I can defeat it because I can often predict what it's going to do

)

That seems like a reasonable time limit. As you can see from my tests earlier, there is also a ~20 elo difference between Loki playing on 5s+0.1s and 10s+0.1s.
I asked because it seems there is a big difference between the tc's egine developers use.

I forgot if I used 10s+1s, or 10s+0.1s... I'll probably go with 10s+0.1s. On the new computer the NPS should be monstrous... On the old 6700K, Rustic runs perft at 32M positions per second, and that is including all incrementals such as zobrist and all evaluation scores. Somewhere this weekend I hope to see what the speed on the new computer will be.

I'm thinking that I will start off with a mixture of 5s+0.1s and 10s+0.1s, and if their scores keep fluctuating by ~20 elo or something, I will use the former time control going forward. I will then calibrate Loki during my development by running gauntlets with longer TC, which will also hopefully reveal any problems there might be with my SPRT TC..

Seems like a plan. I'll probably be running my gauntlets and SPRT testing at 1m+0.6s. I remember having used that time control in the past.

I think I might've called it Lazy SMP a bit too hastily, as I know it is based on a working TT. My point was that I had implemented the ability to perform searches on multiple threads simultaneously, which of course won't offer a strength gain in and of itself (since all threads will search the exact same nodes).

Ah, yes. That explains it. I implemented a (sloppy) multi-threaded perft in Rustic, which just split the first 20 legal moves into the number of threads; so 20 moves/thread on 1 thread, 10 moves/thread on 2 threads, 5 moves/thread on 4 threads. That's not really optimal. I'll think about this later. Probably, just to make tests run faster (in case I start to tinker with the move generator) I'll implement bulk counting.

My reason for this is that I wanted to make sure multithreading was out of the way before adding too many new features. SMP would have been a mess to implement if I had started writing Loki as a single-threaded engine and then changed my mind later on.

In Rustic, I should be able to just create a vector of thread pools, and then launch the iterative deepening function for each thread. Should work. Then I'd only need to think of a way how to pick the best move out of the ones coming from that bunch of threads. Or should I go into iterative deepening and then launch a bunch of alpha-beta functions? I don't know yet. I'll read up about this after Rustic hits 3000 Elo. In another 40 years or so.

And besides, as soon as I get around to creating a hash-table, Lazy SMP will basically be implemented automatically But you're right, at the current moment, Loki's multithreaded abilities are nothing impressive

Hey, let's calculate the best move 12 times. Wheeee!!

Regarding LMR and LMP, I tend to agree with you. My reason for adding it would be that I experienced a (IIRC) 150 elo gain previously, but that might as well have been the deeper search covering up for other parts of the old Loki lacking behind... I must say, I admire (and try to learn from) your patience as a developer

Thanks for the props. However, you need two very special super-powers to be as patient as I am:

1. I am massively lazy. (As I write a chess engine to play chess FOR me instead of ME playing it myself.)
2. I am extremely adverse having to go back to code I wrote because it doesn't work right. That is one thing I absolutely HATE. The only reasons I go back to rewrite code are:
- To make it faster
- My requirements have changed so I need to change the code
- It has to change because of features that are being added.

Thus I research everything into great detail, then write a first version that works, and then I try to make it as fast and as clean as possible before I move on, so I never have to touch it again (unless it's because of one of the reasons mentioned above).

I really had (perhaps I still do, but sure hope I don't) a tendency to get carried away with all the new, fancy methods to speed up a search, and make the eval better, which meant I never really took the time to study the ones I had in detail, and I think that has reflected itself in previous versions of Loki's performance

Possibly... Probably. Not much to do about that, except becoming more lazy... eh... patient. Yeah. Patient. I should swap "lazy" for "patient" in my CV. Maybe we should even rename "Lazy SMP" to "Effortless SMP".

Thank you very much, and good luck to you with your re-started development of Rustic. I look forward to reading more of your blog posts

First I'll have to build that computer. The parts are all in (and most were hell to get for a decent price), and now they're sitting against the wall in the box. I'm very... patient... with regard to building this computer because I watched some (... 140 or so...) YouTube video's about people building computers because I wanted to research this topic in great detail. (Some would possibly call it procrastination, but IMHO, they're just HASTY.)

That sounds like a fun weekend project.

I don't like building computers anymore, even though it costs a LOT less work and cable management than it used to. Basically, if you don't require 10-20-30 TB of data storage, everything is on the motherboard these days: CPU, RAM, 4 SSD's, and all peripherals. I intend to use this computer for close to a decade, except for MAYBE the graphics card (RX 6750 XT) if I ever run across a game that doesn't run it. If I had the money (and they didn't come with a 350w+ TDP), I would have built myself a 32-core Xeon with 128 GB RAM and an RX 6950 XT so I could use this computer for even longer.

I share your experience with an old machine. All my work on Loki before the refactor was done on my school computer, and I remember the long wait times. Now that I have a 5900X on a gaming setup I built a year or so ago (even though I rarely play computer games anymore), the tests are much faster.

The one requirement for this computer was: as many of the same cores as possible. None of that Efficiency-core stuff, because it'll just skew the results (or they have to be disabled). No X3D CPU either, because these CPU's have TWO die chiplets, but only ONE of them has the extra cache, which will also skew the results. As said: I looked into Xeon's and Epycs to see if I could get 24 or more cores, but they would have been too slow per core, and would draw too much power. I'm going to run the 7950X in Eco-mode, which will basically tamp down on its boost clock (which I'll possibly even disable), and put it's power draw in line with the 5950X. The base clock of the 7950X is almost as fast as the all-core boost of the 5950X; that should be enough.

If left unchecked, the 7950X will just keep boosting higher and higher until it hits 95C, and then it throttles. (So, better cooler = better performance.) That will make clock speeds unstable and thus skew testing.

I hope to have built and installed this system (with Debian 12 Bookworm) on Sunday. If so I'll post about it in my own thread. (Or probably, in the Ramblings section of the book and link to it.)

JoAnnP38 · Post by **JoAnnP38** » Fri Mar 31, 2023 9:22 pm

niel5946 wrote: ↑Wed May 05, 2021 10:25 am Although this local optimization is easiest to implement and works in a lot of cases, there are two things you should be aware of:

As the name suggests, it is local. This means that the objective function can very well (and often does) end up settling down in a local minimum. This is still an improvement, but it could be better.

The time it takes is unneccesarily long. As you know, local optimization measures the objective function for every parameter, in every iteration, which is very time consuming. This is especially true when using larger data-sets.

Stochastic gradient descent will usually solve both of these problems:

It uses random perturbations, which are usually larger than 1, in order to explore the space while also measuring the objective function. This has the effect that it usually finds either global minima or very strong local ones. When it does find a minimum, it won't be able to recognize it itself like local optimization does, but this isn't a problem since it'll just hover around that minimum.

It only measures the objective function twice in every iteration, which means that tuning 6000 parameters will take the same time as tuning only 1. This results in a very significant speedup. It does lose some accuracy though, since a perturbation of one parameter will affect another one, but this usually isn't a very big issue.

I sped up the Texel local search by 16X using "mini-batches" and also randomly adjusting parameters. I've dubbed it Stochastic Texel with Mini-Batches

My primary speed-up was recognizing that, as you say, " local optimization measures the objective function for every parameter, in every iteration, which is very time consuming", and instead only selecting a small sample of the data to evaluate whether a parameter should be increased or decreased. I only execute the full objective function once every pass so I can keep a precise control of where my error is. Since I only use a small sample to determine direction 100-200K, I can easily run my tuner for well over 8 million positions and it will complete in about 30 minutes. The mini-batch technique does introduce some noise into the convergence so I had to change its exit condition so that it only exits if it can't reduce errors after at least three tries. Since every pass I use a different mini-batch, all of the data eventually affects the direction of the parameters. When I've used all of the mini-batches I just go through them again and repeat until I have convergence. So far this works perfectly.

The "stochastic" change I mentioned is that I no longer adjust parameters in order. Instead, I go through them randomly because sometimes adjusting one parameter affects how the next parameter can be adjusted. Because I go through the parameters in a different order on every pass, I have yet to stall in a local minimum and I have always converged on a minimum within my chosen error margin.

Although I think Gradient Descent is superior, I have yet to understand the math or how to actually implement it. My changes to the Texel local search put off my need for GD a while longer.

niel5946 · Post by **niel5946** » Fri Apr 07, 2023 4:18 pm

Move ordering and transposition table done
Hi everyone

I have just finished an SPRT test of Loki's new transposition table, and the results are looking quite good!

Code: Select all

Score of Loki DEV vs Loki DEV baseline: 83 - 24 - 69  [0.668] 176
...      Loki DEV playing White: 50 - 9 - 30  [0.730] 89
...      Loki DEV playing Black: 33 - 15 - 39  [0.603] 87
...      White vs Black: 65 - 42 - 69  [0.565] 176
Elo difference: 121.2 +/- 40.9, LOS: 100.0 %, DrawRatio: 39.2 %
SPRT: llr 2.98 (101.3%), lbound -2.94, ubound 2.94 - H1 was accepted

This places Loki's elo at a very tentative 1913 +/- 188 (all changes's elo gain and uncertainty summed up), so I will run a gauntlet with engines ranging from 1700 to 2300 to test this.
Apart from TT (+ tt move ordering), I have been implementing the move ordering methods I listed in my last post. These consist of MVV-LVA, killer moves and history heuristic. Re-implementing them felt rather rewarding since I got to incrementally test the way Loki 3.5.0 did it, and as a result improve upon them.
This means my plan is now:

Try to squeeze a little more elo out of the TT. Perhaps I will look into only storing the first 16 bits of the hashkey, so an entry fits in a cacheline. I would also like to experiment with different amounts of buckets and replacement strategies.
Make the search better with PVS, null move pruning and check extensions. My current architecture is built around implementing staged move generation some day, so I might try that (on the surface, it seems pretty easy compared to how I did it in the old Loki)
Improve the evaluation. This will consist of tapered eval with hand-crafted parameters, implementing a tuner and perhaps testing some new terms like pawn-structure, king safety etc..

mvanthoor wrote: ↑Fri Mar 31, 2023 9:05 pm You could disable PVS in your version, and then use Rustic 1's evaluation tables. They are written from white's point of view, and then use a flip table for the black counterpart tables (like most engines seem to do). I wrote these tables myself three years ago and they performed very well. (They also play a lot in my own style, which makes playing against Rustic Alpha 1 and 2 very weird for me. I can defeat it because I can often predict what it's going to do )

I did just that, and it seems to have offered some elo points (although I tested this and a repetition-detection bug-fix, so I don't know the exact elo gain). Soon I am going to write/tune my own evaluation parameters, but it was probably a good idea to give Loki some parameters that were properly tested as a starting point

Removing PVS only lost ~35 elo, but I think it will provide a bigger strength increase now that I have a transposition table. PVS will probably be the next thing I implement.

mvanthoor wrote: ↑Fri Mar 31, 2023 9:05 pm In Rustic, I should be able to just create a vector of thread pools, and then launch the iterative deepening function for each thread. Should work. Then I'd only need to think of a way how to pick the best move out of the ones coming from that bunch of threads. Or should I go into iterative deepening and then launch a bunch of alpha-beta functions? I don't know yet. I'll read up about this after Rustic hits 3000 Elo. In another 40 years or so.

You could say Loki's multithreading is also just a vector of threads in essence. I have a single main_thread (inherits from search_thread) object which manages a vector of search_thread's internally. I just like the feeling of having a wrapper-class so all I have to do to launch a search is to make a single function call and provide some limits. Modularity and splitting of responsibility is the best!

If you're going to use Lazy SMP, most engines seem to just have a designated main-thread that outputs the PV and bestmove info. Hopefully all threads will be more or less in agreement regarding the best moves.
I think you should launch each thread with its own iterative deepening loop. Having to re-launch all worker threads for each iteration seems a little too much, and it complicates the depth-perturbations that (as far as i know) are essential to Lazy SMP.

JoAnnP38 wrote: ↑Fri Mar 31, 2023 9:22 pm My primary speed-up was recognizing that, as you say, " local optimization measures the objective function for every parameter, in every iteration, which is very time consuming", and instead only selecting a small sample of the data to evaluate whether a parameter should be increased or decreased. I only execute the full objective function once every pass so I can keep a precise control of where my error is. Since I only use a small sample to determine direction 100-200K, I can easily run my tuner for well over 8 million positions and it will complete in about 30 minutes. The mini-batch technique does introduce some noise into the convergence so I had to change its exit condition so that it only exits if it can't reduce errors after at least three tries. Since every pass I use a different mini-batch, all of the data eventually affects the direction of the parameters. When I've used all of the mini-batches I just go through them again and repeat until I have convergence. So far this works perfectly.

That sounds like an interesting idea! I have not come around to tuning just yet in the refactor, but I plan on doing it within these next few weeks. I will be sure to try out this method. I remember the tuning and data-generation (even though i sucked at the latter

) as being some of the most enjoyable things to work on in regards to Loki.
My plan is to implement a tuning and data-generation GUI because I want to both make tuning easy in the future, but also learn more about .NET MAUI. I don't know how advanced it will be but I'm looking forward to giving it a shot

Progress on Loki

Re: Progress on Loki

Re: Progress on Loki

Re: Progress on Loki

Re: Progress on Loki

Re: Progress on Loki

Re: Progress on Loki

Re: Progress on Loki

Re: Progress on Loki

Re: Progress on Loki

Re: Progress on Loki