Progress on Rustic

Mike Sherwin · Post by **Mike Sherwin** » Mon Jun 21, 2021 3:30 am

When someone successfully implements realtime learning it will be seen just how limiting pregame learning really is.

Given today's ipc, clock speeds and number of cores realtime learning is not only possible but preferable. Realtime learning can gather information from far beyond the horizon even to the end of the game. SF 13 on my system reaches 36 ply deep in the original position in 30 seconds. In 60 seconds it reaches 38 ply deep. In most positions that extra 2 ply means absolutely nothing.

What if the first 30 seconds was spent playing many thousands of games at lower ply searches with Reinforcement Learning and storing the positions of those games with their RL values in the hash table? The RL values would then be used to modify the scores of virtually all the positions in the hash either directly or indirectly with information far beyond the horizon. Why stay stuck in yesterday? It is time to create tomorrow!

Move ordering will benefit greatly and the second 30 seconds of normal search will reach even greater depths than the full 60 seconds of normal search. LMR will be more accurate. Many horizon oversights would be avoided by a more intelligently guided search. Etc.

mvanthoor · Post by **mvanthoor** » Mon Jun 28, 2021 11:06 pm

Tapered evaluation has been implemented, but I haven't written a tuner yet. For now, I'm using MinimalChess'0.4.5's PST's to get a feeling of Rustic's performance after tuning. My current test version is performing at 2100 - 2185 Elo, depending on the engine it's playing against.

Rustic dev vs:

Vice 1.1 (2045) +75
BBC 1.1 (2095) +85
BikJUmp 2.04 (2100) +10
Prophet3 (2115) -15
Mora 1.0 (2165) -55

Average TPR: 2124 Elo.

I expect Alpha 3.0.0 to perform somewhere around 1865 if/when CCRL tests it, so it would be an improvement of roughly ~260 Elo.

I also fixed a bug in ordering my killer moves, which I found by accident because I was looking through the code I used for Alpha 3. While running through the list of moves, and then looking if the current move is in the list of killer moves, I calculated the ordering position by using "i" (index of the move list) instead of "n" (location in the killer list). It basically makes the order and location of the killer moves random, because "i" and "n" have nothing to do with one another.

The currently running SPRT-test is showing around a +20 Elo improvement in self-play at this point.

I'm not yet sure if I'm going to release an Alpha 3.1.0 version for this.

mvanthoor · Post by **mvanthoor** » Mon Jun 28, 2021 11:32 pm

And in hindsight, it probably doesn't really matter where the killer moves are sorted, because they're always below the MMV-LVA moves, and I have no history heuristic yet. Suddenly the new version lost almost 20 games in row (take or leave a few draws and an occasional win), and the new version s now 2 points behind. I have a feeling this SPRT-test is going to run forever until it cancels without a conclusion.

Still I'm going to commit that code change, because when I'll implement history, that WOULD be impacted by this bug.

mvanthoor · Post by **mvanthoor** » Sun Aug 01, 2021 3:51 pm

A small update.

In the last few weeks I've been testing several setups of the transposition table so I could refactor out my crazy replacement scheme. This scheme replaced the last entry in a bucket that happened to have a lower search depth than the incoming data, instead of replacing the entry having the lowest depth.

This strange replacement scheme did work though (at least for a larger TT with more than 2 buckets, where the replacement scheme is almost irrelevant), so I tested with a 16 MB TT, with 1-4 entries in a bucket, and it does actually replace the entry having the lowest search depth. It turned out best to have 3 buckets. Both with 2 and 4 buckets, the engine loses a few Elo (think ~10 Elo +/- 5) when having a very small TT.

I'm also trying to remove some stuff from alpha-beta that shouldn't be necessary, such as saving the best move that raised alpha last, even if alpha is not improved, It should not gain anything in theory, but in Rustic, it gains 40 Elo. If I can't find out why, I'll just leave this in, because the rest of the construct converts alpha/beta to fail-soft instead of fail-high. (Basically, I'm doing a fail-soft, but then return alpha and beta, like fail-hard, instead of best_eval.)

This is currently testing.

I've also switched Rustic's development over to Linux (Debian Stable / upcoming Bullseye + KDE), and everything is going well, because I've been moving to open source software for about 15 years already, and I use MSYS2 (and before that, Cygwin) on Windows.

The only programs that are going to be hard to replace, because they're best in their class are Capture One (C1, impossible to replace), MP3Tag, and FooBar. I'll see how far I'll get with games using Lutris. (My newest game is The Witcher 3 from 2015, so I can hardly be called a gamer these days.) After a bit of tinkering with KDE, it's actually hard to notice that I'm not working in Windows... my workflow basically hasn't changed, and all the programs are the same or so similar that there is no practical difference.

This time, after almost 10 attempts of switching to Linux in about 2 decades, I might actually make it, because I've been able to split off my DGT-board from Windows using PicoChess, and because I don't game as much as I did 20 years ago. (Back then, I was big into the RPG revival.) And, obviously, because Linux has improved.

The two things I've notcied that are still the same are:

- If Lennart Poettering gets his hands onto a piece of software, it's going to be incredibly complex (PulseAudio, systemd, Avahi, etc)
- Gnome still breaks all its third party extensions on about every release.
- GTK-desktops don't give a shit with regard to integrating QT-apps (needs tinkering with Kvantum, etc) where GTK-apps work perfectly well out of the box in KDE.
- Integration of QT-flatpaks in GTK desktops either doesn't work, is broken, or isn't supported. I _can't_ get the Calibre (e-book editor) flatpak to actually theme in a dark theme under a GTK-desktop. It only works when installing from the repository. Fut for Calibre, I don't want that because it updates every week.
- If you need something outside of the repository and there isn't a DEB-file for it, you're stuffed.
- If you want to install the latest nVidia driver, you're stuffed too, because it doesn't do dependency checks. It just borks at several different stages in weird ways.
- If you want to install the nVidia driver and you have Secure Boot enabled (and set to "Windows" instead of "Ohters"), you're stuffed thrice over.

In short, if you stick to the repository, use KDE because it integrates GTK better than Gnome/Cinnamon integrate QT, use Flatpaks for fast-updating software, and don't try to do 'special' things with Poettering's parts of the software (so you don't have to get into its enormous complexity), everything just seems to work.

So, I might actually get to the point where I use Linux / open source software for everything, except for photo editing, because nothing out there compares to Capture One (and Eizo Color Navigator doesn't run on Linux), and maybe, for the occasional game.

Tearth · Post by **Tearth** » Sat Nov 13, 2021 4:49 pm

Excellent thread, I read all 31 pages - Rustic was one of the inspirations to start my own Rust-based engine Inanis a few months ago, so it will always have a special place in my heart.

However, when combining traits (interfaces / abstract classes), generics (templates) with threads and shared objects, the compiler has been punching me in the face for three weeks. ("This interface is not thread-safe because...", "Your object also needs Send+Sync", "You can't use X if you also use Y"....) Now I'm not quite sure anymore: I either knew less than I thought, or I just need more practice with Rust's way of dealing with these concepts.

I had similar struggles with borrow checker when trying to move search into a separate thread or make a shared hash table between threads, and one thing I learned: everything is solvable, it's just a matter of the proper amount of Arcs, UnsafeCells and a lot of manual pointer operations

Of course it's a bit against language's philosophy, but using Mutex is a big no-no, as you said earlier in the thread because of the performance.

mvanthoor · Post by **mvanthoor** » Sat Nov 13, 2021 8:07 pm

Tearth wrote: ↑Sat Nov 13, 2021 4:49 pm Excellent thread, I read all 31 pages - Rustic was one of the inspirations to start my own Rust-based engine Inanis a few months ago, so it will always have a special place in my heart.

Thanks

Impressive that you read the entire thread of ramblings.

I have already added Inanis to my list of engines to watch. Basically I'm watching most new engines that are written from scratch, but especially the ones in Rust.

I had similar struggles with borrow checker when trying to move search into a separate thread or make a shared hash table between threads, and one thing I learned: everything is solvable, it's just a matter of the proper amount of Arcs, UnsafeCells and a lot of manual pointer operations Of course it's a bit against language's philosophy, but using Mutex is a big no-no, as you said earlier in the thread because of the performance.

Well; I use Arc<Mutex<T>> for everything that is shared right now, because Rustic does not yet have a multi-threaded search. For sharing the TT between threads I may go unsafe, or not use a mutex, but for passing around UCI / XBoard commands between threads Arc<Mutex<T>> is perfectly fine.

With regard to performance: AFAIK, Rustic's perft is about 13% faster than that of Inanis: ~32 M leaves/sec vs. 28 M leaves/sec. I do not know how much incremental stuff Inanis is now keeping during move execution; maybe a lot more than Rustic is, so the numbers alone don't tell the entire story. Rustic is keeping PST's, game phase, material, and Zobrist keys incrementally.

Now that I've been writing in this thread anyway, I can just as well give a bit of an update. Rustic 4.0 (without the "Alpha" tag) is almost ready: I'm now finishing up the XBoard protocol, and the last thing left is to write the tuner for the tapered evaluation so I don't have to use other engine's PST tables.

This is a preliminary changelog:

Code: Select all

- New features:
  - Tapered and tuned evaluation.
  - Support for XBoard-protocol version 2 <sup>(1)</sup>.
- Improvements:
  - TT Clear function: properly clear TT, instead of recreating it.
  - Fix inaccuracy in TT replacement scheme. (+5 Elo for tiny TT's).
  - Fix inaccuracy in TT mate handling (+20 Elo).
  - Drop from 4 to 3 buckets for a bit more speed (+8 Elo).
  - Simplify time management (+30 Elo).
  - pick_move() speed improvement (+3 Elo).
- Refactor:
  - Restructured Comm to be in line with the rest of the modules.
  - Switch alpha/beta from a strange mix of fail-hard and fail-soft to
    fully fail-soft. No Elo improvement, but the code is cleaner and more
    readable.
  - Better privacy and namespacing for several modules.
  - Made "Entry" the TT index, containing "Buckets" instead of the other
    way around, to be more in line with other engines.
  - Renamed a lot of variables and functions for more consistency.
- Update:
  - "rand" crate to 0.8.4.
  - "rand_chacha" crate to 0.3.1.
  - "if_chain" crate to 1.0.2.

Estimated playing strength increase over version Alpha 3.0.0 is expected to be +300 Elo, for a CCRL Blitz rating of ~2160. (Against some engines, Rustic is performing up to 2210 Elo... but against others, it sits around 2130.) The engine gained about 250 Elo due to the addition of a tapered and tuned evaluation, and 50 Elo due to fixing inaccuracies and code improvements.

Rustic 4 still doesn't have any pruning, staged move generation, or evaluation terms. That stuff is planned for version 5 and up.

One of the things I've started doing during the implementation of the XBoard-protocol is to implement "Display" where appropriate, instead of a custom "to_string()" function. Before version 4 is done I'll try to do that throughout the engine to make it more Rust-idiomatic. I'm happy to get back to actually improving the engine itself after I finish XBoard and the tuner; and I probably need to refactor some more stuff into an EPD-reader to read EPD's from files instead of from an array... but when all that is done, the basis is completely done.

Oh, and for people who're interested in it: I've also gave Rustic's website / docs an update. Complete spell check, typo fixes, rewrite of weird sentences, and some more information. Probably tomorrow I'll add a big part about the communication protocols:

https://rustic-chess.org/

PS: I saw that you seem to have three hash tables: one for perft, one for search, and one for pawns. I spent A LOT of time to figure out how I could create a TT which would hold generic data. There may have been an even better way, but you could take a look at engine/transposition.rs to see what/how I did it. I can now create a TT that holds any <T> I want, so I can create a tt_search<SearchData>, but also tt_perft<PerftData> without duplicating code.

Tearth · Post by **Tearth** » Sat Nov 13, 2021 11:04 pm

Well; I use Arc<Mutex<T>> for everything that is shared right now, because Rustic does not yet have a multi-threaded search. For sharing the TT between threads I may go unsafe, or not use a mutex, but for passing around UCI / XBoard commands between threads Arc<Mutex<T>> is perfectly fine.

I think it's fine to use Mutex in non-critical routines like handling communication, the biggest problem I've spotted was when I tried to use it in perft hash table which must be shared between search threads - can't recall the exact values, but it was for sure at least several dozen percents of performance hit.

With regard to performance: AFAIK, Rustic's perft is about 13% faster than that of Inanis: ~32 M leaves/sec vs. 28 M leaves/sec. I do not know how much incremental stuff Inanis is now keeping during move execution; maybe a lot more than Rustic is, so the numbers alone don't tell the entire story. Rustic is keeping PST's, game phase, material, and Zobrist keys incrementally.

Without examining both engines it's a bit hard to compare - not sure if you did some specific optimizations for perft itself, because I didn't for sure

I use it as move generator tester and as an important part of unit tests, so didn't focus too much on things which aren't affecting search later.

PS: I saw that you seem to have three hash tables: one for perft, one for search, and one for pawns. I spent A LOT of time to figure out how I could create a TT which would hold generic data. There may have been an even better way, but you could take a look at engine/transposition.rs to see what/how I did it. I can now create a TT that holds any <T> I want, so I can create a tt_search<SearchData>, but also tt_perft<PerftData> without duplicating code.

Indeed, I remember I was wondering a few weeks ago how to make a more general version of hash table which would apply to every needed type - I will check it for sure, thanx.

mvanthoor · Post by **mvanthoor** » Sun Nov 14, 2021 3:09 pm

Tearth wrote: ↑Sat Nov 13, 2021 11:04 pm I think it's fine to use Mutex in non-critical routines like handling communication, the biggest problem I've spotted was when I tried to use it in perft hash table which must be shared between search threads - can't recall the exact values, but it was for sure at least several dozen percents of performance hit.

I've been avoiding unsafe code on principle in Rustic, except where it's really straightforward:

- Creating an empty move list and not having it initialized, because it will be filled on the next line by generating the moves.
- Swapping two structs by just swapping the pointers in memory so there's no copying of the entire struct.

I'll have to look into unsafe access for the TT and maybe lockless hashing when I get to implementing multi-threaded search far into the future. I don't expect to look into this until the engine hits at least 2850+ (i.e., stands a chance of exceeding 3000 Elo after implementing multi-threading.)

Without examining both engines it's a bit hard to compare - not sure if you did some specific optimizations for perft itself, because I didn't for sure I use it as move generator tester and as an important part of unit tests, so didn't focus too much on things which aren't affecting search later.

Rustic has no perft-specific optimizations. I just tried to make the move generator, make, and unmake as fast as possible. At top speed the engine ran perft with 42 M leaves/second (on an i7-6700K) including incrementally keeping the zobrist hash, but this number started dropping (obviously) as I started to keep more information incrementally during make/unmake.

Keeping information incrementally is good for chess playing, not for perft. But Rustic is not a perft tool.

Ras · Post by **Ras** » Sun Nov 14, 2021 3:10 pm

mvanthoor wrote: ↑Sat Nov 13, 2021 8:07 pmFor sharing the TT between threads I may go unsafe

That would be more reasonable than trying to fight the language because race conditions like that are exactly what the borrow checker is designed to reject. Even if you find a suitable loophole in the language right now, expect that to be closed later because that's the point of Rust. Concurrent TT access without mutex is inherently "unsafe".

What's more, it's even platform dependent what can work because e.g. atomicity is different on 32 bit vs. 64 bit, and memory ordering is different on x86 vs. ARM so that the latter may require some additional memory barriers.

mvanthoor · Post by **mvanthoor** » Sun Nov 14, 2021 3:15 pm

PS: After I'm done with it, I'll probably never implement XBoard ever again. Each time I think I'm almost done, there's another i to dot or another T to cross for one edge-case or another. Some things are in seconds, others in hundreds of a second, some are in minutes. The "level" TC has base time in two different notations, and may have a floating point for increment, but maybe not. I still need to figure out how "time" goes with "level"; I'll have to look into what Cutechess and Arena actually send to the engine. Reporting stats has two different formats (one without a tab, one with tab), where the tabbed version doesn't seem to be supported by any user interface I've tested; except maybe XBoard itself. I still need to test that.

This feels like a quadratic version of the imperial system. #%%&*(!#$%* I tried to refrain from being biased and went into implementing XBoard with an open mind, but I can see why the entire world switched to UCI even though it takes away the engine's autonomy.

Progress on Rustic

Re: Progress on Rustic

Re: Progress on Rustic

Re: Progress on Rustic

Re: Progress on Rustic

Re: Progress on Rustic

Re: Progress on Rustic

Re: Progress on Rustic

Re: Progress on Rustic

Re: Progress on Rustic

Re: Progress on Rustic