Can principal variation search be worth no Elo?

mvanthoor · Post by **mvanthoor** » Thu Oct 21, 2021 12:09 am

algerbrex wrote: ↑Wed Oct 20, 2021 11:09 pm I believe from retuning I got about 5-ish Elo, so not too much. But I only went through 1-2 tuning sessions, on the two halves of the Zurichess quiet position training set. So I'm sure 10-20 Elo from retuning is certainly possible.

And hitting 2200 on the CCRL would be quite nice with only Rustic's current feature set and better tables PSTs are surprisingly powerful (at least they were for me).

The tuner encodes some general chess knowledge in the PST's; the tuning process makes the engine put pieces on squares where pieces have the most success to contributing to winning games.

And yes, if Rustic makes 2150, I'd be satisfied. 2180 would be at the top of my expectation. If it makes >=2200, that would be really wild. It would probably mean that I was lucky with regard to the engines it tested against. I found one engine around 2130, against which the current dev version performs almost exactly equal (so also ~2130). A newer version of that engine is in the list at roughly 2260, and the same dev version scores -20. That would mean a performance rating of 2240. The other engine improved about 130 points, but against Rustic, only 20 points remain.

I've always found stuff like that strange, but it's always been like this with engine testing. A > B > C does not automatically mean A > C...

I'd eventually like to build something similar, so I can use my Pixelbook for programming and schoolwork, and the other system for running tests and parameter tuning.

And one nice thing about go is that it's quite simple to use multithreading via goroutines, but that's a part of the language I'd need to learn a bit more about before I used it in any extensive manner. But I'm sure I could eventually get that up and running so I could speed up the tuning process.

And yup copied the naive implementation from CPW. So I already knew it'd be slow, which I wasn't too worried about for this initial version since the goal was to get a working tuner, not a fast one. But a faster tuner will definitely be needed in the future to speed up the development process.

Aren't go-routines "green threads"; so threading within a single thread? (I don't know this 100% sure.) In Rust, a thread is a "real" thread. Rust doesn't have "green" threads, but it can simulate them with async/await (and there are 50 million libraries that do this, which I personally dislike).

I don't intend to put a huge amount of money into the testing system, nor a huge amount of time into the tuner. I don't expect to tune more often than once per feature. If a tuning run takes 12 hours, and I can cut that down to 1.5 hours by using Rayon, the "simple" tuner is good enough for me.

When / if I hit 3000 (and certainly at 3200, if I can manage that) with Rustic (without NNUE), I actually intend to divert part of my time to writing my own user interface and a Rust-based replacement for Picochess. (I like Picochess, but I completely dislike the fact that it is written in Python, using an old version of PyChess, and stuck at Python 2 because no-one wants to update it. And I HATE Python. I won't voluntarily touch that language when I'm not getting paid for it.)

In short, If Rustic ever reaches the 3200+ ratings, I don't see a lot of reason to continue development, short of NNUE, because improving beyond that will take exponentially more testing time and computing power. I'd rather add something like MCTS and/or NNUE as alternative search / evaluation options.

As soon as the engine reaches 2850, I consider it strong enough to make it one of my analysis engines. Stockfish 14 is cool, but half the time I don't understand its moves, because the point of a move only becomes visible 25 ply's into the future...

algerbrex · Post by **algerbrex** » Thu Oct 21, 2021 12:51 am

mvanthoor wrote: ↑Thu Oct 21, 2021 12:09 am The tuner encodes some general chess knowledge in the PST's; the tuning process makes the engine put pieces on squares where pieces have the most success to contributing to winning games.

And yes, if Rustic makes 2150, I'd be satisfied. 2180 would be at the top of my expectation. If it makes >=2200, that would be really wild. It would probably mean that I was lucky with regard to the engines it tested against. I found one engine around 2130, against which the current dev version performs almost exactly equal (so also ~2130). A newer version of that engine is in the list at roughly 2260, and the same dev version scores -20. That would mean a performance rating of 2240. The other engine improved about 130 points, but against Rustic, only 20 points remain.

I've always found stuff like that strange, but it's always been like this with engine testing. A > B > C does not automatically mean A > C...

These are the issues I've frequently encountered as well. For example, many versions of Blunder I've tested are often stronger than a version of MinimalChess with a higher Elo. I always just try to make sure I have a semi-decent variety of opponents so I can get a rough estimate of Blunder's strength.

mvanthoor wrote: ↑Thu Oct 21, 2021 12:09 am Aren't go-routines "green threads"; so threading within a single thread? (I don't know this 100% sure.) In Rust, a thread is a "real" thread. Rust doesn't have "green" threads, but it can simulate them with async/await (and there are 50 million libraries that do this, which I personally dislike).

I don't intend to put a huge amount of money into the testing system, nor a huge amount of time into the tuner. I don't expect to tune more often than once per feature. If a tuning run takes 12 hours, and I can cut that down to 1.5 hours by using Rayon, the "simple" tuner is good enough for me.

I believe so yes. And I don't plan to spend much either (not like a could even if I did want too), just enough to have something reasonable to run hardware-intensive things like engine testing and parameter tuning. I've honestly just considered buying a desktop and a cheap monitor and calling it good since that would already be a good improvement over what I have now. Pixelbooks are essentially just a glorified monitor that can reliably connect to the internet. Which was fine for my purposes when I got it about 5-6 months ago, and I didn't anticipate my future self becoming a chess programming addict

mvanthoor wrote: ↑Thu Oct 21, 2021 12:09 am When / if I hit 3000 (and certainly at 3200, if I can manage that) with Rustic (without NNUE), I actually intend to divert part of my time to writing my own user interface and a Rust-based replacement for Picochess. (I like Picochess, but I completely dislike the fact that it is written in Python, using an old version of PyChess, and stuck at Python 2 because no-one wants to update it. And I HATE Python. I won't voluntarily touch that language when I'm not getting paid for it.)

In short, If Rustic ever reaches the 3200+ ratings, I don't see a lot of reason to continue development, short of NNUE, because improving beyond that will take exponentially more testing time and computing power. I'd rather add something like MCTS and/or NNUE as alternative search / evaluation options.

As soon as the engine reaches 2850, I consider it strong enough to make it one of my analysis engines. Stockfish 14 is cool, but half the time I don't understand its moves, because the point of a move only becomes visible 25 ply's into the future...

I'm somewhat of the same opinion. the 3000 Elo mark for me would be much more than I ever thought I could reach when I started writing my humble little chess engine some months ago, so I would have already far exceeded my original goal of breaking 2000+ Elo, which again, seemed pretty impossible several months ago.

At that point, I'd likely follow the same path you outline. I was absolutely fascinated by MCTS after reading about it a couple of weeks ago, and it was definitely something I wanted to explore in the future in Blunder. NNUEs as well.

But I know I'll still be working on Blunder, hopefully making small improvements here or there, and making it more human-friendly, so it can be of use to more than just programmers and engine testers.

I also suppose Blunder is at the point where it could be decent in teaching me how to play better chess since the whole reason I started writing an engine was that I, to put it frankly, suck very badly. I think my highest writing is something like 650 Elo. So perhaps Blunder can help get those numbers up

mvanthoor · Post by **mvanthoor** » Thu Oct 21, 2021 1:15 am

algerbrex wrote: ↑Thu Oct 21, 2021 12:51 am ...and I didn't anticipate my future self becoming a chess programming addict

Here you go. With my compliments.

I'm somewhat of the same opinion. the 3000 Elo mark for me would be much more than I ever thought I could reach when I started writing my humble little chess engine some months ago, so I would have already far exceeded my original goal of breaking 2000+ Elo, which again, seemed pretty impossible several months ago.

I never doubted that I could reach 1700 Elo. I targeted 1700 for Alpha 1 (and actually almost made it, with 1695 after the first test run... before it got paired against TSCP), and even got a bit over-confident, hoping that the only the TT would already get me beyond 2000. Well.... no. But the tapered evaluation did.

At that point, I'd likely follow the same path you outline. I was absolutely fascinated by MCTS after reading about it a couple of weeks ago, and it was definitely something I wanted to explore in the future in Blunder. NNUEs as well.

One thing I'm going to try to build into Rustic at some point is a level option (_not_ uci_limitstrength, because I find that useless; I don't have a testing pool big enough to make it comparable to human ratings. I'll probably calibrate it against CCRL ratings).

But I know I'll still be working on Blunder, hopefully making small improvements here or there, and making it more human-friendly, so it can be of use to more than just programmers and engine testers.

I'll see what I'll do at that point. If Rustic reaches 3000+ and the book on the website is complete, the goal for the engine has been reached. I'll be able to use my own engine and documentation to write a chess engine in any language I want in the future, without consulting a single other website besides my own. So _everybody_ with intermediate to early-advanced programming experience should be able to code a chess engine from scratch.

I also suppose Blunder is at the point where it could be decent in teaching me how to play better chess since the whole reason I started writing an engine was that I, to put it frankly, suck very badly. I think my highest writing is something like 650 Elo. So perhaps Blunder can help get those numbers up

Don't count on it. Eengines are very strong, but they play "strange" chess. What works for a chess engine looking 20+ ply's into the future won't work for a human. An engine is especially useful to point out where you went wrong tactically, sometimes positionally (if it has a very good evaluation such as HIARCS), but it can't teach you how to "read" a position and make plans accordingly. Better buy something like the "Steps Method"and study that, and some Jeremy Silman books.

algerbrex · Post by **algerbrex** » Thu Oct 21, 2021 3:13 am

mvanthoor wrote: ↑Thu Oct 21, 2021 1:15 am I never doubted that I could reach 1700 Elo. I targeted 1700 for Alpha 1 (and actually almost made it, with 1695 after the first test run... before it got paired against TSCP), and even got a bit over-confident, hoping that the only the TT would already get me beyond 2000. Well.... no. But the tapered evaluation did.

In hindsight I that makes sense, but I remember the spot I was in a couple of months ago. I had absolutely no understanding of Elo, how engines worked or become stronger, and what a decent chess engine rating looked like. I think I just assumed it'd be quite difficult to crack 2000 Elo, and all but impossible to reach 3000+

mvanthoor wrote: ↑Thu Oct 21, 2021 1:15 am One thing I'm going to try to build into Rustic at some point is a level option (_not_ uci_limitstrength, because I find that useless; I don't have a testing pool big enough to make it comparable to human ratings. I'll probably calibrate it against CCRL ratings).

I'd like to do that too at some point, but looking at some of the discussions surrounding that topic seems pretty confusing, and it'd take me a while to fully grok all the moving pieces involved.

mvanthoor wrote: ↑Thu Oct 21, 2021 1:15 am Don't count on it. Eengines are very strong, but they play "strange" chess. What works for a chess engine looking 20+ ply's into the future won't work for a human. An engine is especially useful to point out where you went wrong tactically, sometimes positionally (if it has a very good evaluation such as HIARCS), but it can't teach you how to "read" a position and make plans accordingly. Better buy something like the "Steps Method"and study that, and some Jeremy Silman books.

Hmm, thanks. I'll check those resources out. That makes sense though. I've seen even skilled chess players baffled by a computer move, only to see 20 plies later how strong it was. So I'd much rather learn how to make moves that are possible for me to understand the reasoning behind.

Can principal variation search be worth no Elo?

Re: Can principal variation search be worth no Elo?

Re: Can principal variation search be worth no Elo?

Re: Can principal variation search be worth no Elo?

Re: Can principal variation search be worth no Elo?