New engine: Stash

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

Gabor Szots
Posts: 1362
Joined: Sat Jul 21, 2018 7:43 am
Location: Szentendre, Hungary
Full name: Gabor Szots

Re: New engine: Stash

Post by Gabor Szots »

mhouppin wrote: Fri Apr 10, 2020 7:19 pm The matches against version 13 shows a gain of 120 +- 30 Elo, so I would expect this version to play at around 2200 Elo at CCRL Blitz.

Good evening to everyone :D
Hi Morgan,

Yes, my first results seem to confirm your hopes of 14 surpassing 13 considerably. I am not sure it is going to reach 2200 though, although there are still a lot of games to play.
Owing to shortness of time v14 results are not reaching this weeks update, you are going to see only v13 results.
Gabor Szots
CCRL testing group
mhouppin
Posts: 115
Joined: Wed Feb 12, 2020 5:00 pm
Full name: Morgan Houppin

Re: New engine: Stash

Post by mhouppin »

Gabor Szots wrote: Fri Apr 10, 2020 10:12 pm
mhouppin wrote: Fri Apr 10, 2020 7:19 pm The matches against version 13 shows a gain of 120 +- 30 Elo, so I would expect this version to play at around 2200 Elo at CCRL Blitz.

Good evening to everyone :D
Hi Morgan,

Yes, my first results seem to confirm your hopes of 14 surpassing 13 considerably. I am not sure it is going to reach 2200 though, although there are still a lot of games to play.
Owing to shortness of time v14 results are not reaching this weeks update, you are going to see only v13 results.
Thanks a lot for the testing ! Yeah, I am a bit optimistic with the 2200, but I am almost sure than v14 can score a decent 2160-2170 Elo.
The Elo limiting factor may be the eval function (PST + 15cp initiative + 100cp castling rights in middlegame seems a bit too light to allow for sharp play)...
mhouppin
Posts: 115
Joined: Wed Feb 12, 2020 5:00 pm
Full name: Morgan Houppin

Re: New engine: Stash

Post by mhouppin »

Hi there, version 15 is finally out !

This time it has taken quite a lot of new features to get enough Elo gain to do a release (I got disappointed by version 13 and 14 results at CCRL, not that they were bad, but the Elo gain was lower than before). New things:

- More aggressive Null Move Pruning reductions and Late Move Reductions (now takes into account the search depth) and disabled recursive verification for Null Move Search (if the depth is high, redo the search at reduced depth without the null move and without allowing for sub-NMP);
- Killer Heuristic implemented;
- TT writing and probing extended to Qsearch nodes (always scored with depth 0);
- Improved evaluation function (added mobility, backward pawns, passed pawns and candidate passed pawns);
- Added insufficient maing material rescoring (if KB versus KP, the side with the bishop shouldn't get a positive score, at best a draw).

I'm starting regression tests to get an idea of the Elo improvement (I'm guessing +30 Elo at minimum, and hoping for +50 Elo).

Stay safe, and good afternoon to everyone ! :)
Gabor Szots
Posts: 1362
Joined: Sat Jul 21, 2018 7:43 am
Location: Szentendre, Hungary
Full name: Gabor Szots

Re: New engine: Stash

Post by Gabor Szots »

mhouppin wrote: Sun Apr 26, 2020 4:14 pm This time it has taken quite a lot of new features to get enough Elo gain to do a release (I got disappointed by version 13 and 14 results at CCRL, not that they were bad, but the Elo gain was lower than before).
Morgan, it should come as no surprise to you that the stronger Stash gets the harder is to keep the pace of improvement.
Gabor Szots
CCRL testing group
mhouppin
Posts: 115
Joined: Wed Feb 12, 2020 5:00 pm
Full name: Morgan Houppin

Re: New engine: Stash

Post by mhouppin »

Gabor Szots wrote: Sun Apr 26, 2020 4:48 pm
mhouppin wrote: Sun Apr 26, 2020 4:14 pm This time it has taken quite a lot of new features to get enough Elo gain to do a release (I got disappointed by version 13 and 14 results at CCRL, not that they were bad, but the Elo gain was lower than before).
Morgan, it should come as no surprise to you that the stronger Stash gets the harder is to keep the pace of improvement.
Yeah, I was expecting the difficulty of gaining more Elo as the engine would get stronger. I also noticed that self-Elo tests should not be used all alone to evaluate a gain in performance (as shown by the versions 13 and 14, where the gain expectations were +120 and +100 Elo and eventually came around the +60 Elo, which I'm still pretty happy about by the way).
I'm currently generating matches against AICE (which I would expect to be rated 120-180 Elo higher than the actual Stash version) to get a real idea of what progress has been made.
Hoping to hit one day the 3000 Elo milestone ^^
Gabor Szots
Posts: 1362
Joined: Sat Jul 21, 2018 7:43 am
Location: Szentendre, Hungary
Full name: Gabor Szots

Re: New engine: Stash

Post by Gabor Szots »

mhouppin wrote: Sun Apr 26, 2020 5:19 pm I'm currently generating matches against AICE (which I would expect to be rated 120-180 Elo higher than the actual Stash version) to get a real idea of what progress has been made.
Hoping to hit one day the 3000 Elo milestone ^^
If you ask me, I would abandon self-tests altogether and would select 3-5 different opponents to assess progress.

As for that 3000, I guess you will have a job for the next couple of years.
Gabor Szots
CCRL testing group
User avatar
mvanthoor
Posts: 1784
Joined: Wed Jul 03, 2019 4:42 pm
Location: Netherlands
Full name: Marcel Vanthoor

Re: New engine: Stash

Post by mvanthoor »

mhouppin wrote: Sun Apr 26, 2020 4:14 pm Hi there, version 15 is finally out !

This time it has taken quite a lot of new features to get enough Elo gain to do a release (I got disappointed by version 13 and 14 results at CCRL, not that they were bad, but the Elo gain was lower than before).
Wait until you get into the 3200+ ELO range. At those levels, people are very happy if they make a change and their engine has gained +10 ELO outside the error margins after playing 10.000 test games :lol:

You have already made it to over 2000 ELO with your engine. As you started it from scratch, that is quite an achievement.

I am myself now just (slowly) creeping up on finishing the first version of Rustic. (It could have been done already, I spent almost a month chipping away at make, unmake, and related functions to speed them up after I discovered that Weiss was 35% faster while using exactly the same techniques...)

I'll be happy if my engine in its very first incarnation reaches 1200 ELO :lol:

I will definitely be using (older) versions of Stash in my first tests.
Author of Rustic, an engine written in Rust.
Releases | Code | Docs | Progress | CCRL
Ras
Posts: 2487
Joined: Tue Aug 30, 2016 8:19 pm
Full name: Rasmus Althoff

Re: New engine: Stash

Post by Ras »

mhouppin wrote: Sun Apr 26, 2020 5:19 pmI also noticed that self-Elo tests should not be used all alone to evaluate a gain in performance (as shown by the versions 13 and 14, where the gain expectations were +120 and +100 Elo and eventually came around the +60 Elo
That's pretty OK actually - as a rough rule of thumb, the real Elo gain is 50-70% of the Elo gain in self-play if the patch is valid. Testing against other engines is still needed because it may be that a patch has a regression that one's own engine doesn't see and exploit, but other engines do. I use self-play for initial validation: if I don't see some Elo gain even in self-play, then I abandon the idea and don't bother with testing against other engines.
Rasmus Althoff
https://www.ct800.net
User avatar
mvanthoor
Posts: 1784
Joined: Wed Jul 03, 2019 4:42 pm
Location: Netherlands
Full name: Marcel Vanthoor

Re: New engine: Stash

Post by mvanthoor »

mhouppin wrote: Sun Apr 26, 2020 5:19 pm Yeah, I was expecting the difficulty of gaining more Elo as the engine would get stronger. I also noticed that self-Elo tests should not be used all alone to evaluate a gain in performance (as shown by the versions 13 and 14, where the gain expectations were +120 and +100 Elo and eventually came around the +60 Elo, which I'm still pretty happy about by the way).
Self-tests are almost useless to determine accurate rating as they would be against other engines. The higher rating only proves that your new engine knows something your old version doesn't. The reason is that both engines are exactly the same, except for one new function in the new version. If that function works, this new version will be able to do things the old version can't. Therefore, not having this function is a weakness in the old engine, and it will be exploited by the new version over and over and over again. That way, your ELO gain will be magnified.

If you use a pool of 9 engines of different strengths, and then have your engine run a gauntlet against them, you should see each new version ending higher and higher in the tournaments.
I'm currently generating matches against AICE (which I would expect to be rated 120-180 Elo higher than the actual Stash version) to get a real idea of what progress has been made.
Hoping to hit one day the 3000 Elo milestone ^^
Running matches against only one engine is also a bad idea. You could end up optimizing your engine to specifically defeat that particular engine; and that will probably be bad for matches against other engines and thus your ELO-rating as calculated against AICE are inflated.
Author of Rustic, an engine written in Rust.
Releases | Code | Docs | Progress | CCRL
mhouppin
Posts: 115
Joined: Wed Feb 12, 2020 5:00 pm
Full name: Morgan Houppin

Re: New engine: Stash

Post by mhouppin »

mvanthoor wrote: Sun Apr 26, 2020 9:48 pm Running matches against only one engine is also a bad idea. You could end up optimizing your engine to specifically defeat that particular engine; and that will probably be bad for matches against other engines and thus your ELO-rating as calculated against AICE are inflated.
You are right, I struggled a bit to find engines in the CCRL list with available Windows 64-bit binaries, but I created a small pool of 6 chess engines to estimate Stash's rating: BikJump (2102 Elo), KnockOut (2108 Elo), Embla (2115 Elo), FoxSEE(2136 Elo), Protej (2176 Elo), and NG-Play (2195 Elo). I stopped tests with AICE, as the Elo gap might break ratings. 30 3+2 matches against each, 8-ply book with reversed colors. Results soon in the thread !