Progress on Rustic

hgm · Post by **hgm** » Sat Oct 24, 2020 7:36 pm

There are many variants that use only the FIDE army (Suicide, Losers, Atomic, 3checks) or that can be played with a standard Chess set if you imagine the pieces move different from usual (e.g. Shatranj, Knightrider Chess, Berolina Chess, Spartan Chess, Chess with Different Armies, Ultima). But the latter is usually awful, almost impossible for a good Chess player.

Indeed WinBoard/XBoard supports many variants, and can be configured (even by the engine itself) to play more. The engine can redefine the piece moves and initial position through engine->GUI commands, to quite complicated moves, if necessary (e.g. sliders with bent trajectories, pieces that must hop over something, pieces that can do doube capture).

mvanthoor · Post by **mvanthoor** » Sun Oct 25, 2020 4:39 pm

Finally; a milestone has been hit

Rustic can now play in movetime, nodes, and depth modes. I still have to implement time per game/increments ("gametime" mode in my engine). Maybe I'll do MateMode (go mate <x>) as well for completist sake.

It works perfectly fine in Arena. The engine answers at exactly the right time (if it hits the number of nodes, the depth, or the move time is up). It actually plays fairly decent, but beginner-like chess. (Couldn't expect anything else with no evaluation function but piece counting and PSQT''s, and no QSearch.) Speed is good; it reaches depth 7 instantly, and then depth 8-10 depending on the position.

Only one snag... it doesn't work in the Fritz 17 GUI. Chessbase is FOREVER tinkering with its time system. I've seen it being mentioned in many updates through the years. It just doesn't work. When I set 5 seconds per move, it says "5secs/move" above the clock. However, it seems to treat the game like a bullet game with 10 seconds base time and 5 seconds increment. The clock starts at 10 seconds, and then counts down. When you move, the opponent gets 5 seconds added to the game. I didn't ask for a game with base time; I asked for a 5 seconds per move game.

I assume Fritz is sending "go wtime btime winc binc", etc, which I've not yet implemented, instead of "go movetime" like arena does. So, I'm not going to search for a bug that possibly doesn't exist again; I'll just implement GameTime mode, and then I'll see if it works in Fritz 17 as well.

Next up:

GameTime mode
QSearch
Threefold repetition and 50-move rule (or should I go for the new 75 move rule? Any GUI and/or engine using this?)
Transposition table and move picking according to TT

And then we'll have Rustic 1.0; the baseline on top of which search and evaluation improvements can be built.

hgm · Post by **hgm** » Sun Oct 25, 2020 7:28 pm

Engine's could be made aware of the 50-move rule, taking it into account when evaluating positions. (My personal theory is that this only weakens them, though.) For UCI engines there is no way to claim or offer draws, however. The 75-move rule is more something for GUIs; it is a referee decision, not a player's decision.

Beating Fairy-Max is always a good first benchmark. It is an engine virtually without knowledge (not even real PST), without real move sorting (justTT move first). So if your engine is bug-free, it should be easy to beat it.

Terje · Post by **Terje** » Sun Oct 25, 2020 8:29 pm

hgm wrote: ↑Sun Oct 25, 2020 7:28 pm Engine's could be made aware of the 50-move rule, taking it into account when evaluating positions. (My personal theory is that this only weakens them, though.) For UCI engines there is no way to claim or offer draws, however. The 75-move rule is more something for GUIs; it is a referee decision, not a player's decision.

Beating Fairy-Max is always a good first benchmark. It is an engine virtually without knowledge (not even real PST), without real move sorting (justTT move first). So if your engine is bug-free, it should be easy to beat it.

UCI engines can (and good ones do) still know about 50 move rule during search, checking if 50mr is reached when entering a node and scoring it as draw if so, even if they can't claim a draw

hgm · Post by **hgm** » Sun Oct 25, 2020 9:33 pm

Sure, that is what "taking it into account when evaluating a position" means, right?

My conjecture was that this would decrease their playing strength, though. So the 'good ones' are probably not as good as they could have been.

Uri Blass · Post by **Uri Blass** » Sun Oct 25, 2020 10:41 pm

hgm wrote: ↑Sun Oct 25, 2020 9:33 pm Sure, that is what "taking it into account when evaluating a position" means, right?

My conjecture was that this would decrease their playing strength, though. So the 'good ones' are probably not as good as they could have been.

I think that the problem is that the 50 move rule does not happen very often.
I think that it may be interesting to have 50 as a variable and not a fixed number for chess programs and make competitions with numbers that are smaller than 50 from unbalanced positions because it is going to be very hard to win from balanced positions with the 1 move rule but probably all the pieces can win against a king if you do not ignore the 1 move rule because I guess that it is possible to win only by pawn moves in that case.

mvanthoor · Post by **mvanthoor** » Sun Oct 25, 2020 10:46 pm

hgm wrote: ↑Sun Oct 25, 2020 7:28 pm Engine's could be made aware of the 50-move rule, taking it into account when evaluating positions. (My personal theory is that this only weakens them, though.) For UCI engines there is no way to claim or offer draws, however. The 75-move rule is more something for GUIs; it is a referee decision, not a player's decision.

Beating Fairy-Max is always a good first benchmark. It is an engine virtually without knowledge (not even real PST), without real move sorting (justTT move first). So if your engine is bug-free, it should be easy to beat it.

I just implemented QSearch. (PS: because those are small nodes, the nodes/second speed is now through the roof at 12.1 million NPS. Should nodes in QSearch be counted? Some say yes, some say no...)

After implementing QSearch, the play and evaluation is much smoother now. I assume the engine is bug free. It passes all 172 perft tests, and the debug function that checks if incrementally kept values stay correct against values from scratch give no errors. The only thing it actually does for evaluation is count material and use a set of 6 PSQT's (all kept incrementally).

I tried it against Nero 6.1, which is only 1450 Elo in CCRL... and Rustic draws each and every game. The pattern is always the same:

- Completely destroy Nero 6.1 with regard to search speed. Nero reaches 6-8 ply, where Rustic reaches 7-11 ply, even without a hash table, without killer moves and without heuristics.
- Using this speed advantage and a quite aggressive PSQT, it takes over the center and locks down the other engine. Often, Nero's mobility is almost nothing.
- Nero often thinks that taking a pawn or piece somewhere is safe. A human would not take it because "the queen could easily get trapped"; and so it often happens. Nero grabs something somewhere, the piece gets stuck and destroyed, and Rustic is 200 - 800 centipawns up.
- It proceeds to slowly crush Nero....
- And then, because I haven't implemented three-fold repetition yet, it runs into a perpetual check.
- Sometimes it just doesn't see how to proceed; leaving a rook out of play because it doesn't know how to open lines, or it keeps defending a set of 2 pawn, where I can easily see that giving it up to create a passed pawn on the other side of the board is more valuable.

So in short, if I play it against a weak engine like Nero, it actually completely outplays it, and then it either doesn't know how to finish the game (it can see mates in 8, tough, at the very least; tested this), or it runs into some sort of perpetual check. I think I can resolve that by first implementing the threefold repetition rule.

mvanthoor · Post by **mvanthoor** » Mon Oct 26, 2020 12:55 am

I just remembered that I had to put in some pretty extreme values in the PSQT to make it play half-decent openings when it didn't have a qsearch. Basically, Rustic put all its pieces on good squares, then declared "I am best!" and then..."so, what to do now?" The pieces were completely stuck in place due to the extreme PSQT values.

I've made the values more normal, and the engine still plays decent openings because it now has QSearch. It's now able to easily wipe the floor with Nero in most games... as long as it doesn't run into a perpetual check trap being a rook up

(I must get to implementing this third-rep rule.) I've watched several games, and Rustic just patiently outplays Nero. In the beginning, Rustic's speed advantage is only 1, sometimes 2 ply but as the number of pieces dwindle, this advantage increases, up to 4-5 ply's and then Rustic outcalculates Nero. Often Nero thinks everything is fine, except for the fact that it didn't see it can't hold on to a pinned piece. (Or actually, Rustic pinned it with the express purpose of capturing it. Nero sees this, tries to defend, but because of its lower depth fails to see that it doesn't have enough defensive capability until its too late).

Rustic also uses pins, threats of skewers and forks to get out of tight situations. It's quite fun to watch.

Pity that Nero has some sort of bug; after it loses a game and Arena starts a new one, Nero hangs. It's not Rustic hanging, because if Rustic is white, it makes a move and then never gets a reply. If Nero is white, the first move never comes. So I'll have to find a different engine.

hgm · Post by **hgm** » Mon Oct 26, 2020 11:38 am

My experience is also that it is completely pointless to test an engine that doesn't recognize repetitions for strength. Better play just means it reaches a devastating advantage faster, but it still draws every game that it should have won.

In micro-Max I solved this by changing the score in the hash entry of the root to 0 after a search, and the depth to MAXDEPTH, while protecting entries with this depth from replacement in the hash store code. That solved the practical problem, but did not make it possible to plan for future repetitions.

mvanthoor · Post by **mvanthoor** » Mon Oct 26, 2020 12:34 pm

hgm wrote: ↑Mon Oct 26, 2020 11:38 am My experience is also that it is completely pointless to test an engine that doesn't recognize repetitions for strength. Better play just means it reaches a devastating advantage faster, but it still draws every game that it should have won.

In micro-Max I solved this by changing the score in the hash entry of the root to 0 after a search, and the depth to MAXDEPTH, while protecting entries with this depth from replacement in the hash store code. That solved the practical problem, but did not make it possible to plan for future repetitions.

Yesterday and tonight I ran a tournament against Nero 6.1 (it seems to work if I have Arena restart the engines on every game) and Fairy-Max. I ran only 20 games against each, at 10 seconds per move (which Fairy-Max doesn't seem to use).

Observations:
- Nero can hold out against Rustic, until the search depth disadvantage gets too big, and then it'll make a mistake somewhere and lose. Rustic scored 75%: It won 10 of the 20 games and drew 10 (many of which it didn't have to draw).

- Something similar happens against Fairy-Max. Fairy-Max doesn't use all of its time, but in the time it does use, it reaches the same depth as Rustic in the middle game (7-9 ply). Rustic plays well, and often gets into advantageous positions, but as the endgame gets closer, Fairy-Max accumulates a huge search depth advantage due to its TT; often more than 15 ply. Rustic gets outplayed. Because it also doesn't recognize three-fold reps yet, it draws won positions, sometimes loses drawn positions, and because of the search depth deficit, it sometimes makes tactical mistakes squandering a won position. Fairy-Max won half of the games (many of which it shouldn't have), and drew the other half (some of which it should have lost). Fairy-Max scored 75%, which puts it at +200 Elo stronger than Rustic now is.

That means that Rustic is somewhere around 1700 Elo now, +/- 50 or so. (Very rough estimate).

I'll add three-fold reps, 50-move rule, and a TT and see what it can do then. (And sending a PV, so I can actually see what it thinks apart from the first move.)

Progress on Rustic

Re: Progress on Rustic

Re: Progress on Rustic

Re: Progress on Rustic

Re: Progress on Rustic

Re: Progress on Rustic

Re: Progress on Rustic

Re: Progress on Rustic

Re: Progress on Rustic

Re: Progress on Rustic

Re: Progress on Rustic