Games/second Engines

jhaglund2 · Post by **jhaglund2** » Fri Jul 01, 2022 9:07 pm

You could also make a "game tree" where games with a common ancestry all had the same moves made up to the point where they branch off. Can generate a lot of games that way that start the same but then end differently!

A lot of creativity can be used here to complete a game. Not to overthink it, just random moves to get it working.

phhnguyen · Post by **phhnguyen** » Sat Jul 02, 2022 4:52 am

I had tried to have some games per second. However, I think the main problem of having those games is that time measuring and events are not good enough for such small periods of moves. Almost all systems with GUIs are designed for human with actions of longer periods. I have bad experience fighting with system’s sudden delays which maybe as long as seconds. Heavy background tasks may delay everything. Accessing hard disks (say, for logging, probing books, syzygy…) take time too and sometimes be delayed significantly.

Furthermore, almost our engines, tools, GUIs have been tested from bullet games but not faster.

I think you may make it works by using your own tournament manager, using system without GUI (such as Linux server)… but that may take too much effort and is hardly to apply to other other users/testers.

If you need a big number of games, IMO, it’s better to play multi bullet games (say, a few seconds per games) concurrently (say, 10 games).

Mike Sherwin · Post by **Mike Sherwin** » Sat Jul 02, 2022 11:09 am

lithander wrote: ↑Fri Jul 01, 2022 12:59 pm Many engine programmers use ultra fast time controls for testing. With 1 second + 100ms strong engines still play at grandmaster level.
If you want to play dozens of hundreds of games per second I would assume the whole overhead of using a GUI and an engine thread that you talk to with text commands becomes a bottleneck and needs to be removed.

I could write a program in C# in a day that utilizes my chess engine's search routine directly to play hundreds of games per second utilizing all available cores... theses games would be pretty shallow in terms of search depth but otherwise they will just be normal chess games like you'd expect them from an engine with 200 Elo less when given a more normal amount of time. But.. why would I want to do that?

I have been yelling why since 2006 but no one hears me! To do real time reinforcement learning (RTRL). RomiChess has demonstrated from January of 2006 that it could learn to win from a specific position against any of the top engines from that time in less than 100 training games and in fact win a one hundred game match. From ~2012 AlphaZero also demonstrated the value of 100 games of reinforcement learning on the most popular opening positions. Naysayers say that learning from fast time controls has very little value in longer time controls. I have lost track of how many times I have heard that. They have been proven wrong!

So what can be expected? Well using the ten Nunn positions RomiChess played ten matches against Glaurung2 for 20 games per match. In the first match RomiChess scored 5%. In the tenth match RomiChess scored 95%. Against Bob's humongous opening book and the top ten engines RomiChess showed a +50 ELO increase for the last thousand games of 6,000 games played.

One can do RTRL by playing internally as many games as can be played in ~half the available time. After each game the positions (moves) of the game are placed in the TT. A bonus/penalty value is updated for each position. All the positions of the winning side get a small bonus and all the positions of the losing side get a small penalty. In drawn games all positions get a tiny penalty. Good moves can get a penalty and bad moves can get a bonus but these are corrected over the course of many games. The bonus/penalty value is then used to modify the score saved in the TT.

Learned RL values work well while searching because it gives the engine information from far beyond the horizon even unto the end of many games. But RL values do not dominate the search. They only nudge the main search in more promising directions. If all the shallow searches miss deep tactics the main search will still find those tactics and act accordingly. Back in 2006 I estimated that RomiChess fully trained (after a million games) would perform at least1000 elo better. So maybe you might want to do that! idk, maybe.

hgm · Post by **hgm** » Sat Jul 02, 2022 11:36 am

The point is what you want to learn. What you do makes the engine learn an opening book: the parameters changed by your learning algorithm are the weights of the book moves in the various positions. Very nice of course if you want to build an opening book. But useless if you want to tune an evaluation. Because there is no feedback of the results to the evaluation parameters.

But this is a bit off topic, since the issue was how fast you can generate games. Not what you can use these game results for.

Mike Sherwin · Post by **Mike Sherwin** » Sat Jul 02, 2022 12:33 pm

hgm wrote: ↑Sat Jul 02, 2022 11:36 am The point is what you want to learn. What you do makes the engine learn an opening book: the parameters changed by your learning algorithm are the weights of the book moves in the various positions. Very nice of course if you want to build an opening book. But useless if you want to tune an evaluation. Because there is no feedback of the results to the evaluation parameters.

But this is a bit off topic, since the issue was how fast you can generate games. Not what you can use these game results for.

The bonus/penalty values do not dictate what move to play. All they do is adjust the scores stored in the TT. They can only influence what the search returns as the best move. For example if a particular move leads to many loses then even though it might evaluate as good the penalties might cause the search to play a different close move instead.

Thomas asked why he'd want to do that. I answered his question. When the evaluation function does not get it right then RL helps the evaluation function out. And there is feedback from the results of the games for all moves of the games and not for just what one might consider the opening moves. In RomiChess there is two types of learning melded together, monkey see monkey do and Pavlov's dog experiments. Only monkey see monkey do is a type of pseudo book creation. And Romi does play the learned move if the stats are good enough. But the RL is different because before the search the entire subtree with all the RL values is loaded into the TT and only influence the search. It has nothing to do with an opening book.

Here is the learn file to TT function. Please show me where there is anything that resembles a book?

Code: Select all

moves Learn2Hash() {
  s32 Ply;
  moves m;
  poshashs *p;

  Ply = ply - base;
  SetThread();
  m.m = 0;
  if(!pondering && thread) m = GetBookMove(thread);
  if(m.m) return m;
  while(Ply > 0 || thread) {
    while(Ply < 40 && thread) {
      LoadRecord(thread, &learnRecord);
      m.fs = learnRecord.fs;
      m.ts = learnRecord.ts;
      m.typ = learnRecord.type;
      m.flag = learnRecord.flags;
      if(!VerifyMove(&m)) goto dontmove;
      h->thread = thread;
      if(learnRecord.depth) {
        p = PosLook();
        if(!p || learnRecord.depth >= p->depth) {
          PosStore(learnRecord.score, learnRecord.depth, EXACT, &noMove);
        } 
      } 
      MakeMove(&m);
      Ply++;
      thread = learnRecord.child;
    }
    TakeBack();
    Ply--;
    thread = h->thread;
    LoadRecord(thread, &learnRecord);
dontmove:
    thread = learnRecord.sibling;
  }
  m.m = 0;
  return m;
}

hgm · Post by **hgm** » Sat Jul 02, 2022 1:03 pm

Well, it still only affects what you woud play in positions from which the position with the adjusted score could be reached, which is equivalent to an opening book. That changing the score of a position in the TT can alter the move choice in all preceding positions is just a technical issue; you could have reached exactly the same effect by backpropagating the score alteration to all positions leading up to it, changing their weights too.

This type of learning doesn't do anything for positions from which any of the positions in the played games cannot be reached.

lithander · Post by **lithander** » Sun Jul 03, 2022 11:59 am

I'll try to modify Mike's suggestion to address HGM's critizism. Please excuse me if all I achieve is to demonstrate my ignorance. I can hardly follow what you guys have been discussing!

So our engine is getting tasked to find the best move on a position in X seconds via UCI/Xboard protocol. We take half the available time to play as many games as possible from that position with our current evaluation. We will gather a random sampling of positions from each game and store the outcome with it.

Such a set of annotated positions is used by many engines to find good weights for their evaluation with something like Texel's tuning or gradient descent tuning. But usually the set of positions has to represent all possible positions an engine could face from the start position. But the newly generated positions are only a subset of that, specifically from the tree that can be reached from the position the engine is currently tasked to search.
After we have gathered this new training set, that is highly tailored to the current position, we use it to make a a few training passes adjusting the weights of our evaluation. E.g. in a very simple PESTO-style engine we would adjust the values in the PSQTs and the phase thresholds.

Then we use the remaining time to conduct a normal search using these adjusted weights. We would hope to achieve better results despite having only half the amount of time because our evaluation is now tuned to the subtree we are going to search.
For example we don't have to find a compromise of values that work equally well in all castling scenarios. After white has castled king-side and black has castled queen-side it's perfectly fine to let the evaluation weights change to better support that kind of position even if it wouldn't work well for now unreachable positions like the starting position. We can adjust more and more specifically to an ever shrinking subset of "all legal chess positions".

Mike Sherwin · Post by **Mike Sherwin** » Mon Jul 04, 2022 4:14 pm

lithander wrote: ↑Sun Jul 03, 2022 11:59 am I'll try to modify Mike's suggestion to address HGM's critizism. Please excuse me if all I achieve is to demonstrate my ignorance. I can hardly follow what you guys have been discussing!

So our engine is getting tasked to find the best move on a position in X seconds via UCI/Xboard protocol. We take half the available time to play as many games as possible from that position with our current evaluation. We will gather a random sampling of positions from each game and store the outcome with it.

Such a set of annotated positions is used by many engines to find good weights for their evaluation with something like Texel's tuning or gradient descent tuning. But usually the set of positions has to represent all possible positions an engine could face from the start position. But the newly generated positions are only a subset of that, specifically from the tree that can be reached from the position the engine is currently tasked to search.
After we have gathered this new training set, that is highly tailored to the current position, we use it to make a a few training passes adjusting the weights of our evaluation. E.g. in a very simple PESTO-style engine we would adjust the values in the PSQTs and the phase thresholds.

Then we use the remaining time to conduct a normal search using these adjusted weights. We would hope to achieve better results despite having only half the amount of time because our evaluation is now tuned to the subtree we are going to search.
For example we don't have to find a compromise of values that work equally well in all castling scenarios. After white has castled king-side and black has castled queen-side it's perfectly fine to let the evaluation weights change to better support that kind of position even if it wouldn't work well for now unreachable positions like the starting position. We can adjust more and more specifically to an ever shrinking subset of "all legal chess positions".

Thanks Thomas,
That sounds like a good modification to combine the way I described it with the way you already do things in your engine. One slight detail may be a misconception though. The sampling is not random. It is an "intelligently guided" examination of the search space. The first "fast" game is stored in the TT with an RL adjustment. The second fast game already uses that RL adjustment when it searches. And the third fast game uses the adjustments from the previous two games. So in each new fast game the search simply searches the most promising line as it now understands it. Do that 1000 times and "we" have the most promising lines searched all the way to the end of the game that gives the engine information far beyond the horizon! One optimization that is done in RomiChess's after game RL is to give larger bonus/penalties to deeper moves so as to cause deeper moves to change more frequently. When the regular search then starts it uses the RL data just like the fast searches does except it searches all lines deeper so it still finds tactics that the fast searches miss. And the fast searches do find tactics that are even deeper in the tree than the horizon because it "stumbles" upon them although the tactics found there may be relatively shallow.

jhaglund2 · Post by **jhaglund2** » Wed Jul 06, 2022 12:27 am

I have been yelling why since 2006 but no one hears me! To do real time reinforcement learning (RTRL). RomiChess has demonstrated from January of 2006 that it could learn to win from a specific position against any of the top engines from that time in less than 100 training games and in fact win a one hundred game match. From ~2012 AlphaZero also demonstrated the value of 100 games of reinforcement learning on the most popular opening positions. Naysayers say that learning from fast time controls has very little value in longer time controls. I have lost track of how many times I have heard that. They have been proven wrong!

So what can be expected? Well using the ten Nunn positions RomiChess played ten matches against Glaurung2 for 20 games per match. In the first match RomiChess scored 5%. In the tenth match RomiChess scored 95%. Against Bob's humongous opening book and the top ten engines RomiChess showed a +50 ELO increase for the last thousand games of 6,000 games played.

One can do RTRL by playing internally as many games as can be played in ~half the available time. After each game the positions (moves) of the game are placed in the TT. A bonus/penalty value is updated for each position. All the positions of the winning side get a small bonus and all the positions of the losing side get a small penalty. In drawn games all positions get a tiny penalty. Good moves can get a penalty and bad moves can get a bonus but these are corrected over the course of many games. The bonus/penalty value is then used to modify the score saved in the TT.

Learned RL values work well while searching because it gives the engine information from far beyond the horizon even unto the end of many games. But RL values do not dominate the search. They only nudge the main search in more promising directions. If all the shallow searches miss deep tactics the main search will still find those tactics and act accordingly. Back in 2006 I estimated that RomiChess fully trained (after a million games) would perform at least1000 elo better. So maybe you might want to do that! idk, maybe.

I do remember your mentioned method(s) Mike, and it does have merit and what you stated about AlphaZero as well.

Games per second is exactly what it says. Games per second. Just working, it doesn't need a learn, bonus, or penalties.

Results of the game simply add to a counter for the root move (Win, Lose, Draw) for sorting.
Finish a game for however many root moves are possible, e.g., 20. ("rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR")
Each move will have a counter with the denominator being the games played.
The speed which it is measured is Games/Sec (GPS).

Similar to a book or database, the best PV line would be the best moves at each ply.

How you decide to reach the end of the game is totally up to you, but I wouldn't overthink or complicate it, before you even get it too play Games/Sec. 1-ply, even random moves, just to see how it scales is a good starting point.

Whether it's fractional games/sec, 5k, or 50k games/sec, just explore the idea.

Mike Sherwin · Post by **Mike Sherwin** » Wed Jul 06, 2022 8:03 am

jhaglund2 wrote: ↑Wed Jul 06, 2022 12:27 am
I have been yelling why since 2006 but no one hears me! To do real time reinforcement learning (RTRL). RomiChess has demonstrated from January of 2006 that it could learn to win from a specific position against any of the top engines from that time in less than 100 training games and in fact win a one hundred game match. From ~2012 AlphaZero also demonstrated the value of 100 games of reinforcement learning on the most popular opening positions. Naysayers say that learning from fast time controls has very little value in longer time controls. I have lost track of how many times I have heard that. They have been proven wrong!

So what can be expected? Well using the ten Nunn positions RomiChess played ten matches against Glaurung2 for 20 games per match. In the first match RomiChess scored 5%. In the tenth match RomiChess scored 95%. Against Bob's humongous opening book and the top ten engines RomiChess showed a +50 ELO increase for the last thousand games of 6,000 games played.

One can do RTRL by playing internally as many games as can be played in ~half the available time. After each game the positions (moves) of the game are placed in the TT. A bonus/penalty value is updated for each position. All the positions of the winning side get a small bonus and all the positions of the losing side get a small penalty. In drawn games all positions get a tiny penalty. Good moves can get a penalty and bad moves can get a bonus but these are corrected over the course of many games. The bonus/penalty value is then used to modify the score saved in the TT.

Learned RL values work well while searching because it gives the engine information from far beyond the horizon even unto the end of many games. But RL values do not dominate the search. They only nudge the main search in more promising directions. If all the shallow searches miss deep tactics the main search will still find those tactics and act accordingly. Back in 2006 I estimated that RomiChess fully trained (after a million games) would perform at least1000 elo better. So maybe you might want to do that! idk, maybe.
I do remember your mentioned method(s) Mike, and it does have merit and what you stated about AlphaZero as well.

Games per second is exactly what it says. Games per second. Just working, it doesn't need a learn, bonus, or penalties.

Results of the game simply add to a counter for the root move (Win, Lose, Draw) for sorting.
Finish a game for however many root moves are possible, e.g., 20. ("rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR")
Each move will have a counter with the denominator being the games played.
The speed which it is measured is Games/Sec (GPS).

Similar to a book or database, the best PV line would be the best moves at each ply.

How you decide to reach the end of the game is totally up to you, but I wouldn't overthink or complicate it, before you even get it too play Games/Sec. 1-ply, even random moves, just to see how it scales is a good starting point.

Whether it's fractional games/sec, 5k, or 50k games/sec, just explore the idea.

Thank you for the recognition.

Yes, win loss draw counts can be used. However, a finer level of control is possible with bonus/penalties because they can vary in importance according to depth. At lower ply they are very small so the the root move does not change that easily. At higher ply the bonus/penalties are less small to cause the algorithm to switch to another move quickly. Therefore the search through the search space happens top down. If 1. a move is best according to the eval and it also scores well in the games it may never change to another move. There is a million ways to do everything it seems!

Games/second Engines

Re: Games/second Engines

Re: Games/second Engines

Re: Games/second Engines

Re: Games/second Engines

Re: Games/second Engines

Re: Games/second Engines

Re: Games/second Engines

Re: Games/second Engines

Re: Games/second Engines

Re: Games/second Engines