Rustic vs. Shallow Blue: one of us is weird somehow

Discussion of chess software programming and technical issues.

Moderators: hgm, Dann Corbit, Harvey Williamson

Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
User avatar
mvanthoor
Posts: 598
Joined: Wed Jul 03, 2019 2:42 pm
Full name: Marcel Vanthoor

Rustic vs. Shallow Blue: one of us is weird somehow

Post by mvanthoor » Fri Oct 30, 2020 12:06 am

Hi :)

People here may know the engine Shallow Blue by Rhys Rustad-Elliot. I got interested in it when I found its website, where the author has a nice explanation on magic bitboards, that helped me to understand this concept:

https://rhysre.net/fast-chess-move-gene ... oards.html

Shallow Blue has a rating on CCRL that fluctuates around 1720. Now I'm actually at a point where I can take on an engine like this, so I did.

I'm noticing two things:

1. Rustic has no features beyond MVV-LVA in the alpha/beta search, material counting and a PSQT. Compared to this, Shallow Blue has a massive list of features, including search optimizations, tapered evaluation and even a transposition table.

2. Shallow Blue is SLOW. It is so slow, that I'm wondering if I did something wrong, or the author of SB did. Look at the screenshot below...

Image

My engine calculates 3.000K - 5.000K (3-5 million) positions per second. Shallow Blue tops out at 300K. I've quickly looked through search.cc in SB's repository, and I can only find "_nodes++" in QSearch... it would explain the low node count, if the author forgot to increment nodes in alpha/beta. (I increment the node count when I actually go and do some work, in the node, so just before the move loop, both in alpha/beta, and in qsearch; so I don't count a node if it exists immediately.)

Still, according to the feature list, Shallow Blue has a TT (even though it's UCI command doesn't report it); it also looks like it's using it, because it has 62 MB of RAM in use. (And this rises all the time... maybe a memory leak.) The strange thing is, even IF SB is using its TT, Rustic clobbers it with regard to search depth; it's often 2-3 ply ahead.

I've played about a 50 games against this engine, with different time controls, and Rustic scores about 40%. This would put it about 70 Elo below Shallow Blue, at +/- 1650. (I know this is a very tentative score, as it's only one engine and only 50 games.) The questions for me are:

- Am I counting my nodes wrong? I suppose not... perft runs at 36 million moves/sec (including all the incremental stuff), so I can believe that the engine reaches 3-5 million nodes / sec in a normal game, as it doesn't have much of an evaluation yet. Take into account that SB possibly counts its nodes wrong, this could explain the massive speed difference.

- If SB has a TT and search optimizations that Rustic does not have, how can it still be behind in the depth it reaches? Bugs... or very slow implementations much copying of data instead of referencing it? (My engine did a lot of unnecessary zero-initializations, which Fabian helpfully pointed out how to circumvent in Rust, which increased my engine's speed by from perft @ 10 million nodes/s to 30 milion.)

- It seems to somehow make up for the lack of speed with knowledge, having a tapered evaluation, a pawn hash table, and many evaluation terms that Rustic doens't yet have. How much would it be possible to compensate slowness with knowledge, or the other way around?

In the end, I find it strange that an engine such as ShallowBlue has only a 1720 rating, with all of those features; I also find it exciting that my own engine (tentatively) seems to sit at around 1650 with basically NO features.

If ShallowBlue was YOUR engine, where would you first look to improve it?

User avatar
hgm
Posts: 25570
Joined: Fri Mar 10, 2006 9:06 am
Location: Amsterdam
Full name: H G Muller
Contact:

Re: Rustic vs. Shallow Blue: one of us is weird somehow

Post by hgm » Fri Oct 30, 2020 9:47 am

I suppose it is buggy. Or poorly tuned. No amount of features is going to help if you have poor heuristic evaluation. It just makes the engine more clever in finding ways to do itself in.

Micro-Max / Fairy-Max has ~1950 Elo CCRL, and hardly has any features at all. Not even full MVV-LVA sorting. The only deviation from move-generation order is that it puts the best move of the previous iteration (as a duplicat) in front of all other moves. And it does a preparatory iteration to identify the move that is best MVV-LVA-wise. It does have a TT to speed up that process a bit.

BTW, there also exists a micro-Max 1.6, which doesn't have a TT, and should be around 1500 Elo.

My engines typically run only at ~1Mnps (single thread). TT probing slows them down very much. Micro-Max 1.6 reaches 6 Mnps. But it is far weaker than the versions without TT, as it is not able to use the nodes well, revisiting each tree position many times.

A non-buggy engine with TT should be able to beat micro-Max easily.

Guenther
Posts: 3674
Joined: Wed Oct 01, 2008 4:33 am
Location: Regensburg, Germany
Full name: Guenther Simon
Contact:

Re: Rustic vs. Shallow Blue: one of us is weird somehow

Post by Guenther » Fri Oct 30, 2020 10:01 am

mvanthoor wrote:
Fri Oct 30, 2020 12:06 am
Hi :)

People here may know the engine Shallow Blue by Rhys Rustad-Elliot. I got interested in it when I found its website, where the author has a nice explanation on magic bitboards, that helped me to understand this concept:

https://rhysre.net/fast-chess-move-gene ... oards.html

Shallow Blue has a rating on CCRL that fluctuates around 1720. Now I'm actually at a point where I can take on an engine like this, so I did.

I'm noticing two things:

1. Rustic has no features beyond MVV-LVA in the alpha/beta search, material counting and a PSQT. Compared to this, Shallow Blue has a massive list of features, including search optimizations, tapered evaluation and even a transposition table.

2. Shallow Blue is SLOW. It is so slow, that I'm wondering if I did something wrong, or the author of SB did. Look at the screenshot below...



My engine calculates 3.000K - 5.000K (3-5 million) positions per second. Shallow Blue tops out at 300K. I've quickly looked through search.cc in SB's repository, and I can only find "_nodes++" in QSearch... it would explain the low node count, if the author forgot to increment nodes in alpha/beta. (I increment the node count when I actually go and do some work, in the node, so just before the move loop, both in alpha/beta, and in qsearch; so I don't count a node if it exists immediately.)

Still, according to the feature list, Shallow Blue has a TT (even though it's UCI command doesn't report it); it also looks like it's using it, because it has 62 MB of RAM in use. (And this rises all the time... maybe a memory leak.) The strange thing is, even IF SB is using its TT, Rustic clobbers it with regard to search depth; it's often 2-3 ply ahead.

I've played about a 50 games against this engine, with different time controls, and Rustic scores about 40%. This would put it about 70 Elo below Shallow Blue, at +/- 1650. (I know this is a very tentative score, as it's only one engine and only 50 games.) The questions for me are:

- Am I counting my nodes wrong? I suppose not... perft runs at 36 million moves/sec (including all the incremental stuff), so I can believe that the engine reaches 3-5 million nodes / sec in a normal game, as it doesn't have much of an evaluation yet. Take into account that SB possibly counts its nodes wrong, this could explain the massive speed difference.

- If SB has a TT and search optimizations that Rustic does not have, how can it still be behind in the depth it reaches? Bugs... or very slow implementations much copying of data instead of referencing it? (My engine did a lot of unnecessary zero-initializations, which Fabian helpfully pointed out how to circumvent in Rust, which increased my engine's speed by from perft @ 10 million nodes/s to 30 milion.)

- It seems to somehow make up for the lack of speed with knowledge, having a tapered evaluation, a pawn hash table, and many evaluation terms that Rustic doens't yet have. How much would it be possible to compensate slowness with knowledge, or the other way around?

In the end, I find it strange that an engine such as ShallowBlue has only a 1720 rating, with all of those features; I also find it exciting that my own engine (tentatively) seems to sit at around 1650 with basically NO features.

If ShallowBlue was YOUR engine, where would you first look to improve it?
Yes, it is relatively slow around 100kn/s on my old quadcore.

There is a new engine called Drofa, which started as a fork/derivate of ShallowBlue and this one was very open about the origin
and also is well documented in its lots of real changes and is much more in the spirit of opensource than a couple of recently ones.
(It also still adds the original author at startup)

Here is the changelog for Drofa 1.0 from the original ShallowBlue base.

https://github.com/justNo4b/Drofa/commi ... c74d0da0e0

You can look at the sources what he did for speeding it up.
Drofa 1.31 reaches much higher depths now and around 800 kn/s on my machine.
I want to mention though that speed alone is ofc not the holy grail.

https://github.com/justNo4b/Drofa

BTW it seems to me that ShallowBlue 2.00 might have a memory leak?
At least when starting from cmd and doing 'go infinite' it eats up more than 1MB per second or so?
Need to check this in playing games.

User avatar
mvanthoor
Posts: 598
Joined: Wed Jul 03, 2019 2:42 pm
Full name: Marcel Vanthoor

Re: Rustic vs. Shallow Blue: one of us is weird somehow

Post by mvanthoor » Fri Oct 30, 2020 10:17 am

hgm wrote:
Fri Oct 30, 2020 9:47 am
I suppose it is buggy. Or poorly tuned. No amount of features is going to help if you have poor heuristic evaluation. It just makes the engine more clever in finding ways to do itself in.

Micro-Max / Fairy-Max has ~1950 Elo CCRL, and hardly has any features at all. Not even full MVV-LVA sorting. The only deviation from move-generation order is that it puts the best move of the previous iteration (as a duplicat) in front of all other moves. And it does a preparatory iteration to identify the move that is best MVV-LVA-wise. It does have a TT to speed up that process a bit.

BTW, there also exists a micro-Max 1.6, which doesn't have a TT, and should be around 1500 Elo.

My engines typically run only at ~1Mnps (single thread). TT probing slows them down very much. Micro-Max 1.6 reaches 6 Mnps. But it is far weaker than the versions without TT, as it is not able to use the nodes well, revisiting each tree position many times.

A non-buggy engine with TT should be able to beat micro-Max easily.
So MicroMax with TT is 450 Elo stronger than MicroMax without TT? Wow. But you did report somewhere that your engines add about +100 or so Elo for each move they can go deeper. So if a TT adds +4 or +5 ply for your engine, it could indeed be the case that a TT adds 450 Elo.
Guenther wrote:
Fri Oct 30, 2020 10:01 am
...
Thanks :) I've seen Drofa, and I know it's derived from Shallow Blue. It seems true that SB has a memory leak. In its features it mentions a TT, but it doesn't report it through UCI, and its speed doesn't suggest that it uses it (or uses it poorly).

I have to finish GameTime mode in my engine and then create a pool of engines between 1500 and 1800 to see where my engine sits in its current state, with no features but MVV-LVA and PSQT's.

Against Shallow Blue it came up 70 Elo short, which would put it at ~1650 CCRL.
Against Pulse (1648 CCL) however, it obtained a +200 score, which would put it at ~1850 CCRL.

So maybe 1750 CCRL tentatively, with huge error bars for now.

If adding a TT adds up to 450 Elo indeed, Rustic could end up in the 2100+ range without doing anything else. I would be very happy starting out like that. (I've decided to add the TT after the killer and heuristics though; first make the tree a little smaller :))

User avatar
hgm
Posts: 25570
Joined: Fri Mar 10, 2006 9:06 am
Location: Amsterdam
Full name: H G Muller
Contact:

Re: Rustic vs. Shallow Blue: one of us is weird somehow

Post by hgm » Fri Oct 30, 2020 11:10 am

mvanthoor wrote:
Fri Oct 30, 2020 10:17 am
So MicroMax with TT is 450 Elo stronger than MicroMax without TT? Wow. But you did report somewhere that your engines add about +100 or so Elo for each move they can go deeper. So if a TT adds +4 or +5 ply for your engine, it could indeed be the case that a TT adds 450 Elo.
That is not the only difference. Almost everything was taken out of it for creating 1.6, as this was an attempt to break the record for the smallest chess program irrespective of playing strength. The normal design criterion for micro-Max is to optimize Elo/character.

So micro-Max 1.6 (the stand-alone version running in console mode) measures only 1433 characters (ignoring comment and non-essential white space):

Code: Select all

/***************************************************************************/
/*                               micro-Max,                                */
/* A chess program smaller than 2KB (of non-blank source), by H.G. Muller  */
/***************************************************************************/
/* version 1.6 (1433 non-blank characters) features:                       */
/* - recursive negamax search                                              */
/* - quiescence search with recaptures                                     */
/* - recapture extensions                                                  */
/* - (internal) iterative deepening                                        */
/* - best-move-first 'sorting'                                             */
/* - full FIDE rules and move-legality checking                            */

/* accepts under-promotions: type 1,2,3 (=R,B,N) after input move          */
/* (input buffer c[] & *P made global, K and N encoding swapped for this)  */

#define W while

int M=136,S=128,I=8e3,C=799,Q,O,K,N;   

char L,*P,
w[]={0,1,1,-1,3,3,5,9},                      
o[]={-16,-15,-17,0,1,16,0,1,16,15,17,0,14,18,31,33,0,
     7,-1,6,11,8,3,6,                          
     6,4,5,7,3,5,4,6},                         
b[129],

n[]=".?+knbrq?*?KNBRQ",

c[9];

D(k,q,l,e,E,z,n)        
int k,q,l,e,E,z,n;      
{                       
 int j,r,m,v,d,h,i,F,G,s;
 char t,p,u,x,y,X,Y,H,B;

 q--;                                          
 d=X=Y=0;                                      

 W(d++<n||                                     
   z==8&K==I&&(N<1e6&d<98||                    
   (K=X,L=Y&~M,d=2)))                          
 {x=B=X;                                       
  h=Y&S;                                   
  m=d>1?-I:e;                                  
  N++;                                         
  do{u=b[x];                                   
   if(u&k)                                     
   {r=p=u&7;                                   
    j=o[p+16];                                 
    W(r=p>2&r<0?-r:-o[++j])                    
    {A:                                        
     y=x;F=G=S;                                
     do{                                       
      H=y=h?Y^h:y+r;                        
      if(y&M)break;                            
      m=E-S&&b[E]&&y-E<2&E-y<2?I:m;    /* castling-on-Pawn-check bug fixed */
      if(p<3&y==E)H^=16;                       
      t=b[H];if(t&k|p<3&!(y-x&7)-!t)break;       
      i=99*w[t&7];                             
      m=i<0?I:m;                       /* castling-on-Pawn-check bug fixed */
      if(m>=l)goto C;                          

      if(s=d-(y!=z))                           
      {v=p<6?b[x+8]-b[y+8]:0;
       b[G]=b[H]=b[x]=0;b[y]=u|32;             
       if(!(G&M))b[F]=k+6,v+=30;               
       if(p<3)                                 
       {v-=9*((x-2&M||b[x-2]-u)+               
              (x+2&M||b[x+2]-u)-1);            
        if(y+r+1&S)b[y]|=7,i+=C;               
       }
       v=-D(24-k,-l,m>q?-m:-q,-e-v-i,F,y,s);   
       if(K-I)                                 
       {if(v+I&&x==K&y==L&z==8)                
        {Q=-e-i;O=F;
         if(b[y]-u&7&&P-c>5)b[y]-=c[4]&3;        /* under-promotions */
         return l;
        }v=m;                                   
       }                                       
       b[G]=k+6;b[F]=b[y]=0;b[x]=u;b[H]=t;     
       if(v>m)                         
        m=v,X=x,Y=y|S&F;                       
       if(h){h=0;goto A;}                            
      }
      if(x+r-y|u&32|                           
         p>2&(p-3|j-7||                        
         b[G=x+3^r>>1&7]-k-6                   
         ||b[G^1]|b[G^2])                      
        )t+=p<5;                               
      else F=y;                                
     }W(!t);                                   
  }}}W((x=x+9&~M)-B);                          
C:if(m>I-M|m<M-I)d=98;                         
  m=m+I?m:-D(24-k,-I,I,0,S,S,1);    
 }                                             
 return m+=m<e;                                
}

main()
{
 int k=8;

 K=8;W(K--)
 {b[K]=(b[K+112]=o[K+24]+8)+8;b[K+16]=18;b[K+96]=9;
  L=8;W(L--)b[16*L+K+8]=(K-4)*(K-4)+(L-3.5)*(L-3.5); 
 }                                                   

 W(1)                                                
 {N=-1;W(++N<121)
   printf(" %c",N&8&&(N+=7)?10:n[b[N]&15]);          
  P=c;W((*P++=getchar())>10);
  K=I;                                               
  if(*c-10)K=*c-16*c[1]+C,L=c[2]-16*c[3]+C;          
  k^=D(k,-I,I,Q,O,8,2)-I?0:24;
 }
}

Uri Blass
Posts: 8921
Joined: Wed Mar 08, 2006 11:37 pm
Location: Tel-Aviv Israel

Re: Rustic vs. Shallow Blue: one of us is weird somehow

Post by Uri Blass » Fri Oct 30, 2020 12:09 pm

mvanthoor wrote:
Fri Oct 30, 2020 10:17 am
hgm wrote:
Fri Oct 30, 2020 9:47 am
I suppose it is buggy. Or poorly tuned. No amount of features is going to help if you have poor heuristic evaluation. It just makes the engine more clever in finding ways to do itself in.

Micro-Max / Fairy-Max has ~1950 Elo CCRL, and hardly has any features at all. Not even full MVV-LVA sorting. The only deviation from move-generation order is that it puts the best move of the previous iteration (as a duplicat) in front of all other moves. And it does a preparatory iteration to identify the move that is best MVV-LVA-wise. It does have a TT to speed up that process a bit.

BTW, there also exists a micro-Max 1.6, which doesn't have a TT, and should be around 1500 Elo.

My engines typically run only at ~1Mnps (single thread). TT probing slows them down very much. Micro-Max 1.6 reaches 6 Mnps. But it is far weaker than the versions without TT, as it is not able to use the nodes well, revisiting each tree position many times.

A non-buggy engine with TT should be able to beat micro-Max easily.
So MicroMax with TT is 450 Elo stronger than MicroMax without TT? Wow. But you did report somewhere that your engines add about +100 or so Elo for each move they can go deeper. So if a TT adds +4 or +5 ply for your engine, it could indeed be the case that a TT adds 450 Elo.
Guenther wrote:
Fri Oct 30, 2020 10:01 am
...
Thanks :) I've seen Drofa, and I know it's derived from Shallow Blue. It seems true that SB has a memory leak. In its features it mentions a TT, but it doesn't report it through UCI, and its speed doesn't suggest that it uses it (or uses it poorly).

I have to finish GameTime mode in my engine and then create a pool of engines between 1500 and 1800 to see where my engine sits in its current state, with no features but MVV-LVA and PSQT's.

Against Shallow Blue it came up 70 Elo short, which would put it at ~1650 CCRL.
Against Pulse (1648 CCL) however, it obtained a +200 score, which would put it at ~1850 CCRL.

So maybe 1750 CCRL tentatively, with huge error bars for now.

If adding a TT adds up to 450 Elo indeed, Rustic could end up in the 2100+ range without doing anything else. I would be very happy starting out like that. (I've decided to add the TT after the killer and heuristics though; first make the tree a little smaller :))

TT add more elo if the engine has a bad order of moves without TT.

No4b
Posts: 57
Joined: Thu Jun 18, 2020 1:21 pm
Location: Moscow
Full name: Alexander Litov

Re: Rustic vs. Shallow Blue: one of us is weird somehow

Post by No4b » Fri Oct 30, 2020 12:38 pm

So, regarding ShallowBlue strength.
I worked a lot to clear up all of bugs of it when working at Drofa 1.0 and some more further (at ~Drofa 1.2 i suppose there is none of legacy SB bugs, only my own), so there is the approximate list:

1) SB move gen is legal, but its very badly implemented. During movegen it is do a makemove on each move, than check if the position legal, and add move too the legalmove list. As you can imagine, it is much faster to just make pseudo-legal movegen and check moves as they go in the move loop.
2) HashTable is very weirdly implemented. It is a list, and positions just adds up here as the search go. Every search instance new list is created (as far as i understand), but the old one is never cleared, leading not only to a wasted of previous search iteration, but memory leaks as well.
3) PawnHashUpdate has a bug where it is not updated when pawn is captured, hence bad play, esp in the endgames.
4) Repetitions are not properly detected in the search
5) DrawRule is only 50ply instead of 100ply.

It is the major ones i could remember, i suppose there were more.

User avatar
mvanthoor
Posts: 598
Joined: Wed Jul 03, 2019 2:42 pm
Full name: Marcel Vanthoor

Re: Rustic vs. Shallow Blue: one of us is weird somehow

Post by mvanthoor » Fri Oct 30, 2020 12:46 pm

hgm wrote:
Fri Oct 30, 2020 11:10 am
... Micro-Max ...
I'm not even going to pretend that I can ever understand this.

Make a contest out of it. Strip all comments and the header, change one of the numbers somewhere, and then ask people to "please help me to debug this" :lol:

Sorry. Couldn't resist.
No4b wrote:
Fri Oct 30, 2020 12:38 pm
So, regarding ShallowBlue strength.
I worked a lot to clear up all of bugs of it when working at Drofa 1.0 and some more further (at ~Drofa 1.2 i suppose there is none of legacy SB bugs, only my own), so there is the approximate list:

1) SB move gen is legal, but its very badly implemented. During movegen it is do a makemove on each move, than check if the position legal, and add move too the legalmove list. As you can imagine, it is much faster to just make pseudo-legal movegen and check moves as they go in the move loop.
2) HashTable is very weirdly implemented. It is a list, and positions just adds up here as the search go. Every search instance new list is created (as far as i understand), but the old one is never cleared, leading not only to a wasted of previous search iteration, but memory leaks as well.
3) PawnHashUpdate has a bug where it is not updated when pawn is captured, hence bad play, esp in the endgames.
4) Repetitions are not properly detected in the search
5) DrawRule is only 50ply instead of 100ply.

It is the major ones i could remember, i suppose there were more.
Thanks. So it's just one bug on top of another. You'll never get a strong engine that way indeed. I've seen a 200 point Elo jump (tentative estimate) by fixing a three-fold repetition bug. Fortunately, that was exactly the function I was working on. I'm convinced that going slow, and extensively (play)testing changes is key to making a chess engine that can actually be very strong some day.

No4b
Posts: 57
Joined: Thu Jun 18, 2020 1:21 pm
Location: Moscow
Full name: Alexander Litov

Re: Rustic vs. Shallow Blue: one of us is weird somehow

Post by No4b » Fri Oct 30, 2020 1:09 pm

mvanthoor wrote:
Fri Oct 30, 2020 12:46 pm
Thanks. So it's just one bug on top of another. You'll never get a strong engine that way indeed. I've seen a 200 point Elo jump (tentative estimate) by fixing a three-fold repetition bug. Fortunately, that was exactly the function I was working on. I'm convinced that going slow, and extensively (play)testing changes is key to making a chess engine that can actually be very strong some day.
Indeed. My guess is that ShallowBlue author did not playtest it enough, because most of this bugs are obvious once you look at the games.

At the same time i must say that rigorous testing can be quite frustrating, because when the engine is already decent new features are often add up close to no elo.
Right now i`m having 8th test to include passed pawn extention in the lategames (and each of my test is ~6-10 k games), and it seems to not go well once again :|

User avatar
mvanthoor
Posts: 598
Joined: Wed Jul 03, 2019 2:42 pm
Full name: Marcel Vanthoor

Re: Rustic vs. Shallow Blue: one of us is weird somehow

Post by mvanthoor » Fri Oct 30, 2020 4:08 pm

No4b wrote:
Fri Oct 30, 2020 1:09 pm
Indeed. My guess is that ShallowBlue author did not playtest it enough, because most of this bugs are obvious once you look at the games.

At the same time i must say that rigorous testing can be quite frustrating, because when the engine is already decent new features are often add up close to no elo.
Right now i`m having 8th test to include passed pawn extention in the lategames (and each of my test is ~6-10 k games), and it seems to not go well once again :|
In my case, my engine starts to play really weird if I add a principal variation at the point where it creates the best moves. I've been testing this for a few evenings now...

Post Reply