Open Source Blitz Rating List: Rodent 0.10

lucasart · Post by **lucasart** » Sat Jan 21, 2012 3:34 pm

Woaw, impressive improvement over the Sungorus codebase (Rodent is derived from Sungorus): +330 elo!!
Still early to say precisely (84 games played) but it seems to be of the level of Crafty, if not even stronger!

Code: Select all

Rank Name                  Elo    +    - games score oppo. draws 
   1 Critter 1.4          3247   33   32   350   74%  3049   34% 
   2 IvanHoe 999946h      3190   31   31   350   64%  3066   37% 
   3 Stockfish 2.2.1      3171   31   30   400   66%  3020   32% 
   4 Protector 1.4        2918   34   34   350   46%  2954   22% 
   5 Umko 1.2             2864   28   28   500   50%  2875   24% 
   6 Toga 1.4.1           2841   26   26   600   55%  2818   22% 
   7 Daydreamer 1.75      2738   27   27   450   60%  2665   29% 
   8 Fruit 2.1            2700   26   26   500   51%  2694   27% 
   9 Rodent 0.10          2679   70   65    84   75%  2491   19% 
  10 Crafty 23.4          2664   29   29   450   37%  2778   24% 
  11 GNU Chess 5.07.173b  2647   27   27   450   46%  2675   26% 
  12 Arasan 13.4          2634   31   31   350   43%  2687   24% 
  13 Pepito 1.59          2586   28   28   450   53%  2560   21% 
  14 Sloppy 0.2.2         2538   24   25   600   44%  2577   23% 
  15 Greko 9.0            2508   24   24   634   47%  2539   21% 
  16 Pawny 0.3.1          2480   26   26   500   51%  2479   21% 
  17 DoubleCheck 2.4      2415   29   29   400   46%  2442   21% 
  18 Olithink 5.3.0       2393   33   33   300   48%  2406   21% 
  19 EXchess 6.0.2        2363   31   31   350   42%  2429   19% 
  20 Sungorus 1.4         2347   29   30   400   36%  2451   21% 
  21 Jazz 501             2323   33   34   300   37%  2418   23%

PK · Post by PK » Sat Jan 21, 2012 3:57 pm

I suppose it will go down a bit, but still the result is surprising. At my computer I barely get 26% against Fruit 2.1 at 30 s per game. Does increment or non-default hash size make such a difference? Does Rodent gain something from scaling? I'll definately have to run some tests.

lucasart · Post by **lucasart** » Sat Jan 21, 2012 4:15 pm

PK wrote:I suppose it will go down a bit, but still the result is surprising. At my computer I barely get 26% against Fruit 2.1 at 30 s per game. Does increment or non-default hash size make such a difference? Does Rodent gain something from scaling? I'll definately have to run some tests.

I'm running a gauntlet tournament in 1min+1sec/move. I'm using a relatively fast computer, so maybe my 1+1 equals 2+2 on your machine or whatever. Rodent is playing 50 games against each engines from Pawn all the way up to Fruit. Right now it's playing Sloppy (103 games played).
Note that cutechess-cli doesn't restart the challenging engine of the gauntlet (which is intentional). So it means that Rodent will have a persistent hash for the whole 400 games. It doesn't make that much difference, and I've applied the same rule to everyone else, so it shouldn't introduce any biais.
If you're curious to see if the results are going down, here's a refresh

Code: Select all

Rank Name                  Elo    +    - games score oppo. draws 
   1 Critter 1.4          3247   32   32   350   74%  3049   34% 
   2 IvanHoe 999946h      3190   31   31   350   64%  3066   37% 
   3 Stockfish 2.2.1      3171   31   30   400   66%  3020   32% 
   4 Protector 1.4        2918   34   34   350   46%  2954   22% 
   5 Umko 1.2             2864   28   28   500   50%  2875   24% 
   6 Toga 1.4.1           2841   26   26   600   55%  2818   22% 
   7 Daydreamer 1.75      2738   27   27   450   60%  2665   29% 
   8 Fruit 2.1            2700   26   26   500   51%  2694   27% 
   9 Rodent 0.10          2672   62   58   103   74%  2495   19% 
  10 Crafty 23.4          2664   29   29   450   37%  2778   24% 
  11 GNU Chess 5.07.173b  2647   27   27   450   46%  2675   26% 
  12 Arasan 13.4          2634   31   31   350   43%  2687   24% 
  13 Pepito 1.59          2586   28   28   450   53%  2560   21% 
  14 Sloppy 0.2.2         2538   24   24   603   44%  2577   23% 
  15 Greko 9.0            2507   24   24   650   46%  2542   21% 
  16 Pawny 0.3.1          2479   26   26   500   51%  2478   21% 
  17 DoubleCheck 2.4      2414   29   29   400   46%  2442   21% 
  18 Olithink 5.3.0       2393   33   33   300   48%  2406   21% 
  19 EXchess 6.0.2        2363   31   31   350   42%  2429   19% 
  20 Sungorus 1.4         2347   29   30   400   36%  2450   21% 
  21 Jazz 501             2323   33   34   300   37%  2417   23%

What typically scales well (from my experience in DoubleCheck 2.4) is aggressive LMR, ie reducing late moves by more than one ply. And Fruit doesn't do that, it only reduces late move by one ply (Fabien called it History pruning, and LMR is nothing more than Aggressive History pruning).

When I tested double reductions in DoubleCheck, I started with fixed node testing (super fast 100,000 nodes per move) and found it was actually... weaker

Then I did 30sec+0.5sec and it performed significantly better, then 1min+1sec. From very simplistic reasoning on complexity, I conjecture it scales as log(T) asymptotically

Also, I see you have a state of the art refined king safety evaluation now. And according to Don Dailey's testing, this typically scales well at long time control, and can perform worse at very short time control.

So yes, I suspect Rodent scales better than Fruit at long time controls. However, it's perfectly possible that Rodent scores badly against Fruit, and that calculating elo with a varied population of engines gives a better result than doing it with only Fruit as an opponent. But that's what rating lists are for

lucasart · Post by **lucasart** » Sat Jan 21, 2012 4:25 pm

PK wrote:Does increment or non-default hash size make such a difference?

Increment perhaps, and I suspect longer time in general. But not non default hash size. Fruit uses 16 and Rodent 32 by default. In my testing they both use 64, so the explanation wouldn't be there.

lucasart · Post by **lucasart** » Sat Jan 21, 2012 4:30 pm

Oh and another thing that could explain the difference (a tiny bit) is the speed of the compile. I used gcc 4.6.1 with full compiling and linking optimizations. And AFAIK nothing beats that, including MSVC++ and ICC. Do you use MSVC++ 64 bit pro or whatever gives efficient compiles ? Did you make sure you were using all the compiler optimisations, not leaving C++ exception crap, run time bounds checking, run time type information, or whatever else they left in there by default ?
=> never trust micro$oft

I can do a 64bit windows compile on my linux machine, using mingw (gcc port for windows). I'll send it to you and you can see which is faster, and whether the speed difference is significant. Interested ?

lucasart · Post by **lucasart** » Sat Jan 21, 2012 5:34 pm

Result after 150 games

Code: Select all

Rank Name                  Elo    +    - games score oppo. draws 
   1 Critter 1.4          3247   33   32   350   74%  3049   34% 
   2 IvanHoe 999946h      3190   31   31   350   64%  3066   37% 
   3 Stockfish 2.2.1      3171   31   30   400   66%  3020   32% 
   4 Protector 1.4        2918   34   34   350   46%  2954   22% 
   5 Umko 1.2             2864   28   28   500   50%  2875   24% 
   6 Toga 1.4.1           2841   26   26   600   55%  2818   22% 
   7 Daydreamer 1.75      2738   27   27   450   60%  2665   29% 
   8 Fruit 2.1            2700   26   26   500   51%  2694   27% 
   9 Crafty 23.4          2664   29   29   450   37%  2778   24% 
  10 GNU Chess 5.07.173b  2647   27   27   450   46%  2675   26% 
  11 Rodent 0.10          2643   49   47   150   69%  2507   21% 
  12 Arasan 13.4          2634   31   31   350   43%  2687   24% 
  13 Pepito 1.59          2585   28   28   450   53%  2559   21% 
  14 Sloppy 0.2.2         2541   23   23   650   44%  2581   23% 
  15 Greko 9.0            2504   24   24   650   46%  2539   21% 
  16 Pawny 0.3.1          2475   26   26   500   51%  2474   21% 
  17 DoubleCheck 2.4      2413   29   29   400   46%  2440   21% 
  18 Olithink 5.3.0       2390   33   33   300   48%  2403   21% 
  19 EXchess 6.0.2        2361   31   32   350   42%  2427   19% 
  20 Sungorus 1.4         2345   29   30   400   36%  2449   21% 
  21 Jazz 501             2320   33   34   300   37%  2415   23%

lucasart · Post by **lucasart** » Sun Jan 22, 2012 3:06 am

Final result after 400 games

Code: Select all

Rank Name                  Elo    +    - games score oppo. draws 
...
   8 Fruit 2.1            2700   25   25   550   52%  2683   27% 
   9 Crafty 23.4          2661   29   29   450   37%  2775   24% 
  10 GNU Chess 5.07.173b  2641   26   26   500   47%  2666   27% 
  11 Arasan 13.4          2631   29   29   400   44%  2674   23% 
  12 Rodent 0.10          2615   28   28   400   55%  2580   25% 
  13 Crafty_23.4          2589   78   79    50   46%  2615   24% 
  14 Pepito 1.59          2583   26   26   500   53%  2559   22% 
  15 Sloppy 0.2.2         2535   23   24   650   44%  2574   23% 
  16 Greko 9.0            2497   24   24   650   46%  2531   21% 
  17 Pawny 0.3.1          2467   26   26   500   51%  2465   21% 
...

Open Source Blitz Rating List: Rodent 0.10

Open Source Blitz Rating List: Rodent 0.10

Re: Open Source Blitz Rating List: Rodent 0.10

Re: Open Source Blitz Rating List: Rodent 0.10

Re: Open Source Blitz Rating List: Rodent 0.10

Re: Open Source Blitz Rating List: Rodent 0.10

Re: Open Source Blitz Rating List: Rodent 0.10

Re: Open Source Blitz Rating List: Rodent 0.10