Trouble with the new Arasan 23.1

Discussion of computer chess matches and engine tournaments.

Moderator: Ras

User avatar
Rebel
Posts: 7387
Joined: Thu Aug 18, 2011 12:04 pm
Full name: Ed Schröder

Trouble with the new Arasan 23.1

Post by Rebel »

Arasan 23.1 : -40 elo

Code: Select all

Gambit Rating List
Running      : Gauntlet Arasan 23.1
Time Control : Time control : 40/120
Games        : 1000

Results from file gauntlet-arasan.pgn:

No. Name            Win Draw Loss Unf.  Score Games       %
-----------------------------------------------------------
  1 Arasan 23.1    +191 =402 -407   *0  392.0  1000   39.2%
  2 Berserk 5       +89  =75  -36   *0  126.5   200   63.2%
  3 Ethereal 12.50  +87  =78  -35   *0  126.0   200   63.0%
  4 Pedone 3.1      +82  =82  -36   *0  123.0   200   61.5%
  5 SlowChess 2.4   +73  =91  -36   *0  118.5   200   59.2%
  6 Komodo 11       +76  =76  -48   *0  114.0   200   57.0%

Total Games:    1000
White Wins:      291 (29.1%)
Black Wins:      307 (30.7%)
Draws:           402 (40.2%)
Unfinished:        0 (0.0%)

Estimated elo gain for Arasan_23.1
Elo pool : 3345
Arasan 23.0.1 : 3321.0
Arasan_23.1 : 3280.8
Difference : -40.2
I don't understand.

A bullet match (40/10) ended in favor of the previous Arasan :?:

Code: Select all

No. Name           Win Draw Loss Unf.  Score Games       %
----------------------------------------------------------
  1 Arasan 23.0.1  +85  =74  -41   *0  122.0   200   61.0%
  2 Arasan 23.1    +41  =74  -85   *0   78.0   200   39.0%
If you are reading Jon...

Both versions used the AVX2 compile.
90% of coding is debugging, the other 10% is writing bugs.
User avatar
pohl4711
Posts: 2811
Joined: Sat Sep 03, 2011 7:25 am
Location: Berlin, Germany
Full name: Stefan Pohl

Re: Trouble with the new Arasan 23.1

Post by pohl4711 »

I started my ratinglist-testrun of Arasan 23.1 (avx2, of course). If I see a clear regression, too, I will report (and abort the testrun)...

What I can say is, the annoying message in cutechess-cli after each game played by Arasan "EngineProcess: Process destroyed while engine still running" is not fixed. Arasan is the only engine, where this warning is printed. Again, and again, and again... Very annoying!
User avatar
pohl4711
Posts: 2811
Joined: Sat Sep 03, 2011 7:25 am
Location: Berlin, Germany
Full name: Stefan Pohl

Re: Trouble with the new Arasan 23.1

Post by pohl4711 »

Testrun of Arasan 23.1 aborted: Ed Schroeder measured a huge regression to Arasan 23.0.1 and so do I: -25 Elo for Arasan 23.1 after 350 games compared to Arasan 23.0.1.
User avatar
Gabor Szots
Posts: 1466
Joined: Sat Jul 21, 2018 7:43 am
Location: Budapest, Hungary
Full name: Gabor Szots

Re: Trouble with the new Arasan 23.1

Post by Gabor Szots »

Here 23.1 is beating 23.0.1 handsomely, but only a couple of games played. However, the network should be set explicitly, by default it does not use one and displays a message to that effect.

EDIT: They look about equal now, too few games to form an opinion.
Gabor Szots
CCRL testing group
Wolfgang
Posts: 989
Joined: Sat May 13, 2006 1:08 am

Re: Trouble with the new Arasan 23.1

Post by Wolfgang »

Started a "quick and dirty" test @ 3'+1" without Ponder vs.
Koivisto 7.0
Revenge 1.0
Xiphos 0.6

100 games each, 5moves openings, Intel i5 Haswell @ 3,3 GHZ with avx2 compile. Net is set automatically by BanksiaGUI

200 games played, ~~+30.
Too few games of course... :wink: but if this is confirmed after the third match, I'll start a complete test
Best
Wolfgang
CEGT-Team
www.cegt.net
www.cegt.forumieren.com
Wolfgang
Posts: 989
Joined: Sat May 13, 2006 1:08 am

Re: Trouble with the new Arasan 23.1

Post by Wolfgang »

After 300 games: +24
These matches were played with Linux-Version under Mint 20.2 Uma and Banksia-GUI for Linux.

I'll repeat them with my "normal" setup under Windows 10, ShredderClassic 13 and same testsuite. PC is comparable (bit faster), i7-Haswell @ 3,8 GHZ
Best
Wolfgang
CEGT-Team
www.cegt.net
www.cegt.forumieren.com
User avatar
Gabor Szots
Posts: 1466
Joined: Sat Jul 21, 2018 7:43 am
Location: Budapest, Hungary
Full name: Gabor Szots

Re: Trouble with the new Arasan 23.1

Post by Gabor Szots »

I regret to report that my recent test, using CCRL blitz time control, also shows regression.
Gabor Szots
CCRL testing group
jdart
Posts: 4408
Joined: Fri Mar 10, 2006 5:23 am
Location: http://www.arasanchess.org

Re: Trouble with the new Arasan 23.1

Post by jdart »

I am also doing a little more testing. FYI, Linux is primary development and testing environment now, and 23.1 scored about +30 ELO above 23.0 in that environment (2000 games). However, a short test on Windows I did showed 23.0 scoring above 23.1 (not significant, though). Puzzling to me since "bench" shows 23.1 is a little faster, and the new code + network scored well on Linux.
User avatar
RubiChess
Posts: 644
Joined: Fri Mar 30, 2018 7:20 am
Full name: Andreas Matthies

Re: Trouble with the new Arasan 23.1

Post by RubiChess »

Hi Jon.

My tests show that arasan-d9-10-20211029.nnue (23.1) is clearly worse than arasan-d8-9-20210827.nnue (23.0.1).
The speed gain of 23.1 seems to be bigger in the Linux build than in the Windows build so it is eaten up by the bad network in the Windows build but not (completely) in the Linux one.

Regards, Andreas
User avatar
pohl4711
Posts: 2811
Joined: Sat Sep 03, 2011 7:25 am
Location: Berlin, Germany
Full name: Stefan Pohl

Re: Trouble with the new Arasan 23.1

Post by pohl4711 »

jdart wrote: Sat Nov 13, 2021 9:20 pm I am also doing a little more testing. FYI, Linux is primary development and testing environment now, and 23.1 scored about +30 ELO above 23.0 in that environment (2000 games). However, a short test on Windows I did showed 23.0 scoring above 23.1 (not significant, though). Puzzling to me since "bench" shows 23.1 is a little faster, and the new code + network scored well on Linux.
To be sure about the regression, I played more than 2200 games (vs. 7 opponents). In my ratinglist, it would look like this:

Code: Select all

  16 Seer 2.3.0 avx           : 3374    6    6  9324    56.1 %   3329   51.4 %
  17 Arasan 23.0.1 avx2       : 3347    6    6  9000    55.8 %   3304   49.6 %
  18 Scorpio 3.0.14d cpu      : 3339    7    7  7000    64.0 %   3236   44.6 %
  19 Gogobello 3 avx2         : 3305    6    6 10000    49.6 %   3309   53.8 %
  20 Arasan 23.1 avx2         : 3305   11   11  2278    44.6 %   3346   44.2 %
  21 Minic 3.17 znver3        : 3295    6    6  8326    50.4 %   3293   46.7 %
So, a clear regression: -42 Elo (but all played games were OK (no timelosses, crashes or disconnects in cutechess-cli))