Berserk 11 released. Number 2 free engine!?

Discussion of anything and everything relating to chess playing software and machines.

Moderator: Ras

jhonnold
Posts: 122
Joined: Wed Feb 17, 2021 3:16 pm
Full name: Jay Honnold

Re: Berserk 11 released. Number 2 free engine!?

Post by jhonnold »

This was a sanity check to make sure I didn't severely hurt incremental time controls. I'm sure this would have finished at 0.
Uri Blass
Posts: 11108
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: Berserk 11 released. Number 2 free engine!?

Post by Uri Blass »

Jouni wrote: Tue Feb 21, 2023 3:10 pm Also small gain:

STC Inc Regression

ELO | 1.78 +- 5.79 (95%)
SPRT | 8.0+0.08s Threads=1 Hash=8MB
LLR | 1.46 (-2.94, 2.94) [-5.00, 0.00]
GAMES | N: 6456 W: 1529 L: 1496 D: 3431
I do not see a gain based on the result but only no regression.

I expect In most cases more than 50% in SPRT test with these bounds

1.78 +- 5.79 is simply wrong.
This bounds may be correct with fixed number of games and I do not understand why people give these bounds with SPRT tests.
lkaufman
Posts: 6279
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA
Full name: Larry Kaufman

Re: Berserk 11 released. Number 2 free engine!?

Post by lkaufman »

Wolfgang wrote: Tue Feb 21, 2023 2:23 pm Berserk 11.1 with a major bug-fix for anyone testing without an increment or in cyclical TCs.
https://github.com/jhonnold/berserk/releases/tag/11.1

Fortunately we currently test with 4'+2" instead of 40/3 or 40/4 repeated as we did in the past... 8-)
Are you saying that CEGT has switched from 40/3 (or 4) to 4' +2" for all blitz testing? I don't see any mention of this on the website. If so, have you also switched to increment testing to replace 40/20? I can't think of any reason to use increment for blitz but not for Rapid, although I believe CCRL does this (no idea why). Also, doesn't this make the 3' + 1" and 5' + 3" lists rather redundant? I know that they use ponder, but that does not seem to make any meaningful difference when comparing engines, as long as they all have a ponder option.
Komodo rules!
User avatar
Werner
Posts: 3009
Joined: Wed Mar 08, 2006 10:09 pm
Location: Germany
Full name: Werner Schüle

Re: Berserk 11 released. Number 2 free engine!?

Post by Werner »

At the moment I make all the games for our 40/20 list and I did not change to increment testing.
And I did not start testing Berserk 11 - so no problems here.
Last edited by Werner on Tue Feb 21, 2023 6:24 pm, edited 2 times in total.
Werner
User avatar
Graham Banks
Posts: 45043
Joined: Sun Feb 26, 2006 10:52 am
Location: Auckland, NZ

Re: Berserk 11 released. Number 2 free engine!?

Post by Graham Banks »

lkaufman wrote: Tue Feb 21, 2023 5:08 pm
Wolfgang wrote: Tue Feb 21, 2023 2:23 pm Berserk 11.1 with a major bug-fix for anyone testing without an increment or in cyclical TCs.
https://github.com/jhonnold/berserk/releases/tag/11.1

Fortunately we currently test with 4'+2" instead of 40/3 or 40/4 repeated as we did in the past... 8-)
Are you saying that CEGT has switched from 40/3 (or 4) to 4' +2" for all blitz testing? I don't see any mention of this on the website. If so, have you also switched to increment testing to replace 40/20? I can't think of any reason to use increment for blitz but not for Rapid, although I believe CCRL does this (no idea why). Also, doesn't this make the 3' + 1" and 5' + 3" lists rather redundant? I know that they use ponder, but that does not seem to make any meaningful difference when comparing engines, as long as they all have a ponder option.
I test with repeating time controls because many older engines don't understand increments, and in order to maintain a meaningful rating list, there must be correlation between newer and older engines.

Also, I prefer repeating time controls for consistency of quality through the opening, middlegame and endgame phases.
gbanksnz at gmail.com
lkaufman
Posts: 6279
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA
Full name: Larry Kaufman

Re: Berserk 11 released. Number 2 free engine!?

Post by lkaufman »

Graham Banks wrote: Tue Feb 21, 2023 6:23 pm
lkaufman wrote: Tue Feb 21, 2023 5:08 pm
Wolfgang wrote: Tue Feb 21, 2023 2:23 pm Berserk 11.1 with a major bug-fix for anyone testing without an increment or in cyclical TCs.
https://github.com/jhonnold/berserk/releases/tag/11.1

Fortunately we currently test with 4'+2" instead of 40/3 or 40/4 repeated as we did in the past... 8-)
Are you saying that CEGT has switched from 40/3 (or 4) to 4' +2" for all blitz testing? I don't see any mention of this on the website. If so, have you also switched to increment testing to replace 40/20? I can't think of any reason to use increment for blitz but not for Rapid, although I believe CCRL does this (no idea why). Also, doesn't this make the 3' + 1" and 5' + 3" lists rather redundant? I know that they use ponder, but that does not seem to make any meaningful difference when comparing engines, as long as they all have a ponder option.
I test with repeating time controls because many older engines don't understand increments, and in order to maintain a meaningful rating list, there must be correlation between newer and older engines.

Also, I prefer repeating time controls for consistency of quality through the opening, middlegame and endgame phases.
I thought that CCRL had totally switched blitz testing from 40/x to increment, is that wrong? Perhaps you just don't do any blitz testing. The reasons you mention don't seem to have anything to do with blitz vs Rapid, so I don't see any logical reason for running blitz with increment but Rapid with 40/x. Am I missing something? Regarding your last point, in general the engines search much deeper in the endgame in the same amount of time, so by using increment you are actually making the search depths more consistent in the endgame than with 40/x. Human GMs know that it usually is best to take much more time per move in the middle game than in the endgame (if the TC allows for this), and engine tests show the same thing. It is very clear that you get a much higher average quality of search with increment than with 40/x given an equal average total game time. But anyway, this is all the same for blitz and rapid, my main question is why they follow different policies? Or is it just up to the individual tester?
Komodo rules!
Uri Blass
Posts: 11108
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: Berserk 11 released. Number 2 free engine!?

Post by Uri Blass »

CCRL use increment only for blitz
Graham Banks does not test at blitz based on my understanding.

From the CCRL page

1)2'+1" is our fast "blitz" time control as from January 2020, with previous games played at 40/2 repeating time control.

CCRL have the right to do what they want but it make the list less reliable because I understand that the same engine can have games with different time control.

Rating of the engines at different type of time control may be different because it is possible that some engines have bad time management for one of the type of time control.
Uri Blass
Posts: 11108
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: Berserk 11 released. Number 2 free engine!?

Post by Uri Blass »

I also believe that you get a better quality of search with increment than with 40/x but I think that it can be more clear if somebody make a list with different types of time control when for every engine you will have both average time per game and rating so people see that you get higher rating with similiar average time per game with increment(it may be interesting to see the rating difference)

There are interfaces that allow to test with different type of time control for different engines
lkaufman
Posts: 6279
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA
Full name: Larry Kaufman

Re: Berserk 11 released. Number 2 free engine!?

Post by lkaufman »

Uri Blass wrote: Tue Feb 21, 2023 8:27 pm I also believe that you get a better quality of search with increment than with 40/x but I think that it can be more clear if somebody make a list with different types of time control when for every engine you will have both average time per game and rating so people see that you get higher rating with similiar average time per game with increment(it may be interesting to see the rating difference)

There are interfaces that allow to test with different type of time control for different engines
We did such tests with Komodo a couple years ago, and it was very clear that increment beat repeating TC with equal average time per game. I don't have the results anymore.
Komodo rules!
lkaufman
Posts: 6279
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA
Full name: Larry Kaufman

Re: Berserk 11 released. Number 2 free engine!?

Post by lkaufman »

Uri Blass wrote: Tue Feb 21, 2023 8:15 pm CCRL use increment only for blitz
Graham Banks does not test at blitz based on my understanding.

From the CCRL page

1)2'+1" is our fast "blitz" time control as from January 2020, with previous games played at 40/2 repeating time control.

CCRL have the right to do what they want but it make the list less reliable because I understand that the same engine can have games with different time control.

Rating of the engines at different type of time control may be different because it is possible that some engines have bad time management for one of the type of time control.
I know they do this, but I have not seen any explanation or justification for still using repeating TC for Rapid 3 years after making the switch for blitz. I know the arguments for both methods, but they seem unrelated to Rapid vs blitz.
Komodo rules!