Intel i7-4770k Quad
ChessGUI
256mb hash each where possible
3-4-5 piece tablebases
Ponder off
LowDraw100.cgb book
40 moves in 16 minutes repeating (adapted for CCRL)
6 cycles 54 rounds
New opening book that is fair, but expected to produce a lowish draw rate.
Alternatively, if you install TLCV (Tom's Live Chess Viewer) on your computer, you can watch the games live move by move. You'll also be able to chat to others following the tournament in the chatroom there. http://kirill-kryukov.com/chess/discuss ... p?id=42959
Host - GrahamCCRL.dyndns.org Port - 16092
chrisw wrote: ↑Mon Sep 04, 2023 5:33 pm
Why was this game declared a draw? They were shuffling, but the "winning" side was going to make a pawn move and change everything sooner or later.
Hi Chris,
I've probably mentioned this before, but I've always used the default ChessGUI adjudication settings:
98% of the time, they're pretty much right.
You just made that statistic up, no?!
Those default values are bonkers and they penalise the "winning" engine in the pair, probably the stronger engine. Result? The entire rating list 40/15 is skewed against the stronger engines, Elos are depressed at the top end.
There were plenty of examples of premature draws I noticed in the past days, but this is the first one I checked on, and, Wow, I'm flabbergasted.
What happens in the the games where strong engines shuffle and then resolves by pushing a pawn as rule 50 approaches? Pawn push changes everything. They're all declared as draws though. This is a really bad inaccuracy and skew bias.
Add to that the elimination of games where an arbitrary lower eval limit isn't breached in the opening and you again skew the rating list by decreasing the possibility that a stronger engine (in the aborted game) then gets a win the next game.
So, two effects we know about:
1. The limit rules are increasing the draw rate
2. Stronger engines are being penalised
You have Komodo assistant author and Berserk author and CSTal author all telling you the limits rules you're using are crazy. CSTal author is telling you the Elo results of the rating lists prepared using these rules are skewed. I suspect Jay and Larry are going to agree with that too.
If you compare the CCRL and CEGT lists, you'll see a similar scenario, in that Stockfish and Dragon by Komodo leave a good gap back to the next 10 or so engines, which are all compressed into a 55-70 Elo range.
Stefan Pohl's list using the Hert500 book also shows this pattern.
That's just the way that things are.
I looked at the first 97 games in this tournament, and find these games that fit into the category you're referring to. It's unfortunate that they both feature CS Tal 2.00:
Graham Banks wrote: ↑Tue Sep 05, 2023 12:10 pm
If you compare the CCRL and CEGT lists, you'll see a similar scenario, in that Stockfish and Dragon by Komodo leave a good gap back to the next 10 or so engines, which are all compressed into a 55-70 Elo range.
Stefan Pohl's list using the Hert500 book also shows this pattern.
That's just the way that things are.
I looked at the first 97 games in this tournament, and find these games that fit into the category you're referring to. It's unfortunate that they both feature CS Tal 2.00:
Graham, it's not relevant what CEGT, Pohl, Hert500, do or show, to the central critique that you are adjudicating draws at 80 cp. Nor is it relevant that "this is the way that it is".
First off 80 cp is way off any draw value and second, engines nowadays are not eval-normalised and 80 cp is no way 80 cp across all engines.
It's completely nuts to adjudicate draws at 80 cp. You can adjudicate a draw if both engines show below 4 or 5 cp for N moves. That's the way it is.
It also makes no sense to be culling games on your 70 cp out of book for N moves criteria. Evals are not consistent. 70 cp is not 70 cp. You should leave it up to the book to decide if it's lines are okay or not. I know many book are just amateur affairs and could contain anything - if you want a book with start move controlled to some normalised cp value, send me the pgns or epds and I'll filter them for you using SF15 or some other accurate/normalised evaluator.
chrisw wrote: ↑Tue Sep 05, 2023 11:31 am
Those default values are bonkers and they penalise the "winning" engine in the pair, probably the stronger engine.
I totally agree that the 80cp value is bonkers.
chrisw wrote: ↑Tue Sep 05, 2023 11:31 am
Result? The entire rating list 40/15 is skewed against the stronger engines, Elos are depressed at the top end.
That depends entirely on the number of incorrect adjudications. If the number is tiny, then the effect may not be as great as you think. Graham probably doesn't have any facts to back up his "98% are OK" number, but I don't think you have any facts either to be making claim such as "the entire ratings list is skewed" etc etc.
Maybe Graham might be convinced to change that 80cp value going forward at least.
chrisw wrote: ↑Tue Sep 05, 2023 11:31 am
Those default values are bonkers and they penalise the "winning" engine in the pair, probably the stronger engine.
I totally agree that the 80cp value is bonkers.
chrisw wrote: ↑Tue Sep 05, 2023 11:31 am
Result? The entire rating list 40/15 is skewed against the stronger engines, Elos are depressed at the top end.
That depends entirely on the number of incorrect adjudications. If the number is tiny, then the effect may not be as great as you think. Graham probably doesn't have any facts to back up his "98% are OK" number, but I don't think you have any facts either to be making claim such as "the entire ratings list is skewed" etc etc.
Maybe Graham might be convinced to change that 80cp value going forward at least.
Chris doesn't state how skewed the list is due to the 80 cp adjudication; clearly it is skewed by it, but perhaps not dramatically so. Yes, the solution is to change the 80cp value to some tiny number or even 0 in the future, but I think this should be done simultaneously with switching to increment play for the Rapid games (as is already done for blitz), because the two issue are related. Using a minimal adjudication threshhold will result in many more super-long games, which will be very boring and a huge waste of resources with 40/x repeating time controls. One of the big benefits of increment is that games can be played to the end without need for adjudication. We already benefitted from this in human chess; I had a major tournament game adjudicated by none other than Bobbby Fischer around 1965; now with increment we just play to the end. Perhaps standardizing the opening book to whatever you settle on here could also be done at the same time, with the old list archived as you do with blitz. Mixing games played with a low-draw book together with games played with normal books is statistically unsound. If all three issues are fixed at once, the new list would be much sounder and very highly respected.
I have no inclination or desire to start, or to be involved with, a new rating list.
If/when I stop my CCRL testing, I'll happily continue to run my Amateur Series tournaments for those still interested, although probably with an incremental time control.
I run engine v engine testing because I enjoy it - the tournaments in particular.
I like watching some of the games, which is why bullet or blitz hold no interest for me whatsoever.
Graham Banks wrote: ↑Wed Sep 06, 2023 2:07 am
I have no inclination or desire to start, or to be involved with, a new rating list.
If/when I stop my CCRL testing, I'll happily continue to run my Amateur Series tournaments for those still interested, although probably with an incremental time control.
I run engine v engine testing because I enjoy it - the tournaments in particular.
I like watching some of the games, which is why bullet or blitz hold no interest for me whatsoever.
No one was suggesting blitz games to replace Rapid. My suggestion was to retain the current pace of your 40/15 games for the first 60 moves or so, then use increment to speed up the long endgames (which are usually drawn) a bit, especially if the adjudication rules are tightened. Maybe it wouldn't even need a new list, perhaps the change would be deemed minor enough to combine them. I would think that would make them more enjoyable to watch with no loss in average quality.
Graham Banks wrote: ↑Wed Sep 06, 2023 2:07 am
I have no inclination or desire to start, or to be involved with, a new rating list.
If/when I stop my CCRL testing, I'll happily continue to run my Amateur Series tournaments for those still interested, although probably with an incremental time control.
I run engine v engine testing because I enjoy it - the tournaments in particular.
I like watching some of the games, which is why bullet or blitz hold no interest for me whatsoever.
No one was suggesting blitz games to replace Rapid. My suggestion was to retain the current pace of your 40/15 games for the first 60 moves or so, then use increment to speed up the long endgames (which are usually drawn) a bit, especially if the adjudication rules are tightened. Maybe it wouldn't even need a new list, perhaps the change would be deemed minor enough to combine them. I would think that would make them more enjoyable to watch with no loss in average quality.
Do you think that 30 minutes with 10 second increments would be on a par with 40/15 games, which on average take 50 minutes to complete (if using 40/16)?
On my 5950x, I use 40/11 repeating, with the average games taking around 35 minutes, so I'm guessing that would be about 20 minutes with 7 second increments?
I could also drop the draw adjudication from 10 consecutive moves past move 60 with less than 80, to less than 30.
I do draw the line at using what I perceive to be unfair opening lines though.
Graham Banks wrote: ↑Wed Sep 06, 2023 2:07 am
I have no inclination or desire to start, or to be involved with, a new rating list.
If/when I stop my CCRL testing, I'll happily continue to run my Amateur Series tournaments for those still interested, although probably with an incremental time control.
I run engine v engine testing because I enjoy it - the tournaments in particular.
I like watching some of the games, which is why bullet or blitz hold no interest for me whatsoever.
No one was suggesting blitz games to replace Rapid. My suggestion was to retain the current pace of your 40/15 games for the first 60 moves or so, then use increment to speed up the long endgames (which are usually drawn) a bit, especially if the adjudication rules are tightened. Maybe it wouldn't even need a new list, perhaps the change would be deemed minor enough to combine them. I would think that would make them more enjoyable to watch with no loss in average quality.
Do you think that 30 minutes with 10 second increments would be on a par with 40/15 games, which on average take 50 minutes to complete (if using 40/16)?
On my 5950x, I use 40/11 repeating, with the average games taking around 35 minutes, so I'm guessing that would be about 20 minutes with 7 second increments?
I could also drop the draw adjudication from 10 consecutive moves past move 60 with less than 80, to less than 30.
I do draw the line at using what I perceive to be unfair opening lines though.
You want to negotiate the value of a draw down from 80 to 30? This is like you say 2+2=5, we say 2+2=4 and you offer to compromise on 2+2=4.5
Draw = 0.0, maybe +/-5 if using engine output, some of whom appear to add some small random value to 0.0 during search.
In the context of a back to back opening book, there is no concept of an unfair opening line. The correct term is unbalanced. Unbalanced opening book are perfectly "fair", as long as each side gets to play each opening from both black and white perspectives.
You can use the term grotesquely unbalanced, where the white side will always win - these lines are useless in an opening book because they provide no information, so there is a case for culling grotesquely unbalanced lines.
Data on openings? Well you can do an SF15 eval on the exit point. You can get a result-based value from a suitable number of samples of the line found in comp-comp games. Two values in fact, whitebias = (W+D) / (W+D+L), and nondrawrate = (W+L) / (W+D+L)
Criteria for being in book:
Sufficient number of samples (I am using 50 plus)
The highest nondrawrates
If SFeval < BORING_EVAL_LO then cull
if SFeval > GROTESQUE_EVAL_HI then cull
if whitebias > TOO_MUCH_WHITEBIAS then cull
Then we don't need to do any culling where you stop the game in the first N moves and restart with another opening, because that culling is done at the book building stage.