AlphaZero No Castling Chess

pohl4711 · Post by **pohl4711** » Mon Dec 09, 2019 5:31 am

Nordlandia wrote: ↑Mon Dec 09, 2019 5:10 am Engines need to know that they're playing armageddon. So they need to be taught playing that mode.

That is false. In Blackmageddon, black is one pawn ahead: The engine, which plays white, has a huge negative contempt, because black has one pawn more and the evaluation of the engine is clearly negative. And the engine, which plays black, has a huge positive contempt, because black has one pawn more and the evaluation of the engine is clearly positive. So white will try to reach a forced draw and black will try to avoid it...
That is one big advantage of Blackmageddon: A material advantage is much more stable in its effect on the engine-evals and works like a huge positive contempt for black and a huge negative contempt for white.
So, Blackmageddon works fine, using it with engines, which do not know, that they play Blackmageddon. Because they do know, how to count material on the board. And my testiruns proof, that it works. The results are just amazing.

lkaufman · Post by **lkaufman** » Mon Dec 09, 2019 5:57 am

Laskos wrote: ↑Mon Dec 09, 2019 2:16 am
lkaufman wrote: ↑Sun Dec 08, 2019 5:10 pm
Laskos wrote: ↑Sun Dec 08, 2019 8:36 am
lkaufman wrote: ↑Sun Dec 08, 2019 6:50 am
Laskos wrote: ↑Sat Dec 07, 2019 11:36 pm
I am on the phone and TeamViewer now, but the result with

Contempt = 75
White Contempt = True

is almost identical:

60+0.6
White wins: 99/200
Draws: 94/200
Black Wins: 7/200

The openings were built with the White Contempt = 50 (and no Black castling), so I would not worry too much about them. An interesting result, I guess a much larger White Contempt would even harm the White. So, Komodo seems to consider this a very simple and balanced Armageddon variant.
OK, so far quite remarkable that this ultra-simple rule could be so balanced. I downloaded your nbc book and will run my own tests on it; the big question is whether it will tilt too much one way with more time (although your forty games at 240 +2.4" said no). I can also test it on Komodo MCTS and on Stockfish (since Stockfish also has similar contempt settings); I guess there's less point to checking it out on Lc0 without a way to make it go for wins as White and draws as Black. I checked out the initial position with several engines. Komodo gives it a bit over +1, and SF about +1.3, both of which are well above what I consider the win/draw threshold on those engines (roughly 0.7 on K and 1.0 on SF). However the nn engines showed about 71% win prob. (72.5% for two top Lc0 networks, 69% for Fat Fritz), which is right about where the win/draw line should fall. I still think that in human play White will score well over 50%, but that is due to psychological reasons rather than objective ones; it's simply more fun to look for wins than to try to avoid them, and being unhappy is not conducive to best results. But we can't model this, so I think the best we can do is to hope that White scores a bit under 50% in engine play with more time, so that the human bias for pleasant positions won't give White too big a score in human play.
I tested at longer TC in 100 games overnight, it came completely even with 0 Black wins.

240+2.4
White wins: 50/100
Draws: 50/100
Black Wins: 0/100

One issue would be that these are self-plays of identical Komodos. But I think the performances being so stable with TC, even if strong humans would do better as White, the White performance would hover in human games at say 60% irrespective of time control and even strength (well, strong and very strong humans anyway).

I am now testing the resolving power of this simple variant pitting Komodo at 60+0.6 versus Komodo at 40+0.4, and comparing the result to standard chess result. Also, the White performance will be again interesting to see with these unequal opponents. Probably 400 games. Now I am controlling my PC remotely on the phone, so troubles copying and pasting here might occur.
I also ran it overnight, and my results agree remarkably well with yours. I used 300 +3", my I7 laptop runs at 4.9 Ghz (is yours also pretty fast?). My result was 106 White wins, 103 draws, 5 Black wins, so with Armageddon scoring 106 to 108. So no real trend for the results with more time, and the trivial Black win is at least in the direction I hoped, since White will probably do better in human play.

This is really remarkable. I gave up on the idea too quickly when I first thought of it because the SF and Komodo evals were too high, but it seems that they are just wrong. Looks like the NNs got it right. So we now have a variant of chess that appears to be perfectly balanced (at least between engines), has no draws, and can be explained in one short sentence ("Black cannot castle but wins draws"). I guess the NBC Armageddon name is the best one. The fact that Kramnik and AlphaZero have just promoted no castling chess in a big way makes this incredibly timely!

I'll have to test it with K mcts and with SF.
With SF and Lc0 there is an issue that they have no Colored Contempt. Lc0 has no Contempt at all and from my past experience, doesn't excel in variants deviating from Chess, although here the deviation is small. For SF I used the default Contempt, setting it much higher will make Black chasing wins only too, which is bad. I got very similar result (I even wonder how all these results fall very close one to another, not much of even statistical fluctuations):

200 games at 60 + 0.6, SF self-play, default Contempt, NBC_473 suite:

White wins: 98
Draws: 96
Black Wins: 6

This thing seems remarkably stable. In my patzer view, without any "tradition" in playing serious Chess, this variant seem better than Chess .
Isn't it exciting? No draws. White cannot "play safe" like the usual "if not a win, a draw will do", and Black finding itself in constant danger, having to play very accurately and neutralize White. Asymmetry is also appealing, especially in side and reversed, to show different faces of the players. But side and reversed seems to not even be more necessary than in regular Chess, as the Armageddon score is close enough to 50% for equal players both sides (55% for White in regular Chess). Just ensure that overall in the tournament, the player gets a fair amount of White and Black.

Hope that these experiments hold in human games, where the psychology is a large factor. I guess the level of playing of Komodo and SF at these time controls is quite high, but top humans can develop an opening theory for this variant well beyond the opening play of the engines at short tc, and get a bit different view, aside psychological factors like the taste for playing as White.

Also to note that NBC Armageddon has a binomial type scoring, but side and reversed, a trinomial variance can be applied to pairs of games (0,1,2 results in pairs of games for one of the players), especially in engine matches, which will reduce significantly the error margins for these unbalanced openings. The resolving power of NBC Armageddon seems significantly higher than that of regular Chess.

My test with Komodo MCTS on four threads at 1' +.6" was a bit disappointing in that White clearly won the match. Result 186 White wins, 25 Draws, 130 Black wins, so 186 to 155 or 54.6%. Still, combined with the normal Komodo results we have about 52%, quite an acceptable score. When I tried actually having several versions of Komodo, SF, and Lc0 play it out with no book at 5" per move and watched the games, my general impression was that White was rarely keeping the large claimed initial advantage (excluding any Contempt), and that if I had to bet I would bet that the start position is a theoretical draw rather than a White win. Although the psychological advantage is surely with White, I see now that White has to play with great energy to avoid watching his edge gradually slip away, and I'm no longer so sure which side I'd choose in a million dollar game. It partly depends on the time limit; in fast human play there are few draws, so White is surely for choice, but at more serious time limits it could tilt the other way.

Although I've invested nearly sixty years in playing literally thousands of standard tournament games, I'm somewhat inclined to share your "patzer view". I would certainly love to play in a serious tournament of NBC (maybe we don't need to say Armageddon, I suppose it's rather implied, why else would you play NBC except perhaps for engine tests with color reversal?). Although new opening theory would of course develop, it wouldn't develop nearly so quickly and accurately as in normal chess, because right now the engines aren't good at judging what is enough to win and what is not, they are much better at picking the most promising moves when none of them are winning yet. Of course if NBC caught on then eventually engines would be optimized to play it well, and then opening analysis would be of high quality. But there is hardly any chance of the game getting played out the way many openings are in normal chess; all variations would lead to clear advantages for White but probably not to clearly winning advantages; the middlegame will always be on the knife-edge between a win and a draw.

At the very least, NBC should replace time-odds Armageddon as the tiebreak in human chess, at some rapid but not blitz time control.

Ovyron · Post by **Ovyron** » Mon Dec 09, 2019 7:34 am

Sorry, NBC is already taken

So I think we need a new catchy name for this variant, specially if as I hope it ends replacing chess. What about something related to Armageddon? Apocalypse? Doomsday? Ragnarok? Eschaton? Fimbulwinter? Götterdämmerung? There has to be something.

Armageddon doesn't have "draw odds" included on the name so the name of this shouldn't need mention of castling.

Javier Ros · Post by **Javier Ros** » Mon Dec 09, 2019 11:11 am

Nordlandia wrote: ↑Mon Dec 09, 2019 5:10 am Engines need to know that they're playing armageddon. So they need to be taught playing that mode.

I agree, in the same way AlphaZero has been trained to play No Castling Chess, Lc0 must learn No Castling Chess or BNC with draw advantage, while alpha-beta programs must be also modified. A new opening repertoire will be created for each variation of chess. The contempt factor for each side is not enough.

The experiments of Laskos and Larry Kaufman are very interesting but when the programs take into account the draw advantage the results will vary.
Anyway the proposal seems balanced and acceptable.

lkaufman · Post by **lkaufman** » Mon Dec 09, 2019 5:37 pm

Javier Ros wrote: ↑Mon Dec 09, 2019 11:11 am
Nordlandia wrote: ↑Mon Dec 09, 2019 5:10 am Engines need to know that they're playing armageddon. So they need to be taught playing that mode.
I agree, in the same way AlphaZero has been trained to play No Castling Chess, Lc0 must learn No Castling Chess or BNC with draw advantage, while alpha-beta programs must be also modified. A new opening repertoire will be created for each variation of chess. The contempt factor for each side is not enough.

The experiments of Laskos and Larry Kaufman are very interesting but when the programs take into account the draw advantage the results will vary.
Anyway the proposal seems balanced and acceptable.

Although knowing the Armageddon rule is ideal, I'm pretty sure that using a White Contempt of 75 in Komodo comes close enough to this for most practical purposes. I think that the results indicate that the exact value doesn't matter much, because as the value goes higher, both sides modify their play more towards the Armageddon rule, and the effects cancel out.
I added Fat Fritz to the experiment, although it doesn't have Contempt so you may consider the result less reliable than the Komodo results. It ran on an RTX 2080 at 1' + 1". Result: 177 White wins, 8 Black wins, 185 draws, so 177/370 points = 47.8%. So this brings the overall results for all engines tested down to between 50% and 51% (depending on how you weight them). It is really amazing how fair this variant appears to be, at least between engines!

lkaufman · Post by **lkaufman** » Mon Dec 09, 2019 5:40 pm

Ovyron wrote: ↑Mon Dec 09, 2019 7:34 am Sorry, NBC is already taken

So I think we need a new catchy name for this variant, specially if as I hope it ends replacing chess. What about something related to Armageddon? Apocalypse? Doomsday? Ragnarok? Eschaton? Fimbulwinter? Götterdämmerung? There has to be something.

Armageddon doesn't have "draw odds" included on the name so the name of this shouldn't need mention of castling.

I think as long as we call it NBC chess, we are safe from copyright issues. I like that the name tells people how to play it.

lkaufman · Post by **lkaufman** » Tue Dec 10, 2019 7:07 am

lkaufman wrote: ↑Mon Dec 09, 2019 5:37 pm
Javier Ros wrote: ↑Mon Dec 09, 2019 11:11 am
Nordlandia wrote: ↑Mon Dec 09, 2019 5:10 am Engines need to know that they're playing armageddon. So they need to be taught playing that mode.
I agree, in the same way AlphaZero has been trained to play No Castling Chess, Lc0 must learn No Castling Chess or BNC with draw advantage, while alpha-beta programs must be also modified. A new opening repertoire will be created for each variation of chess. The contempt factor for each side is not enough.

The experiments of Laskos and Larry Kaufman are very interesting but when the programs take into account the draw advantage the results will vary.
Anyway the proposal seems balanced and acceptable.
Although knowing the Armageddon rule is ideal, I'm pretty sure that using a White Contempt of 75 in Komodo comes close enough to this for most practical purposes. I think that the results indicate that the exact value doesn't matter much, because as the value goes higher, both sides modify their play more towards the Armageddon rule, and the effects cancel out.
I added Fat Fritz to the experiment, although it doesn't have Contempt so you may consider the result less reliable than the Komodo results. It ran on an RTX 2080 at 1' + 1". Result: 177 White wins, 8 Black wins, 185 draws, so 177/370 points = 47.8%. So this brings the overall results for all engines tested down to between 50% and 51% (depending on how you weight them). It is really amazing how fair this variant appears to be, at least between engines!

I realized that it may be a flaw in the testing of this idea to run only identical or very similar engines against each other. In the real world, engines and humans don't generally play against clones, this is not a proper test. So I'm running some unrelated engine matches. They don't have to be of equal strength, as long as they are within a hundred elo or so of each other this should still work, since each side gets half White and half Black. Of course if the engines were a thousand elo apart the result would come out 50% for each color since the stronger engine would win all the games, half with each color, but with moderate elo gaps any White-Black bias should show up.
My first test was Stockfish 10 on 6 fast cpu cores vs. Fat Fritz on RTX 2080, at 1' +0.6". Stockfish won the match 60 to 40, but that's not what matters. White won 58 games, with 42 draws and not a single Black win for either engine! This is a bit worrisome, as 58 to 42 is rather significant. We'll have to see how other unrelated pairings come out. It may turn out that NBC Armageddon isn't as fair as we thought, in which case we can fall back on NBSC, no Black short castling, which would obviously raise Black's prospects. But let's wait for results first.

Laskos · Post by **Laskos** » Tue Dec 10, 2019 8:32 am

lkaufman wrote: ↑Tue Dec 10, 2019 7:07 am
lkaufman wrote: ↑Mon Dec 09, 2019 5:37 pm
Javier Ros wrote: ↑Mon Dec 09, 2019 11:11 am
Nordlandia wrote: ↑Mon Dec 09, 2019 5:10 am Engines need to know that they're playing armageddon. So they need to be taught playing that mode.
I agree, in the same way AlphaZero has been trained to play No Castling Chess, Lc0 must learn No Castling Chess or BNC with draw advantage, while alpha-beta programs must be also modified. A new opening repertoire will be created for each variation of chess. The contempt factor for each side is not enough.

The experiments of Laskos and Larry Kaufman are very interesting but when the programs take into account the draw advantage the results will vary.
Anyway the proposal seems balanced and acceptable.
Although knowing the Armageddon rule is ideal, I'm pretty sure that using a White Contempt of 75 in Komodo comes close enough to this for most practical purposes. I think that the results indicate that the exact value doesn't matter much, because as the value goes higher, both sides modify their play more towards the Armageddon rule, and the effects cancel out.
I added Fat Fritz to the experiment, although it doesn't have Contempt so you may consider the result less reliable than the Komodo results. It ran on an RTX 2080 at 1' + 1". Result: 177 White wins, 8 Black wins, 185 draws, so 177/370 points = 47.8%. So this brings the overall results for all engines tested down to between 50% and 51% (depending on how you weight them). It is really amazing how fair this variant appears to be, at least between engines!
I realized that it may be a flaw in the testing of this idea to run only identical or very similar engines against each other. In the real world, engines and humans don't generally play against clones, this is not a proper test. So I'm running some unrelated engine matches. They don't have to be of equal strength, as long as they are within a hundred elo or so of each other this should still work, since each side gets half White and half Black. Of course if the engines were a thousand elo apart the result would come out 50% for each color since the stronger engine would win all the games, half with each color, but with moderate elo gaps any White-Black bias should show up.
My first test was Stockfish 10 on 6 fast cpu cores vs. Fat Fritz on RTX 2080, at 1' +0.6". Stockfish won the match 60 to 40, but that's not what matters. White won 58 games, with 42 draws and not a single Black win for either engine! This is a bit worrisome, as 58 to 42 is rather significant. We'll have to see how other unrelated pairings come out. It may turn out that NBC Armageddon isn't as fair as we thought, in which case we can fall back on NBSC, no Black short castling, which would obviously raise Black's prospects. But let's wait for results first.

I had 600-700 games of Komodo against SF and Lc0, and the result was White again at very close to 50% overall. You will probably see a similar thing, 100 ganes are simply too few.
I will also try to see how the weaker, Master level, say 2000 Elo adversaries behave. One way to simulate that is to take top AB engines to say depths 5-7 or so.

By the way, you seem to have a monster laptop :) . 4.9 GHz cores and RTX 2080 compared to my desktop 4.0 GHz cores and RTX 2070 (OC-ed). But I use 2 coolers and additionally 3 case fans. I can only wonder how your laptop works.

Ovyron · Post by **Ovyron** » Tue Dec 10, 2019 8:49 am

lkaufman wrote: ↑Mon Dec 09, 2019 5:40 pm I like that the name tells people how to play it.

No, NBC doesn't say anything, people would just ask "what does NBC stand for?" Except they won't because it already stands for something.

So how about a nice game of National Broadcasting Company?

(this isn't about copyright, it's about clarity)

Laskos · Post by **Laskos** » Tue Dec 10, 2019 1:26 pm

lkaufman wrote: ↑Tue Dec 10, 2019 7:07 am
lkaufman wrote: ↑Mon Dec 09, 2019 5:37 pm
Javier Ros wrote: ↑Mon Dec 09, 2019 11:11 am
Nordlandia wrote: ↑Mon Dec 09, 2019 5:10 am Engines need to know that they're playing armageddon. So they need to be taught playing that mode.
I agree, in the same way AlphaZero has been trained to play No Castling Chess, Lc0 must learn No Castling Chess or BNC with draw advantage, while alpha-beta programs must be also modified. A new opening repertoire will be created for each variation of chess. The contempt factor for each side is not enough.

The experiments of Laskos and Larry Kaufman are very interesting but when the programs take into account the draw advantage the results will vary.
Anyway the proposal seems balanced and acceptable.
Although knowing the Armageddon rule is ideal, I'm pretty sure that using a White Contempt of 75 in Komodo comes close enough to this for most practical purposes. I think that the results indicate that the exact value doesn't matter much, because as the value goes higher, both sides modify their play more towards the Armageddon rule, and the effects cancel out.
I added Fat Fritz to the experiment, although it doesn't have Contempt so you may consider the result less reliable than the Komodo results. It ran on an RTX 2080 at 1' + 1". Result: 177 White wins, 8 Black wins, 185 draws, so 177/370 points = 47.8%. So this brings the overall results for all engines tested down to between 50% and 51% (depending on how you weight them). It is really amazing how fair this variant appears to be, at least between engines!
I realized that it may be a flaw in the testing of this idea to run only identical or very similar engines against each other. In the real world, engines and humans don't generally play against clones, this is not a proper test. So I'm running some unrelated engine matches. They don't have to be of equal strength, as long as they are within a hundred elo or so of each other this should still work, since each side gets half White and half Black. Of course if the engines were a thousand elo apart the result would come out 50% for each color since the stronger engine would win all the games, half with each color, but with moderate elo gaps any White-Black bias should show up.
My first test was Stockfish 10 on 6 fast cpu cores vs. Fat Fritz on RTX 2080, at 1' +0.6". Stockfish won the match 60 to 40, but that's not what matters. White won 58 games, with 42 draws and not a single Black win for either engine! This is a bit worrisome, as 58 to 42 is rather significant. We'll have to see how other unrelated pairings come out. It may turn out that NBC Armageddon isn't as fair as we thought, in which case we can fall back on NBSC, no Black short castling, which would obviously raise Black's prospects. But let's wait for results first.

I had this in the morning, a RR of 600 games with top 3 engines at 60 + 0.6:

Code: Select all

   # PLAYER        : RATING  ERROR    POINTS  PLAYED     (%)   CFS(next)
   1 SF_9          :  46.58  19.48     232.5     400    58.1      95    
   2 Houdini_6     :  19.35  18.85     213.5     400    53.4     100    
   3 Komodo_131    : -65.93  19.52     154.0     400    38.5     ---    

White advantage = 157.98 +/- 11.35
Draw rate (equal opponents) = 44.33 % +/- 2.40

Komodo surprisingly performs quite poorly, although I put White Contempt = 75, and the default small Contempt for the other two, as they have no Colored Contempt. I hope high Contempt in Komodo doesn't harm its performance and doesn't skew the overall result.

Here is the important aspect:

Code: Select all

Games        : 600 (finished)

White Wins   : 314 (52.3 %)
Black Wins   : 70 (11.7 %)
Draws        : 216 (36.0 %)
Unfinished   : 0

White Score  : 70.3 %
Black Score  : 29.7 %

White wins are 52.3%, which is not that bad. My theory was that for determining the borderline at longer TC, both the White score (52.3%, the border is 50%) and the White performance in normal scoring (70.3%, the border is 75%) should be considered. At shorter TC more "accidents" happen, and a Black win (many here) is often an accident between similar in strength engines. But "accidents" do happen the other way around too, some of White wins are also "accidents". At the same time, the performance at 60 + 0.6 in normal scoring is significantly below 75%, which combined 52.3% White wins, denotes that it is debatable what sort of opening it is at much longer TC.

I am now testing at 240 + 2.4 the same RR in 600 games. It will take almost a day probably, but if my theory stands, White might score even below 52.3% at longer TC, but White performance might be above 70.3% in normal scoring (maybe not above the threshold of 75%). That is due to less Black wins at longer TC, and possibly, less White wins. Let's see, but the statistical fluctuations are not that small even in 600 games, so one has to be cautious inferring too many things.

AlphaZero No Castling Chess

Re: AlphaZero No Castling Chess

Re: AlphaZero No Castling Chess

Re: AlphaZero No Castling Chess

Re: AlphaZero No Castling Chess

Re: AlphaZero No Castling Chess

Re: AlphaZero No Castling Chess

Re: AlphaZero No Castling Chess

Re: AlphaZero No Castling Chess

Re: AlphaZero No Castling Chess

Re: AlphaZero No Castling Chess