Stockfish Handicap Test

Milos · Post by **Milos** » Fri Jan 19, 2018 1:50 pm

Ovyron wrote:Yeah? Show the the code that Stockfish has for books and EGTBs, its "domain knowledge".

You clearly don't understand what domain knowledge is, and can't really understand code, so why bother?

MonteCarlo · Post by **MonteCarlo** » Fri Jan 19, 2018 4:57 pm

As long as we're nitpicking, there were at least 3 distinct openings (depending on how you count subvariations) played in the 10 published games (Ruy Lopez, Classical French, Queen's Indian).

syzygy · Post by **syzygy** » Fri Jan 19, 2018 8:19 pm

MonteCarlo wrote:As long as we're nitpicking, there were at least 3 distinct openings (depending on how you count subvariations) played in the 10 published games (Ruy Lopez, Classical French, Queen's Indian).

And even with just two, the plural is the correct form...

syzygy · Post by **syzygy** » Fri Jan 19, 2018 8:27 pm

Dann Corbit wrote:True, the Cerebellum approach adds a great deal of Elo.
But this Elo is just a DB lookup. You will admit, I hope, that there is no calculation going on whatsoever. And any engine can strap it on (though this requires code changes).
SF, itself, cannot strap it on without modification (the stock SF can NOT use the special book, that requires the modified version. Stock SF CAN use the polyglot version but that is also neither here nor there).

Stock SF cannot even use the polyglot version, since the polyglot book is no longer in stock SF.

Btw, I may be missing something, but as far as I understand at least the latest releases of the Cerebellum book are in the polyglot format.
http://www.zipproth.de/#Brainfish_download

Stock Houdini can use the (polyglot) Cerebellum book, and this is even documented in the manual:

The engine implements repetition checks to avoid playing moves that generate 3-fold repetitions; this is particularly useful for Cerebellum Polyglot books with multiple moves.

So I guess a match between Houdini and Stockfish is fair only if Houdini gets to use the Cerebellum book.

syzygy · Post by **syzygy** » Fri Jan 19, 2018 8:30 pm

Ovyron wrote:But this would have lost MultiProcessor randomness, so all those games would have looked the same with SF and A0 being deterministic. How do you solve this problem? Whatever method solving this would have been a good method to use in the actual match in case SF without book played very similar openings and we only get to see a performance difference in a very small subset of the actual game.

The problem can be solved by letting the engines play from a series of short book lines, each engine playing each opening once with white and once with black.

Ovyron · Post by **Ovyron** » Sat Jan 20, 2018 5:04 am

syzygy wrote:The problem can be solved by letting the engines play from a series of short book lines, each engine playing each opening once with white and once with black.

So the question is, how would have A0 fared in such a scenario? If the answer would have been like 72 draws and 28 wins, then SF wasn't handicapped and the match was fine.

Otherwise, this is philosophical, the same question you have to ask when you decide if you test with generic books or with a tournament book, if you want to emulate the rating lists that wans to have the fairest conditions possible, or want to emulate the world championship where a skilled bookmaker can give a push to the engine.

Because in one side, you don't want the book to take over and just give your engine winning positions that would ruin the test.

But in the other, if A0 turned out to be really clumsy with 1.e4 and that's why it played 1.d4 every time, why should 1.e4 be forced on it? If it was human it could play anything it wants, so from this perspective, varied, short lines don't make sense.

It's like A0 and Stockfish are entirely different kinds of chess playing entities altogether, and playing a fair match is impossible in principle, or we have conflicting definitions of "fair".

Because, if any user could somehow get access to A0 and it wanted to match it against Stockfish at home, they would have some conditions (where using Cerebellum makes no sense.)

But if A0 and Stockfish met on the final of the World Championship, you'd have completely different conditions, Stockfish would avoid playing the openings we know have failed against A0.

With DeepMind's test we got neither.

syzygy · Post by **syzygy** » Sat Jan 20, 2018 8:15 pm

Ovyron wrote:
syzygy wrote:The problem can be solved by letting the engines play from a series of short book lines, each engine playing each opening once with white and once with black.
So the question is, how would have A0 fared in such a scenario? If the answer would have been like 72 draws and 28 wins, then SF wasn't handicapped and the match was fine.

For that we could look at the reported results of the 12 x 100 games played from 12 frequent opening lines. It seems AlphaZero won those 12 matches a bit less overwhelmingly, but still convincingly.

(It is not clear to me whether those games were also played at 1 minute per move, but I suspect they were. At 1 second per move (which was used to determine that AlphaZero surpassed Stockfish after about 4 hours of training), the two would seem to be relatively close in strength (one of the graphs in the preprint suggests that some but not that much strength was gained after the 4th hour).)

But in the other, if A0 turned out to be really clumsy with 1.e4 and that's why it played 1.d4 every time, why should 1.e4 be forced on it? If it was human it could play anything it wants, so from this perspective, varied, short lines don't make sense.

I see your point.

I guess one should ask first what AlphaZero is supposed to achieve. Great play from the opening position? Or great play from any reasonable position?

The goal of Stockfish is great play from any reasonable position and not specifically from the opening position (although the opening position is obviously very important). So a match of AlphaZero against a "bare" Stockfish should ideally cover more than just the opening position.

If playing great chess from the opening position were AlphaZero's only goal (allowing it to restrict itself to 1.d4 as white, etc.), then it would be fair to test it against an engine playing with a very good book.

The preprint, in particular those 12 matches played from 12 frequent openings, gives me the impression that their goal is to play good chess in general. That seems also in line with the more general goal of Deep Mind. You don't want a self-driving car that only knows how to drive from Google HQ to the local Pizza Hut.

(Anyway, it seems unlikely that AlphaZero could learn to beat Stockfish from the opening position entirely from self-play, without also getting good at playing from essentially arbitrary positions.)

With DeepMind's test we got neither.

We did get something, but there's always more to learn.

Ovyron · Post by **Ovyron** » Sun Jan 21, 2018 3:19 am

syzygy wrote:I guess one should ask first what AlphaZero is supposed to achieve.

I think what they tried to achieve wasn't specific to chess, but developing something that in 4 hours can learn any game that you feed to it, and play at a matched level of something that took 70 years of human ingenuity to develop (counting from Norbert Wiener's minimax.)

Did they achieve it?

I'd say yes.

If they were to fight against latest dev, with the strongest book, the fastest computer, the biggest endgame tablebase, actual time controls instead of fixed ones, and whatever it was claimed that crippled Stockfish, and they only won 50.5-49.5, they would have still acheived their goal.

Discussions about a better match are moot if the guys at DeepMind didn't even care about the game they played, and they used chess as an example of the power of their approach.

Perhaps they don't care about providing the strongest chess entity on the planet, but more about being like Alan Turing and putting a chess program in paper, except their approach can be applied to any goal, the goal just happened to be "checkmate the opponent's king before they checkmate yours", and that's what got us interested, but we may not even be the intended audience.

Uri Blass · Post by **Uri Blass** » Sun Jan 21, 2018 1:30 pm

Ovyron wrote:
syzygy wrote:I guess one should ask first what AlphaZero is supposed to achieve.
I think what they tried to achieve wasn't specific to chess, but developing something that in 4 hours can learn any game that you feed to it, and play at a matched level of something that took 70 years of human ingenuity to develop (counting from Norbert Wiener's minimax.)

Did they achieve it?

I'd say yes.

If they were to fight against latest dev, with the strongest book, the fastest computer, the biggest endgame tablebase, actual time controls instead of fixed ones, and whatever it was claimed that crippled Stockfish, and they only won 50.5-49.5, they would have still acheived their goal.

Discussions about a better match are moot if the guys at DeepMind didn't even care about the game they played, and they used chess as an example of the power of their approach.

Perhaps they don't care about providing the strongest chess entity on the planet, but more about being like Alan Turing and putting a chess program in paper, except their approach can be applied to any goal, the goal just happened to be "checkmate the opponent's king before they checkmate yours", and that's what got us interested, but we may not even be the intended audience.

can learn chess and some games and can learn any game is not the same.

I think that they learned only games when all information is known.
They did not show that they can win also in games like stratego when people do not know where is the flag.

I do not know if there is a program that can be competitive with good humans in this game.
I am also not sure that there are no games with complete information that they cannot learn in their way in order to be competitive with the best humans.

Stockfish Handicap Test

Re: Stockfish Handicap Test

Re: Stockfish Handicap Test

Re: Stockfish Handicap Test

Re: Stockfish Handicap Test

Re: Stockfish Handicap Test

Re: Stockfish Handicap Test

Re: Stockfish Handicap Test

Re: Stockfish Handicap Test

Re: Stockfish Handicap Test