AlphaZero Chess is not that strong ...

cdani · Post by **cdani** » Tue Dec 26, 2017 11:18 pm

hgm wrote:This is surely weird, and suggests that Stockfish gets weaker with more hash.

Is possible that testing of Stockfish is mostly done with little hash? If so, the engine is tuned to it.

syzygy · Post by **syzygy** » Wed Dec 27, 2017 12:12 am

hgm wrote:This is surely weird, and suggests that Stockfish gets weaker with more hash.

Note that in my graph the load factor is not defined w.r.t. hashfull, but w.r.t. node count. Which should be much higher, as many nodes are for the same position.

I am also not sure whether Stockfish counts hash cutoffs as nodes or not.

Stockfish counts calls to do_move(), so hash cutoffs are counted.

SF only tries singular extensions for TT moves, so if a position is not in the TT, it will not check if a TT move is a singular move and will not extend the search if it would have been. Perhaps this explains some of what is being observed.

Laskos · Post by **Laskos** » Fri Dec 29, 2017 8:57 pm

syzygy wrote:
hgm wrote:This is surely weird, and suggests that Stockfish gets weaker with more hash.

Note that in my graph the load factor is not defined w.r.t. hashfull, but w.r.t. node count. Which should be much higher, as many nodes are for the same position.

I am also not sure whether Stockfish counts hash cutoffs as nodes or not.
Stockfish counts calls to do_move(), so hash cutoffs are counted.

SF only tries singular extensions for TT moves, so if a position is not in the TT, it will not check if a TT move is a singular move and will not extend the search if it would have been. Perhaps this explains some of what is being observed.

Hmm.. that would mean that time-to-depth is not the exact measure to get the total effect of hash size for SF, only time-to-strength, right?

First, I think Shredder GUI restarts the engine for each position tested, and it counts the time to initialize the hash tables in the time used. So, my result was probably misleading. I took another approach, playing actual games to depth=21 to see the difference between the hash = 1 MB and close to optimal hash = 128 MB. First, time-to-depth in the optimal case is about 20% lower. Second, with LOS = 99.0%, it seems same depth=21 result is not equal for different hash sizes (meaning that time-to-depth is indeed not the exact measure in SF case), although I would have liked a bit more confidence.

Code: Select all

Games Completed = 1000 of 1000 (Avg game length = 185.349 sec)
Settings = Gauntlet/0MB/100000ms per move/M 400cp for 3 moves, D 80 moves/EPD:C:\LittleBlitzer\2moves_v1.epd(32000)
Time = 47002 sec elapsed, 0 sec remaining
 1.  SF 1 MB                  	483.5/1000	84-117-799  	(L: m=0 t=0 i=0 a=117)	(D: r=543 i=38 f=0 s=1 a=217)	(tpm=1885.0 d=21.00 nps=1804812)
 2.  SF 128 MB                	516.5/1000	117-84-799  	(L: m=0 t=0 i=0 a=84)	 (D: r=543 i=38 f=0 s=1 a=217)	(tpm=1523.1 d=21.00 nps=1820235)

hgm · Post by **hgm** » Fri Dec 29, 2017 9:45 pm

So Stockfish loses ~11 Elo in self-play by making the hash 128 times smaller than 'optimal'?

Laskos · Post by **Laskos** » Fri Dec 29, 2017 9:49 pm

hgm wrote:So Stockfish loses ~11 Elo in self-play by making the hash 128 times smaller than 'optimal'?

No. It was same depth=21 result. So, in total (strength at fixed time), it seems to lose about 20% in time-to-depth + 11 Elo points at fixed depth.

Leo · Post by **Leo** » Sat Dec 30, 2017 12:19 am

I looked up the worlds strongest chess playing entity and found this:

Komodo -- the brainchild of Don Dailey (who died in November of 2013), GM Larry Kaufman, and Mark Lefler -- is now universally recognized as the strongest chess-playing entity on the planet.

Not knocking Komodo, its just what popped up first in the search.

Eelco de Groot · Post by **Eelco de Groot** » Sat Dec 30, 2017 5:10 am

syzygy wrote:
hgm wrote:This is surely weird, and suggests that Stockfish gets weaker with more hash.

Note that in my graph the load factor is not defined w.r.t. hashfull, but w.r.t. node count. Which should be much higher, as many nodes are for the same position.

I am also not sure whether Stockfish counts hash cutoffs as nodes or not.
Stockfish counts calls to do_move(), so hash cutoffs are counted.

SF only tries singular extensions for TT moves, so if a position is not in the TT, it will not check if a TT move is a singular move and will not extend the search if it would have been. Perhaps this explains some of what is being observed.

It is not so relevant for the question of strength, but phenomenon of sometimes better tactical results with a small hashtable is well known. It applies to engines without singular extensions as well but you are probably correct Ronald that SE in PV nodes strengthens the effect. A possible reason/explanation for me is that with the added overwriting, the PV does not get the time to grow strong quickly, score is lower especially when you'd SE otherwise but can't find the move. Now other moves have a better chance. Or, with enough hash, the first twenty iterations are done very quickly and they all fit in the hashtable. But anytime an entry is not found the engine has to start searching again. This is a form of IID, and any new internal search might find transpositions from other places that can push the search over the horizon with what Robert Hyatt called "grafting". With more IID the tactical results in theory would improve but there is less time for pushing the PV over the horizon so the Elo drops...

hgm · Post by **hgm** » Sat Dec 30, 2017 10:13 am

Laskos wrote:
hgm wrote:So Stockfish loses ~11 Elo in self-play by making the hash 128 times smaller than 'optimal'?
No. It was same depth=21 result. So, in total (strength at fixed time), it seems to lose about 20% in time-to-depth + 11 Elo points at fixed depth.

So tpm means 'time per move', and Stockfish was using 1.5 or 1.8 sec/move? This contradicts the results you posted before, where it took even longer time to reach d=21 with 128MB than wth 1MB.

At 1.8Mps that would be 2.7M nodes/search, require 27MB to store the entire tree even if every node is different. So is 128MB not far too large? One would not expect significant difference between 128MB and 8MB hash under these conditions.

Testing with games at fixed depth could be a bit tricky, as the time per move could vary wildly depending on the game phase. So it is difficult to conclude how much the TT was overloaded in the decisive phase of the game from the average time per move of the entire game.

Laskos · Post by **Laskos** » Sat Dec 30, 2017 1:15 pm

hgm wrote:
Laskos wrote:
hgm wrote:So Stockfish loses ~11 Elo in self-play by making the hash 128 times smaller than 'optimal'?
No. It was same depth=21 result. So, in total (strength at fixed time), it seems to lose about 20% in time-to-depth + 11 Elo points at fixed depth.
So tpm means 'time per move', and Stockfish was using 1.5 or 1.8 sec/move? This contradicts the results you posted before, where it took even longer time to reach d=21 with 128MB than wth 1MB.

At 1.8Mps that would be 2.7M nodes/search, require 27MB to store the entire tree even if every node is different. So is 128MB not far too large? One would not expect significant difference between 128MB and 8MB hash under these conditions.

Testing with games at fixed depth could be a bit tricky, as the time per move could vary wildly depending on the game phase. So it is difficult to conclude how much the TT was overloaded in the decisive phase of the game from the average time per move of the entire game.

I already wrote that my previous result was probably wrong, as Shredder GUI restarts engine after each position, and the time to initialize the hash is counted in time used.

During the first say 20-25 moves, the time per move is roughly double the average, so loading is say 54 MB, and 40% hashfull is found to be close to optimal in an earlier thread (Mark Lefler also confirmed that). So, 128 MB hash tables is close to optimal for the first 20-25 moves, and these moves are the most important ones, determining probably close to 70-80% of outcomes. Fixed depth is somehow similar to playing games with a large base and a small increment. I wanted to see fixed depth result first to see the average time-to-depth, second to see if fixed depth strengths differ with the size of the hash.

hgm · Post by **hgm** » Sat Dec 30, 2017 2:10 pm

54MB is the size of the tree, and would only be 40% hashful if all nodes of the tree were different. Which would imply the hash table is a write-only data sink, never used for anything other than burning a few memory cycles. More typically a tree size of 54MB would mean 10% hashful.

AlphaZero Chess is not that strong ...

Re: AlphaZero Chess is not that strong ...

Re: AlphaZero Chess is not that strong ...

Re: AlphaZero Chess is not that strong ...

Re: AlphaZero Chess is not that strong ...

Re: AlphaZero Chess is not that strong ...

Re: AlphaZero Chess is not that strong ...

Re: AlphaZero Chess is not that strong ...

Re: AlphaZero Chess is not that strong ...

Re: AlphaZero Chess is not that strong ...

Re: AlphaZero Chess is not that strong ...