Rebel wrote:petero2 wrote:Rebel wrote:Daniel Shawul wrote:That is a start for sure -- proving a NN evaluation could be competive or even much better than a hand crafted evaluation function. The latency of evaluating the NN can be countered with a combination hardware (GPU/TPU) and software (async evaluations) which is what Google did for AlphaGo. Giraffe used only three layers of NN with chess specific inputs such as attack maps while AlphaZero used many more layers of CNN with just the rules of the game as input. Texel actually replaced its evaluation function with Giraffe's NN and showed that the eval is actually better but it would need a time odds to be competitive on the same hardware.
Statements like these
could make me a believer.
The post describing this test is
here.
Nice idea, a few points.
I tried the STS test and saw hardly any similarity between Giraffe (2016) and Texel GI. And so I ran the good old similarity test.
Code: Select all
Positions 8238 Gira Texe
{Giraffe w64 (time: 100 ms scale: 1.0)} ----- 4.72
{Texel Gi (time: 100 ms scale: 1.0)} 4.72 -----
Only 4% where 65+% was to be expected?
Never seen such a low percentage, running it now at 1 second.
Last, I started a match, TC=40/60, I stopped after 40 games, 37.5 - 2.5 in favor of Texel GI while NPS favored Giraffe (2016) with approx. 20-25%.
------
Unless I have done something wrong I don't see how one can conclude the NN is on par with SF or your evaluation.
Giraffe contains a non-standard search that is quite weak and sometimes causes huge blunders to be played. For example, in the following position giraffe calculated for 3.1 seconds to depth 15 and played Ba6??, dropping a bishop for no compensation, due to a 1-ply tactical "combination".
[D]3r3k/1b3pbp/1p5n/2p1p2p/4PP2/2P2KP1/R3NN1P/1B6 b - - 1 30
For this reason I did not use giraffe to make my conclusions, I compared "texel gi" with "standard texel", so giraffe's search code was not involved at all.
Also, I did not conclude giraffe evaluation was "on par" with SF or texel evaluation, I only concluded that if it had been 10x faster it would have caused texel to be around 100 elo stronger than texel using its own evaluation. I
guessed that this would have been roughly equal to SF evaluation.