I have no objection to that, infact I have mentioned several times the only metric that would make sense to me is a per dollar/watt compariosn. However, i would like to know if the success of A0 came from a hardware or software improvement. DeepBlue was a hardware success story and they used FPGA to accelerate their eval.jkiliani wrote:To accommodate Daniel's valid concerns, we should run both Alpha-Beta engines and neural net engines on the abacusmirek wrote:Ah this again? So comparing performance per $ or performance per Watt obviously doesn't concern you, right? I can see team of scientists deciding if they will run their simulation on 10x1080Ti or 7000 cpu cores and in the end they will go for the cpus, because while the GPUs would be cheaper it would also provide an unfair advantage over cpuDaniel Shawul wrote: A GTX 1080 Ti is 11 TFlops, and 64 cores is 1 TFlops so that is an 11X hardware advantage. Why not use the same 64 CPU cores for it and see if it will beat Stockfish ?
You have elevated the minimum hardware requirement for A0 to be competitive with Stockfish to 11 x 64 = 700 CPU cores each thinking for 1 min per move. So if you had used one core, which is the standard in chess rating lists, you have to use a time control of 11 hours per move
We will see what HW and engines e.g. correspondence players will use once LC0 gets to A0 level. If you want to insist that 1 CPU core is the only correct metric on which to measure engine strength than have it your way. But don't be surprised if in correspondence game you get completely smashed by engine which according to your "correct 1 core rating" will be like 500 elo weaker to you alpha-beta searcher.
I am not backing off, the test is independent of hardware, we can just let it calculate roughly as many nodes as were used in game before the blunder was played and see how it goes with new weights e.g. in weekly intervals. Until the net gets resized again.Daniel Shawul wrote: In any case you are backing off from your "bet" that the net is going to improve the tactics. Now you insist on some form of hardware advantage to cover for tactical weakness. Which is it?
Seriously, equivalent power use is a fair metric. Otherwise what would be the point of improving hardware at all? If Alpha-Beta crunchers find a good way to use modern GPUs they should implement these by all means...
In the case of A0, the eval is a bulky NN that is hugely accelerated with a "speciality" hardware.
Comparing scorpio-mcts-min with L0 that more or less use similar algorithms I get results that is in largely in favour of Scorpio atleast upto 320+2 tc. Scorpio has a 100x faster eval that the 10x128 nn of eval and yet it beats it. L0 chose for a bulky eval so it has to pay for its slowness on the same hardware, otherwise it would be an unfair comparison. When CCLS run L0 vs scorpio-mcts-min leala beat it but L0 was running on a 1080.
20+0.1
Code: Select all
Score of lczero vs scorpio-mcts-min: 2 - 40 - 3 [0.078] 45
Elo difference: -429.59 +/- 252.08
SPRT: llr -3.01, lbound -2.94, ubound 2.94 - H0 was accepted
Finished match
Code: Select all
Score of lczero vs scorpio-mcts-min: 3 - 35 - 8 [0.152] 46
Elo difference: -298.39 +/- 125.93
SPRT: llr -3.03, lbound -2.94, ubound 2.94 - H0 was accepted
Finished match
Code: Select all
Finished game 36 (scorpio-mcts-min vs lczero): * {No result}
Score of lczero vs scorpio-mcts-min: 0 - 30 - 5 [0.071] 35
Elo difference: -445.58 +/- 206.80
Finished match
Code: Select all
scorpio-mcts-min 91 183 183 15 73.3% -90 13.3%
lczero -90 183 183 15 26.7% 91 13.3%