Daniel Shawul wrote:Almost everybody, even one's who used NN evals like Giraffe, cared about performance on single core upto now. It is simply the most widely used computing hardware now.
That is why AlphaZero was called a 'paradigm change'. 20 years ago everyone had a fixed phone line at home. Now they all have cell phones...
That all people were using the same flawed metric was only because they were all doing essentially the same, with minuscule difference. If I want to describe a collection of meatballs, it doesn't matter whether I use height, width or length as a metric of which is more nutricious. But now someone brings in a Frankfurter sausage, and a Hamburger, which forces me to abandon my evil ways.
Graphics cards are mostly integrated ones on which LCzero performs equally or worse than the CPU. You are asking for a hardware that is available to gamers or people who do number crunching.
Computer Chess is number crunching. What people have now and what people will have in the future might be very different. Only catering to what they have now might very well be betting on a dead horse. Species that completely specialize on consuming a dwindling food source usually go extinct together with that food source.
Good graphics cards are not more expensive than extra CPUs and sockets for them on the motherboard. Which is what people that really care about engine performance have now. 10 years ago all CPUs were single core, and no one cared about SMP. Now everyone has at least 4 cores, and if your engine does not suppored SMP it will be considered worthless, no matter how well it does in single-CPU tests.
A question: Do you see an algorithmic improvement in DeepBlue's acceleration of its eval ? If not, what makes it different with A0's ? That GPU cards are easily accessible is irrelevant from an algorithmic point of view.
I am not very familiar with Deep Blues algorithm, but I always though it was just a hardware implementation of existing eval techniques (like PST, mobility, etc.) So not a different algorithm at all, perhaps even a simplified one because of hardware constraints.
If you want to compare MCTS vs alpha-beta, you use the same eval for them and see who fairs better in chess/Go. Similarly, to compare evals, you use the same search, and see how well the NN eval fairs against the hand-made one on the same hardware.
The latter is not a well-defined procedure. Because the outcome will be completely dependent on what hardware you choose. What if this 'equal hardware' was a neural network, which you had to train to compute your hand-crafted evaluation?
This is a lot like comparing engines only available as binaries, one only available for x64 PC, the other only for ARM. They can of course run on both tablets and PCs, through the applicable emulator. Now how much good will this 'equal hardware' comparison do you? If you test on a phone, the ARM program will win. If you test on a PC, the windows .exe will win.
Now an ARM and an x64 are suffciently similar that you can hope to get your hand on the algorithms in high-level source form, and compile both for ARM or PC, and conduct the test on either one of them. Hoping that the rsults would not depend too much on whether you use ARM or x64. But if the architectures are really different, this is hopeless. I had that in the 90s, when I was competing on Pentium-II PC with people using a Cray YMP supercomputer. Every programming technique that made it fast on the PC slowed it down on a Cray, and vice versa. So the high-level algorithm had to be completely different. Closer to home, try to compare a magic-bitboard engine against a mailbox on the 'equal hardware' of an 8-bit microprocessor like 6502 (which does not even have 8-bit multiply instruction). Which algorithm do you think would come out as highly superior?
If there is no hardware equivalence in comparing the evals the comparison is meaningless ofcourse as one can just add more knowledge without worry. Infact, a NN is probably the most inefficient tool for that, though it is a more generic one.
It waists too many FLOPs doing unnecessary multiplications for a not-so-important feature that a handmade eval would probably ignore. No one doubts (even without using NN) that you can increase your elo to your satisfaction by adding more knowledge -- the question is if can you get a better quality eval with the same FLOPs ?
But if FLOPS are nearly free, who cares? The NN does not need any branches. So from its point of view hand-crafted eval does many unnecessary poorly predictable branches. You get a complely lopsided comparison if you just stress one aspect of the algorithm, ignoring all other essential parts. You have to weight in the cost of branching, caching, out-of-order execution, in a realistic way.