LCZero Accomplishments and Goals Thus Far

Albert Silver · Post by **Albert Silver** » Thu May 03, 2018 9:06 pm

mhull wrote:
jp wrote:
mhull wrote:
Daniel Shawul wrote:Sigh..wake me up when it is 2800 elo running on singe CPU core, which is what every other engine uses in rating lists. As far as I am concerned, it is still a 2100 elo engine there.
Your demand for uniform platform comparison is commutative. Why not demand all the other engines run on a GPU?

Then it would be "equal".
So if I teach my pet frog to play chess, I should demand you play it underwater rather than on land, because demands should be "commutative"?
You don't demand the frog should use equal hardware though, which is the point.

The hardware is equal. What is not equal is the ability for both programs to exploit it. A subtle, but important difference. I have a GTX1060 for reasons that had nothing to do with viewing a crystal ball and anticipating Leela. By this I mean that it is a normal computer component that has many uses: gaming, graphics work, photography and video rendering, cryptocurrency mining, protein folding, and more. It is not a special 'Leela card' installed in the computer with the only purpose of giving it an edge. The fact that an engine was designed that is also able to leverage its power is a testament to lateral thinking, not sneaky cheating.

mhull · Post by **mhull** » Thu May 03, 2018 9:07 pm

jp wrote:
mhull wrote:
jp wrote:...
So if I teach my pet frog to play chess, I should demand you play it underwater rather than on land, because demands should be "commutative"?
You don't demand the frog should use equal hardware though, which is the point.
No, that's not right. Neither you nor my frog are allowed to use any hardware. That would be cheating. My frog can play in two playing environments, but not equally comfortably. You can only play in one. If there's to be a match, which environment should it be in?

In that analogy, it's equally unfair to demand the majority engines play in their not-comfortable environment as it is to demand the minority project to play in its not-comfortable environment

Each project should play in its most comfortable environment. It is sour grapes (and unfair) to demand the ascendant project play in its most uncomfortable environment, just to make it "fair".

jp · Post by jp » Thu May 03, 2018 9:18 pm

See the shark & sea lion & seal stuff above too.
Maybe the conclusion is no 'fair' match is possible.

mhull · Post by **mhull** » Thu May 03, 2018 10:08 pm

Albert Silver wrote:
mhull wrote:
jp wrote:
mhull wrote:
Daniel Shawul wrote:Sigh..wake me up when it is 2800 elo running on singe CPU core, which is what every other engine uses in rating lists. As far as I am concerned, it is still a 2100 elo engine there.
Your demand for uniform platform comparison is commutative. Why not demand all the other engines run on a GPU?

Then it would be "equal".
So if I teach my pet frog to play chess, I should demand you play it underwater rather than on land, because demands should be "commutative"?
You don't demand the frog should use equal hardware though, which is the point.
The hardware is equal. What is not equal is the ability for both programs to exploit it. A subtle, but important difference. I have a GTX1060 for reasons that had nothing to do with viewing a crystal ball and anticipating Leela. By this I mean that it is a normal computer component that has many uses: gaming, graphics work, photography and video rendering, cryptocurrency mining, protein folding, and more. It is not a special 'Leela card' installed in the computer with the only purpose of giving it an edge. The fact that an engine was designed that is also able to leverage its power is a testament to lateral thinking, not sneaky cheating.

This gets to the point I have often criticized, the obsession with uniformity in "testing" and then the further perceived "leveling" by forcing all projects to use the same book. As much as they've worked hard to test equally, the perception of uniformity was/is an illusion:

Not all projects equally leverage various CPU features like SIMD (i.e. MMX, SSEn...) or even 64-bits, SMP/NUMA, etc. Projects that could were/are deliberately dumbed-down.
Testing with ponder off cripples programs that do pondering better than others.
The uniform book is helping some programs while hurting others in unknown ways.
Some projects play sans-book better than other projects, which is NEVER allowed to affect the test, instead arbitrarily crippling stronger openers.

The advent of Zero projects like Leela are forcing the issue by leveraging something besides a CPU. But look how the testers, giving in on hardware, continue to force Leela to play openings it sees as wrong, because spectators don't like watching Leela's favoritism for some particular opening. So now spectators are allowed to effect the test, but none of them think it's unfair to Leela, and so they think the test is valid.

But even with this crippling of Leela, some remain unsatisfied. They want to force Leela to run using scalar hardware instead of vector hardware -- again with the crippling to make "fair". A project designer cannot win with these people, because they immediately want to cripple your project if it dares to think outside the box. Of course, there is nothing fair when even their testing of A/B searchers failed to level the playing field equitably.

joseolv23 · Post by **joseolv23** » Fri May 04, 2018 3:27 am

I believe that the energetic consumption would be an indicated criterion to establish an equality of conditions between gpu and cpu, besides the nm (28nm, 16nm, 14 nm, etc.).

Evert · Post by **Evert** » Fri May 04, 2018 4:17 pm

Albert Silver wrote: You're right of course that you cannot make a straightforward comparison. So how about this? Leela is 2900 CCRL when paired with a GTX 1060. The CPU and the rest is identical so that's the only difference. And since Leela is designed to to make use of a GPU, without which it is quite clearly crippled, there's also little point in making a straight CPU to CPU comparison. Unless the purpose is to show how necessary a GPU is to Leela.

That's not quite good enough, I think,because it refers to a specific piece of hardware. What's needed is some sort of benchmark that measures the performance of the card that can be used to scale time control, analogous to the way CCRL time control is scaled based on the benchmark of a particular build of Crafty.
It's probably too early to settle on a standard for that, but it's worth thinking about.

Milos · Post by **Milos** » Fri May 04, 2018 4:46 pm

Albert Silver wrote:As mentioned in a reply to someone else, the real question of course is the time control. If you make it fast enough it will beat anybody even now. But at 5 minutes it's a different story since a good player should be able to bring in tactics at multiple ply levels which can swing things radically. That said, I did just test LCZ in Easy Mode in a 5-minute game and though I didn't have too much trouble beating it, I was forced to think a lot more and be more attentive then I had expected so my pessimism about it's 1 playout strength is misplaced. I don't think it's a 2000 yet, at least not in five minute games, but it plays a lot better than I expected and it's 'instinctual' positional moves compensate for a lot.

Your 1ply pessimism is well based, you can not simply objectively evaluate it by playing with it. Problem is ppl don't do proper testing and don't understand how things work.
In my tests - 1000 games from proper opening positions, LZ0 1ply against SFdev depth=1 - LZ0 managed to finally beat SF (and that one by a very small margin) only when larger network (15x192) was introduced.
SF depth=1 search is only QS and it goes roughly through 100 nodes (so on a powerful CPU you can execute even million of those searches every second). If you took this into account you'd realise that optimism regarding 1ply LC0 strength is totally unfounded.

jp · Post by jp » Sat May 05, 2018 2:29 pm

Milos wrote:. Problem is ppl don't do proper testing and don't understand how things work.
In my tests - 1000 games from proper opening positions, LZ0 1ply against SFdev depth=1 - LZ0 managed to finally beat SF (and that one by a very small margin) only when larger network (15x192) was introduced.

Did you create your own opening test suite or use an existing one?
What interface do you use for testing?

LCZero Accomplishments and Goals Thus Far

Re: LCZero Accomplishments and Goals Thus Far

Re: LCZero Accomplishments and Goals Thus Far

Re: LCZero Accomplishments and Goals Thus Far

Re: LCZero Accomplishments and Goals Thus Far

Re: LCZero Accomplishments and Goals Thus Far

Re: LCZero Accomplishments and Goals Thus Far

Re: LCZero Accomplishments and Goals Thus Far

Re: LCZero Accomplishments and Goals Thus Far