Except that if we assume 50:50 distribution of discrete AMD vs. Nvidia GPUs, even excluding AMD and speeding up Nvidia 4-8x would speed up the learning process considerably. I think in one of the github threads they were discussing licensing issues with cuDNN. I am not sure what's the status of it? Otherwise I don't see a reason why there shouldn't be an Nvidia specific binary for people who want to contribute with Nvidia and OpenCL binary for everyone else. I mean in the official download section of lc0 and preferably without the necessity for user to install nvidia developer tools (if that's possible)Ras wrote:I guess the OpenCL version also works with AMD GPUs while CUDA does not. Since the project depends on voluntary contribution, it may make sense not to shut out a considerable number of potential volunteers.Milos wrote:Gian-Carlo's hand-written OpenCL implementation is really not a match for NVIDIA specialized libraries
how good is a GeForce GTX 1060 6GB for Leela ?
Moderators: hgm, Rebel, chrisw
-
- Posts: 52
- Joined: Sat Mar 24, 2018 4:18 pm
Re: how good is a GeForce GTX 1060 6GB for Leela ?
-
- Posts: 4190
- Joined: Wed Nov 25, 2009 1:47 am
Re: how good is a GeForce GTX 1060 6GB for Leela ?
If efficiency is you goal then of course. First I believe there is at least twice more NVIDIA GPU users than AMD GPU users, but if there was a same number, with cuDNN you'd typically get 8x speed-up compared to OpenCL. So even if you cut of half a users and other half all used cuDNN version, you'd get an overall speed up of 4x, meaning you could reach 44 million games in a monthRas wrote:I guess the OpenCL version also works with AMD GPUs while CUDA does not. Since the project depends on voluntary contribution, it may make sense not to shut out a considerable number of potential volunteers.Milos wrote:Gian-Carlo's hand-written OpenCL implementation is really not a match for NVIDIA specialized libraries
-
- Posts: 4190
- Joined: Wed Nov 25, 2009 1:47 am
Re: how good is a GeForce GTX 1060 6GB for Leela ?
There are no licensing issues just Gian-Carlo's paranoia and hurt pride:mirek wrote:Except that if we assume 50:50 distribution of discrete AMD vs. Nvidia GPUs, even excluding AMD and speeding up Nvidia 4-8x would speed up the learning process considerably. I think in one of the github threads they were discussing licensing issues with cuDNN. I am not sure what's the status of it? Otherwise I don't see a reason why there shouldn't be an Nvidia specific binary for people who want to contribute with Nvidia and OpenCL binary for everyone else. I mean in the official download section of lc0 and preferably without the necessity for user to install nvidia developer tools (if that's possible)Ras wrote:I guess the OpenCL version also works with AMD GPUs while CUDA does not. Since the project depends on voluntary contribution, it may make sense not to shut out a considerable number of potential volunteers.Milos wrote:Gian-Carlo's hand-written OpenCL implementation is really not a match for NVIDIA specialized libraries
http://www.talkchess.com/forum/viewtopi ... 944#759944
And cuDNN library is clearly under System library exception of GPL v3 and is totally fine to be included in any GPLed project.
-
- Posts: 4190
- Joined: Wed Nov 25, 2009 1:47 am
Re: how good is a GeForce GTX 1060 6GB for Leela ?
Correct, most probably tensor core matrix multiply implementation is direct, i.e. not using any FFT because latency minimization is the goal, not area savings.Dann Corbit wrote:I think that the point Milos was making is that the tensor cores do not perform a 3x3 multiply. They perform a 4x4 multiply.jkiliani wrote:3x3 kernels are extremely common in machine learning for good reason, so I think both the design engineers and the driver programmers at Nvidia are way ahead of Milos there. They probably use Winograd transforms to compute 3x3 kernels just like the Leela Zero OpenCL implementation (which is not nearly as bad as Milos claims I might add)Dann Corbit wrote:I guess that you have to program specifically for the tensor cores.Milos wrote:4x4 one to be precise. Since LC0 kernal is 3x3 there is roughly only 1/3 efficiency ((27+9*2)/(81+16*3) operations) when running LC0 only on tensor cores, assuming ofc that they are fully loaded and that cuDNN libs are efficient for them, which is a big question mark at the moment.Dann Corbit wrote:They can multiply a small matrix in a single cycle (the tensor cores).Werewolf wrote:The Titan V also "only" has 640 of them. I suspect its successor will cram in many more.Dann Corbit wrote:Titan V has tensor cores.
But they are fiddly to use and you have to program especially for them.
If they were going to run on that hardware, it certainly makes sense to change to a 4x4 kernel. That is a very good point.
Now, a 3x3 matrix fits into a 4x4 matrix, so you can still multiply it in one cycle. But the thing that is missing is that if you have a 4x4 kernel you would get:
2 * (4 * 4 * 4) - 4 * 4 = 112 operations in one cycle
verses
2 * (3 * 3 * 3) - 3 * 3 = 45 operations in one cycle
So you are getting 45/112= 40% of the computer power available.
This assumes that square matrix multiply is 2N^3 - 2N^2 operations (typical count and I doubt you can do better on such a small matrix).
-
- Posts: 1346
- Joined: Sat Apr 19, 2014 1:47 pm
Re: how good is a GeForce GTX 1060 6GB for Leela ?
I have the same NPS than Nay here :
https://docs.google.com/spreadsheets/d/ ... =857482380
NPS : 1167 with GTX 1060 @ 3GB, 3ghz, 4 cores, i5 7400
So I guess it's ok for my set up.
I have GTX 1060 @ 6B, using 2 core, i5-3570 using ID 227.
https://docs.google.com/spreadsheets/d/ ... =857482380
NPS : 1167 with GTX 1060 @ 3GB, 3ghz, 4 cores, i5 7400
So I guess it's ok for my set up.
I have GTX 1060 @ 6B, using 2 core, i5-3570 using ID 227.
-
- Posts: 219
- Joined: Thu May 29, 2014 5:58 pm
What does LC0 use?
"Even with an old i5-2500K and GTX1060 I get about 2250NPS in the benchmark I described."Doesn't LC0 use either the GPU or CPU, but not both?
In which case, should one test both versions of LC0 or is that a waste of time, since for a comparably price processing unit, GPU will be faster than CPU?
In which case, should one test both versions of LC0 or is that a waste of time, since for a comparably price processing unit, GPU will be faster than CPU?
-
- Posts: 219
- Joined: Thu May 29, 2014 5:58 pm
Re: how good is a GeForce GTX 1060 6GB for Leela ?
How is lczero7.exe different from lczero.exe?
-
- Posts: 3019
- Joined: Wed Mar 08, 2006 9:57 pm
- Location: Rio de Janeiro, Brazil
Re: What does LC0 use?
Can't speak for everyone, since different setups may vary, but on my desktop you described, it is about 60% CPU usage and 45-50% GPU as per the task manager.cma6 wrote:"Even with an old i5-2500K and GTX1060 I get about 2250NPS in the benchmark I described."Doesn't LC0 use either the GPU or CPU, but not both?
In which case, should one test both versions of LC0 or is that a waste of time, since for a comparably price processing unit, GPU will be faster than CPU?
"Tactics are the bricks and sticks that make up a game, but positional play is the architectural blueprint."
-
- Posts: 219
- Joined: Thu May 29, 2014 5:58 pm
Re: What does LC0 use?
Albert:
I wasn't asking about processor usage, but about whether one should even bother testing the CPU version of lc0 based on prices of comparably priced CPU vs. GPU?
I wasn't asking about processor usage, but about whether one should even bother testing the CPU version of lc0 based on prices of comparably priced CPU vs. GPU?
-
- Posts: 3019
- Joined: Wed Mar 08, 2006 9:57 pm
- Location: Rio de Janeiro, Brazil
Re: What does LC0 use?
You wrote, "Doesn't LC0 use either the GPU or CPU, but not both?" and I answered no, it uses both.cma6 wrote:Albert:
I wasn't asking about processor usage, but about whether one should even bother testing the CPU version of lc0 based on prices of comparably priced CPU vs. GPU?
"Tactics are the bricks and sticks that make up a game, but positional play is the architectural blueprint."