nvidia tesla

Daniel Shawul · Post by **Daniel Shawul** » Thu Apr 05, 2012 6:10 pm

diep wrote:
Daniel Shawul wrote:If you can find a suitable algorithm for the hardware, it is indeed possible.
Infact I belive a good engine that plays checkers is very real. I am working on that right now. However the algorithm I have is not suitable for chess.

Don't believe the rant some people make here... You can tell if you ask "what have you done?" Possible answer "Pay me"
8x8 checkers has been kind of solved.

Sure it is but the goal is not to solve a game. Otherwise all those checkers programmers should stop. You know when I said here UCT gives a good checkers engine here many people got shocked... even though your "beloved" academicians already wrote about it.

The first 10x10 international checkersprogram i made in 90s, when i showed up at a tournament, i outsearched everyone by factor 2 in plies on average, over the entire game.

That was fullwidth.

No nothing pruning.

Some move ordering tricks though and a high nps.

They didn't know how to generate moves fast and they still do not know how to do that real quick.

I beg to differ. There are many smart programmers in checkers, go and other games as well.

From what you posted you have the same problem now for gpgpu chess. Fix that is my tip.

As for the 10x10 checkersprogram, it took me 3 weeks fulltime of hard work to get the first version going. After that at most 1 day a year work or so.

Some of these guys were busy every evening with their program.

I don't write this to spit on them, in contradiction - in fields where there has not been a very big competition it's possible to really outdo others by factors.

You are so wrong probably because you don't understand since you never programed the game. I always laugh when you claim go can be alpha-beta searched if a good chess programmer who knows how to do LMR or something was there... do you think remi or anyone will miss that?
My first try of go was infact alpha-beta and with a heavy pruning I can get same search depth as chess. But the main problem is evaluation which is full of tactics such as ladders. The alpha-beta searcher was at 1500 elo after heavy pruning and a lot of static tactics detection...

Right now what's there in gpgpu for chess is not very well optimized for vector processing.

Yes it is meant for vector processing like CFD which i do btw. And the improvements like DPU and ECC, the small 64kb cache are meant to improve performance in that regards.

In the 90s they already knew how to solve this you know, this is not rocket science, but it shows how much creativity someone possesses as there is no downloadable example how to do it.

In the 90s..

Daniel Shawul · Post by **Daniel Shawul** » Thu Apr 05, 2012 6:28 pm

diep wrote:Daniel, quiet down and don't say things i didn't write.

All postings i did do i speak of a 3 layer approach.

So now please write me a PROOF that this requires for all cores to slowdown factor 100-1000 or so, by reading from device ram and/or global shared ram.

Your proof is absent. As long as you're not gonna be able to prove things on paper, and laugh for me to prove my SMP search first on paper, you might never make it to the Einstein league.

You got it backwards. I don't need to prove anything but YOU do after making so many ridiculous claims. I replied quoting what you said so address those points. The 3 layered approach as you call it circles around the hashtables. Why would you want to base your approach on the weakest link of GPUs to begin with ? I showed you it is not easy to do hashtables despite some improvements recently. If you say your approach is superb that is great since it could have applications elsewhere.
But that is an improvement you finally posted only 5 lines...even though after getting me to write a lot.

Daniel Shawul · Post by **Daniel Shawul** » Thu Apr 05, 2012 6:57 pm

I will reply only to your gpgpu comment

The big problem in gpgpu programming is the parallel search; it's easy to prove that the best way to solve this requires at least a 3 point solution:

a) SMP search between the gpu's using the DDR3 RAM of the cpu's

Multi-gpu using cpu ram will be dead slow. The latency between device and cpu is so high. It can only be successful when you "domain decompositon" methods that do the job independently

b) SMP search between the compute units ( = SIMD - that's around a 32 cores)

This is better but still difficult for alpha-beta and similar algorithms. It needs synchronization and gpgpus don't have that.

c) SMP search within 1 compute unit

That is exactly what Srdja did but moving to b he still have difficulties.

For comparision normal SMP searches in software have just 1 layer of SMP search which already isn't easy to build. So there is 3 hurdles here. there is an efficiency loss at each layer. Minimization of that will determine how well its speedup is over 1 cpu core that's doing the same yet using a very efficient shared hashtable (namely everywhere).

The software search of Diep at a shared memory machine, just the addition 2010 is 40-50 pages of a4 full of proof.

Designing such search you need to do on paper not surprisingly or it won't work.

I get the impression that the big paper work that's required for this gets underestimated and laughed away by those who simply have no idea what you need to do to get the maximum out of the hardware.

While it is nice and all to work it out on paper, you should realize there are lots of myths surrounding gpu performance even for very suitable algorithms. Unless you actually program it , you can't know for sure.

diep · Post by **diep** » Thu Apr 05, 2012 7:14 pm

Daniel Shawul wrote:
diep wrote:Daniel, quiet down and don't say things i didn't write.

All postings i did do i speak of a 3 layer approach.

So now please write me a PROOF that this requires for all cores to slowdown factor 100-1000 or so, by reading from device ram and/or global shared ram.

Your proof is absent. As long as you're not gonna be able to prove things on paper, and laugh for me to prove my SMP search first on paper, you might never make it to the Einstein league.
You got it backwards. I don't need to prove anything but YOU do after making so many ridiculous claims. I replied quoting what you said so address those points. The 3 layered approach as you call it circles around the hashtables. Why would you want to base your approach on the weakest link of GPUs to begin with ? I showed you it is not easy to do hashtables despite some improvements recently. If you say your approach is superb that is great since it could have applications elsewhere.
But that is an improvement you finally posted only 5 lines...even though after getting me to write a lot.

Try remove the hashtables of todays chessprograms and see what elorating they play at then.

Then we can talk again.

Vincent

Daniel Shawul · Post by **Daniel Shawul** » Thu Apr 05, 2012 8:08 pm

diep wrote:
Daniel Shawul wrote:
diep wrote:Daniel, quiet down and don't say things i didn't write.

All postings i did do i speak of a 3 layer approach.

So now please write me a PROOF that this requires for all cores to slowdown factor 100-1000 or so, by reading from device ram and/or global shared ram.

Your proof is absent. As long as you're not gonna be able to prove things on paper, and laugh for me to prove my SMP search first on paper, you might never make it to the Einstein league.
You got it backwards. I don't need to prove anything but YOU do after making so many ridiculous claims. I replied quoting what you said so address those points. The 3 layered approach as you call it circles around the hashtables. Why would you want to base your approach on the weakest link of GPUs to begin with ? I showed you it is not easy to do hashtables despite some improvements recently. If you say your approach is superb that is great since it could have applications elsewhere.
But that is an improvement you finally posted only 5 lines...even though after getting me to write a lot.
Try remove the hashtables of todays chessprograms and see what elorating they play at then.

Then we can talk again.

Vincent

They will still play very good indeed. Your point being?
You said you use hashtables in your methodolgoy so that doesn't help you.

nvidia tesla

Re: nvidia tesla

Re: nvidia tesla

Re: nvidia tesla

Re: nvidia tesla

Re: nvidia tesla