Perft(14) revisited

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

zullil
Posts: 6442
Joined: Tue Jan 09, 2007 12:31 am
Location: PA USA
Full name: Louis Zulli

Re: About that test positon

Post by zullil »

sje wrote:
stegemma wrote:I think that it's not a good position to test perft, because Kings are both already castled. In the opening, castling has a big impact in move generation, because of it's "complicated" rules. This lead to in-accurancy on performance estimation of a perft routine, without castling.
It doesn't test promotion, either. But that's okay because no one position can test everything in a reasonable amount of time. What the above position does do is to provide a known set of complex middlegame positions which can be used to generate timing benchmark data.
The so-called "kiwipete" position seems to offer almost everything that might be found in the late opening/early middlegame:
[D]r3k2r/p1ppqpb1/bn2pnp1/3PN3/1p2P3/2N2Q1p/PPPBBPPP/R3K2R w KQkq -

Code: Select all

louis@LZsT5610:~/Documents/Chess/Kirby$ ./perft 
FEN string = r3k2r/p1ppqpb1/bn2pnp1/3PN3/1p2P3/2N2Q1p/PPPBBPPP/R3K2R w KQkq -
Depth = 5
Leaf nodes = 193690690
Time taken = 2233 ms
louis@LZsT5610:~/Documents/Chess/Kirby$ ./perft 
FEN string = r3k2r/p1ppqpb1/bn2pnp1/3PN3/1p2P3/2N2Q1p/PPPBBPPP/R3K2R w KQkq -
Depth = 6
Leaf nodes = 8031647685
Time taken = 96721 ms
louis@LZsT5610:~/Documents/Chess/Kirby$ ./perft 
FEN string = r3k2r/p1ppqpb1/bn2pnp1/3PN3/1p2P3/2N2Q1p/PPPBBPPP/R3K2R w KQkq -
Depth = 7
Leaf nodes = 374190009323
Time taken = 4148898 ms
User avatar
jsgroby
Posts: 83
Joined: Mon Mar 24, 2014 12:26 am
Location: Glen Carbon, IL USA

Re: Trivia: The very first perft(7)

Post by jsgroby »

I could be missing something, but I was planning on building a 32-64 node mini-supercomputer with the Raspberry Pi units. Harnessing the 24GFLOPS of the GPU on each board and using MPI to break big problems into smaller units each node can chew on. I have not crunched the numbers yet, but I think even this $5K machine would be pretty powerful due to being massively parallel compared to existing i7 processors which are largely serial in their computational abilities comparatively speaking.

Meh, who knows, maybe I am barking up the wrong tree with regards to using it for chess perft analysis, but I think it could speed up the time to solve some of the deeper levels.

There are already at least a handful of universities that have built these devices and they handle problems that would take a very long time on an i7 type processor....and would not be cost effective to build using i7 architecture.

Jeff
User avatar
sje
Posts: 4675
Joined: Mon Mar 13, 2006 7:43 pm

Raspberry Pi cluster = "bramble"

Post by sje »

The documentation for the VideoCore IV is here: http://www.broadcom.com/docs/support/vi ... G100-R.pdf

I don't see how one of these GPUs with a maximum of only two hardware threads and a clock of 700 MHz is going to help a whole lot.

Also, compared to a BB Black, a Raspberry Pi must have an SD card, an extra expense.
User avatar
sje
Posts: 4675
Joined: Mon Mar 13, 2006 7:43 pm

Status

Post by sje »

I've got about 150 K perft(7) calculations completed, about 1/6th of one percent of the total work needed. At present, I'm working on a mostly automated restart facility so that the usual power outages won't cause excessive loss or delay.
User avatar
jsgroby
Posts: 83
Joined: Mon Mar 24, 2014 12:26 am
Location: Glen Carbon, IL USA

Re: Raspberry Pi cluster = "bramble"

Post by jsgroby »

sje wrote:The documentation for the VideoCore IV is here: http://www.broadcom.com/docs/support/vi ... G100-R.pdf

I don't see how one of these GPUs with a maximum of only two hardware threads and a clock of 700 MHz is going to help a whole lot.

Also, compared to a BB Black, a Raspberry Pi must have an SD card, an extra expense.

It's gonna be fun to build it and use it to tackle some cool problems. I will have a look at how we can divide perft analysis up across the 64 nodes and see what benefit will be gained by having all those nodes analyze simultaneously. Will report back here once we have some results.

Jeff
User avatar
Ajedrecista
Posts: 1968
Joined: Wed Jul 13, 2011 9:04 pm
Location: Madrid, Spain.

Re: Perft(14) revisited.

Post by Ajedrecista »

Hello Steven:

Just for the record, I have just reminded that the late Don Dailey initiated a project of distributed perft for calculate Perft(14). It first started with Perft(12) to verify the validity of the project, then going to Perft(14). Links:

Perft helpers

Distributed Perft(12) Calculation
Distributed Perft(14) Calculation

Please note the estimated completion date of that time for Perft(14): June of year 2038!

I guess that you want something similar, this is why you made available chunks of 100,000 positions as you explained before. I can only wish good luck to you because I do not have a powerful hardware to help.

Regards from Spain.

Ajedrecista.
ZirconiumX
Posts: 1334
Joined: Sun Jul 17, 2011 11:14 am

Re: Raspberry Pi cluster = "bramble"

Post by ZirconiumX »

jsgroby wrote:
sje wrote:The documentation for the VideoCore IV is here: http://www.broadcom.com/docs/support/vi ... G100-R.pdf

I don't see how one of these GPUs with a maximum of only two hardware threads and a clock of 250 MHz is going to help a whole lot.

Also, compared to a BB Black, a Raspberry Pi must have an SD card, an extra expense.

It's gonna be fun to build it and use it to tackle some cool problems. I will have a look at how we can divide perft analysis up across the 64 nodes and see what benefit will be gained by having all those nodes analyze simultaneously. Will report back here once we have some results.

Jeff
Yeah, the VC4 is really not much use for one of these. Also, the VC4 by default runs at 250 MHz, so it's pretty slow. It's designed for video tasks etc, not for general purpose computing. You'd need an Adreno or something for that.

(There is a reason the Broadcom proprietary libraries do not come with OpenCL support)

Matthew:out
Some believe in the almighty dollar.

I believe in the almighty printf statement.
User avatar
jsgroby
Posts: 83
Joined: Mon Mar 24, 2014 12:26 am
Location: Glen Carbon, IL USA

Re: Raspberry Pi cluster = "bramble"

Post by jsgroby »

ZirconiumX wrote:
jsgroby wrote:
sje wrote:The documentation for the VideoCore IV is here: http://www.broadcom.com/docs/support/vi ... G100-R.pdf

I don't see how one of these GPUs with a maximum of only two hardware threads and a clock of 250 MHz is going to help a whole lot.

Also, compared to a BB Black, a Raspberry Pi must have an SD card, an extra expense.

It's gonna be fun to build it and use it to tackle some cool problems. I will have a look at how we can divide perft analysis up across the 64 nodes and see what benefit will be gained by having all those nodes analyze simultaneously. Will report back here once we have some results.

Jeff
Yeah, the VC4 is really not much use for one of these. Also, the VC4 by default runs at 250 MHz, so it's pretty slow. It's designed for video tasks etc, not for general purpose computing. You'd need an Adreno or something for that.

(There is a reason the Broadcom proprietary libraries do not come with OpenCL support)

Matthew:out
http://hackaday.com/2014/01/31/fft-on-the-raspis-gpu/

Some cool libraries are starting to emerge with regards to the VC4 in the Pi. This is the kind of stuff our group will be looking at as well as how to break a much larger project into smaller chunks and do the work on a parallel system.
User avatar
jsgroby
Posts: 83
Joined: Mon Mar 24, 2014 12:26 am
Location: Glen Carbon, IL USA

Re: Raspberry Pi cluster = "bramble"

Post by jsgroby »

More information:

GPU_FFT is an FFT library for the Raspberry Pi which exploits the BCM2835 SoC V3D hardware to deliver ten times the performance that is possible on the 700 MHz ARM. Kernels are provided for all power-of-2 FFT lengths from 256 to 131,072 points inclusive.


Maybe not as much a slouch as some people are thinking...

Jeff
ZirconiumX
Posts: 1334
Joined: Sun Jul 17, 2011 11:14 am

Re: Raspberry Pi cluster = "bramble"

Post by ZirconiumX »

FFT is straightforward (in the sense that there is no recursion etc) math. Many things can do math, and a GPU can do math in parallel.

Tree search is very complex math, which would require a lot of time attempting to formulate a layout for a chess program.

You'd also have to do a lot of very awkward things, like for maximum performance use DMA, which requires being root, and fiddling about.

It's a lot of effort. If you can manage it, good job, but do not expect it to be easy.

Matthew:out
Some believe in the almighty dollar.

I believe in the almighty printf statement.