Perft(14)

sje · Post by **sje** » Wed Oct 10, 2012 7:36 pm

Regardless of many issues, the primary chess part of the enterprise is CookieCat (or a program like CookieCat) which takes a position and a depth, calculates the count, and returns that number. This part I can do, although it may not be the fastest implementation, it will be open and portable to the extent that Free Pascal is portable.

The secondary chess part is the program which generates and organizes all of the unique FEN records for a given depth and assigns a use count for each that's needed for the final multiplication. I've done this before and it's pretty easy, but I can't find the code at this time so I'll have to do it again. CookieCat is getting chubby at 15K source lines, but I suppose I can add this position generator code to her without much difficulty.

The details of how many positions are sent, how they are sent, how the results are returned, and other issues should be determined through testing of perft(n) with n < 14.

With a hundred or so fast machines, we could do perft(15) as well. Beyond that we would have to use BOINC to get more end users. However, the BOINC community likes to see real time eye candy, and that would mean modifying (and slowing) the client.

Don · Post by **Don** » Wed Oct 10, 2012 8:09 pm

sje wrote:Regardless of many issues, the primary chess part of the enterprise is CookieCat (or a program like CookieCat) which takes a position and a depth, calculates the count, and returns that number. This part I can do, although it may not be the fastest implementation, it will be open and portable to the extent that Free Pascal is portable.

The secondary chess part is the program which generates and organizes all of the unique FEN records for a given depth and assigns a use count for each that's needed for the final multiplication. I've done this before and it's pretty easy, but I can't find the code at this time so I'll have to do it again. CookieCat is getting chubby at 15K source lines, but I suppose I can add this position generator code to her without much difficulty.

The details of how many positions are sent, how they are sent, how the results are returned, and other issues should be determined through testing of perft(n) with n < 14.

With a hundred or so fast machines, we could do perft(15) as well. Beyond that we would have to use BOINC to get more end users. However, the BOINC community likes to see real time eye candy, and that would mean modifying (and slowing) the client.

I actually have a client that is almost ready to go - It is graphical, displays the actual position being searched and the progress of the local perft binary move by move. That part is working perfectly. A simply mouse click will get global progress (the client will ask the server) and it will show the partial move counts for all 20 first moves and estimate the progress. It's going to be pretty cool.

Now, for the perft binaries .....

The protocol for the binaries is dirt simple and looks like this (which can be wrapped):

You sent 2 arguments to the binary, the fen position and the depth as Steven suggests.

The first thing the binary sends is a 1 line string announcing itself - we need to track which perft binary client is used for verification purposes and in case there are bugs.

Then the binary sends each move as it is searched with a node count. This is simply to have nice progress to display in the graphical client.

Here is by example what would happen when we call up the perft program by hand:
./perft "rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq -" 5
perft version 1.3 written by Ozzie Puggermiller
a2a3 181046
b2b3 215255
c2c3 222861
d2d3 328511
e2e3 402988
f2f3 178889
g2g3 217210
h2h3 181044
b1a3 198572
b1c3 234656
g1f3 233491
g1h3 198502
a2a4 217832
b2b4 216145
c2c4 240082
d2d4 361790
e2e4 405385
f2f4 198473
g2g4 214048
h2h4 218829
TOTAL 4865609

Note that we end with a "TOTAL" line and the binary should automatically exit from that point.

The moves do NOT have to reported in long algebraic, although the GUI does. It will accept either SAN notation or long algebraic.

I can imagine all kinds of enhancements, but I want this to be dirt simple and I don't have time to make this look like a microsoft application with hundreds of menus and such and icon plastered all over the place ....

Don

ibid · Post by **ibid** » Wed Oct 10, 2012 9:36 pm

Don wrote:I did some analysis and perft(8) based on the komodo implementation takes about 90 seconds in the opening, but could take a couple of hours in a complex position.

Out of curiousity I tested with this position:
[d] r1b1kbnr/pppp1ppp/2n1p3/6q1/6Q1/2N1P3/PPPP1PPP/R1B1KBNR w KQkq - 4 4
(It is the 6 ply position with the highest perft(4).) JetChess produces the answer (8709649140824) on my machine (phenom 1090T @ 3.6 GHz) using 1 GB of hash and 1 core in 2508 seconds. My program, using 5 cores on the same machine, took 571 seconds. This may not be the hardest of the 9 million positions, but is probably close. Most should be noticably faster. So it does seem like a reasonable amount of work on any modern machine.

Don wrote:The mechanism the server will use is to assign one of the N positions at random when a client makes a request for help.

While I see the logic behind the random assignments, assigning a single client consecutive positions (assuming some sane ordering) will increase the chances for hash hits between work units...

[EDIT: oops... unless of course the client is called for each position seperately... :)]

-paul

Don · Post by **Don** » Wed Oct 10, 2012 10:05 pm

ibid wrote:
Don wrote:I did some analysis and perft(8) based on the komodo implementation takes about 90 seconds in the opening, but could take a couple of hours in a complex position.
Out of curiousity I tested with this position:
[d] r1b1kbnr/pppp1ppp/2n1p3/6q1/6Q1/2N1P3/PPPP1PPP/R1B1KBNR w KQkq - 4 4
(It is the 6 ply position with the highest perft(4).) JetChess produces the answer (8709649140824) on my machine (phenom 1090T @ 3.6 GHz) using 1 GB of hash and 1 core in 2508 seconds. My program, using 5 cores on the same machine, took 571 seconds. This may not be the hardest of the 9 million positions, but is probably close. Most should be noticably faster. So it does seem like a reasonable amount of work on any modern machine.

Don wrote:The mechanism the server will use is to assign one of the N positions at random when a client makes a request for help.
While I see the logic behind the random assignments, assigning a single client consecutive positions (assuming some sane ordering) will increase the chances for hash hits between work units...

[EDIT: oops... unless of course the client is called for each position seperately... ]

-paul

I planned to reinvoke perft between tests - even without randomness these positions are essentially random anyway even if I traverse them in some set order - they are the unique positions at the end of perft(6) runs. With transpositions and such there is no fixed ordering of these that can be considered canonical anyway.

That does not mean there will be some benefit from not restarting the client - that is something I will determine later before I commit to one way. It also means at least slightly more complication which I want to avoid - it means the client has to do more than parse command line arguments, I would now have to set up a little protocol for sending commands to perft. Every little bit adds more complexity ....

SuneF · Post by **SuneF** » Wed Oct 10, 2012 10:13 pm

Don wrote: I did some analysis and perft(8) based on the komodo implementation takes about 90 seconds in the opening, but could take a couple of hours in a complex position. I have only sampled a few positions. That seems like a reasonable chunk of work to parcel out to the clients. Also, 9 mlliion positions is easier to work with than 90 million and most hosting companies are stingy about bandwidth and disk space. If I put these positions in a sql database it's going to use a lot of disk space, just a raw text file with the fen's for the 9 million positions is about half a gig - not a lot but 90 million would already be several gig.

So all things considered I believe the 6 ply positions are the easiest to work with and the least resource intensive for the server.

It's a cute little exercise in distributed programming, even if you have no interest in perft it still has some appeal... but how you manage to find the time I don't know!

I wonder what the difference is between 1x perft(9) and 20(+) x perft(8) in terms of efficiency?
No doubt perft 9's would be more efficient but not sure if it's a factor of 2 or more?
Then if it takes too long and the client is shutdown, all work is lost. I guess perft(8) creates more "restore points" so maybe you are right, if you can handle the additional trafic.

It might be possible to assign all the depth 5 sibling child positions to the same client and make more efficient use of the hash somehow. I'm not convinced about that though...

I think you should track as much as possible, like what client is used, how much memory was used, how many cores etc...
This way if something is creating bad results you can detect if its the client, the machine or the user causing it and undo all that has been contributed by the guilty party.

I suggest you don't send out a result to be confirmed until you have the first result, because if the same user receives both units at the same time, he could potentially corrupt them both and avoid detection. If he only gets one at the time he can't be sure that one of his clients (even if he creates multiple user accounts) will also get the second package and thus he might get detected if he corrupts the first one.

Don wrote: This is going to require huge CPU resources - I don't know if we can keep people interested for a year or two - however long it takes. If we get enough volunteers it should not take more than a few months - but it remains to be seen ....

If there is a nice website with live stats it certainly would be more interesting. And please make a parallel version of the perft algo!

Don · Post by **Don** » Wed Oct 10, 2012 11:48 pm

ibid wrote:
Don wrote:I did some analysis and perft(8) based on the komodo implementation takes about 90 seconds in the opening, but could take a couple of hours in a complex position.
Out of curiousity I tested with this position:
[d] r1b1kbnr/pppp1ppp/2n1p3/6q1/6Q1/2N1P3/PPPP1PPP/R1B1KBNR w KQkq - 4 4
(It is the 6 ply position with the highest perft(4).)

This is probably a good test position. I'm running it right now - my perft is probably one of the slower ones - so if it's too slow it won't be appropriate for this project. I don't want any particular assignment to take more than a couple of hours - and under 1 hour would be better.

JetChess produces the answer (8709649140824) on my machine (phenom 1090T @ 3.6 GHz) using 1 GB of hash and 1 core in 2508 seconds. My program, using 5 cores on the same machine, took 571 seconds. This may not be the hardest of the 9 million positions, but is probably close. Most should be noticably faster. So it does seem like a reasonable amount of work on any modern machine.

Don wrote:The mechanism the server will use is to assign one of the N positions at random when a client makes a request for help.
While I see the logic behind the random assignments, assigning a single client consecutive positions (assuming some sane ordering) will increase the chances for hash hits between work units...

[EDIT: oops... unless of course the client is called for each position seperately... ]

-paul

sje · Post by **sje** » Wed Oct 10, 2012 11:53 pm

sje wrote:Some changes I could make to CookieCat:

2) Change the transposition table to use an additional 64 bits of hash code for safety and a slight slowdown.

I'm working on this now. Other than my own programs, I know of no chess software which uses more than 64 bits of hash. Using insufficiently long hash codes is a sure recipe for false positives as we've seen in several perft(11) calculations.

Don · Post by **Don** » Thu Oct 11, 2012 12:02 am

It's not looking good - I project a time of about 4.8 hours based on the time it has taken to return the first 2 moves.

Don wrote:
ibid wrote:
Don wrote:I did some analysis and perft(8) based on the komodo implementation takes about 90 seconds in the opening, but could take a couple of hours in a complex position.
Out of curiousity I tested with this position:
[d] r1b1kbnr/pppp1ppp/2n1p3/6q1/6Q1/2N1P3/PPPP1PPP/R1B1KBNR w KQkq - 4 4
(It is the 6 ply position with the highest perft(4).)

This is probably a good test position. I'm running it right now - my perft is probably one of the slower ones - so if it's too slow it won't be appropriate for this project. I don't want any particular assignment to take more than a couple of hours - and under 1 hour would be better.

JetChess produces the answer (8709649140824) on my machine (phenom 1090T @ 3.6 GHz) using 1 GB of hash and 1 core in 2508 seconds. My program, using 5 cores on the same machine, took 571 seconds. This may not be the hardest of the 9 million positions, but is probably close. Most should be noticably faster. So it does seem like a reasonable amount of work on any modern machine.

Don wrote:The mechanism the server will use is to assign one of the N positions at random when a client makes a request for help.
While I see the logic behind the random assignments, assigning a single client consecutive positions (assuming some sane ordering) will increase the chances for hash hits between work units...

[EDIT: oops... unless of course the client is called for each position seperately... ]

-paul

sje · Post by **sje** » Thu Oct 11, 2012 12:09 am

ibid wrote: [d] r1b1kbnr/pppp1ppp/2n1p3/6q1/6Q1/2N1P3/PPPP1PPP/R1B1KBNR w KQkq - 4 4
(It is the 6 ply position with the highest perft(4).) JetChess produces the answer (8709649140824) on my machine (phenom 1090T @ 3.6 GHz) using 1 GB of hash and 1 core in 2508 seconds. My program, using 5 cores on the same machine, took 571 seconds. This may not be the hardest of the 9 million positions, but is probably close. Most should be noticably faster. So it does seem like a reasonable amount of work on any modern machine.

Symbolic (2.67 GHz Xeon [quad]):

Code: Select all

Depth: 8   Count: 8,709,649,140,824   Elapsed: 1608.27

Symbolic (3.4 GHz Core i7 [quad]):

Code: Select all

Depth: 8   Count: 8,709,649,140,824   Elapsed: 889.483

Don · Post by **Don** » Thu Oct 11, 2012 12:11 am

sje wrote:
sje wrote:Some changes I could make to CookieCat:

2) Change the transposition table to use an additional 64 bits of hash code for safety and a slight slowdown.
I'm working on this now. Other than my own programs, I know of no chess software which uses more than 64 bits of hash. Using insufficiently long hash codes is a sure recipe for false positives as we've seen in several perft(11) calculations.

I'm using 128 bits, but I think what I should be doing is using 64 bits to verify the position and use some of those other bits for the address. So that would effectively give me much more than 64 bits but not as much as 128 and would probably enable me to double the hash table size as the hash records would be 64 bits smaller.

Perft(14)

Re: CookieCat reloaded

Re: CookieCat reloaded

Re: CookieCat reloaded

Re: CookieCat reloaded

Re: CookieCat reloaded

Re: CookieCat reloaded

Re: CookieCat reloaded

Re: CookieCat reloaded

Re: CookieCat reloaded

Re: CookieCat reloaded