Coincidental perft(7) results

Discussion of chess software programming and technical issues.

Moderator: Ras

User avatar
sje
Posts: 4675
Joined: Mon Mar 13, 2006 7:43 pm

Time to wait

Post by sje »

The exact distribution of replicated perft(7) results won't be known until the project completes. All I can say at this point is that the example I gave is that its count is the highest seen so far.

We don't even know for certain the perft(7) mean value, although it looks like it's fairly close to 21.5 billion. Highest so far: 282,695,441,146

The mean product should be close to 633 billion. Highest so far: 116,866,280,182,600

I'm looking forward to seeing some histograms of the final results. Also, some spreadsheets. Both cases will need software that can handle 66 bit values.
Dann Corbit
Posts: 12838
Joined: Wed Mar 08, 2006 8:57 pm
Location: Redmond, WA USA

Re: Time to wait

Post by Dann Corbit »

sje wrote: {SNIP}
Both cases will need software that can handle 66 bit values.
Have you examined MPIR:
http://mpir.org/
There are C++ interfaces for this C library.
It has very efficient software operations on arbitrary precision integers.

Here is a benchmark of some alternatives:
http://www.mpfr.org/mpfr-3.1.2/timings.html
Some interesting links here:
http://hgpu.org/?tag=extended-precision

At some point, GPU will be the way to go for perft. Have you thought about collaboration with Anil and Serdja?
User avatar
sje
Posts: 4675
Joined: Mon Mar 13, 2006 7:43 pm

Re: Time to wait

Post by sje »

Symbolic already has the 128 bit support it needs; what I could use is a spreadsheet and a histogram maker which have 128 bit support.

----

Oscar can almost run in GPU mode; it has no recursion and no dependence on any C run time support; it doesn't even need a stack. What it does not have are malloc()/free() routines because I haven't gotten them fully coded yet. Instead, it uses this bit of fakery:

Code: Select all

// **** Heap routines

static void *HeapNew(const size_t request)
{
  void *userptr;

  userptr = malloc(request); // TBD
  return userptr;
}

static void HeapDelete(void * const userptr)
{
  free(userptr); // TBD
}
The implementation delay is mostly due to not having a spiffy video card until just a month ago.

I suppose I could release the beta-ish version of Oscar so that anther person or two could try their hand with writing a good allocator and then getting the program running in GPU mode. The difficulty is that I don't have much time to answer a lot of questions about the 6K+ line source.

Incidentally, on 32 bit hardware, no-bitboards Oscar runs faster than bitboard Symbolic.
User avatar
sje
Posts: 4675
Joined: Mon Mar 13, 2006 7:43 pm

Some Oscar stuff

Post by sje »

For those with an interest:

Oscar's single 26,177 byte header file: https://dl.dropboxusercontent.com/u/316 ... ar/Oscar.h

The operft 13,559 byte driver source (a bit messy): https://dl.dropboxusercontent.com/u/316 ... r/operft.c

A guide for using operft: https://dl.dropboxusercontent.com/u/316 ... OscarGuide

A Mac OS/X 10.7+ executable of operft: https://dl.dropboxusercontent.com/u/316 ... /operftMac

A Linux (64 bit Intel) executable of operft: https://dl.dropboxusercontent.com/u/316 ... perftLinux

Oscar himself: https://dl.dropboxusercontent.com/u/316 ... carCat.png

I will try and spend some more time with the 139,077 byte Oscar.c source before releasing it.
User avatar
vittyvirus
Posts: 646
Joined: Wed Jun 18, 2014 2:30 pm
Full name: Fahad Syed

Re: Time to wait

Post by vittyvirus »

Dann Corbit wrote:
sje wrote: {SNIP}
Both cases will need software that can handle 66 bit values.
Have you examined MPIR:
http://mpir.org/
There are C++ interfaces for this C library.
It has very efficient software operations on arbitrary precision integers.

Here is a benchmark of some alternatives:
http://www.mpfr.org/mpfr-3.1.2/timings.html
Some interesting links here:
http://hgpu.org/?tag=extended-precision

At some point, GPU will be the way to go for perft. Have you thought about collaboration with Anil and Serdja?
GNU GMP library is very good for the job. So are Boost libraries.
mar
Posts: 2675
Joined: Fri Nov 26, 2010 2:00 pm
Location: Czech Republic
Full name: Martin Sedlak

Re: Time to wait

Post by mar »

Dann Corbit wrote:
sje wrote: {SNIP}
Both cases will need software that can handle 66 bit values.
Have you examined MPIR:
http://mpir.org/
There are C++ interfaces for this C library.
It has very efficient software operations on arbitrary precision integers.
MPIR is a fork of GMP. But yes, radix conversion is anything but trivial, especially going from dense bit representation to decimal.
Naive approach doing divmod by 10 is so slow that I wouldn't even bother trying for very large numbers.
Another possibility is to use a temporary bigint represented as array of uint32s modulo (base) 1 billion, while this is much better, it's still O(n^2).
Fortunately GMP does divide and conquer for such large numbers which runs much faster (~n*log n IIRC) but still nothing trivial.
In this particular case however, I think depending on a large general purpose high precision library would be overkill and I think it's much better to do what Steven did.
User avatar
vittyvirus
Posts: 646
Joined: Wed Jun 18, 2014 2:30 pm
Full name: Fahad Syed

Re: Time to wait

Post by vittyvirus »

mar wrote:
Dann Corbit wrote:
sje wrote: {SNIP}
Both cases will need software that can handle 66 bit values.
Have you examined MPIR:
http://mpir.org/
There are C++ interfaces for this C library.
It has very efficient software operations on arbitrary precision integers.
MPIR is a fork of GMP. But yes, radix conversion is anything but trivial, especially going from dense bit representation to decimal.
Naive approach doing divmod by 10 is so slow that I wouldn't even bother trying for very large numbers.
Another possibility is to use a temporary bigint represented as array of uint32s modulo (base) 1 billion, while this is much better, it's still O(n^2).
Fortunately GMP does divide and conquer for such large numbers which runs much faster (~n*log n IIRC) but still nothing trivial.
In this particular case however, I think depending on a large general purpose high precision library would be overkill and I think it's much better to do what Steven did.
Fractional part Radix conversion made me gave up on programming for arbitrary precision (I'll anyway get back to it sooner or later),
The problem is, it's not easy to convert from one base to other in reasonable time and memory. This is well demonstrated in my project Is It Normal - A multi-base Normality Analyser. Esp. fractional part Divide and conquer based algorithms are largely unknown. I know only of two implementations, and none of major libraries use it. See this for further information.
User avatar
sje
Posts: 4675
Joined: Mon Mar 13, 2006 7:43 pm

Re: Some Oscar stuff

Post by sje »

The operft guide posted earlier was woefully out of date. The contents have been replaced: https://dl.dropboxusercontent.com/u/316 ... OscarGuide
User avatar
sje
Posts: 4675
Joined: Mon Mar 13, 2006 7:43 pm

Sample calls

Post by sje »

Sample calls:

Code: Select all

sje@diane:~/tmp$ ./operft
20   0.00 MHz

sje@diane:~/tmp$ ./operft -d3 -z
Na3 400
Nc3 440
Nf3 440
Nh3 400
a3 380
a4 420
b3 420
b4 421
c3 420
c4 441
d3 539
d4 560
e3 599
e4 600
f3 380
f4 401
g3 420
g4 421
h3 380
h4 420
8902   0.28 MHz

sje@diane:~/tmp$ ./operft -d5
4865609   3.61 MHz

sje@diane:~/tmp$ ./operft -b -d5
4865609   30.03 MHz

sje@diane:~/tmp$ ./operft -b -d6
119060324   47.39 MHz

sje@diane:~/tmp$ ./operft -bt -d6
119060324   84.43 MHz

sje@diane:~/tmp$ ./operft -bt -d7
3195901860   348.44 MHz

sje@diane:~/tmp$ ./operft -p "r4rk1/1pp1qppp/p1np1n2/2b1p1B1/2B1P1b1/P1NP1N2/1PP1QPPP/R4RK1 w - - 0 10" -d5
164075551   5.84 MHz

sje@diane:~/tmp$ ./operft -p "r4rk1/1pp1qppp/p1np1n2/2b1p1B1/2B1P1b1/P1NP1N2/1PP1QPPP/R4RK1 w - - 0 10" -b -d5
164075551   64.14 MHz

sje@diane:~/tmp$ ./operft -p "r4rk1/1pp1qppp/p1np1n2/2b1p1B1/2B1P1b1/P1NP1N2/1PP1QPPP/R4RK1 w - - 0 10" -bt -d5
164075551   87.40 MHz