Coincidental perft(7) results

sje · Post by **sje** » Wed Jul 08, 2015 3:10 am

The exact distribution of replicated perft(7) results won't be known until the project completes. All I can say at this point is that the example I gave is that its count is the highest seen so far.

We don't even know for certain the perft(7) mean value, although it looks like it's fairly close to 21.5 billion. Highest so far: 282,695,441,146

The mean product should be close to 633 billion. Highest so far: 116,866,280,182,600

I'm looking forward to seeing some histograms of the final results. Also, some spreadsheets. Both cases will need software that can handle 66 bit values.

Dann Corbit · Post by **Dann Corbit** » Wed Jul 08, 2015 8:50 am

sje wrote: {SNIP}
Both cases will need software that can handle 66 bit values.

Have you examined MPIR:
http://mpir.org/
There are C++ interfaces for this C library.
It has very efficient software operations on arbitrary precision integers.

Here is a benchmark of some alternatives:
http://www.mpfr.org/mpfr-3.1.2/timings.html
Some interesting links here:
http://hgpu.org/?tag=extended-precision

At some point, GPU will be the way to go for perft. Have you thought about collaboration with Anil and Serdja?

sje · Post by **sje** » Wed Jul 08, 2015 10:50 am

Symbolic already has the 128 bit support it needs; what I could use is a spreadsheet and a histogram maker which have 128 bit support.

----

Oscar can almost run in GPU mode; it has no recursion and no dependence on any C run time support; it doesn't even need a stack. What it does not have are malloc()/free() routines because I haven't gotten them fully coded yet. Instead, it uses this bit of fakery:

Code: Select all

// **** Heap routines

static void *HeapNew(const size_t request)
{
  void *userptr;

  userptr = malloc(request); // TBD
  return userptr;
}

static void HeapDelete(void * const userptr)
{
  free(userptr); // TBD
}

The implementation delay is mostly due to not having a spiffy video card until just a month ago.

I suppose I could release the beta-ish version of Oscar so that anther person or two could try their hand with writing a good allocator and then getting the program running in GPU mode. The difficulty is that I don't have much time to answer a lot of questions about the 6K+ line source.

Incidentally, on 32 bit hardware, no-bitboards Oscar runs faster than bitboard Symbolic.

sje · Post by **sje** » Wed Jul 08, 2015 11:09 am

For those with an interest:

Oscar's single 26,177 byte header file: https://dl.dropboxusercontent.com/u/316 ... ar/Oscar.h

The operft 13,559 byte driver source (a bit messy): https://dl.dropboxusercontent.com/u/316 ... r/operft.c

A guide for using operft: https://dl.dropboxusercontent.com/u/316 ... OscarGuide

A Mac OS/X 10.7+ executable of operft: https://dl.dropboxusercontent.com/u/316 ... /operftMac

A Linux (64 bit Intel) executable of operft: https://dl.dropboxusercontent.com/u/316 ... perftLinux

Oscar himself: https://dl.dropboxusercontent.com/u/316 ... carCat.png

I will try and spend some more time with the 139,077 byte Oscar.c source before releasing it.

vittyvirus · Post by **vittyvirus** » Wed Jul 08, 2015 12:39 pm

Dann Corbit wrote:
sje wrote: {SNIP}
Both cases will need software that can handle 66 bit values.
Have you examined MPIR:
http://mpir.org/
There are C++ interfaces for this C library.
It has very efficient software operations on arbitrary precision integers.

Here is a benchmark of some alternatives:
http://www.mpfr.org/mpfr-3.1.2/timings.html
Some interesting links here:
http://hgpu.org/?tag=extended-precision

At some point, GPU will be the way to go for perft. Have you thought about collaboration with Anil and Serdja?

GNU GMP library is very good for the job. So are Boost libraries.

mar · Post by **mar** » Wed Jul 08, 2015 12:59 pm

Dann Corbit wrote:
sje wrote: {SNIP}
Both cases will need software that can handle 66 bit values.
Have you examined MPIR:
http://mpir.org/
There are C++ interfaces for this C library.
It has very efficient software operations on arbitrary precision integers.

MPIR is a fork of GMP. But yes, radix conversion is anything but trivial, especially going from dense bit representation to decimal.
Naive approach doing divmod by 10 is so slow that I wouldn't even bother trying for very large numbers.
Another possibility is to use a temporary bigint represented as array of uint32s modulo (base) 1 billion, while this is much better, it's still O(n^2).
Fortunately GMP does divide and conquer for such large numbers which runs much faster (~n*log n IIRC) but still nothing trivial.
In this particular case however, I think depending on a large general purpose high precision library would be overkill and I think it's much better to do what Steven did.

vittyvirus · Post by **vittyvirus** » Wed Jul 08, 2015 1:43 pm

mar wrote:
Dann Corbit wrote:
sje wrote: {SNIP}
Both cases will need software that can handle 66 bit values.
Have you examined MPIR:
http://mpir.org/
There are C++ interfaces for this C library.
It has very efficient software operations on arbitrary precision integers.
MPIR is a fork of GMP. But yes, radix conversion is anything but trivial, especially going from dense bit representation to decimal.
Naive approach doing divmod by 10 is so slow that I wouldn't even bother trying for very large numbers.
Another possibility is to use a temporary bigint represented as array of uint32s modulo (base) 1 billion, while this is much better, it's still O(n^2).
Fortunately GMP does divide and conquer for such large numbers which runs much faster (~n*log n IIRC) but still nothing trivial.
In this particular case however, I think depending on a large general purpose high precision library would be overkill and I think it's much better to do what Steven did.

Fractional part Radix conversion made me gave up on programming for arbitrary precision (I'll anyway get back to it sooner or later),
The problem is, it's not easy to convert from one base to other in reasonable time and memory. This is well demonstrated in my project Is It Normal - A multi-base Normality Analyser. Esp. fractional part Divide and conquer based algorithms are largely unknown. I know only of two implementations, and none of major libraries use it. See this for further information.

sje · Post by **sje** » Wed Jul 08, 2015 2:07 pm

The operft guide posted earlier was woefully out of date. The contents have been replaced: https://dl.dropboxusercontent.com/u/316 ... OscarGuide

sje · Post by **sje** » Wed Jul 08, 2015 2:20 pm

Sample calls:

Code: Select all

sje@diane:~/tmp$ ./operft
20   0.00 MHz

sje@diane:~/tmp$ ./operft -d3 -z
Na3 400
Nc3 440
Nf3 440
Nh3 400
a3 380
a4 420
b3 420
b4 421
c3 420
c4 441
d3 539
d4 560
e3 599
e4 600
f3 380
f4 401
g3 420
g4 421
h3 380
h4 420
8902   0.28 MHz

sje@diane:~/tmp$ ./operft -d5
4865609   3.61 MHz

sje@diane:~/tmp$ ./operft -b -d5
4865609   30.03 MHz

sje@diane:~/tmp$ ./operft -b -d6
119060324   47.39 MHz

sje@diane:~/tmp$ ./operft -bt -d6
119060324   84.43 MHz

sje@diane:~/tmp$ ./operft -bt -d7
3195901860   348.44 MHz

sje@diane:~/tmp$ ./operft -p "r4rk1/1pp1qppp/p1np1n2/2b1p1B1/2B1P1b1/P1NP1N2/1PP1QPPP/R4RK1 w - - 0 10" -d5
164075551   5.84 MHz

sje@diane:~/tmp$ ./operft -p "r4rk1/1pp1qppp/p1np1n2/2b1p1B1/2B1P1b1/P1NP1N2/1PP1QPPP/R4RK1 w - - 0 10" -b -d5
164075551   64.14 MHz

sje@diane:~/tmp$ ./operft -p "r4rk1/1pp1qppp/p1np1n2/2b1p1B1/2B1P1b1/P1NP1N2/1PP1QPPP/R4RK1 w - - 0 10" -bt -d5
164075551   87.40 MHz

Coincidental perft(7) results

Time to wait

Re: Time to wait

Re: Time to wait

Some Oscar stuff

Re: Time to wait

Re: Time to wait

Re: Time to wait

Re: Some Oscar stuff

Sample calls