The exact distribution of replicated perft(7) results won't be known until the project completes. All I can say at this point is that the example I gave is that its count is the highest seen so far.
We don't even know for certain the perft(7) mean value, although it looks like it's fairly close to 21.5 billion. Highest so far: 282,695,441,146
The mean product should be close to 633 billion. Highest so far: 116,866,280,182,600
I'm looking forward to seeing some histograms of the final results. Also, some spreadsheets. Both cases will need software that can handle 66 bit values.
Coincidental perft(7) results
Moderator: Ras
-
Dann Corbit
- Posts: 12838
- Joined: Wed Mar 08, 2006 8:57 pm
- Location: Redmond, WA USA
Re: Time to wait
Have you examined MPIR:sje wrote: {SNIP}
Both cases will need software that can handle 66 bit values.
http://mpir.org/
There are C++ interfaces for this C library.
It has very efficient software operations on arbitrary precision integers.
Here is a benchmark of some alternatives:
http://www.mpfr.org/mpfr-3.1.2/timings.html
Some interesting links here:
http://hgpu.org/?tag=extended-precision
At some point, GPU will be the way to go for perft. Have you thought about collaboration with Anil and Serdja?
-
sje
- Posts: 4675
- Joined: Mon Mar 13, 2006 7:43 pm
Re: Time to wait
Symbolic already has the 128 bit support it needs; what I could use is a spreadsheet and a histogram maker which have 128 bit support.
----
Oscar can almost run in GPU mode; it has no recursion and no dependence on any C run time support; it doesn't even need a stack. What it does not have are malloc()/free() routines because I haven't gotten them fully coded yet. Instead, it uses this bit of fakery:
The implementation delay is mostly due to not having a spiffy video card until just a month ago.
I suppose I could release the beta-ish version of Oscar so that anther person or two could try their hand with writing a good allocator and then getting the program running in GPU mode. The difficulty is that I don't have much time to answer a lot of questions about the 6K+ line source.
Incidentally, on 32 bit hardware, no-bitboards Oscar runs faster than bitboard Symbolic.
----
Oscar can almost run in GPU mode; it has no recursion and no dependence on any C run time support; it doesn't even need a stack. What it does not have are malloc()/free() routines because I haven't gotten them fully coded yet. Instead, it uses this bit of fakery:
Code: Select all
// **** Heap routines
static void *HeapNew(const size_t request)
{
void *userptr;
userptr = malloc(request); // TBD
return userptr;
}
static void HeapDelete(void * const userptr)
{
free(userptr); // TBD
}I suppose I could release the beta-ish version of Oscar so that anther person or two could try their hand with writing a good allocator and then getting the program running in GPU mode. The difficulty is that I don't have much time to answer a lot of questions about the 6K+ line source.
Incidentally, on 32 bit hardware, no-bitboards Oscar runs faster than bitboard Symbolic.
-
sje
- Posts: 4675
- Joined: Mon Mar 13, 2006 7:43 pm
Some Oscar stuff
For those with an interest:
Oscar's single 26,177 byte header file: https://dl.dropboxusercontent.com/u/316 ... ar/Oscar.h
The operft 13,559 byte driver source (a bit messy): https://dl.dropboxusercontent.com/u/316 ... r/operft.c
A guide for using operft: https://dl.dropboxusercontent.com/u/316 ... OscarGuide
A Mac OS/X 10.7+ executable of operft: https://dl.dropboxusercontent.com/u/316 ... /operftMac
A Linux (64 bit Intel) executable of operft: https://dl.dropboxusercontent.com/u/316 ... perftLinux
Oscar himself: https://dl.dropboxusercontent.com/u/316 ... carCat.png
I will try and spend some more time with the 139,077 byte Oscar.c source before releasing it.
Oscar's single 26,177 byte header file: https://dl.dropboxusercontent.com/u/316 ... ar/Oscar.h
The operft 13,559 byte driver source (a bit messy): https://dl.dropboxusercontent.com/u/316 ... r/operft.c
A guide for using operft: https://dl.dropboxusercontent.com/u/316 ... OscarGuide
A Mac OS/X 10.7+ executable of operft: https://dl.dropboxusercontent.com/u/316 ... /operftMac
A Linux (64 bit Intel) executable of operft: https://dl.dropboxusercontent.com/u/316 ... perftLinux
Oscar himself: https://dl.dropboxusercontent.com/u/316 ... carCat.png
I will try and spend some more time with the 139,077 byte Oscar.c source before releasing it.
-
vittyvirus
- Posts: 646
- Joined: Wed Jun 18, 2014 2:30 pm
- Full name: Fahad Syed
Re: Time to wait
GNU GMP library is very good for the job. So are Boost libraries.Dann Corbit wrote:Have you examined MPIR:sje wrote: {SNIP}
Both cases will need software that can handle 66 bit values.
http://mpir.org/
There are C++ interfaces for this C library.
It has very efficient software operations on arbitrary precision integers.
Here is a benchmark of some alternatives:
http://www.mpfr.org/mpfr-3.1.2/timings.html
Some interesting links here:
http://hgpu.org/?tag=extended-precision
At some point, GPU will be the way to go for perft. Have you thought about collaboration with Anil and Serdja?
-
mar
- Posts: 2675
- Joined: Fri Nov 26, 2010 2:00 pm
- Location: Czech Republic
- Full name: Martin Sedlak
Re: Time to wait
MPIR is a fork of GMP. But yes, radix conversion is anything but trivial, especially going from dense bit representation to decimal.Dann Corbit wrote:Have you examined MPIR:sje wrote: {SNIP}
Both cases will need software that can handle 66 bit values.
http://mpir.org/
There are C++ interfaces for this C library.
It has very efficient software operations on arbitrary precision integers.
Naive approach doing divmod by 10 is so slow that I wouldn't even bother trying for very large numbers.
Another possibility is to use a temporary bigint represented as array of uint32s modulo (base) 1 billion, while this is much better, it's still O(n^2).
Fortunately GMP does divide and conquer for such large numbers which runs much faster (~n*log n IIRC) but still nothing trivial.
In this particular case however, I think depending on a large general purpose high precision library would be overkill and I think it's much better to do what Steven did.
-
vittyvirus
- Posts: 646
- Joined: Wed Jun 18, 2014 2:30 pm
- Full name: Fahad Syed
Re: Time to wait
Fractional part Radix conversion made me gave up on programming for arbitrary precision (I'll anyway get back to it sooner or later),mar wrote:MPIR is a fork of GMP. But yes, radix conversion is anything but trivial, especially going from dense bit representation to decimal.Dann Corbit wrote:Have you examined MPIR:sje wrote: {SNIP}
Both cases will need software that can handle 66 bit values.
http://mpir.org/
There are C++ interfaces for this C library.
It has very efficient software operations on arbitrary precision integers.
Naive approach doing divmod by 10 is so slow that I wouldn't even bother trying for very large numbers.
Another possibility is to use a temporary bigint represented as array of uint32s modulo (base) 1 billion, while this is much better, it's still O(n^2).
Fortunately GMP does divide and conquer for such large numbers which runs much faster (~n*log n IIRC) but still nothing trivial.
In this particular case however, I think depending on a large general purpose high precision library would be overkill and I think it's much better to do what Steven did.
The problem is, it's not easy to convert from one base to other in reasonable time and memory. This is well demonstrated in my project Is It Normal - A multi-base Normality Analyser. Esp. fractional part Divide and conquer based algorithms are largely unknown. I know only of two implementations, and none of major libraries use it. See this for further information.
-
sje
- Posts: 4675
- Joined: Mon Mar 13, 2006 7:43 pm
Re: Some Oscar stuff
The operft guide posted earlier was woefully out of date. The contents have been replaced: https://dl.dropboxusercontent.com/u/316 ... OscarGuide
-
sje
- Posts: 4675
- Joined: Mon Mar 13, 2006 7:43 pm
Sample calls
Sample calls:
Code: Select all
sje@diane:~/tmp$ ./operft
20 0.00 MHz
sje@diane:~/tmp$ ./operft -d3 -z
Na3 400
Nc3 440
Nf3 440
Nh3 400
a3 380
a4 420
b3 420
b4 421
c3 420
c4 441
d3 539
d4 560
e3 599
e4 600
f3 380
f4 401
g3 420
g4 421
h3 380
h4 420
8902 0.28 MHz
sje@diane:~/tmp$ ./operft -d5
4865609 3.61 MHz
sje@diane:~/tmp$ ./operft -b -d5
4865609 30.03 MHz
sje@diane:~/tmp$ ./operft -b -d6
119060324 47.39 MHz
sje@diane:~/tmp$ ./operft -bt -d6
119060324 84.43 MHz
sje@diane:~/tmp$ ./operft -bt -d7
3195901860 348.44 MHz
sje@diane:~/tmp$ ./operft -p "r4rk1/1pp1qppp/p1np1n2/2b1p1B1/2B1P1b1/P1NP1N2/1PP1QPPP/R4RK1 w - - 0 10" -d5
164075551 5.84 MHz
sje@diane:~/tmp$ ./operft -p "r4rk1/1pp1qppp/p1np1n2/2b1p1B1/2B1P1b1/P1NP1N2/1PP1QPPP/R4RK1 w - - 0 10" -b -d5
164075551 64.14 MHz
sje@diane:~/tmp$ ./operft -p "r4rk1/1pp1qppp/p1np1n2/2b1p1B1/2B1P1b1/P1NP1N2/1PP1QPPP/R4RK1 w - - 0 10" -bt -d5
164075551 87.40 MHz