Benchmark app for your CPU

Discussion of chess software programming and technical issues.

Moderator: Ras

JohnWoe
Posts: 529
Joined: Sat Mar 02, 2013 11:31 pm

Benchmark app for your CPU

Post by JohnWoe »

Created a simple bench program for lulz. :lol:

Sum of the first 100M collatz values: https://github.com/SamuraiDangyo/collatzzz

Take binary and run ...

Results smt like this:

Code: Select all

Benchmarking ... 
====================
Result:   17923493476
Time(ms): 7999
dangi12012
Posts: 1062
Joined: Tue Apr 28, 2020 10:03 pm
Full name: Daniel Infuehr

Re: Benchmark app for your CPU

Post by dangi12012 »

JohnWoe wrote: Sat Nov 13, 2021 9:38 pm Time(ms): 7999
Make it multithreaded? its only running on a single thread.
5950X results. Would be interesting to see a 12900k here. But I guess its too early anyway because ddr5 is much slower now than it will be in 1 year.

Code: Select all

Benchmarking ...
====================
Result:   17923493476
Time(ms): 6264
Also please add a reciprocal time result here. Something like MFlops: xxx.xx
Because times are non linear in comparsion and 1s faster is completely different between 5-6 and 3-4.
Worlds-fastest-Bitboard-Chess-Movegenerator
Daniel Inführ - Software Developer
JohnWoe
Posts: 529
Joined: Sat Mar 02, 2013 11:31 pm

Re: Benchmark app for your CPU

Post by JohnWoe »

dangi12012 wrote: Sat Nov 13, 2021 9:42 pm
JohnWoe wrote: Sat Nov 13, 2021 9:38 pm Time(ms): 7999
Make it multithreaded? its only running on a single thread.
5950X results. Would be interesting to see a 12900k here. But I guess its too early anyway because ddr5 is much slower now than it will be in 1 year.

Code: Select all

Benchmarking ...
====================
Result:   17923493476
Time(ms): 6264
Also please add a reciprocal time result here. Something like MFlops: xxx.xx
Because times are non linear in comparsion and 1s faster is completely different between 5-6 and 3-4.
Thanks for testing!
I was thinking about multithreaded version. Wanted to keep it simple. But next version is gonna be multithreaded!
I'm using a small "hashtable" too. Which gives a boost to results and benches memory as well.
dangi12012
Posts: 1062
Joined: Tue Apr 28, 2020 10:03 pm
Full name: Daniel Infuehr

Re: Benchmark app for your CPU

Post by dangi12012 »

JohnWoe wrote: Sat Nov 13, 2021 10:20 pm
dangi12012 wrote: Sat Nov 13, 2021 9:42 pm
JohnWoe wrote: Sat Nov 13, 2021 9:38 pm Time(ms): 7999
Make it multithreaded? its only running on a single thread.
5950X results. Would be interesting to see a 12900k here. But I guess its too early anyway because ddr5 is much slower now than it will be in 1 year.

Code: Select all

Benchmarking ...
====================
Result:   17923493476
Time(ms): 6264
Also please add a reciprocal time result here. Something like MFlops: xxx.xx
Because times are non linear in comparsion and 1s faster is completely different between 5-6 and 3-4.
Thanks for testing!
I was thinking about multithreaded version. Wanted to keep it simple. But next version is gonna be multithreaded!
I'm using a small "hashtable" too. Which gives a boost to results and benches memory as well.
https://github.com/martinus/robin-hood-hashing
This is the fastest STL like hashtable. It should be a drop in replacement. I helped 3x the speed for some of my apps.

Yes please add multithreading!
Your code does the sum of steps needed to return to 1 for all integers from 0 to 1E7?
Worlds-fastest-Bitboard-Chess-Movegenerator
Daniel Inführ - Software Developer
dangi12012
Posts: 1062
Joined: Tue Apr 28, 2020 10:03 pm
Full name: Daniel Infuehr

Re: Benchmark app for your CPU

Post by dangi12012 »

I can confirm:

Code: Select all

uint64_t sum = 0;
    for (int i = 2; i < 100000000; i++)
    {
        sum += collatz(i);
    }
EQUALS: 17923493476
Worlds-fastest-Bitboard-Chess-Movegenerator
Daniel Inführ - Software Developer
dangi12012
Posts: 1062
Joined: Tue Apr 28, 2020 10:03 pm
Full name: Daniel Infuehr

Re: Benchmark app for your CPU

Post by dangi12012 »

I got interested in this problem and cobbled together this solution:

Code: Select all

#include <iostream>
#include <stdint.h>
#include <bit>
//#include "robin_hood.h"
#include <chrono>
#include <atomic>

//Reference impl
/*
uint64_t steps(uint64_t n) {
    uint64_t steps = 0;
    while (n != 1) {
        if (n % 2 == 0) {
            n = n / 2;
            steps++;
        }
        else {
            n = (3 * n + 1) / 2;
            steps += 2;
        }
    }
    return steps;
}
*/

//robin_hood::unordered_map<uint64_t, uint64_t> steplookup;
//Todo: single threaded recursive dictionary impl

//Threadsave impl
uint64_t steps(uint64_t n) {
    uint64_t N = n;

    //Make input odd
    uint64_t steps = std::countr_zero(n);
    if (steps != 0) n >>= steps;

    while (n != 1) {
        //Repeated expand until even
        while (n & 1) {
            n = (3 * n + 1) / 2;
            steps += 2;
        }

        //Repeated division by 2 until odd
        steps += std::countr_zero(n);
        n >>= std::countr_zero(n);
    }
    
    return steps;
}

int main()
{
    auto t1 = std::chrono::high_resolution_clock::now();
    std::atomic<uint64_t> sum = 0;

    #pragma omp parallel
    #pragma omp for
    for (int i = 2; i < 100000000; i++)
    {
        sum += steps(i);
    }
    auto t2 = std::chrono::high_resolution_clock::now(); 
    auto ms_int = duration_cast<std::chrono::milliseconds>(t2 - t1).count();
    std::cout << sum << "\n";
    std::cout << ms_int / 1000.0 << "s";
}
Output:
17923493476
1.362s

Compile with -OpenMP
What do you get?
Worlds-fastest-Bitboard-Chess-Movegenerator
Daniel Inführ - Software Developer
dangi12012
Posts: 1062
Joined: Tue Apr 28, 2020 10:03 pm
Full name: Daniel Infuehr

Re: Benchmark app for your CPU

Post by dangi12012 »

I continued here - because the collatz problem is different from just a benchmark app:
http://www.talkchess.com/forum3/viewtop ... 48#p911748
Worlds-fastest-Bitboard-Chess-Movegenerator
Daniel Inführ - Software Developer
JohnWoe
Posts: 529
Joined: Sat Mar 02, 2013 11:31 pm

Re: Benchmark app for your CPU

Post by JohnWoe »

dangi12012 wrote: Sun Nov 14, 2021 1:49 pm I got interested in this problem and cobbled together this solution:

Code: Select all

#include <iostream>
#include <stdint.h>
#include <bit>
//#include "robin_hood.h"
#include <chrono>
#include <atomic>

//Reference impl
/*
uint64_t steps(uint64_t n) {
    uint64_t steps = 0;
    while (n != 1) {
        if (n % 2 == 0) {
            n = n / 2;
            steps++;
        }
        else {
            n = (3 * n + 1) / 2;
            steps += 2;
        }
    }
    return steps;
}
*/

//robin_hood::unordered_map<uint64_t, uint64_t> steplookup;
//Todo: single threaded recursive dictionary impl

//Threadsave impl
uint64_t steps(uint64_t n) {
    uint64_t N = n;

    //Make input odd
    uint64_t steps = std::countr_zero(n);
    if (steps != 0) n >>= steps;

    while (n != 1) {
        //Repeated expand until even
        while (n & 1) {
            n = (3 * n + 1) / 2;
            steps += 2;
        }

        //Repeated division by 2 until odd
        steps += std::countr_zero(n);
        n >>= std::countr_zero(n);
    }
    
    return steps;
}

int main()
{
    auto t1 = std::chrono::high_resolution_clock::now();
    std::atomic<uint64_t> sum = 0;

    #pragma omp parallel
    #pragma omp for
    for (int i = 2; i < 100000000; i++)
    {
        sum += steps(i);
    }
    auto t2 = std::chrono::high_resolution_clock::now(); 
    auto ms_int = duration_cast<std::chrono::milliseconds>(t2 - t1).count();
    std::cout << sum << "\n";
    std::cout << ms_int / 1000.0 << "s";
}
Output:
17923493476
1.362s

Compile with -OpenMP
What do you get?
Nice program !!!

Anyway
collatzzz v0.2 is out: https://github.com/SamuraiDangyo/collatzzz
Multithreading supported. Disabled hashtable. Since this is a CPU bench app :lol:

1 CPU:

Code: Select all

Benchmarking ... 
============================
Collatz:    0 -> 100,000,000
Sum(steps): 17,923,493,583
CPU(s):     1
NPS:        2,004,000
Time(ms):   49,887
3 CPUs. I have only 3 cores atm.

Code: Select all

> Benchmarking ...
============================
Collatz:    0 -> 100,000,000
Sum(steps): 17,923,493,583
CPU(s):     3
NPS:        4,948,000
Time(ms):   20,207
Also can be run with comman line args:

Code: Select all

./collatzzz 1234567
> Benchmarking ...
============================
Collatz:    0 -> 1,234,567
Sum(steps): 164,873,302
CPU(s):     3
NPS:        5,536,000
Time(ms):   223
dangi12012
Posts: 1062
Joined: Tue Apr 28, 2020 10:03 pm
Full name: Daniel Infuehr

Re: Benchmark app for your CPU

Post by dangi12012 »

JohnWoe wrote: Sun Nov 14, 2021 9:03 pm
dangi12012 wrote: Sun Nov 14, 2021 1:49 pm I got interested in this problem and cobbled together this solution:

Code: Select all

#include <iostream>
#include <stdint.h>
#include <bit>
//#include "robin_hood.h"
#include <chrono>
#include <atomic>

//Reference impl
/*
uint64_t steps(uint64_t n) {
    uint64_t steps = 0;
    while (n != 1) {
        if (n % 2 == 0) {
            n = n / 2;
            steps++;
        }
        else {
            n = (3 * n + 1) / 2;
            steps += 2;
        }
    }
    return steps;
}
*/

//robin_hood::unordered_map<uint64_t, uint64_t> steplookup;
//Todo: single threaded recursive dictionary impl

//Threadsave impl
uint64_t steps(uint64_t n) {
    uint64_t N = n;

    //Make input odd
    uint64_t steps = std::countr_zero(n);
    if (steps != 0) n >>= steps;

    while (n != 1) {
        //Repeated expand until even
        while (n & 1) {
            n = (3 * n + 1) / 2;
            steps += 2;
        }

        //Repeated division by 2 until odd
        steps += std::countr_zero(n);
        n >>= std::countr_zero(n);
    }
    
    return steps;
}

int main()
{
    auto t1 = std::chrono::high_resolution_clock::now();
    std::atomic<uint64_t> sum = 0;

    #pragma omp parallel
    #pragma omp for
    for (int i = 2; i < 100000000; i++)
    {
        sum += steps(i);
    }
    auto t2 = std::chrono::high_resolution_clock::now(); 
    auto ms_int = duration_cast<std::chrono::milliseconds>(t2 - t1).count();
    std::cout << sum << "\n";
    std::cout << ms_int / 1000.0 << "s";
}
Output:
17923493476
1.362s

Compile with -OpenMP
What do you get?
Nice program !!!

Anyway
collatzzz v0.2 is out: https://github.com/SamuraiDangyo/collatzzz
Multithreading supported. Disabled hashtable. Since this is a CPU bench app :lol:

1 CPU:

Code: Select all

Benchmarking ... 
============================
Collatz:    0 -> 100,000,000
Sum(steps): 17,923,493,583
CPU(s):     1
NPS:        2,004,000
Time(ms):   49,887
3 CPUs. I have only 3 cores atm.

Code: Select all

> Benchmarking ...
============================
Collatz:    0 -> 100,000,000
Sum(steps): 17,923,493,583
CPU(s):     3
NPS:        4,948,000
Time(ms):   20,207
Also can be run with comman line args:

Code: Select all

./collatzzz 1234567
> Benchmarking ...
============================
Collatz:    0 -> 1,234,567
Sum(steps): 164,873,302
CPU(s):     3
NPS:        5,536,000
Time(ms):   223
First of all: Please release the source code. Your command line arguments are not documented - and I would like to compile it myself. How to run with multiple threads? Only you know because it defaults to 1.
Most important: No one likes running random exe files off github
Worlds-fastest-Bitboard-Chess-Movegenerator
Daniel Inführ - Software Developer
JohnWoe
Posts: 529
Joined: Sat Mar 02, 2013 11:31 pm

Re: Benchmark app for your CPU

Post by JohnWoe »

^ Are you running Windows? In Linux this works. I use this in C++ for threading:

Code: Select all

  g_cpus = std::clamp(nth, 1, static_cast<int>(std::thread::hardware_concurrency()));
collatzzz tries to grab all cores possible. Use "-cores 3" to limit cores. As for open source code. Will see. That's just stress...

I added some scaling pyplot on my 16 CPU / 4800U: https://github.com/SamuraiDangyo/collatzzz

I generator 1 million Collatz conjecture steps here: https://github.com/SamuraiDangyo/collat ... -steps.txt
Are you finding errors?

Collatz sum 0 -> 1B in 22.9 seconds :D

Code: Select all

> ./collatzzz -sum 0 1000000000

... Sum ...
==============================
Collatz:    0 -> 1,000,000,000
Sum(steps): 203,234,783,374
CPU(s):     16
NPS:        8,867,910,000
Time(ms):   22,918