Benchmark app for your CPU

JohnWoe · Post by **JohnWoe** » Sat Nov 13, 2021 9:38 pm

Created a simple bench program for lulz.

Sum of the first 100M collatz values: https://github.com/SamuraiDangyo/collatzzz

Take binary and run ...

Results smt like this:

Code: Select all

Benchmarking ... 
====================
Result:   17923493476
Time(ms): 7999

dangi12012 · Post by **dangi12012** » Sat Nov 13, 2021 9:42 pm

JohnWoe wrote: ↑Sat Nov 13, 2021 9:38 pm Time(ms): 7999

Make it multithreaded? its only running on a single thread.
5950X results. Would be interesting to see a 12900k here. But I guess its too early anyway because ddr5 is much slower now than it will be in 1 year.

Code: Select all

Benchmarking ...
====================
Result:   17923493476
Time(ms): 6264

Also please add a reciprocal time result here. Something like MFlops: xxx.xx
Because times are non linear in comparsion and 1s faster is completely different between 5-6 and 3-4.

JohnWoe · Post by **JohnWoe** » Sat Nov 13, 2021 10:20 pm

dangi12012 wrote: ↑Sat Nov 13, 2021 9:42 pm
JohnWoe wrote: ↑Sat Nov 13, 2021 9:38 pm Time(ms): 7999
Make it multithreaded? its only running on a single thread.
5950X results. Would be interesting to see a 12900k here. But I guess its too early anyway because ddr5 is much slower now than it will be in 1 year.
Code: Select all
Benchmarking ...
====================
Result:   17923493476
Time(ms): 6264
Also please add a reciprocal time result here. Something like MFlops: xxx.xx
Because times are non linear in comparsion and 1s faster is completely different between 5-6 and 3-4.

Thanks for testing!
I was thinking about multithreaded version. Wanted to keep it simple. But next version is gonna be multithreaded!
I'm using a small "hashtable" too. Which gives a boost to results and benches memory as well.

dangi12012 · Post by **dangi12012** » Sat Nov 13, 2021 10:34 pm

JohnWoe wrote: ↑Sat Nov 13, 2021 10:20 pm
dangi12012 wrote: ↑Sat Nov 13, 2021 9:42 pm
JohnWoe wrote: ↑Sat Nov 13, 2021 9:38 pm Time(ms): 7999
Make it multithreaded? its only running on a single thread.
5950X results. Would be interesting to see a 12900k here. But I guess its too early anyway because ddr5 is much slower now than it will be in 1 year.
Code: Select all
Benchmarking ...
====================
Result:   17923493476
Time(ms): 6264
Also please add a reciprocal time result here. Something like MFlops: xxx.xx
Because times are non linear in comparsion and 1s faster is completely different between 5-6 and 3-4.
Thanks for testing!
I was thinking about multithreaded version. Wanted to keep it simple. But next version is gonna be multithreaded!
I'm using a small "hashtable" too. Which gives a boost to results and benches memory as well.

https://github.com/martinus/robin-hood-hashing
This is the fastest STL like hashtable. It should be a drop in replacement. I helped 3x the speed for some of my apps.

Yes please add multithreading!
Your code does the sum of steps needed to return to 1 for all integers from 0 to 1E7?

dangi12012 · Post by **dangi12012** » Sat Nov 13, 2021 10:54 pm

I can confirm:

Code: Select all

uint64_t sum = 0;
    for (int i = 2; i < 100000000; i++)
    {
        sum += collatz(i);
    }
EQUALS: 17923493476

dangi12012 · Post by **dangi12012** » Sun Nov 14, 2021 1:49 pm

I got interested in this problem and cobbled together this solution:

Code: Select all

#include <iostream>
#include <stdint.h>
#include <bit>
//#include "robin_hood.h"
#include <chrono>
#include <atomic>

//Reference impl
/*
uint64_t steps(uint64_t n) {
    uint64_t steps = 0;
    while (n != 1) {
        if (n % 2 == 0) {
            n = n / 2;
            steps++;
        }
        else {
            n = (3 * n + 1) / 2;
            steps += 2;
        }
    }
    return steps;
}
*/

//robin_hood::unordered_map<uint64_t, uint64_t> steplookup;
//Todo: single threaded recursive dictionary impl

//Threadsave impl
uint64_t steps(uint64_t n) {
    uint64_t N = n;

    //Make input odd
    uint64_t steps = std::countr_zero(n);
    if (steps != 0) n >>= steps;

    while (n != 1) {
        //Repeated expand until even
        while (n & 1) {
            n = (3 * n + 1) / 2;
            steps += 2;
        }

        //Repeated division by 2 until odd
        steps += std::countr_zero(n);
        n >>= std::countr_zero(n);
    }
    
    return steps;
}

int main()
{
    auto t1 = std::chrono::high_resolution_clock::now();
    std::atomic<uint64_t> sum = 0;

    #pragma omp parallel
    #pragma omp for
    for (int i = 2; i < 100000000; i++)
    {
        sum += steps(i);
    }
    auto t2 = std::chrono::high_resolution_clock::now(); 
    auto ms_int = duration_cast<std::chrono::milliseconds>(t2 - t1).count();
    std::cout << sum << "\n";
    std::cout << ms_int / 1000.0 << "s";
}

Output:
17923493476
1.362s

Compile with -OpenMP
What do you get?

dangi12012 · Post by **dangi12012** » Sun Nov 14, 2021 3:37 pm

I continued here - because the collatz problem is different from just a benchmark app:
http://www.talkchess.com/forum3/viewtop ... 48#p911748

JohnWoe · Post by **JohnWoe** » Sun Nov 14, 2021 9:03 pm

dangi12012 wrote: ↑Sun Nov 14, 2021 1:49 pm I got interested in this problem and cobbled together this solution:

Code: Select all

#include <iostream>
#include <stdint.h>
#include <bit>
//#include "robin_hood.h"
#include <chrono>
#include <atomic>

//Reference impl
/*
uint64_t steps(uint64_t n) {
    uint64_t steps = 0;
    while (n != 1) {
        if (n % 2 == 0) {
            n = n / 2;
            steps++;
        }
        else {
            n = (3 * n + 1) / 2;
            steps += 2;
        }
    }
    return steps;
}
*/

//robin_hood::unordered_map<uint64_t, uint64_t> steplookup;
//Todo: single threaded recursive dictionary impl

//Threadsave impl
uint64_t steps(uint64_t n) {
    uint64_t N = n;

    //Make input odd
    uint64_t steps = std::countr_zero(n);
    if (steps != 0) n >>= steps;

    while (n != 1) {
        //Repeated expand until even
        while (n & 1) {
            n = (3 * n + 1) / 2;
            steps += 2;
        }

        //Repeated division by 2 until odd
        steps += std::countr_zero(n);
        n >>= std::countr_zero(n);
    }
    
    return steps;
}

int main()
{
    auto t1 = std::chrono::high_resolution_clock::now();
    std::atomic<uint64_t> sum = 0;

    #pragma omp parallel
    #pragma omp for
    for (int i = 2; i < 100000000; i++)
    {
        sum += steps(i);
    }
    auto t2 = std::chrono::high_resolution_clock::now(); 
    auto ms_int = duration_cast<std::chrono::milliseconds>(t2 - t1).count();
    std::cout << sum << "\n";
    std::cout << ms_int / 1000.0 << "s";
}

Output:
17923493476
1.362s

Compile with -OpenMP
What do you get?

Nice program !!!

Anyway
collatzzz v0.2 is out: https://github.com/SamuraiDangyo/collatzzz
Multithreading supported. Disabled hashtable. Since this is a CPU bench app

1 CPU:

Code: Select all

Benchmarking ... 
============================
Collatz:    0 -> 100,000,000
Sum(steps): 17,923,493,583
CPU(s):     1
NPS:        2,004,000
Time(ms):   49,887

3 CPUs. I have only 3 cores atm.

Code: Select all

> Benchmarking ...
============================
Collatz:    0 -> 100,000,000
Sum(steps): 17,923,493,583
CPU(s):     3
NPS:        4,948,000
Time(ms):   20,207

Also can be run with comman line args:

Code: Select all

./collatzzz 1234567
> Benchmarking ...
============================
Collatz:    0 -> 1,234,567
Sum(steps): 164,873,302
CPU(s):     3
NPS:        5,536,000
Time(ms):   223

dangi12012 · Post by **dangi12012** » Sun Nov 14, 2021 9:08 pm

JohnWoe wrote: ↑Sun Nov 14, 2021 9:03 pm

dangi12012 wrote: ↑Sun Nov 14, 2021 1:49 pm I got interested in this problem and cobbled together this solution:

Code: Select all

#include <iostream>
#include <stdint.h>
#include <bit>
//#include "robin_hood.h"
#include <chrono>
#include <atomic>

//Reference impl
/*
uint64_t steps(uint64_t n) {
    uint64_t steps = 0;
    while (n != 1) {
        if (n % 2 == 0) {
            n = n / 2;
            steps++;
        }
        else {
            n = (3 * n + 1) / 2;
            steps += 2;
        }
    }
    return steps;
}
*/

//robin_hood::unordered_map<uint64_t, uint64_t> steplookup;
//Todo: single threaded recursive dictionary impl

//Threadsave impl
uint64_t steps(uint64_t n) {
    uint64_t N = n;

    //Make input odd
    uint64_t steps = std::countr_zero(n);
    if (steps != 0) n >>= steps;

    while (n != 1) {
        //Repeated expand until even
        while (n & 1) {
            n = (3 * n + 1) / 2;
            steps += 2;
        }

        //Repeated division by 2 until odd
        steps += std::countr_zero(n);
        n >>= std::countr_zero(n);
    }
    
    return steps;
}

int main()
{
    auto t1 = std::chrono::high_resolution_clock::now();
    std::atomic<uint64_t> sum = 0;

    #pragma omp parallel
    #pragma omp for
    for (int i = 2; i < 100000000; i++)
    {
        sum += steps(i);
    }
    auto t2 = std::chrono::high_resolution_clock::now(); 
    auto ms_int = duration_cast<std::chrono::milliseconds>(t2 - t1).count();
    std::cout << sum << "\n";
    std::cout << ms_int / 1000.0 << "s";
}

Output:
17923493476
1.362s

Compile with -OpenMP
What do you get?

Nice program !!!

Anyway
collatzzz v0.2 is out: https://github.com/SamuraiDangyo/collatzzz
Multithreading supported. Disabled hashtable. Since this is a CPU bench app

1 CPU:

Code: Select all

Benchmarking ... 
============================
Collatz:    0 -> 100,000,000
Sum(steps): 17,923,493,583
CPU(s):     1
NPS:        2,004,000
Time(ms):   49,887

3 CPUs. I have only 3 cores atm.

Code: Select all

> Benchmarking ...
============================
Collatz:    0 -> 100,000,000
Sum(steps): 17,923,493,583
CPU(s):     3
NPS:        4,948,000
Time(ms):   20,207

Also can be run with comman line args:

Code: Select all

./collatzzz 1234567
> Benchmarking ...
============================
Collatz:    0 -> 1,234,567
Sum(steps): 164,873,302
CPU(s):     3
NPS:        5,536,000
Time(ms):   223

First of all: Please release the source code. Your command line arguments are not documented - and I would like to compile it myself. How to run with multiple threads? Only you know because it defaults to 1.
Most important: No one likes running random exe files off github

JohnWoe · Post by **JohnWoe** » Sat Nov 20, 2021 5:07 pm

^ Are you running Windows? In Linux this works. I use this in C++ for threading:

Code: Select all

  g_cpus = std::clamp(nth, 1, static_cast<int>(std::thread::hardware_concurrency()));

collatzzz tries to grab all cores possible. Use "-cores 3" to limit cores. As for open source code. Will see. That's just stress...

I added some scaling pyplot on my 16 CPU / 4800U: https://github.com/SamuraiDangyo/collatzzz

I generator 1 million Collatz conjecture steps here: https://github.com/SamuraiDangyo/collat ... -steps.txt
Are you finding errors?

Collatz sum 0 -> 1B in 22.9 seconds

Code: Select all

> ./collatzzz -sum 0 1000000000

... Sum ...
==============================
Collatz:    0 -> 1,000,000,000
Sum(steps): 203,234,783,374
CPU(s):     16
NPS:        8,867,910,000
Time(ms):   22,918

Benchmark app for your CPU

Benchmark app for your CPU

Re: Benchmark app for your CPU

Re: Benchmark app for your CPU

Re: Benchmark app for your CPU

Re: Benchmark app for your CPU

Re: Benchmark app for your CPU

Re: Benchmark app for your CPU

Re: Benchmark app for your CPU

Re: Benchmark app for your CPU

Re: Benchmark app for your CPU