 
 Sum of the first 100M collatz values: https://github.com/SamuraiDangyo/collatzzz
Take binary and run ...
Results smt like this:
Code: Select all
Benchmarking ... 
====================
Result:   17923493476
Time(ms): 7999Moderator: Ras
 
 Code: Select all
Benchmarking ... 
====================
Result:   17923493476
Time(ms): 7999Make it multithreaded? its only running on a single thread.
Code: Select all
Benchmarking ...
====================
Result:   17923493476
Time(ms): 6264Thanks for testing!dangi12012 wrote: ↑Sat Nov 13, 2021 9:42 pmMake it multithreaded? its only running on a single thread.
5950X results. Would be interesting to see a 12900k here. But I guess its too early anyway because ddr5 is much slower now than it will be in 1 year.
Also please add a reciprocal time result here. Something like MFlops: xxx.xxCode: Select all
Benchmarking ... ==================== Result: 17923493476 Time(ms): 6264
Because times are non linear in comparsion and 1s faster is completely different between 5-6 and 3-4.
https://github.com/martinus/robin-hood-hashingJohnWoe wrote: ↑Sat Nov 13, 2021 10:20 pmThanks for testing!dangi12012 wrote: ↑Sat Nov 13, 2021 9:42 pmMake it multithreaded? its only running on a single thread.
5950X results. Would be interesting to see a 12900k here. But I guess its too early anyway because ddr5 is much slower now than it will be in 1 year.
Also please add a reciprocal time result here. Something like MFlops: xxx.xxCode: Select all
Benchmarking ... ==================== Result: 17923493476 Time(ms): 6264
Because times are non linear in comparsion and 1s faster is completely different between 5-6 and 3-4.
I was thinking about multithreaded version. Wanted to keep it simple. But next version is gonna be multithreaded!
I'm using a small "hashtable" too. Which gives a boost to results and benches memory as well.
Code: Select all
uint64_t sum = 0;
    for (int i = 2; i < 100000000; i++)
    {
        sum += collatz(i);
    }
EQUALS: 17923493476Code: Select all
#include <iostream>
#include <stdint.h>
#include <bit>
//#include "robin_hood.h"
#include <chrono>
#include <atomic>
//Reference impl
/*
uint64_t steps(uint64_t n) {
    uint64_t steps = 0;
    while (n != 1) {
        if (n % 2 == 0) {
            n = n / 2;
            steps++;
        }
        else {
            n = (3 * n + 1) / 2;
            steps += 2;
        }
    }
    return steps;
}
*/
//robin_hood::unordered_map<uint64_t, uint64_t> steplookup;
//Todo: single threaded recursive dictionary impl
//Threadsave impl
uint64_t steps(uint64_t n) {
    uint64_t N = n;
    //Make input odd
    uint64_t steps = std::countr_zero(n);
    if (steps != 0) n >>= steps;
    while (n != 1) {
        //Repeated expand until even
        while (n & 1) {
            n = (3 * n + 1) / 2;
            steps += 2;
        }
        //Repeated division by 2 until odd
        steps += std::countr_zero(n);
        n >>= std::countr_zero(n);
    }
    
    return steps;
}
int main()
{
    auto t1 = std::chrono::high_resolution_clock::now();
    std::atomic<uint64_t> sum = 0;
    #pragma omp parallel
    #pragma omp for
    for (int i = 2; i < 100000000; i++)
    {
        sum += steps(i);
    }
    auto t2 = std::chrono::high_resolution_clock::now(); 
    auto ms_int = duration_cast<std::chrono::milliseconds>(t2 - t1).count();
    std::cout << sum << "\n";
    std::cout << ms_int / 1000.0 << "s";
}Nice program !!!dangi12012 wrote: ↑Sun Nov 14, 2021 1:49 pm I got interested in this problem and cobbled together this solution:Output:Code: Select all
#include <iostream> #include <stdint.h> #include <bit> //#include "robin_hood.h" #include <chrono> #include <atomic> //Reference impl /* uint64_t steps(uint64_t n) { uint64_t steps = 0; while (n != 1) { if (n % 2 == 0) { n = n / 2; steps++; } else { n = (3 * n + 1) / 2; steps += 2; } } return steps; } */ //robin_hood::unordered_map<uint64_t, uint64_t> steplookup; //Todo: single threaded recursive dictionary impl //Threadsave impl uint64_t steps(uint64_t n) { uint64_t N = n; //Make input odd uint64_t steps = std::countr_zero(n); if (steps != 0) n >>= steps; while (n != 1) { //Repeated expand until even while (n & 1) { n = (3 * n + 1) / 2; steps += 2; } //Repeated division by 2 until odd steps += std::countr_zero(n); n >>= std::countr_zero(n); } return steps; } int main() { auto t1 = std::chrono::high_resolution_clock::now(); std::atomic<uint64_t> sum = 0; #pragma omp parallel #pragma omp for for (int i = 2; i < 100000000; i++) { sum += steps(i); } auto t2 = std::chrono::high_resolution_clock::now(); auto ms_int = duration_cast<std::chrono::milliseconds>(t2 - t1).count(); std::cout << sum << "\n"; std::cout << ms_int / 1000.0 << "s"; }
17923493476
1.362s
Compile with -OpenMP
What do you get?
 
 Code: Select all
Benchmarking ... 
============================
Collatz:    0 -> 100,000,000
Sum(steps): 17,923,493,583
CPU(s):     1
NPS:        2,004,000
Time(ms):   49,887Code: Select all
> Benchmarking ...
============================
Collatz:    0 -> 100,000,000
Sum(steps): 17,923,493,583
CPU(s):     3
NPS:        4,948,000
Time(ms):   20,207Code: Select all
./collatzzz 1234567
> Benchmarking ...
============================
Collatz:    0 -> 1,234,567
Sum(steps): 164,873,302
CPU(s):     3
NPS:        5,536,000
Time(ms):   223First of all: Please release the source code. Your command line arguments are not documented - and I would like to compile it myself. How to run with multiple threads? Only you know because it defaults to 1.JohnWoe wrote: ↑Sun Nov 14, 2021 9:03 pmNice program !!!dangi12012 wrote: ↑Sun Nov 14, 2021 1:49 pm I got interested in this problem and cobbled together this solution:Output:Code: Select all
#include <iostream> #include <stdint.h> #include <bit> //#include "robin_hood.h" #include <chrono> #include <atomic> //Reference impl /* uint64_t steps(uint64_t n) { uint64_t steps = 0; while (n != 1) { if (n % 2 == 0) { n = n / 2; steps++; } else { n = (3 * n + 1) / 2; steps += 2; } } return steps; } */ //robin_hood::unordered_map<uint64_t, uint64_t> steplookup; //Todo: single threaded recursive dictionary impl //Threadsave impl uint64_t steps(uint64_t n) { uint64_t N = n; //Make input odd uint64_t steps = std::countr_zero(n); if (steps != 0) n >>= steps; while (n != 1) { //Repeated expand until even while (n & 1) { n = (3 * n + 1) / 2; steps += 2; } //Repeated division by 2 until odd steps += std::countr_zero(n); n >>= std::countr_zero(n); } return steps; } int main() { auto t1 = std::chrono::high_resolution_clock::now(); std::atomic<uint64_t> sum = 0; #pragma omp parallel #pragma omp for for (int i = 2; i < 100000000; i++) { sum += steps(i); } auto t2 = std::chrono::high_resolution_clock::now(); auto ms_int = duration_cast<std::chrono::milliseconds>(t2 - t1).count(); std::cout << sum << "\n"; std::cout << ms_int / 1000.0 << "s"; }
17923493476
1.362s
Compile with -OpenMP
What do you get?
Anyway
collatzzz v0.2 is out: https://github.com/SamuraiDangyo/collatzzz
Multithreading supported. Disabled hashtable. Since this is a CPU bench app
1 CPU:3 CPUs. I have only 3 cores atm.Code: Select all
Benchmarking ... ============================ Collatz: 0 -> 100,000,000 Sum(steps): 17,923,493,583 CPU(s): 1 NPS: 2,004,000 Time(ms): 49,887Also can be run with comman line args:Code: Select all
> Benchmarking ... ============================ Collatz: 0 -> 100,000,000 Sum(steps): 17,923,493,583 CPU(s): 3 NPS: 4,948,000 Time(ms): 20,207Code: Select all
./collatzzz 1234567 > Benchmarking ... ============================ Collatz: 0 -> 1,234,567 Sum(steps): 164,873,302 CPU(s): 3 NPS: 5,536,000 Time(ms): 223
Code: Select all
  g_cpus = std::clamp(nth, 1, static_cast<int>(std::thread::hardware_concurrency()));
 
 
Code: Select all
> ./collatzzz -sum 0 1000000000
... Sum ...
==============================
Collatz:    0 -> 1,000,000,000
Sum(steps): 203,234,783,374
CPU(s):     16
NPS:        8,867,910,000
Time(ms):   22,918