
Sum of the first 100M collatz values: https://github.com/SamuraiDangyo/collatzzz
Take binary and run ...
Results smt like this:
Code: Select all
Benchmarking ...
====================
Result: 17923493476
Time(ms): 7999
Moderator: Ras
Code: Select all
Benchmarking ...
====================
Result: 17923493476
Time(ms): 7999
Make it multithreaded? its only running on a single thread.
Code: Select all
Benchmarking ...
====================
Result: 17923493476
Time(ms): 6264
Thanks for testing!dangi12012 wrote: ↑Sat Nov 13, 2021 9:42 pmMake it multithreaded? its only running on a single thread.
5950X results. Would be interesting to see a 12900k here. But I guess its too early anyway because ddr5 is much slower now than it will be in 1 year.
Also please add a reciprocal time result here. Something like MFlops: xxx.xxCode: Select all
Benchmarking ... ==================== Result: 17923493476 Time(ms): 6264
Because times are non linear in comparsion and 1s faster is completely different between 5-6 and 3-4.
https://github.com/martinus/robin-hood-hashingJohnWoe wrote: ↑Sat Nov 13, 2021 10:20 pmThanks for testing!dangi12012 wrote: ↑Sat Nov 13, 2021 9:42 pmMake it multithreaded? its only running on a single thread.
5950X results. Would be interesting to see a 12900k here. But I guess its too early anyway because ddr5 is much slower now than it will be in 1 year.
Also please add a reciprocal time result here. Something like MFlops: xxx.xxCode: Select all
Benchmarking ... ==================== Result: 17923493476 Time(ms): 6264
Because times are non linear in comparsion and 1s faster is completely different between 5-6 and 3-4.
I was thinking about multithreaded version. Wanted to keep it simple. But next version is gonna be multithreaded!
I'm using a small "hashtable" too. Which gives a boost to results and benches memory as well.
Code: Select all
uint64_t sum = 0;
for (int i = 2; i < 100000000; i++)
{
sum += collatz(i);
}
EQUALS: 17923493476
Code: Select all
#include <iostream>
#include <stdint.h>
#include <bit>
//#include "robin_hood.h"
#include <chrono>
#include <atomic>
//Reference impl
/*
uint64_t steps(uint64_t n) {
uint64_t steps = 0;
while (n != 1) {
if (n % 2 == 0) {
n = n / 2;
steps++;
}
else {
n = (3 * n + 1) / 2;
steps += 2;
}
}
return steps;
}
*/
//robin_hood::unordered_map<uint64_t, uint64_t> steplookup;
//Todo: single threaded recursive dictionary impl
//Threadsave impl
uint64_t steps(uint64_t n) {
uint64_t N = n;
//Make input odd
uint64_t steps = std::countr_zero(n);
if (steps != 0) n >>= steps;
while (n != 1) {
//Repeated expand until even
while (n & 1) {
n = (3 * n + 1) / 2;
steps += 2;
}
//Repeated division by 2 until odd
steps += std::countr_zero(n);
n >>= std::countr_zero(n);
}
return steps;
}
int main()
{
auto t1 = std::chrono::high_resolution_clock::now();
std::atomic<uint64_t> sum = 0;
#pragma omp parallel
#pragma omp for
for (int i = 2; i < 100000000; i++)
{
sum += steps(i);
}
auto t2 = std::chrono::high_resolution_clock::now();
auto ms_int = duration_cast<std::chrono::milliseconds>(t2 - t1).count();
std::cout << sum << "\n";
std::cout << ms_int / 1000.0 << "s";
}
Nice program !!!dangi12012 wrote: ↑Sun Nov 14, 2021 1:49 pm I got interested in this problem and cobbled together this solution:Output:Code: Select all
#include <iostream> #include <stdint.h> #include <bit> //#include "robin_hood.h" #include <chrono> #include <atomic> //Reference impl /* uint64_t steps(uint64_t n) { uint64_t steps = 0; while (n != 1) { if (n % 2 == 0) { n = n / 2; steps++; } else { n = (3 * n + 1) / 2; steps += 2; } } return steps; } */ //robin_hood::unordered_map<uint64_t, uint64_t> steplookup; //Todo: single threaded recursive dictionary impl //Threadsave impl uint64_t steps(uint64_t n) { uint64_t N = n; //Make input odd uint64_t steps = std::countr_zero(n); if (steps != 0) n >>= steps; while (n != 1) { //Repeated expand until even while (n & 1) { n = (3 * n + 1) / 2; steps += 2; } //Repeated division by 2 until odd steps += std::countr_zero(n); n >>= std::countr_zero(n); } return steps; } int main() { auto t1 = std::chrono::high_resolution_clock::now(); std::atomic<uint64_t> sum = 0; #pragma omp parallel #pragma omp for for (int i = 2; i < 100000000; i++) { sum += steps(i); } auto t2 = std::chrono::high_resolution_clock::now(); auto ms_int = duration_cast<std::chrono::milliseconds>(t2 - t1).count(); std::cout << sum << "\n"; std::cout << ms_int / 1000.0 << "s"; }
17923493476
1.362s
Compile with -OpenMP
What do you get?
Code: Select all
Benchmarking ...
============================
Collatz: 0 -> 100,000,000
Sum(steps): 17,923,493,583
CPU(s): 1
NPS: 2,004,000
Time(ms): 49,887
Code: Select all
> Benchmarking ...
============================
Collatz: 0 -> 100,000,000
Sum(steps): 17,923,493,583
CPU(s): 3
NPS: 4,948,000
Time(ms): 20,207
Code: Select all
./collatzzz 1234567
> Benchmarking ...
============================
Collatz: 0 -> 1,234,567
Sum(steps): 164,873,302
CPU(s): 3
NPS: 5,536,000
Time(ms): 223
First of all: Please release the source code. Your command line arguments are not documented - and I would like to compile it myself. How to run with multiple threads? Only you know because it defaults to 1.JohnWoe wrote: ↑Sun Nov 14, 2021 9:03 pmNice program !!!dangi12012 wrote: ↑Sun Nov 14, 2021 1:49 pm I got interested in this problem and cobbled together this solution:Output:Code: Select all
#include <iostream> #include <stdint.h> #include <bit> //#include "robin_hood.h" #include <chrono> #include <atomic> //Reference impl /* uint64_t steps(uint64_t n) { uint64_t steps = 0; while (n != 1) { if (n % 2 == 0) { n = n / 2; steps++; } else { n = (3 * n + 1) / 2; steps += 2; } } return steps; } */ //robin_hood::unordered_map<uint64_t, uint64_t> steplookup; //Todo: single threaded recursive dictionary impl //Threadsave impl uint64_t steps(uint64_t n) { uint64_t N = n; //Make input odd uint64_t steps = std::countr_zero(n); if (steps != 0) n >>= steps; while (n != 1) { //Repeated expand until even while (n & 1) { n = (3 * n + 1) / 2; steps += 2; } //Repeated division by 2 until odd steps += std::countr_zero(n); n >>= std::countr_zero(n); } return steps; } int main() { auto t1 = std::chrono::high_resolution_clock::now(); std::atomic<uint64_t> sum = 0; #pragma omp parallel #pragma omp for for (int i = 2; i < 100000000; i++) { sum += steps(i); } auto t2 = std::chrono::high_resolution_clock::now(); auto ms_int = duration_cast<std::chrono::milliseconds>(t2 - t1).count(); std::cout << sum << "\n"; std::cout << ms_int / 1000.0 << "s"; }
17923493476
1.362s
Compile with -OpenMP
What do you get?
Anyway
collatzzz v0.2 is out: https://github.com/SamuraiDangyo/collatzzz
Multithreading supported. Disabled hashtable. Since this is a CPU bench app![]()
1 CPU:3 CPUs. I have only 3 cores atm.Code: Select all
Benchmarking ... ============================ Collatz: 0 -> 100,000,000 Sum(steps): 17,923,493,583 CPU(s): 1 NPS: 2,004,000 Time(ms): 49,887
Also can be run with comman line args:Code: Select all
> Benchmarking ... ============================ Collatz: 0 -> 100,000,000 Sum(steps): 17,923,493,583 CPU(s): 3 NPS: 4,948,000 Time(ms): 20,207
Code: Select all
./collatzzz 1234567 > Benchmarking ... ============================ Collatz: 0 -> 1,234,567 Sum(steps): 164,873,302 CPU(s): 3 NPS: 5,536,000 Time(ms): 223
Code: Select all
g_cpus = std::clamp(nth, 1, static_cast<int>(std::thread::hardware_concurrency()));
Code: Select all
> ./collatzzz -sum 0 1000000000
... Sum ...
==============================
Collatz: 0 -> 1,000,000,000
Sum(steps): 203,234,783,374
CPU(s): 16
NPS: 8,867,910,000
Time(ms): 22,918