If just the hardware popcount improves your program by 5% in speed, you're doing something wrong. Doing something incremental then will probably speed you up 20% or so.abulmo wrote:diep wrote:It is the same cpu core from game tree search viewpoint seen. So your code should be equally fast. Only difference is a built in memory controller. That should be a few percent to you, not factor 2 difference in speed.
When reading technical articles, there are more differences, some coming from intermediate CPU architecture.
* µop cache (faster instruction bandwitdh a)
* better branch prediction unit
* New instructions (popcount makes my program 5% faster).
* faster memory (DDR3 vs DDR2).
* built-in system agent
* hyperthreading
* etc.
diep wrote:Of course running at 1 core, as you compare a 8 thread cpu with 4 now.
Of course not. A fair comparison is not to disable half of the capabilities of the sandy bridge. Both CPU are 4 cores. One can support 8 thread, the other not. The fair comparaison is 8 threads against 4 threads.That say, HT acceleration is only 20% (vs 75% if using 8 real cores).
There are many small improvements that add up to make the CPU running my program 50% faster.
The RAM shouldn't be a big issue, only give a few % or so. If it gives more then you're doing something wrong obviously.
Saying a processor is stronger because of hyperthreading is IMHO a wrong argument. First of all very few engines benefit from hyperthreading.
We have only had some guys who work at intel or microsoft who claimed serious advantages in hyperthreading so far. Your name i never saw before. Where do you work?
Even some who brag publicly it does for them, their tournament machine doesn't use it.
The 'bigger bandwidth' for instructions is not true for chess of course. This brings you 0 instructions a cycle extra. Saying something about better branch prediction is not serious if you're using bitboards. Will also bring you 0 benefits.
If the i7 really would be a better processor for integer code, then obviously Diep also would be a lot faster at it, yet it isn't. Even the tiniest improvement in branch prediction i would notice - yet it doesn't.
Same for other codes.
The huge 2 differences are only the hyperthreading and built in memory controller (DDR3 versus DDR2). DDR3 can abort a read already premature after reading 32 bytes. So we already realize here that your hashtables must be pretty weak implemented. As single read must be under 32 bytes otherwise the DDR3 wouldn't show its maximum potential. Even then you should just optimize your program for the CPU, not vice versa.
Hyperthreading i discussed above.
Probably you have a weak form of SMP.
In the first place you should optimize your engine, not look for hardware that can run your crapcode faster.