What kind of performance relative to one another should one expect with the following compile options:
POPCNT
PREFETCH
POPCNT/PREFETCH
DEFAULT
I am running a test with the same engine compiled with these different options; 300 1+0 games, and will post result when done. Anyone care to guess which will come out on top? The System is an AMD Phenom II 940.
PREFETCH vs POPCNT vs ...
Moderator: Ras
-
- Posts: 4190
- Joined: Wed Nov 25, 2009 1:47 am
Re: PREFETCH vs POPCNT vs ...
They combined can give like 10-15% speed improvement, so 10-12 ELO max.
On 300 games you error bars will around 30 ELO. So not conclusive which ever result you get...
On 300 games you error bars will around 30 ELO. So not conclusive which ever result you get...
Re: PREFETCH vs POPCNT vs ...
yea, I agree this result will not be conclusive. It may give a little hint though. Maybe some day when I have more time, I will do an extensive test. On paper, the combined should be better. I think the head-to-head match up will be really interesting...Milos wrote:They combined can give like 10-15% speed improvement, so 10-12 ELO max.
On 300 games you error bars will around 30 ELO. So not conclusive which ever result you get...
-
- Posts: 373
- Joined: Thu Aug 14, 2008 3:21 am
- Location: Albuquerque, NM
Re: PREFETCH vs POPCNT vs ...
Can't you determine performance benefit without running a tournament?kingliveson wrote:What kind of performance relative to one another should one expect with the following compile options:
POPCNT
PREFETCH
POPCNT/PREFETCH
DEFAULT
I tested prefetch and alignment with 2 engines on two platforms. I found prefetch to be worthless. I could at least measure alignment benefit, but it was still very small. I tested on MIPS32 and core2.
Re: PREFETCH vs POPCNT vs ...
One can always do bench marks...raw data like a tournament is another way. I have always wondered whether prefetching would actually have a negative effect or any effect at all for chess engines. Perhaps a smart developer can manipulate the behavior of the instruction set call--how often and how it is used in a chess program to take full advantage.bnemias wrote:Can't you determine performance benefit without running a tournament?kingliveson wrote:What kind of performance relative to one another should one expect with the following compile options:
POPCNT
PREFETCH
POPCNT/PREFETCH
DEFAULT
I tested prefetch and alignment with 2 engines on two platforms. I found prefetch to be worthless. I could at least measure alignment benefit, but it was still very small. I tested on MIPS32 and core2.
-
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: PREFETCH vs POPCNT vs ...
You have to be careful with this "little hint" idea, which suggests that if A is better than B then that means something, whereas with a large error bar, it really means nothing at all, not even a "little hint".kingliveson wrote:yea, I agree this result will not be conclusive. It may give a little hint though. Maybe some day when I have more time, I will do an extensive test. On paper, the combined should be better. I think the head-to-head match up will be really interesting...Milos wrote:They combined can give like 10-15% speed improvement, so 10-12 ELO max.
On 300 games you error bars will around 30 ELO. So not conclusive which ever result you get...
-
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: PREFETCH vs POPCNT vs ...
I am not sure how a compiler is going to use POPCNT, or what it is supposed to do, since there is no C semantic for doing that. Which means those of us doing a popcnt operation use some sort of table lookup idea, or else inline asm based on the A & A-1 trick. You'd have to either use inline asm and use a popcnt instruction, or else use the MSCV popcnt intrinsic which is not compatible with other compilers.kingliveson wrote:What kind of performance relative to one another should one expect with the following compile options:
POPCNT
PREFETCH
POPCNT/PREFETCH
DEFAULT
I am running a test with the same engine compiled with these different options; 300 1+0 games, and will post result when done. Anyone care to guess which will come out on top? The System is an AMD Phenom II 940.
-
- Posts: 4190
- Joined: Wed Nov 25, 2009 1:47 am
Re: PREFETCH vs POPCNT vs ...
The story about popcnt is quite simple.bob wrote:I am not sure how a compiler is going to use POPCNT, or what it is supposed to do, since there is no C semantic for doing that. Which means those of us doing a popcnt operation use some sort of table lookup idea, or else inline asm based on the A & A-1 trick. You'd have to either use inline asm and use a popcnt instruction, or else use the MSCV popcnt intrinsic which is not compatible with other compilers.
If your processor supports SSE4a use hardware one. If it doesn't unless your bitboards are really sparse (when you use assemble optimized B&(B-1)) use the famous one from:
http://www.amd.com/us-en/assets/content ... /25112.PDF.
Everything else (especially compiler ones) is slower in practice for chess implementations...
Re: PREFETCH vs POPCNT vs ...
This is just a bit manipulation that is specific to hardware and not compilers or language standards. The code will compile, but if your hardware does not understand the machine language instructions, the program crashes. The algorithm can also be accomplished for machines without popcnt instruction set using inline asm that may not be as efficient. I am interested in the benefit it provides for chess programs.Milos wrote:The story about popcnt is quite simple.bob wrote:I am not sure how a compiler is going to use POPCNT, or what it is supposed to do, since there is no C semantic for doing that. Which means those of us doing a popcnt operation use some sort of table lookup idea, or else inline asm based on the A & A-1 trick. You'd have to either use inline asm and use a popcnt instruction, or else use the MSCV popcnt intrinsic which is not compatible with other compilers.
If your processor supports SSE4a use hardware one. If it doesn't unless your bitboards are really sparse (when you use assemble optimized B&(B-1)) use the famous one from:
http://www.amd.com/us-en/assets/content ... /25112.PDF.
Everything else (especially compiler ones) is slower in practice for chess implementations...
I stand corrected. Since I promised to post the outcome, I will do just that. I would like suggestion though on the minimum games that one would need to start seeing a pattern. This question has been asked repeatedly as to how these optimizations affect chess engine perfomance. I and many others are interested in the final out come.bob wrote:You have to be careful with this "little hint" idea, which suggests that if A is better than B then that means something, whereas with a large error bar, it really means nothing at all, not even a "little hint".kingliveson wrote:yea, I agree this result will not be conclusive. It may give a little hint though. Maybe some day when I have more time, I will do an extensive test. On paper, the combined should be better. I think the head-to-head match up will be really interesting...Milos wrote:They combined can give like 10-15% speed improvement, so 10-12 ELO max.
On 300 games you error bars will around 30 ELO. So not conclusive which ever result you get...
There was a crash along the way somewhere, but here is the results:
Code: Select all
Individual statistics:
1 DEFAULT : 8 124 (+ 24,= 80,- 20), 51.6 %
PREFETCH/POPCNT : 42 (+ 5,= 31,- 6), 48.8 %
POPCNT : 41 (+ 9,= 27,- 5), 54.9 %
PREFETCH : 41 (+ 10,= 22,- 9), 51.2 %
2 PREFETCH/POPCNT : 6 124 (+ 18,= 91,- 15), 51.2 %
DEFAULT : 42 (+ 6,= 31,- 5), 51.2 %
POPCNT : 41 (+ 4,= 31,- 6), 47.6 %
PREFETCH : 41 (+ 8,= 29,- 4), 54.9 %
3 PREFETCH : -2 123 (+ 20,= 82,- 21), 49.6 %
DEFAULT : 41 (+ 9,= 22,- 10), 48.8 %
PREFETCH/POPCNT : 41 (+ 4,= 29,- 8), 45.1 %
POPCNT : 41 (+ 7,= 31,- 3), 54.9 %
4 POPCNT : -13 123 (+ 14,= 89,- 20), 47.6 %
DEFAULT : 41 (+ 5,= 27,- 9), 45.1 %
PREFETCH/POPCNT : 41 (+ 6,= 31,- 4), 52.4 %
PREFETCH : 41 (+ 3,= 31,- 7), 45.1 %
-
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: PREFETCH vs POPCNT vs ...
Look at the "interesting test results" thread I started. Things can change significantly even after thousands of games. The data in that post is pretty easy to follow.kingliveson wrote:This is just a bit manipulation that is specific to hardware and not compilers or language standards. The code will compile, but if your hardware does not understand the machine language instructions, the program crashes. The algorithm can also be accomplished for machines without popcnt instruction set using inline asm that may not be as efficient. I am interested in the benefit it provides for chess programs.Milos wrote:The story about popcnt is quite simple.bob wrote:I am not sure how a compiler is going to use POPCNT, or what it is supposed to do, since there is no C semantic for doing that. Which means those of us doing a popcnt operation use some sort of table lookup idea, or else inline asm based on the A & A-1 trick. You'd have to either use inline asm and use a popcnt instruction, or else use the MSCV popcnt intrinsic which is not compatible with other compilers.
If your processor supports SSE4a use hardware one. If it doesn't unless your bitboards are really sparse (when you use assemble optimized B&(B-1)) use the famous one from:
http://www.amd.com/us-en/assets/content ... /25112.PDF.
Everything else (especially compiler ones) is slower in practice for chess implementations...
I stand corrected. Since I promised to post the outcome, I will do just that. I would like suggestion though on the minimum games that one would need to start seeing a pattern. This question has been asked repeatedly as to how these optimizations affect chess engine perfomance. I and many others are interested in the final out come.bob wrote:You have to be careful with this "little hint" idea, which suggests that if A is better than B then that means something, whereas with a large error bar, it really means nothing at all, not even a "little hint".kingliveson wrote:yea, I agree this result will not be conclusive. It may give a little hint though. Maybe some day when I have more time, I will do an extensive test. On paper, the combined should be better. I think the head-to-head match up will be really interesting...Milos wrote:They combined can give like 10-15% speed improvement, so 10-12 ELO max.
On 300 games you error bars will around 30 ELO. So not conclusive which ever result you get...
There was a crash along the way somewhere, but here is the results:
The test was done using 25 positions with color switch.Code: Select all
Individual statistics: 1 DEFAULT : 8 124 (+ 24,= 80,- 20), 51.6 % PREFETCH/POPCNT : 42 (+ 5,= 31,- 6), 48.8 % POPCNT : 41 (+ 9,= 27,- 5), 54.9 % PREFETCH : 41 (+ 10,= 22,- 9), 51.2 % 2 PREFETCH/POPCNT : 6 124 (+ 18,= 91,- 15), 51.2 % DEFAULT : 42 (+ 6,= 31,- 5), 51.2 % POPCNT : 41 (+ 4,= 31,- 6), 47.6 % PREFETCH : 41 (+ 8,= 29,- 4), 54.9 % 3 PREFETCH : -2 123 (+ 20,= 82,- 21), 49.6 % DEFAULT : 41 (+ 9,= 22,- 10), 48.8 % PREFETCH/POPCNT : 41 (+ 4,= 29,- 8), 45.1 % POPCNT : 41 (+ 7,= 31,- 3), 54.9 % 4 POPCNT : -13 123 (+ 14,= 89,- 20), 47.6 % DEFAULT : 41 (+ 5,= 27,- 9), 45.1 % PREFETCH/POPCNT : 41 (+ 6,= 31,- 4), 52.4 % PREFETCH : 41 (+ 3,= 31,- 7), 45.1 %