I don't follow that. Don't see how it is more compact to access an array of integers vs an array of shorts or bytes, in terms of instructions...Aleks Peshkov wrote:Using int variables compiler generate shorter code, because it can combine calculation instruction with load from memory assembler instruction in one command.Robert Pope wrote:How about board variable types? Right now I am using int board[192]. Is that fastest because it is the register size, or is it better to use char board[192] because it is more compact?
Speed difference may be or may be not noticeable.
0x88 engines
Moderators: hgm, Rebel, chrisw
-
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: 0x88 engines
-
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: 0x88 engines
You should test it. It is probably faster to access the bytes since they will take 3 cache blocks. If you use the values as subscripts, however, then there is a bit of overhead in that you need to convert a byte into something that can be added in address computations. My board array is chars because it benchmarks faster.Robert Pope wrote:How about board variable types? Right now I am using int board[192]. Is that fastest because it is the register size, or is it better to use char board[192] because it is more compact?
-
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: 0x88 engines
There is also an electronic paper on my web page that talks about 0x88, among others, that you could also look at as it might explain things a bit differently than Bruce did, for an alternative viewpoint.Bill Rogers wrote:Well Robert the gentlemen who are helping you are both full of expert information. I just did not know how far along you were in the developement of your program.
Bill
It is something like "board representations" or something similar.
-
- Posts: 718
- Joined: Fri Mar 20, 2009 8:59 pm
Re: 0x88 engines
Hmm, a more specific implementation question will probably get more specific answers
Is 100k nps for perft or for actual searching, with eval and whatnot? Because a simplistic 0x88 scheme without piecelists or iterative check detection, etc. should get far more than 100k nps in perft...
CPW has source there so you could look at one possible implementation for everything basic there.
Is 100k nps for perft or for actual searching, with eval and whatnot? Because a simplistic 0x88 scheme without piecelists or iterative check detection, etc. should get far more than 100k nps in perft...
CPW has source there so you could look at one possible implementation for everything basic there.
-
- Posts: 892
- Joined: Sun Nov 19, 2006 9:16 pm
- Location: Russia
Re: 0x88 engines
All arithmetic is performed in natural word size by C restriction (and it is faster anyway for any modern processor). Small data need to be loaded into named register before use. Natural int variable can use arithmetic instructions with indirect memory arguments reducing register pressure and number of plain load instructions.bob wrote:I don't follow that. Don't see how it is more compact to access an array of integers vs an array of shorts or bytes, in terms of instructions...Aleks Peshkov wrote:Using int variables compiler generate shorter code, because it can combine calculation instruction with load from memory assembler instruction in one command.Robert Pope wrote:How about board variable types? Right now I am using int board[192]. Is that fastest because it is the register size, or is it better to use char board[192] because it is more compact?
Speed difference may be or may be not noticeable.
0x88 engines use board array as main board representation, bitboard engines do not. Speed vs memory trade of is different.
-
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: 0x88 engines
Not sure I follow. Last time I looked, which was just now, one can do this:Aleks Peshkov wrote:All arithmetic is performed in natural word size by C restriction (and it is faster anyway for any modern processor). Small data need to be loaded into named register before use. Natural int variable can use arithmetic instructions with indirect memory arguments reducing register pressure and number of plain load instructions.bob wrote:I don't follow that. Don't see how it is more compact to access an array of integers vs an array of shorts or bytes, in terms of instructions...Aleks Peshkov wrote:Using int variables compiler generate shorter code, because it can combine calculation instruction with load from memory assembler instruction in one command.Robert Pope wrote:How about board variable types? Right now I am using int board[192]. Is that fastest because it is the register size, or is it better to use char board[192] because it is more compact?
Speed difference may be or may be not noticeable.
0x88 engines use board array as main board representation, bitboard engines do not. Speed vs memory trade of is different.
inc xyz
where xyz is a byte, a word, a doubleword, or a quadword. Ditto for indirect references using something like
inc [ebx]
In microsoft-land you have to tell the compiler what kind of value it will find at [ebx] whereas in linux (gas - ATT format) you just use incb [ebx]. So I _still_ do not follow what you are saying. My board array in Crafty is an array of bytes. Not because I like bytes but because it runs faster. I have other byte arrays that are used for the same reason.
-
- Posts: 670
- Joined: Mon Dec 03, 2007 3:01 pm
- Location: Barcelona, Spain
Re: 0x88 engines
According to Pete Isensee's paper (http://www.tantalon.com/pete/cppopt/app ... ativeCosts) about optimization tricks, doing various operations with int variables is a lot faster than with bytes.
-
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: 0x88 engines
I mentioned that. But, cache is also an issue. An array of bytes is 1/4 the size of an array of ints. And 1/8 the size of an array of longs on a 64 bit processor. Simplest test is to try both and measure the speed. I did and found the array of bytes was better. I'll usually re-check on each new processor that comes out although I did not do this on our evaluation Nehalem box as I was busy trying to figure out a BIOS issue that was messy.Codeman wrote:According to Pete Isensee's paper (http://www.tantalon.com/pete/cppopt/app ... ativeCosts) about optimization tricks, doing various operations with int variables is a lot faster than with bytes.
Also note that "lot" is an ambiguous term. This is not a "huge" deal at all. Whereas cache can be a big one.
-
- Posts: 2250
- Joined: Wed Mar 08, 2006 8:47 pm
- Location: Hattingen, Germany
Re: 0x88 engines
Homogeneous char arrays in conjunction with zero- or sign-extension (movzx, movsx) are fine with respect to the smaller size and possible L1 savings. A 64-byte versus 256-byte board array is one 64-byte cacheline versus four (if cachline aligned). Since the board array is heavily used, all the four cachelines of an int-array are most often fetched from L1, while three others need to been thrown out, which may otherwise reside inside the cache with the byte-array.Codeman wrote:According to Pete Isensee's paper (http://www.tantalon.com/pete/cppopt/app ... ativeCosts) about optimization tricks, doing various operations with int variables is a lot faster than with bytes.
So the marginally slower zero- or sign-extension to load chars to a native 32-bit register likely pays off in less cache misses elsewhere. If and by how much depends on your memory and cache footprint and may vary from program to program.
For the stack, that is locals and parameter (most are passed via registers on x64 anyway) and very often used scalar globals, better stay with int and native register width, also due to alignment, load store forewarding issues, and partial register stalls.
-
- Posts: 228
- Joined: Sun Mar 12, 2006 3:11 pm
Re: 0x88 engines
Does this apply to table look-ups. So it is faster to have int array[] than int8 array[]?Codeman wrote:According to Pete Isensee's paper (http://www.tantalon.com/pete/cppopt/app ... ativeCosts) about optimization tricks, doing various operations with int variables is a lot faster than with bytes.