0x88 engines

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: 0x88 engines

Post by bob »

Aleks Peshkov wrote:
Robert Pope wrote:How about board variable types? Right now I am using int board[192]. Is that fastest because it is the register size, or is it better to use char board[192] because it is more compact?
Using int variables compiler generate shorter code, because it can combine calculation instruction with load from memory assembler instruction in one command.

Speed difference may be or may be not noticeable.
I don't follow that. Don't see how it is more compact to access an array of integers vs an array of shorts or bytes, in terms of instructions...
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: 0x88 engines

Post by bob »

Robert Pope wrote:How about board variable types? Right now I am using int board[192]. Is that fastest because it is the register size, or is it better to use char board[192] because it is more compact?
You should test it. It is probably faster to access the bytes since they will take 3 cache blocks. If you use the values as subscripts, however, then there is a bit of overhead in that you need to convert a byte into something that can be added in address computations. My board array is chars because it benchmarks faster.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: 0x88 engines

Post by bob »

Bill Rogers wrote:Well Robert the gentlemen who are helping you are both full of expert information. I just did not know how far along you were in the developement of your program.
Bill
There is also an electronic paper on my web page that talks about 0x88, among others, that you could also look at as it might explain things a bit differently than Bruce did, for an alternative viewpoint.

It is something like "board representations" or something similar.
MattieShoes
Posts: 718
Joined: Fri Mar 20, 2009 8:59 pm

Re: 0x88 engines

Post by MattieShoes »

Hmm, a more specific implementation question will probably get more specific answers :-)

Is 100k nps for perft or for actual searching, with eval and whatnot? Because a simplistic 0x88 scheme without piecelists or iterative check detection, etc. should get far more than 100k nps in perft...

CPW has source there so you could look at one possible implementation for everything basic there.
Aleks Peshkov
Posts: 892
Joined: Sun Nov 19, 2006 9:16 pm
Location: Russia

Re: 0x88 engines

Post by Aleks Peshkov »

bob wrote:
Aleks Peshkov wrote:
Robert Pope wrote:How about board variable types? Right now I am using int board[192]. Is that fastest because it is the register size, or is it better to use char board[192] because it is more compact?
Using int variables compiler generate shorter code, because it can combine calculation instruction with load from memory assembler instruction in one command.

Speed difference may be or may be not noticeable.
I don't follow that. Don't see how it is more compact to access an array of integers vs an array of shorts or bytes, in terms of instructions...
All arithmetic is performed in natural word size by C restriction (and it is faster anyway for any modern processor). Small data need to be loaded into named register before use. Natural int variable can use arithmetic instructions with indirect memory arguments reducing register pressure and number of plain load instructions.

0x88 engines use board array as main board representation, bitboard engines do not. Speed vs memory trade of is different.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: 0x88 engines

Post by bob »

Aleks Peshkov wrote:
bob wrote:
Aleks Peshkov wrote:
Robert Pope wrote:How about board variable types? Right now I am using int board[192]. Is that fastest because it is the register size, or is it better to use char board[192] because it is more compact?
Using int variables compiler generate shorter code, because it can combine calculation instruction with load from memory assembler instruction in one command.

Speed difference may be or may be not noticeable.
I don't follow that. Don't see how it is more compact to access an array of integers vs an array of shorts or bytes, in terms of instructions...
All arithmetic is performed in natural word size by C restriction (and it is faster anyway for any modern processor). Small data need to be loaded into named register before use. Natural int variable can use arithmetic instructions with indirect memory arguments reducing register pressure and number of plain load instructions.

0x88 engines use board array as main board representation, bitboard engines do not. Speed vs memory trade of is different.
Not sure I follow. Last time I looked, which was just now, one can do this:

inc xyz

where xyz is a byte, a word, a doubleword, or a quadword. Ditto for indirect references using something like

inc [ebx]

In microsoft-land you have to tell the compiler what kind of value it will find at [ebx] whereas in linux (gas - ATT format) you just use incb [ebx]. So I _still_ do not follow what you are saying. My board array in Crafty is an array of bytes. Not because I like bytes but because it runs faster. I have other byte arrays that are used for the same reason.
Edmund
Posts: 670
Joined: Mon Dec 03, 2007 3:01 pm
Location: Barcelona, Spain

Re: 0x88 engines

Post by Edmund »

According to Pete Isensee's paper (http://www.tantalon.com/pete/cppopt/app ... ativeCosts) about optimization tricks, doing various operations with int variables is a lot faster than with bytes.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: 0x88 engines

Post by bob »

Codeman wrote:According to Pete Isensee's paper (http://www.tantalon.com/pete/cppopt/app ... ativeCosts) about optimization tricks, doing various operations with int variables is a lot faster than with bytes.
I mentioned that. But, cache is also an issue. An array of bytes is 1/4 the size of an array of ints. And 1/8 the size of an array of longs on a 64 bit processor. Simplest test is to try both and measure the speed. I did and found the array of bytes was better. I'll usually re-check on each new processor that comes out although I did not do this on our evaluation Nehalem box as I was busy trying to figure out a BIOS issue that was messy.

Also note that "lot" is an ambiguous term. This is not a "huge" deal at all. Whereas cache can be a big one.
Gerd Isenberg
Posts: 2250
Joined: Wed Mar 08, 2006 8:47 pm
Location: Hattingen, Germany

Re: 0x88 engines

Post by Gerd Isenberg »

Codeman wrote:According to Pete Isensee's paper (http://www.tantalon.com/pete/cppopt/app ... ativeCosts) about optimization tricks, doing various operations with int variables is a lot faster than with bytes.
Homogeneous char arrays in conjunction with zero- or sign-extension (movzx, movsx) are fine with respect to the smaller size and possible L1 savings. A 64-byte versus 256-byte board array is one 64-byte cacheline versus four (if cachline aligned). Since the board array is heavily used, all the four cachelines of an int-array are most often fetched from L1, while three others need to been thrown out, which may otherwise reside inside the cache with the byte-array.

So the marginally slower zero- or sign-extension to load chars to a native 32-bit register likely pays off in less cache misses elsewhere. If and by how much depends on your memory and cache footprint and may vary from program to program.

For the stack, that is locals and parameter (most are passed via registers on x64 anyway) and very often used scalar globals, better stay with int and native register width, also due to alignment, load store forewarding issues, and partial register stalls.
frankp
Posts: 228
Joined: Sun Mar 12, 2006 3:11 pm

Re: 0x88 engines

Post by frankp »

Codeman wrote:According to Pete Isensee's paper (http://www.tantalon.com/pete/cppopt/app ... ativeCosts) about optimization tricks, doing various operations with int variables is a lot faster than with bytes.
Does this apply to table look-ups. So it is faster to have int array[] than int8 array[]?