Some x64 assembler for the curious

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

Michael Sherwin
Posts: 3196
Joined: Fri May 26, 2006 3:00 am
Location: WY, USA
Full name: Michael Sherwin

Some x64 assembler for the curious

Post by Michael Sherwin »

Code: Select all

; ATTACKED BY BLACK - rax contains the square in question
Atkbyblk proc
  mov r8d,  wp[rax*4]             ; wp indexed by square gives first entry in huge move table
abb1:
  mov r9d,  tosq[r8*4]            ; from tosq[] we get the first destination square
  mov r10d, [rcx].t.board[r9*4]   ; the index of piece on board - empty squares are 0
  mov r10d, [rcx].t.typ[r10*4]    ; with the index we can get the piece type
  cmp r10,  11                    ; compare piece type to black pawn (I need to do some equates so I can use BP)
  je  abbt                        ; attacked by black true
  mov r8d,  nxsq[r8*4]            ; if next square r8 is loaded with the next cell
  cmp r8,   0 ;no next square
  je  abbb                        ; attacked by black bishop?
  jmp abb1                        ; loop back to get next destination square
abbb:
  mov r8d,  b[rax*4]
abb2:
  mov r9d,  tosq[r8*4]
  mov r10d, [rcx].t.board[r9*4]
  mov r10d, [rcx].t.typ[r10*4]
  cmp r10,  13                    ; black bishop
  je  abbt
  cmp r10,  15                    ; black queen
  je  abbt
  cmp r10,  16                    ; black king
  je  abbt
  mov r8d,  nxsq[r8*4]
  cmp r8,   0
  je  abbr
  jmp abb2 
abbr:
  mov r8d,  r[rax*4]
abb3:
  mov r9d,  tosq[r8*4]
  mov r10d, [rcx].t.board[r9*4]
  mov r10d, [rcx].t.typ[r10*4]
  cmp r10,  14                    ; black rook
  je  abbt
  cmp r10,  15
  je  abbt
  cmp r10,  16
  je  abbt
  mov r8d,  nxsq[r8*4]
  cmp r8,   0
  je  abbn
  jmp abb3 
abbn:
  mov r8d,  n[rax*4]
abb4:
  mov r9d,  tosq[r8*4]
  mov r10d, [rcx].t.board[r9*4]
  mov r10d, [rcx].t.typ[r10*4]
  cmp r10,  12                    ; black knight
  je  abbt
  mov r8d,  nxsq[r8*4]
  cmp r8,   0
  je  abbf
  jmp abb4
abbf: ; attacked by black false
  xor rax,  rax
  ret
abbt: ; attacked by black true
  mov rax,  1
  ret
Atkbyblk endp
My very first computer was a RadioShack 2k handheld with a very simple basic. It was fun but not that useful and very very slow. Still it was good because I began to learn how to program. My next computer was an Atari 800 48k. Wow was it fast! :lol: That is when I started to dream about writing a chess program. Though writing it in Basic would not do. So I learned 6502 assembler. I programmed a chess board, pieces some function icons using redefined character graphics. It worked great for moving the pieces around. I even wrote some move generation code. I had it almost playing chess but at that time I didn't even know about alpha-beta. By that time though the 8-bit computers were quite dead. And I got a really good deal on an 8086 with 512 MB of ram. And I thought I'd be able to just pick up where I left off. However, due to my learning disability that I have posted about before 8086 assembler with its segmented memory access was beyond me. I just could not keep it in my head long enough to do anything. Then the 80386 with its flat memory model came out and I was back in business! I wrote a very fast assembler version of the GNUChess 4 move generator utilizing jump tables. Then I got distracted with RomiChess and bitboards. This is the first time I have successfully come back to assembler. 8-)

The method I decided on is similar to GNUChess 4 except the move tables are perfectly compressed while GNUChess used 64 bytes for every single piece type for every single square. The code above demonstrates how wonderful x64 is with all those volatile registers that need not be preserved and parameters are not passed on the stack unless there are not enough registers. For those that might be interested but never took the leap into assembler I just wanted to demonstrate how easy it can be. :D
If you are on a sidewalk and the covid goes beep beep
Just step aside or you might have a bit of heat
Covid covid runs through the town all day
Can the people ever change their ways
Sherwin the covid's after you
Sherwin if it catches you you're through
Michael Sherwin
Posts: 3196
Joined: Fri May 26, 2006 3:00 am
Location: WY, USA
Full name: Michael Sherwin

Re: Some x64 assembler for the curious

Post by Michael Sherwin »

There was something I forgot to do.

abr3:
mov r9d, tosq[r8*4]
mov r10d, [rcx].t.board[r9*4]
cmp r10, 0 ; I forgot to check for an empty square and jump to next square
je abbr1
mov r10d, [rcx].t.typ[r10*4]
cmp r10, 14
je abbt
cmp r10, 15
je abbt
cmp r10, 16
je abbt
mov r8d, nxdr[r8*4]
cmp r8, 0
je abbr
jmp abb3
abbr1:
mov r8d, nxsq[r8*4]
cmp r8, 0
je abbn
jmp abb3

If the destination square is empty the next destination square is gotten by the next square entry of the move table. If the destination square has a piece then the next destination square is gotten by the next direction entry of the move table. I forgot to mention that this code is designed for multithreading. Rcx is the thread pointer that was passed to the Move generator. The thread pointer always has to be passed to every function called from C/C++ so it is always available to be used by the Attack functions.

Also it has dawned on me that maybe for near jumps I might should be using branch instructions instead of jumps for efficiency. I'll have to look into that.
If you are on a sidewalk and the covid goes beep beep
Just step aside or you might have a bit of heat
Covid covid runs through the town all day
Can the people ever change their ways
Sherwin the covid's after you
Sherwin if it catches you you're through
mar
Posts: 2554
Joined: Fri Nov 26, 2010 2:00 pm
Location: Czech Republic
Full name: Martin Sedlak

Re: Some x64 assembler for the curious

Post by mar »

Yes, first n args are passed in registers, but I like having twice the number of registers more.
I haven't done any serious assembly in years though (except for a stub that passes control to generated machine code for my scripting language)
It's worth noting that addresses use 64-bit registers by default and most 32-bit reg targets clear upper 32 bits, which is handy.

Hmm, why not use test reg,reg + jz/je instead of cmp reg,0? The former encodes one byte shorter :)
Btw I started with Atari 800 (but it was an XE, with tape recorder :)
Also it has dawned on me that maybe for near jumps I might should be using branch instructions instead of jumps for efficiency. I'll have to look into that.
I think the assembler will optimize near jumps to short jumps if close enough (unless you meant to invert the condition to save extra jump?)
Martin Sedlak
Michael Sherwin
Posts: 3196
Joined: Fri May 26, 2006 3:00 am
Location: WY, USA
Full name: Michael Sherwin

Re: Some x64 assembler for the curious

Post by Michael Sherwin »

mar wrote: Sat Mar 23, 2019 1:39 am Yes, first n args are passed in registers, but I like having twice the number of registers more.
I haven't done any serious assembly in years though (except for a stub that passes control to generated machine code for my scripting language)
It's worth noting that addresses use 64-bit registers by default and most 32-bit reg targets clear upper 32 bits, which is handy.

Hmm, why not use test reg,reg + jz/je instead of cmp reg,0? The former encodes one byte shorter :)
Btw I started with Atari 800 (but it was an XE, with tape recorder :)
Also it has dawned on me that maybe for near jumps I might should be using branch instructions instead of jumps for efficiency. I'll have to look into that.
I think the assembler will optimize near jumps to short jumps if close enough (unless you meant to invert the condition to save extra jump?)
Ok I'll look into test. I think I used to know about it. Though like most things it has left my memory. One of the reasons I like assembler so much is because I can do everything I need to do with very few different instructions that I can keep in my memory. I also had a tape drive for my Atari 800. But, I also bought a wonderful 90k floppy drive. It took me about a month to learn how to make a boot sector/record whatever it's called so my chess program would boot from the floppy. Really high tech!
If you are on a sidewalk and the covid goes beep beep
Just step aside or you might have a bit of heat
Covid covid runs through the town all day
Can the people ever change their ways
Sherwin the covid's after you
Sherwin if it catches you you're through
Joost Buijs
Posts: 1563
Joined: Thu Jul 16, 2009 10:47 am
Location: Almere, The Netherlands

Re: Some x64 assembler for the curious

Post by Joost Buijs »

The good old days!

The first chess program I wrote during 1977/1978 was in assembler too. I remember the move generator being quite fast 15 kns, but the program overall did 700 ns on the Heatkit H8 (2MHz. Intel 8080 with 16k static memory ). In the beginning I used 2 cassette drives, one to load the editor and the assembler and a second one for the assembled output. Since I was working for Zenith/Heatkit at that time I pretty soon got my hands on a box with two 80k floppy disks which made life a lot easier.

In 1981 I switched to "C" but still used assembler for some subs that really had to be fast. Nowadays compilers are so efficient that they are hard to beat with handwritten assembler. Writing in assembler becomes more and more a curiosity of the past, even for very low level stuff like kernel drivers and such.

Somehow I lost all sources of the chess programs I wrote at that time, but I still have some old ASM routines I wrote for my Reversi program. For instance the move generator and do move looked like this:

Code: Select all

; move generator/execute reversi endgame 29/9/89

gendir      macro       dir
            local       l1,l2,l3,exit
            cmp         _board[si+dir],ch       ; if (board[squre+dir]==opp)
            jnz         exit
            mov         ax,dir                  ; ax=dir
            lea         bx,[si+dir]             ; bx=square+dir
l1:         add         bx,ax                   ; add direction
            cmp         _board[bx],ch           ; if (board[test] == opp)
            jz          l1                      ; then try next square
            cmp         _board[bx],cl           ; if (board[test] == own)
            jnz         exit
l2:         sub         bx,ax                   ; swap found, scan backward
            mov         _board[bx],cl           ; swap opp with own
            mov         [di],bl                 ; save swapsquare
            inc         di
l3:         sub         bx,ax
            cmp         bx,si                   ; last done yet ?
            jz          exit
            mov         _board[bx],cl
            mov         [di],bl
            inc         di
            jmp short   l3
exit:
            endm

gendirm1    macro
            local       l1,l2,l3,exit
            cmp         _board[si-1],ch         ; if (board[squre+dir]==opp)
            jnz         exit
	    lea         bx,[si-1]               ; bx=square+dir
l1:         dec		bx                      ; add direction
            cmp         _board[bx],ch           ; if (board[test] == opp)
            jz          l1                      ; then try next square
            cmp         _board[bx],cl           ; if (board[test] == own)
            jnz         exit
l2:         inc         bx                      ; swap found, scan backward
            mov         _board[bx],cl           ; swap opp with own
            mov         [di],bl                 ; save swapsquare
            inc         di
l3:         inc         bx
            cmp         bx,si                   ; last done yet ?
            jz          exit
            mov         _board[bx],cl
            mov         [di],bl
            inc         di
            jmp short   l3
exit:
            endm

gendirp1    macro
            local       l1,l2,l3,exit
            cmp         _board[si+1],ch         ; if (board[squre+dir]==opp)
            jnz         exit
	    lea         bx,[si+1]               ; bx=square+dir
l1:         inc         bx                      ; add direction
            cmp         _board[bx],ch           ; if (board[test] == opp)
            jz          l1                      ; then try next square
            cmp         _board[bx],cl           ; if (board[test] == own)
            jnz         exit
l2:         dec         bx                      ; swap found, scan backward
            mov         _board[bx],cl           ; swap opp with own
            mov         [di],bl                 ; save swapsquare
            inc         di
l3:         dec         bx
            cmp         bx,si                   ; last done yet ?
            jz          exit
            mov         _board[bx],cl
            mov         [di],bl
            inc         di
            jmp short   l3
exit:
            endm
        .model          small

        .data
        extrn           _board:byte

        .code

        public  _genexe
_genexe proc    near
        push    si                  ; save register vars
        push    di
        .386p
        mov     cl,[esp+6]          ; player
        mov     ch,[esp+8]          ; enemy
	mov     si,[esp+10]         ; move square
        mov     di,[esp+12]         ; pointer to swaplist
        .286p
        gendir  -11
        gendir  -10
        gendir   -9
	gendirm1
	gendirp1
        gendir   +9
        gendir  +10
        gendir  +11
	mov     ax,di               ; calculate
	.386p
	sub     ax,[esp+12]         ; swapcount
	.286p
        jz      exit                ; if no move then exit
        mov     _board[si],cl       ; else make move
exit:   pop     di
        pop     si
        ret                         ; return swap count
_genexe endp
        end
This is the evaluator:

Code: Select all

;       Evaluator reversi ver 7.50M

index   macro   s1,s2,s3,s4,s5,s6,s7,s8
        mov     dl,_board[s1]
        mov     bx,dx
        add     bx,dx
	add     bx,dx
        mov     dl,_board[s2]
	add     bx,dx
        mov     cx,bx
        add     bx,cx
	add     bx,cx
        mov     dl,_board[s3]
	add     bx,dx
	mov     cx,bx
        add     bx,cx
	add     bx,cx
        mov     dl,_board[s4]
	add     bx,dx
	mov     cx,bx
        add     bx,cx
	add     bx,cx
        mov     dl,_board[s5]
	add     bx,dx
	mov     cx,bx
        add     bx,cx
	add     bx,cx
        mov     dl,_board[s6]
	add     bx,dx
	mov     cx,bx
        add     bx,cx
	add     bx,cx
        mov     dl,_board[s7]
	add     bx,dx
	mov     cx,bx
        add     bx,cx
	add     bx,cx
        mov     dl,_board[s8]
	add     bx,dx
        add     bx,bx           ; scale BX
	endm

	.model  small

	.data
	extrn   _board:byte
        extrn   _edgep:dword

	.code
	.386P

	public  _eval
_eval   proc    near
        push    si
        movzx   ebx,word ptr[esp+4]     ; get player
        les     si,_edgep[ebx*4]        ; get pointer to structures
        xor     dx,dx                   ; clear DX
	index   11,12,13,14,15,16,17,18
	mov     ax,es:[si+bx]
	index   81,82,83,84,85,86,87,88
	add     ax,es:[si+bx]
	index   11,21,31,41,51,61,71,81
	add     ax,es:[si+bx]
	index   18,28,38,48,58,68,78,88
	add     ax,es:[si+bx]
        pop     si
	ret
_eval   endp
	end
matthewlai
Posts: 793
Joined: Sun Aug 03, 2014 4:48 am
Location: London, UK

Re: Some x64 assembler for the curious

Post by matthewlai »

Michael Sherwin wrote: Fri Mar 22, 2019 11:38 pm

Code: Select all

; ATTACKED BY BLACK - rax contains the square in question
Atkbyblk proc
  mov r8d,  wp[rax*4]             ; wp indexed by square gives first entry in huge move table
abb1:
  mov r9d,  tosq[r8*4]            ; from tosq[] we get the first destination square
  mov r10d, [rcx].t.board[r9*4]   ; the index of piece on board - empty squares are 0
  mov r10d, [rcx].t.typ[r10*4]    ; with the index we can get the piece type
  cmp r10,  11                    ; compare piece type to black pawn (I need to do some equates so I can use BP)
  je  abbt                        ; attacked by black true
  mov r8d,  nxsq[r8*4]            ; if next square r8 is loaded with the next cell
  cmp r8,   0 ;no next square
  je  abbb                        ; attacked by black bishop?
  jmp abb1                        ; loop back to get next destination square
abbb:
  mov r8d,  b[rax*4]
abb2:
  mov r9d,  tosq[r8*4]
  mov r10d, [rcx].t.board[r9*4]
  mov r10d, [rcx].t.typ[r10*4]
  cmp r10,  13                    ; black bishop
  je  abbt
  cmp r10,  15                    ; black queen
  je  abbt
  cmp r10,  16                    ; black king
  je  abbt
  mov r8d,  nxsq[r8*4]
  cmp r8,   0
  je  abbr
  jmp abb2 
abbr:
  mov r8d,  r[rax*4]
abb3:
  mov r9d,  tosq[r8*4]
  mov r10d, [rcx].t.board[r9*4]
  mov r10d, [rcx].t.typ[r10*4]
  cmp r10,  14                    ; black rook
  je  abbt
  cmp r10,  15
  je  abbt
  cmp r10,  16
  je  abbt
  mov r8d,  nxsq[r8*4]
  cmp r8,   0
  je  abbn
  jmp abb3 
abbn:
  mov r8d,  n[rax*4]
abb4:
  mov r9d,  tosq[r8*4]
  mov r10d, [rcx].t.board[r9*4]
  mov r10d, [rcx].t.typ[r10*4]
  cmp r10,  12                    ; black knight
  je  abbt
  mov r8d,  nxsq[r8*4]
  cmp r8,   0
  je  abbf
  jmp abb4
abbf: ; attacked by black false
  xor rax,  rax
  ret
abbt: ; attacked by black true
  mov rax,  1
  ret
Atkbyblk endp
My very first computer was a RadioShack 2k handheld with a very simple basic. It was fun but not that useful and very very slow. Still it was good because I began to learn how to program. My next computer was an Atari 800 48k. Wow was it fast! :lol: That is when I started to dream about writing a chess program. Though writing it in Basic would not do. So I learned 6502 assembler. I programmed a chess board, pieces some function icons using redefined character graphics. It worked great for moving the pieces around. I even wrote some move generation code. I had it almost playing chess but at that time I didn't even know about alpha-beta. By that time though the 8-bit computers were quite dead. And I got a really good deal on an 8086 with 512 MB of ram. And I thought I'd be able to just pick up where I left off. However, due to my learning disability that I have posted about before 8086 assembler with its segmented memory access was beyond me. I just could not keep it in my head long enough to do anything. Then the 80386 with its flat memory model came out and I was back in business! I wrote a very fast assembler version of the GNUChess 4 move generator utilizing jump tables. Then I got distracted with RomiChess and bitboards. This is the first time I have successfully come back to assembler. 8-)

The method I decided on is similar to GNUChess 4 except the move tables are perfectly compressed while GNUChess used 64 bytes for every single piece type for every single square. The code above demonstrates how wonderful x64 is with all those volatile registers that need not be preserved and parameters are not passed on the stack unless there are not enough registers. For those that might be interested but never took the leap into assembler I just wanted to demonstrate how easy it can be. :D
Would be interesting to write the same function in C and see how a compiler would compile it, and how much faster it would be (it's extremely difficult to beat a compiler on performance these days, so I'm assuming the hand-written assembly will be slower), and where the speed differences come from.
Disclosure: I work for DeepMind on the AlphaZero project, but everything I say here is personal opinion and does not reflect the views of DeepMind / Alphabet.
Michael Sherwin
Posts: 3196
Joined: Fri May 26, 2006 3:00 am
Location: WY, USA
Full name: Michael Sherwin

Re: Some x64 assembler for the curious

Post by Michael Sherwin »

I have already done the proof that handwritten assembler is faster. I have programmed two perft examples. One in pure C and one with handwritten assembler for the move generator, make move and take back.

Both versions make and unmake all moves generated and do no cache counting.
GNUChess 4 style in pure C, bench 7.
42,682,123 nodes per second. (new optimised compile with MSVS 2017)

GNUChess 4 style with move generator, make and unmake in assembler, bench 7.
74,754,475 nodes per second. (old compile with Turbo C and Turbo Assembler)

I guess now the argument could be that my C code must be really poorly done. Oh well there is always something someone can throw back and the burden of proof will always be on me. However, even if my C programming skills are that bad then I'm better off writing in assembler anyway! :lol:
If you are on a sidewalk and the covid goes beep beep
Just step aside or you might have a bit of heat
Covid covid runs through the town all day
Can the people ever change their ways
Sherwin the covid's after you
Sherwin if it catches you you're through
odomobo
Posts: 96
Joined: Fri Jul 06, 2018 1:09 am
Location: Chicago, IL
Full name: Josh Odom

Re: Some x64 assembler for the curious

Post by odomobo »

The nature of your learning disability eludes me a little bit. Your code seems pretty sophisticated, which is not what I would have expected with a learning disability. It's a personal issue, so I understand if you don't want to talk about it, but I am curious to understand it further.
Michael Sherwin
Posts: 3196
Joined: Fri May 26, 2006 3:00 am
Location: WY, USA
Full name: Michael Sherwin

Re: Some x64 assembler for the curious

Post by Michael Sherwin »

odomobo wrote: Wed Mar 27, 2019 9:34 pm The nature of your learning disability eludes me a little bit. Your code seems pretty sophisticated, which is not what I would have expected with a learning disability. It's a personal issue, so I understand if you don't want to talk about it, but I am curious to understand it further.
Just because someone has a learning disability does not mean that they are not intelligent. My disability is because I was born 6 weeks premature and was a "blue baby". They did not even expect me to live. So I must have suffered some minor brain damage at birth. Anyway because of that whatever that was exactly my memory is very selective. Some things I remember very well. Some things I cannot remember at all. Things that are infused with a lot of meaning I remember very well because I remember, for lack of a better word, motifs. I remember the "gist" of things. For example I can remember lyrics to songs that make sense. That tell a clear and meaningful story. I cannot even remember two words to a song that is just made up of meaningless but good sounding words. And that is no matter how many times I try. I once tried to learn Dutch. I carried a sentence around with me on a piece of paper for at least a month. I was hoping that if I read it enough times that I'd remember the words and the meaning. I lost the paper and I had no recollection of any of the words. The problem was that looking at the words in Dutch had no meaning to me. I am a good chess player. My record against various SF versions at long time limits is 4 losses and 4 draws. At knight odds my record is two draws and a win. But I can't remember opening lines no matter how many times I go over them. So with programming it is like learning a foreign language. As long as I have the "piece of paper" in front of me (the C primer or some sample code) I can make it happen but it takes me 10 times longer or more to accomplish anything. I have to go over my code numerous times just to get into my awareness what it is I'm trying to do long enough to get the next thing done. And even with my learning disability I was able to author RomiChess but it took me over 20 years of on again off again effort. Hopefully I have retained enough understanding so that this time it won't take that long! :D
If you are on a sidewalk and the covid goes beep beep
Just step aside or you might have a bit of heat
Covid covid runs through the town all day
Can the people ever change their ways
Sherwin the covid's after you
Sherwin if it catches you you're through
Michael Sherwin
Posts: 3196
Joined: Fri May 26, 2006 3:00 am
Location: WY, USA
Full name: Michael Sherwin

Re: Some x64 assembler for the curious

Post by Michael Sherwin »

Something just dawned on me. I never quite thought of it like this before. Isn't it ironic that someone like me created one of the most successful learning chess engines so far! RomiChess has had reinforcement learning since 2006, twelve years before Alpha Zero. :?
If you are on a sidewalk and the covid goes beep beep
Just step aside or you might have a bit of heat
Covid covid runs through the town all day
Can the people ever change their ways
Sherwin the covid's after you
Sherwin if it catches you you're through