Advice on optimizing my move generation

algerbrex · Post by **algerbrex** » Fri Jun 11, 2021 5:06 am

Following the advice of more experienced engine creators on here and in other places, I've made sure in my chess engine that my move generator was working correctly before I tried to do any optimizing. And I am happy to say that after running my move generator through a perft suite of 30-40 different positions at depth=5, it seems to be working correctly.

My issue now is the speed. I'd like to test up to at least depth=6, but that just isn't feasible right now given how slow my move generator is. For example, my engine takes almost two minutes just to go up to depth=4 in the kiwipete position:

Code: Select all

Total nodes: 4085603

real    1m53.013s
user    1m18.742s
sys     0m15.138s

What's even more annoying is that from what I've read, you don't need to implement things like transposition tables, threads, or bulk counting to get a perft time well below 20 seconds up to depth=6 or even depth=7. Any advice on what I can do to get my engine to this point?

Here's my code for perft as of right now:

Code: Select all

func DividePerft(board *Board, depth, divideAt int) uint64 {
	pseduoMoves := ComputePseduolegalMoves(board)
	var nodes uint64
	if depth == 0 {
		return 1
	}
	for _, move := range pseduoMoves {
		board.DoMove(&move)
		if KingIsSafe(board) {
			moveNodes := DividePerft(board, depth-1, divideAt)
			nodes += moveNodes
			if depth == divideAt {
				fmt.Printf("%v: %v\n", move, moveNodes)
			}
		}
		board.UndoMove(&move)
	}
	return nodes
}

My code currently uses a 10x12 board as its internal representation, and as you can see from above, pseduo legal moves are pruned from perft by checking if they leave the king in check.

JVMerlino · Post by **JVMerlino** » Fri Jun 11, 2021 6:31 am

First of all, congratulations on getting to this point. Having a perfect move generator is a definite achievement.

For reference, my totally mediocre engine, Myrddin, which has a fully legal move generator that also flags moves that give check, without bulk counting, needs 1/4 seconds to reach perft 4 in the kiwipete position - and this is considered slow.

The only thing that seems strange is that you seem to compute the legal moves for a position BEFORE checking if depth=0, which I'm pretty sure results in a lot of wasted time. What it should be in pseudocode is this:

if not using bulk counting and depth = 0, return 1
generate moves
if using bulk counting and depth = 1, return number of legal moves
loop through moves and recursively call perft

But your move generator is pseudolegal, so you do have to check for legality somewhere, and it seems like you're doing that properly. So I think if you just move the depth=0 check to the top, you might see some significant speedup.

jm

algerbrex · Post by **algerbrex** » Fri Jun 11, 2021 7:11 am

JVMerlino wrote: ↑Fri Jun 11, 2021 6:31 am First of all, congratulations on getting to this point. Having a perfect move generator is a definite achievement.

For reference, my totally mediocre engine, Myrddin, which has a fully legal move generator that also flags moves that give check, without bulk counting, needs 1/4 seconds to reach perft 4 in the kiwipete position - and this is considered slow.

The only thing that seems strange is that you seem to compute the legal moves for a position BEFORE checking if depth=0, which I'm pretty sure results in a lot of wasted time. What it should be in pseudocode is this:

if not using bulk counting and depth = 0, return 1
generate moves
if using bulk counting and depth = 1, return number of legal moves
loop through moves and recursively call perft

But your move generator is pseudolegal, so you do have to check for legality somewhere, and it seems like you're doing that properly. So I think if you just move the depth=0 check to the top, you might see some significant speedup.

jm

Hi and thank you!

And wow! You were absolutely right and I can't believe I didn't notice that! Just by fixing what you said, I shaved off ~100 seconds! Now instead of taking two minutes to calculate kiwipete at depth=4, my generator takes only ~4 seconds! Thanks so much for noticing that!

Of course, it's still incredibly slow compared to good chess engines, but I'll gladly take 4 seconds over two minutes! Now I just need to work on getting that number down for perft(6) and perft(7).

niel5946 · Post by **niel5946** » Fri Jun 11, 2021 9:48 am

I don't really think a - relatively - slow move generator is a big problem for chess engines under 2500-3000 elo. My engine, Loki, has the same perft time at depth 4 (kiwipete) as JVMerlino's, using a pseudo-legal move generator and no bulk counting. I have already gotten it to play at 2400-2500 on CCRL without having changed the move generator iirc.

Having said that, a lightning fast move generator is nice

Although your board representation has a tendency to be on the slow side (note tendency: it can be made really fast) as compared to something like bitboard representations, another really big factor in perft speed is the make move and undo move functions. How are these written? And what about the move generator itself? Additionally, what language are you using? I'm not familiar with the syntax in your code example.
If you are satisfied with their current implementations and are only looking for low-level speedups, I recommend reading through this CPW page on optimization

mvanthoor · Post by **mvanthoor** » Fri Jun 11, 2021 10:37 am

algerbrex wrote: ↑Fri Jun 11, 2021 7:11 am Hi and thank you!

And wow! You were absolutely right and I can't believe I didn't notice that! Just by fixing what you said, I shaved off ~100 seconds! Now instead of taking two minutes to calculate kiwipete at depth=4, my generator takes only ~4 seconds! Thanks so much for noticing that!

Of course, it's still incredibly slow compared to good chess engines, but I'll gladly take 4 seconds over two minutes! Now I just need to work on getting that number down for perft(6) and perft(7).

Good going

4 seconds is still not fast though... my engine runs KiwiPete perft 4 in about 100ms (0.1 second), without bulk-counting or TT, which is, compared to the real perft monsters, _still_ slow:

Code: Select all

Perft 1: 48 (0 ms, inf leaves/sec)
Perft 2: 2039 (0 ms, inf leaves/sec)
Perft 3: 97862 (2 ms, 48931000 leaves/sec)
Perft 4: 4085603 (110 ms, 37141845 leaves/sec)

Is your code available somewhere?

Perft checks if MakeMove, UnMakeMove, the board representation, and the piece movement functions are bug-free. (It would be good to also have a function that in debug-mode also checks that any incremental updates you're making, such as the zobrist hash, etc.) You have achieved this point, so now you can start to optimize. It would be best if you can avoid _all_ memory allocations while the engine is calculating, because memory allocations are slow. Don't create and destroy objects or arrays. Create everything before you start the calculation and re-use it.

maksimKorzh · Post by **maksimKorzh** » Fri Jun 11, 2021 11:06 am

How you deal with attacks (king in check detection)?
On the fly generation or lookup?
In my experience attacks detection is the biggest time eater,
some other potential slowdowns might occur because of data structures being used during move generations
like putting moves onto stack or similar.

flok · Post by **flok** » Fri Jun 11, 2021 4:11 pm

[quote=algerbrex post_id=895640 time=1623380803 user_id=13114]

Code: Select all

Total nodes: 4085603

real    1m53.013s
user    1m18.742s
sys     0m15.138s

Any idea why real is so much longer (about 20 seconds) than the cpu-time for user and system combined?
Also: 15 seconds for system? Is your program causing swapping? Or is it emitting a lot of text to the terminal?

algerbrex · Post by **algerbrex** » Fri Jun 11, 2021 5:18 pm

niel5946 wrote: ↑Fri Jun 11, 2021 9:48 am I don't really think a - relatively - slow move generator is a big problem for chess engines under 2500-3000 elo. My engine, Loki, has the same perft time at depth 4 (kiwipete) as JVMerlino's, using a pseudo-legal move generator and no bulk counting. I have already gotten it to play at 2400-2500 on CCRL without having changed the move generator iirc.

Having said that, a lightning fast move generator is nice
Although your board representation has a tendency to be on the slow side (note tendency: it can be made really fast) as compared to something like bitboard representations, another really big factor in perft speed is the make move and undo move functions. How are these written? And what about the move generator itself? Additionally, what language are you using? I'm not familiar with the syntax in your code example.
If you are satisfied with their current implementations and are only looking for low-level speedups, I recommend reading through this CPW page on optimization

Well my engine is written in Golang actually, which I realize isn't the best language for writing a fast chess engine in, but I'm not looking for the fastest engine, I just want something that has reasonable move generation speed.

And the way DoMove and UndoMove are written is by having a chain of if statements dealing with the different move types that are generated. Here are some snippets of DoMove:

Code: Select all

if move.MoveType == CastleWKS {
		doCastling(board, 4, 6, 7, 5)
		board.CanCastleWKS = false
		board.CanCastleWQS = false
	} else if move.MoveType == CastleWQS {
		doCastling(board, 4, 2, 0, 3)
		board.CanCastleWQS = false
		board.CanCastleWKS = false
	} else if move.MoveType == CastleBKS {
		doCastling(board, 60, 62, 63, 61)
		board.CanCastleBKS = false
		board.CanCastleBQS = false
	} else if move.MoveType == CastleBQS {
		doCastling(board, 60, 58, 56, 59)
		board.CanCastleBQS = false
		board.CanCastleBKS = false
... 

pieceType := board.Pieces[move.From]
		pieceColor := board.Colors[move.From]
		captureType := board.Pieces[move.To]
		captureColor := board.Colors[move.To]
		board.Pieces[move.From] = NoPiece
		board.Colors[move.From] = NoColor
		board.Pieces[move.To] = pieceType
		board.Colors[move.To] = pieceColor

		undoMove := UndoMove{PieceColor: pieceColor, PieceType: pieceType, CaptureColor: captureColor, CaptureType: captureType}
		undoMove.CanCastleWKS = board.CanCastleWKS
		undoMove.CanCastleWQS = board.CanCastleWQS
		undoMove.CanCastleBKS = board.CanCastleBKS
		undoMove.CanCastleBQS = board.CanCastleBQS

		// Check if there is now a en passant target square
		if pieceType == Pawn && moveIsDoublePawnPush(move) {
			board.EPSquare = move.To + 8
			if pieceColor == White {
				board.EPSquare = move.To - 8
			}
		}

		// Update king positions
		if pieceType == King && pieceColor == White {
			board.WhiteKingPos = move.To
		} else if pieceType == King && pieceColor == Black {
			board.BlackKingPos = move.To
		}

		// Update the castling rights. Only do this AFTER we've already saved them.
		if pieceType == King && pieceColor == White {
			board.CanCastleWKS = false
			board.CanCastleWQS = false
		} else if pieceType == King && pieceColor == Black {
			board.CanCastleBKS = false
			board.CanCastleBQS = false
		} else if pieceType == Rook && pieceColor == White {
			if move.From == 7 {
				board.CanCastleWKS = false
			} else if move.From == 0 {
				board.CanCastleWQS = false
			}
		} else if pieceType == Rook && pieceColor == Black {
			if move.From == 63 {
				board.CanCastleBKS = false
			} else if move.From == 56 {
				board.CanCastleBQS = false
			}
		}

		// If a rook is taken, make sure to update the castling rights.
		if captureType == Rook && captureColor == White {
			if move.To == 7 {
				board.CanCastleWKS = false
			} else if move.To == 0 {
				board.CanCastleWQS = false
			}
		} else if captureType == Rook && captureColor == Black {
			if move.To == 63 {
				board.CanCastleBKS = false
			} else if move.To == 56 {
				board.CanCastleBQS = false
			}
		}

		board.UndoMoves = append(board.UndoMoves, undoMove)
	}

The full function I feel is much to do big to post here (which might be an indication of what the issue is lol), but if it's fine to do so, then I
l'll gladly post DoMove and UndoMove.

As for the move generator, I'm using the 10x12 board representation as you already know, and deltas to go through and calculate the pseudo legal moves. For example, here is my function to calculate pseudo legal slider moves:

Code: Select all

func computeMovesForSlider(board *Board, from int, enemyColor Color, deltas []int) (moves []Move) {
	for _, delta := range deltas {
		to := Mailbox120[Mailbox64[from]+delta]
		for to != -1 && board.Colors[to] == NoColor {
			moves = append(moves, Move{From: from, To: to, MoveType: Quiet})
			to = Mailbox120[Mailbox64[to]+delta]
		}
		if to != -1 && board.Colors[to] == enemyColor {
			moves = append(moves, Move{From: from, To: to, MoveType: Attack})
		}
	}
	return moves
}

Where I'm running into an issue is move generation for pawns. All the other pieces move generation was relatively simple, but because of pawns' weirdness (en passant, double pushing, capturing diagonally but move straight), my function for calculating pawn moves is rough ~100 lines of code, which again, I feel like is indicative of the problem, but I'm not sure how to shorten the logic. I've already gone back and tried to get rid of repeated logic and messy, inefficient code.

algerbrex · Post by **algerbrex** » Fri Jun 11, 2021 5:23 pm

mvanthoor wrote: ↑Fri Jun 11, 2021 10:37 am
algerbrex wrote: ↑Fri Jun 11, 2021 7:11 am Hi and thank you!

And wow! You were absolutely right and I can't believe I didn't notice that! Just by fixing what you said, I shaved off ~100 seconds! Now instead of taking two minutes to calculate kiwipete at depth=4, my generator takes only ~4 seconds! Thanks so much for noticing that!

Of course, it's still incredibly slow compared to good chess engines, but I'll gladly take 4 seconds over two minutes! Now I just need to work on getting that number down for perft(6) and perft(7).
Good going 4 seconds is still not fast though... my engine runs KiwiPete perft 4 in about 100ms (0.1 second), without bulk-counting or TT, which is, compared to the real perft monsters, _still_ slow:
Code: Select all
Perft 1: 48 (0 ms, inf leaves/sec)
Perft 2: 2039 (0 ms, inf leaves/sec)
Perft 3: 97862 (2 ms, 48931000 leaves/sec)
Perft 4: 4085603 (110 ms, 37141845 leaves/sec)
Is your code available somewhere?

Perft checks if MakeMove, UnMakeMove, the board representation, and the piece movement functions are bug-free. (It would be good to also have a function that in debug-mode also checks that any incremental updates you're making, such as the zobrist hash, etc.) You have achieved this point, so now you can start to optimize. It would be best if you can avoid _all_ memory allocations while the engine is calculating, because memory allocations are slow. Don't create and destroy objects or arrays. Create everything before you start the calculation and re-use it.

Haha yes, as opposed to four seconds I'm pretty happy about that

, but of course, I realize that that's still incredibly slow, and my engine still doesn't do much better at higher depths, still taking about 2 minutes to calculate perft(6) from the starting position now.

My code isn't posted anywhere online, but I'd be happy to do so so that the helpful people here can take a look!

Also, I'll take that advice about memory management and see if I'm doing that anywhere in my code. I already know that one mistake I made in the first version of my engine (which I had to scrap due to unsolveable bugs), that I slowed down my speed by not using pass-by-reference.

algerbrex · Post by **algerbrex** » Fri Jun 11, 2021 5:27 pm

maksimKorzh wrote: ↑Fri Jun 11, 2021 11:06 am How you deal with attacks (king in check detection)?
On the fly generation or lookup?
In my experience attacks detection is the biggest time eater,
some other potential slowdowns might occur because of data structures being used during move generations
like putting moves onto stack or similar.

Hi and yes, that's also where I suspect I'm being very inefficient. Right now, I'm detecting if the king is left in check by making a movie using a isSquareAttacked method which does the usual approach of sitting a super-piece on the king square, radiating out rays, and seeing if they hit any enemy pieces. Here is the code:

Code: Select all

func squareIsAttacked(board *Board, square int, whiteToMove bool) bool {
	enemyColor := White
	pawnAttackLeftDelta := SouthWestDelta
	pawnAtttackRightDelta := SouthEastDelta
	if whiteToMove {
		enemyColor = Black
		pawnAttackLeftDelta = NorthWestDelta
		pawnAtttackRightDelta = NorthEastDelta
	}

	// Send out rays from the square position. If any of the rays intersect
	// with an enemy rook, bishop, or queen, then the square is attacked.
	for _, delta := range QueenDeltas {
		to := Mailbox120[Mailbox64[square]+delta]
		for to != -1 && board.Pieces[to] == NoPiece {
			to = Mailbox120[Mailbox64[to]+delta]
		}
		if delta == NorthDelta || delta == SouthDelta || delta == EastDelta || delta == WestDelta {
			if to != -1 && board.Colors[to] == enemyColor && (board.Pieces[to] == Rook || board.Pieces[to] == Queen) {
				return true
			}
		} else if delta == NorthEastDelta || delta == SouthEastDelta || delta == NorthWestDelta || delta == SouthWestDelta {
			if to != -1 && board.Colors[to] == enemyColor && (board.Pieces[to] == Bishop || board.Pieces[to] == Queen) {
				return true
			}
		}
	}
	// Send out knight rays from the square position, if they intersect with an enemy knight,
	// the square is attacked.
	for _, delta := range KnightDeltas {
		to := Mailbox120[Mailbox64[square]+delta]
		if to != -1 && board.Colors[to] == enemyColor && board.Pieces[to] == Knight {
			return true
		}
	}
	// Lastly, check if the square is attacked by a pawn or a king.
	for _, delta := range KingDeltas {
		to := Mailbox120[Mailbox64[square]+delta]
		if to != -1 && board.Colors[to] == enemyColor && board.Pieces[to] == King {
			return true
		} else if delta == pawnAttackLeftDelta || delta == pawnAtttackRightDelta {
			if to != -1 && board.Colors[to] == enemyColor && board.Pieces[to] == Pawn {
				return true
			}
		}
	}
	// If we've made it to here without returing early, the square is not attacked and safe.
	return false
}

This is working excellent in terms of being bug free, but I do feel like there are faster ways of check detection. But I couldn't find any good resources for this related to mailbox representations of boards. Everything is related to bitboards, which I've already seen, since I attempted to write my first three engines using those.

Advice on optimizing my move generation

Advice on optimizing my move generation

Re: Advice on optimizing my move generation

Re: Advice on optimizing my move generation

Re: Advice on optimizing my move generation

Re: Advice on optimizing my move generation

Re: Advice on optimizing my move generation

Re: Advice on optimizing my move generation

Re: Advice on optimizing my move generation

Re: Advice on optimizing my move generation

Re: Advice on optimizing my move generation