Class wrapper and performance degradation ?

Discussion of chess software programming and technical issues.

Moderator: Ras

MahmoudUthman
Posts: 237
Joined: Sat Jan 17, 2015 11:54 pm

Class wrapper and performance degradation ?

Post by MahmoudUthman »

I used to contain the position representation inside a namespaces without using classes at all , after wrapping the "exact" code inside a class the performance of the degraded in an observable way for example the perft of the starting position overhead at depth 6 reaches 800ms "base=4100ms" ? is this normal or could I be doing something wrong ?
Sven
Posts: 4052
Joined: Thu May 15, 2008 9:57 pm
Location: Berlin, Germany
Full name: Sven Schüle

Re: Class wrapper and performance degradation ?

Post by Sven »

MahmoudUthman wrote:I used to contain the position representation inside a namespaces without using classes at all , after wrapping the "exact" code inside a class the performance of the degraded in an observable way for example the perft of the starting position overhead at depth 6 reaches 800ms "base=4100ms" ? is this normal or could I be doing something wrong ?
If I understand you correctly then perft(6) is now slower by about 20% (4900 ms instead of 4100 ms). That does not sound normal for me. The typical overhead is mostly the additional use of the "this" pointer but that should not cause a huge slowdown, I would expect few percent only.
kbhearn
Posts: 411
Joined: Thu Dec 30, 2010 4:48 am

Re: Class wrapper and performance degradation ?

Post by kbhearn »

few questions:

1) this is c++? if not, many other languages have heavier object overhead
2) optimisation switches? if your constructor isn't being inlined it would be extra function calls per object
3) objects created on stack as local variables, allocated once and then reused, or dynamically created and destroyed with new and delete? dynamically creating objects in your inner loop with new may cause a slowdown as it entails a memory allocation

beyond this, perhaps it would be helpful if you shared your class prototype so we can get an idea of how you're using it?
MahmoudUthman
Posts: 237
Joined: Sat Jan 17, 2015 11:54 pm

Re: Class wrapper and performance degradation ?

Post by MahmoudUthman »

1-yes , C++.
2-yes I tested with both intel and Microsoft compilers using the exact setting for both versions "Highest optimization setting available".
3-Stack.

Code: Select all

class Position
{
public:
	Position() = default;


	inline U64 GetCatsleZobrist() const;
	inline U64 GetEnpassantZobrist() const;

	const inline U64 AttackSearch(unsigned int PieceSQ, Piece PT) const;
	template<Color C> inline U64 PinnedPieces() const;
	template<Color C> inline U64 AttackedSquares() const;

	void MakeMove(const Move& __restrict M);
	void UnMakeMove(const Move& __restrict M);


	inline void FromFen(const std::string& Fen);
	

	template<Color C> inline Move* GenerateEvasions(Move*  List, U64 AttackedSqs) const;	
	template<Color C> inline Move* GenerateMoves (Move*  List, U64 InCheck, U64 AttackedSqs) const;
	

	U64 Perft(U64 depth);
	void PrintBoard();
private:
	Bitboard Backup_BitBoards[MaxDepth][16];
	Score BackUp_PSQTScore[MaxDepth];
	U64 BackUp_HalfMoveClock[MaxDepth];
	U64 BackUp_PositionKey[MaxDepth];
	U64 Backup_CR[MaxDepth][2];
	U32 Backup_LastMoveDPushSQ[MaxDepth];

	Bitboard* BitBoards;
	unsigned int DoublePushSQ[2] ;
	U64 CastleRights[2];
	Score PSQTScore;
	U64 Positionkey;
	U64 GameRecord[512];
	Piece Board[64+MaxDepth];	


	int ply ;
	U64 HalfMoveClock ;
	U64 GameRecordCounter ;
	
	Color STM;
	Color NTM;
	size_t USTM, UNTM;
};
every single function is left exactly as it was , the only modification is that the functions and variables have been moved from a namespace into a class
matthewlai
Posts: 793
Joined: Sun Aug 03, 2014 4:48 am
Location: London, UK

Re: Class wrapper and performance degradation ?

Post by matthewlai »

MahmoudUthman wrote:1-yes , C++.
2-yes I tested with both intel and Microsoft compilers using the exact setting for both versions "Highest optimization setting available".
3-Stack.

Code: Select all

class Position
{
public:
	Position() = default;


	inline U64 GetCatsleZobrist() const;
	inline U64 GetEnpassantZobrist() const;

	const inline U64 AttackSearch(unsigned int PieceSQ, Piece PT) const;
	template<Color C> inline U64 PinnedPieces() const;
	template<Color C> inline U64 AttackedSquares() const;

	void MakeMove(const Move& __restrict M);
	void UnMakeMove(const Move& __restrict M);


	inline void FromFen(const std::string& Fen);
	

	template<Color C> inline Move* GenerateEvasions(Move*  List, U64 AttackedSqs) const;	
	template<Color C> inline Move* GenerateMoves (Move*  List, U64 InCheck, U64 AttackedSqs) const;
	

	U64 Perft(U64 depth);
	void PrintBoard();
private:
	Bitboard Backup_BitBoards[MaxDepth][16];
	Score BackUp_PSQTScore[MaxDepth];
	U64 BackUp_HalfMoveClock[MaxDepth];
	U64 BackUp_PositionKey[MaxDepth];
	U64 Backup_CR[MaxDepth][2];
	U32 Backup_LastMoveDPushSQ[MaxDepth];

	Bitboard* BitBoards;
	unsigned int DoublePushSQ[2] ;
	U64 CastleRights[2];
	Score PSQTScore;
	U64 Positionkey;
	U64 GameRecord[512];
	Piece Board[64+MaxDepth];	


	int ply ;
	U64 HalfMoveClock ;
	U64 GameRecordCounter ;
	
	Color STM;
	Color NTM;
	size_t USTM, UNTM;
};
every single function is left exactly as it was , the only modification is that the functions and variables have been moved from a namespace into a class
Are you sure no copies are made? Try deleting the default copy ctor and assignment operator.

If you only have a global instance of this class I wouldn't expect there to be any overhead at all. The compiler should be able to figure out all the addresses at compile-time. If you have a single non-global instance, I would expect slightly lower performance due to pointer redirection, but it shouldn't be anywhere near 20%.
Disclosure: I work for DeepMind on the AlphaZero project, but everything I say here is personal opinion and does not reflect the views of DeepMind / Alphabet.
MahmoudUthman
Posts: 237
Joined: Sat Jan 17, 2015 11:54 pm

Re: Class wrapper and performance degradation ?

Post by MahmoudUthman »

Are you sure no copies are made? Try deleting the default copy ctor and assignment operator.
Yes, I already delete both , "ironically I wrongly deleted them while removing the comments from the code I posted above".
If you only have a global instance of this class I wouldn't expect there to be any overhead at all. The compiler should be able to figure out all the addresses at compile-time. If you have a single non-global instance, I would expect slightly lower performance due to pointer redirection, but it shouldn't be anywhere near 20%.
Yes , only a global instance .
matthewlai
Posts: 793
Joined: Sun Aug 03, 2014 4:48 am
Location: London, UK

Re: Class wrapper and performance degradation ?

Post by matthewlai »

MahmoudUthman wrote:
Are you sure no copies are made? Try deleting the default copy ctor and assignment operator.
Yes, I already delete both , "ironically I wrongly deleted them while removing the comments from the code I posted above".
If you only have a global instance of this class I wouldn't expect there to be any overhead at all. The compiler should be able to figure out all the addresses at compile-time. If you have a single non-global instance, I would expect slightly lower performance due to pointer redirection, but it shouldn't be anywhere near 20%.
Yes , only a global instance .
Profile it and see exactly what function has become slower?
Disclosure: I work for DeepMind on the AlphaZero project, but everything I say here is personal opinion and does not reflect the views of DeepMind / Alphabet.
MahmoudUthman
Posts: 237
Joined: Sat Jan 17, 2015 11:54 pm

Re: Class wrapper and performance degradation ?

Post by MahmoudUthman »

matthewlai wrote:
MahmoudUthman wrote:
Are you sure no copies are made? Try deleting the default copy ctor and assignment operator.
Yes, I already delete both , "ironically I wrongly deleted them while removing the comments from the code I posted above".
If you only have a global instance of this class I wouldn't expect there to be any overhead at all. The compiler should be able to figure out all the addresses at compile-time. If you have a single non-global instance, I would expect slightly lower performance due to pointer redirection, but it shouldn't be anywhere near 20%.
Yes , only a global instance .
Profile it and see exactly what function has become slower?
Make move is always slower by the largest margin of them all, the others keep fluctuating "but most of the time they are slower" , any idea what could be the cause !
Sven
Posts: 4052
Joined: Thu May 15, 2008 9:57 pm
Location: Berlin, Germany
Full name: Sven Schüle

Re: Class wrapper and performance degradation ?

Post by Sven »

MahmoudUthman wrote:
matthewlai wrote:
MahmoudUthman wrote:
Are you sure no copies are made? Try deleting the default copy ctor and assignment operator.
Yes, I already delete both , "ironically I wrongly deleted them while removing the comments from the code I posted above".
If you only have a global instance of this class I wouldn't expect there to be any overhead at all. The compiler should be able to figure out all the addresses at compile-time. If you have a single non-global instance, I would expect slightly lower performance due to pointer redirection, but it shouldn't be anywhere near 20%.
Yes , only a global instance .
Profile it and see exactly what function has become slower?
Make move is always slower by the largest margin of them all, the others keep fluctuating "but most of the time they are slower" , any idea what could be the cause !
I would go through all places where you access your global board instance, step by step. Perhaps start in "make move" and its callers. I would expect that there is at least one piece of code that now causes some frequent copying of the board.

Are you sure that you have changed all occurrences of "POSITION_NAMESPACE::POSITION_MEMBER" into "POSITION_GLOBAL_INSTANCE.POSITION_MEMBER"? (I know, the answer can only be "yes" since otherwise it would not compile, but maybe you get some hint by checking again your changes ...)
MahmoudUthman
Posts: 237
Joined: Sat Jan 17, 2015 11:54 pm

Re: Class wrapper and performance degradation ?

Post by MahmoudUthman »

Sven Schüle wrote:
MahmoudUthman wrote:
matthewlai wrote:
MahmoudUthman wrote:
Are you sure no copies are made? Try deleting the default copy ctor and assignment operator.
Yes, I already delete both , "ironically I wrongly deleted them while removing the comments from the code I posted above".
If you only have a global instance of this class I wouldn't expect there to be any overhead at all. The compiler should be able to figure out all the addresses at compile-time. If you have a single non-global instance, I would expect slightly lower performance due to pointer redirection, but it shouldn't be anywhere near 20%.
Yes , only a global instance .
Profile it and see exactly what function has become slower?
Make move is always slower by the largest margin of them all, the others keep fluctuating "but most of the time they are slower" , any idea what could be the cause !
I would go through all places where you access your global board instance, step by step. Perhaps start in "make move" and its callers. I would expect that there is at least one piece of code that now causes some frequent copying of the board.

Are you sure that you have changed all occurrences of "POSITION_NAMESPACE::POSITION_MEMBER" into "POSITION_GLOBAL_INSTANCE.POSITION_MEMBER"? (I know, the answer can only be "yes" since otherwise it would not compile, but maybe you get some hint by checking again your changes ...)
I double checked everything and found no errors , so I striped the functions to the bare minimum (removed incremental updates of zobrist keys, pst score ...etc , basically a perft program rather than a chess engine) and this is the result of perft to depth 7 from the starting position :

Code: Select all

Classless version :
Perft 1 :       20      time :  0        ms
Perft 2 :       400     time :  0        ms
Perft 3 :       8902    time :  0        ms
Perft 4 :       197281  time :  7        ms
Perft 5 :       4865609 time :  129      ms
Perft 6 :       119060324       time :  2787     ms
Perft 7 :       3195901860      time :  73937    ms

Code: Select all

Class:
Perft 1 :       20      time :  0        ms
Perft 2 :       400     time :  0        ms
Perft 3 :       8902    time :  0        ms
Perft 4 :       197281  time :  5        ms
Perft 5 :       4865609 time :  129      ms
Perft 6 :       119060324       time :  2929     ms
Perft 7 :       3195901860      time :  77954    ms
before reducing everything I think reaching depth 7 took around 100 seconds or so for the classless version and 110 or 120 "I don't remember" for the one with classes , does the above result fall in line with what's to be expected ?