Class wrapper and performance degradation ?
Moderator: Ras
-
MahmoudUthman
- Posts: 237
- Joined: Sat Jan 17, 2015 11:54 pm
Class wrapper and performance degradation ?
I used to contain the position representation inside a namespaces without using classes at all , after wrapping the "exact" code inside a class the performance of the degraded in an observable way for example the perft of the starting position overhead at depth 6 reaches 800ms "base=4100ms" ? is this normal or could I be doing something wrong ?
-
Sven
- Posts: 4052
- Joined: Thu May 15, 2008 9:57 pm
- Location: Berlin, Germany
- Full name: Sven Schüle
Re: Class wrapper and performance degradation ?
If I understand you correctly then perft(6) is now slower by about 20% (4900 ms instead of 4100 ms). That does not sound normal for me. The typical overhead is mostly the additional use of the "this" pointer but that should not cause a huge slowdown, I would expect few percent only.MahmoudUthman wrote:I used to contain the position representation inside a namespaces without using classes at all , after wrapping the "exact" code inside a class the performance of the degraded in an observable way for example the perft of the starting position overhead at depth 6 reaches 800ms "base=4100ms" ? is this normal or could I be doing something wrong ?
-
kbhearn
- Posts: 411
- Joined: Thu Dec 30, 2010 4:48 am
Re: Class wrapper and performance degradation ?
few questions:
1) this is c++? if not, many other languages have heavier object overhead
2) optimisation switches? if your constructor isn't being inlined it would be extra function calls per object
3) objects created on stack as local variables, allocated once and then reused, or dynamically created and destroyed with new and delete? dynamically creating objects in your inner loop with new may cause a slowdown as it entails a memory allocation
beyond this, perhaps it would be helpful if you shared your class prototype so we can get an idea of how you're using it?
1) this is c++? if not, many other languages have heavier object overhead
2) optimisation switches? if your constructor isn't being inlined it would be extra function calls per object
3) objects created on stack as local variables, allocated once and then reused, or dynamically created and destroyed with new and delete? dynamically creating objects in your inner loop with new may cause a slowdown as it entails a memory allocation
beyond this, perhaps it would be helpful if you shared your class prototype so we can get an idea of how you're using it?
-
MahmoudUthman
- Posts: 237
- Joined: Sat Jan 17, 2015 11:54 pm
Re: Class wrapper and performance degradation ?
1-yes , C++.
2-yes I tested with both intel and Microsoft compilers using the exact setting for both versions "Highest optimization setting available".
3-Stack.
every single function is left exactly as it was , the only modification is that the functions and variables have been moved from a namespace into a class
2-yes I tested with both intel and Microsoft compilers using the exact setting for both versions "Highest optimization setting available".
3-Stack.
Code: Select all
class Position
{
public:
Position() = default;
inline U64 GetCatsleZobrist() const;
inline U64 GetEnpassantZobrist() const;
const inline U64 AttackSearch(unsigned int PieceSQ, Piece PT) const;
template<Color C> inline U64 PinnedPieces() const;
template<Color C> inline U64 AttackedSquares() const;
void MakeMove(const Move& __restrict M);
void UnMakeMove(const Move& __restrict M);
inline void FromFen(const std::string& Fen);
template<Color C> inline Move* GenerateEvasions(Move* List, U64 AttackedSqs) const;
template<Color C> inline Move* GenerateMoves (Move* List, U64 InCheck, U64 AttackedSqs) const;
U64 Perft(U64 depth);
void PrintBoard();
private:
Bitboard Backup_BitBoards[MaxDepth][16];
Score BackUp_PSQTScore[MaxDepth];
U64 BackUp_HalfMoveClock[MaxDepth];
U64 BackUp_PositionKey[MaxDepth];
U64 Backup_CR[MaxDepth][2];
U32 Backup_LastMoveDPushSQ[MaxDepth];
Bitboard* BitBoards;
unsigned int DoublePushSQ[2] ;
U64 CastleRights[2];
Score PSQTScore;
U64 Positionkey;
U64 GameRecord[512];
Piece Board[64+MaxDepth];
int ply ;
U64 HalfMoveClock ;
U64 GameRecordCounter ;
Color STM;
Color NTM;
size_t USTM, UNTM;
};
-
matthewlai
- Posts: 793
- Joined: Sun Aug 03, 2014 4:48 am
- Location: London, UK
Re: Class wrapper and performance degradation ?
Are you sure no copies are made? Try deleting the default copy ctor and assignment operator.MahmoudUthman wrote:1-yes , C++.
2-yes I tested with both intel and Microsoft compilers using the exact setting for both versions "Highest optimization setting available".
3-Stack.
every single function is left exactly as it was , the only modification is that the functions and variables have been moved from a namespace into a classCode: Select all
class Position { public: Position() = default; inline U64 GetCatsleZobrist() const; inline U64 GetEnpassantZobrist() const; const inline U64 AttackSearch(unsigned int PieceSQ, Piece PT) const; template<Color C> inline U64 PinnedPieces() const; template<Color C> inline U64 AttackedSquares() const; void MakeMove(const Move& __restrict M); void UnMakeMove(const Move& __restrict M); inline void FromFen(const std::string& Fen); template<Color C> inline Move* GenerateEvasions(Move* List, U64 AttackedSqs) const; template<Color C> inline Move* GenerateMoves (Move* List, U64 InCheck, U64 AttackedSqs) const; U64 Perft(U64 depth); void PrintBoard(); private: Bitboard Backup_BitBoards[MaxDepth][16]; Score BackUp_PSQTScore[MaxDepth]; U64 BackUp_HalfMoveClock[MaxDepth]; U64 BackUp_PositionKey[MaxDepth]; U64 Backup_CR[MaxDepth][2]; U32 Backup_LastMoveDPushSQ[MaxDepth]; Bitboard* BitBoards; unsigned int DoublePushSQ[2] ; U64 CastleRights[2]; Score PSQTScore; U64 Positionkey; U64 GameRecord[512]; Piece Board[64+MaxDepth]; int ply ; U64 HalfMoveClock ; U64 GameRecordCounter ; Color STM; Color NTM; size_t USTM, UNTM; };
If you only have a global instance of this class I wouldn't expect there to be any overhead at all. The compiler should be able to figure out all the addresses at compile-time. If you have a single non-global instance, I would expect slightly lower performance due to pointer redirection, but it shouldn't be anywhere near 20%.
Disclosure: I work for DeepMind on the AlphaZero project, but everything I say here is personal opinion and does not reflect the views of DeepMind / Alphabet.
-
MahmoudUthman
- Posts: 237
- Joined: Sat Jan 17, 2015 11:54 pm
Re: Class wrapper and performance degradation ?
Yes, I already delete both , "ironically I wrongly deleted them while removing the comments from the code I posted above".Are you sure no copies are made? Try deleting the default copy ctor and assignment operator.
Yes , only a global instance .If you only have a global instance of this class I wouldn't expect there to be any overhead at all. The compiler should be able to figure out all the addresses at compile-time. If you have a single non-global instance, I would expect slightly lower performance due to pointer redirection, but it shouldn't be anywhere near 20%.
-
matthewlai
- Posts: 793
- Joined: Sun Aug 03, 2014 4:48 am
- Location: London, UK
Re: Class wrapper and performance degradation ?
Profile it and see exactly what function has become slower?MahmoudUthman wrote:Yes, I already delete both , "ironically I wrongly deleted them while removing the comments from the code I posted above".Are you sure no copies are made? Try deleting the default copy ctor and assignment operator.Yes , only a global instance .If you only have a global instance of this class I wouldn't expect there to be any overhead at all. The compiler should be able to figure out all the addresses at compile-time. If you have a single non-global instance, I would expect slightly lower performance due to pointer redirection, but it shouldn't be anywhere near 20%.
Disclosure: I work for DeepMind on the AlphaZero project, but everything I say here is personal opinion and does not reflect the views of DeepMind / Alphabet.
-
MahmoudUthman
- Posts: 237
- Joined: Sat Jan 17, 2015 11:54 pm
Re: Class wrapper and performance degradation ?
Make move is always slower by the largest margin of them all, the others keep fluctuating "but most of the time they are slower" , any idea what could be the cause !matthewlai wrote:Profile it and see exactly what function has become slower?MahmoudUthman wrote:Yes, I already delete both , "ironically I wrongly deleted them while removing the comments from the code I posted above".Are you sure no copies are made? Try deleting the default copy ctor and assignment operator.Yes , only a global instance .If you only have a global instance of this class I wouldn't expect there to be any overhead at all. The compiler should be able to figure out all the addresses at compile-time. If you have a single non-global instance, I would expect slightly lower performance due to pointer redirection, but it shouldn't be anywhere near 20%.
-
Sven
- Posts: 4052
- Joined: Thu May 15, 2008 9:57 pm
- Location: Berlin, Germany
- Full name: Sven Schüle
Re: Class wrapper and performance degradation ?
I would go through all places where you access your global board instance, step by step. Perhaps start in "make move" and its callers. I would expect that there is at least one piece of code that now causes some frequent copying of the board.MahmoudUthman wrote:Make move is always slower by the largest margin of them all, the others keep fluctuating "but most of the time they are slower" , any idea what could be the cause !matthewlai wrote:Profile it and see exactly what function has become slower?MahmoudUthman wrote:Yes, I already delete both , "ironically I wrongly deleted them while removing the comments from the code I posted above".Are you sure no copies are made? Try deleting the default copy ctor and assignment operator.Yes , only a global instance .If you only have a global instance of this class I wouldn't expect there to be any overhead at all. The compiler should be able to figure out all the addresses at compile-time. If you have a single non-global instance, I would expect slightly lower performance due to pointer redirection, but it shouldn't be anywhere near 20%.
Are you sure that you have changed all occurrences of "POSITION_NAMESPACE::POSITION_MEMBER" into "POSITION_GLOBAL_INSTANCE.POSITION_MEMBER"? (I know, the answer can only be "yes" since otherwise it would not compile, but maybe you get some hint by checking again your changes ...)
-
MahmoudUthman
- Posts: 237
- Joined: Sat Jan 17, 2015 11:54 pm
Re: Class wrapper and performance degradation ?
I double checked everything and found no errors , so I striped the functions to the bare minimum (removed incremental updates of zobrist keys, pst score ...etc , basically a perft program rather than a chess engine) and this is the result of perft to depth 7 from the starting position :Sven Schüle wrote:I would go through all places where you access your global board instance, step by step. Perhaps start in "make move" and its callers. I would expect that there is at least one piece of code that now causes some frequent copying of the board.MahmoudUthman wrote:Make move is always slower by the largest margin of them all, the others keep fluctuating "but most of the time they are slower" , any idea what could be the cause !matthewlai wrote:Profile it and see exactly what function has become slower?MahmoudUthman wrote:Yes, I already delete both , "ironically I wrongly deleted them while removing the comments from the code I posted above".Are you sure no copies are made? Try deleting the default copy ctor and assignment operator.Yes , only a global instance .If you only have a global instance of this class I wouldn't expect there to be any overhead at all. The compiler should be able to figure out all the addresses at compile-time. If you have a single non-global instance, I would expect slightly lower performance due to pointer redirection, but it shouldn't be anywhere near 20%.
Are you sure that you have changed all occurrences of "POSITION_NAMESPACE::POSITION_MEMBER" into "POSITION_GLOBAL_INSTANCE.POSITION_MEMBER"? (I know, the answer can only be "yes" since otherwise it would not compile, but maybe you get some hint by checking again your changes ...)
Code: Select all
Classless version :
Perft 1 : 20 time : 0 ms
Perft 2 : 400 time : 0 ms
Perft 3 : 8902 time : 0 ms
Perft 4 : 197281 time : 7 ms
Perft 5 : 4865609 time : 129 ms
Perft 6 : 119060324 time : 2787 ms
Perft 7 : 3195901860 time : 73937 ms
Code: Select all
Class:
Perft 1 : 20 time : 0 ms
Perft 2 : 400 time : 0 ms
Perft 3 : 8902 time : 0 ms
Perft 4 : 197281 time : 5 ms
Perft 5 : 4865609 time : 129 ms
Perft 6 : 119060324 time : 2929 ms
Perft 7 : 3195901860 time : 77954 ms