Templates duplicates too much as well. For small pieces of code, duplication doesn't hurt so much with size, and by eliminating a branch (that is probably hard to predict also) you can get a bit of a speed increase. But you get increased debugging effort as well when you forget to change 1/2 of the block.kbhearn wrote:i know this is going to make you cringe, but you could always convert to 'just a little' c++ and use templates to get the advantage of specific white/black code if it really is a significant gain without resorting to duplication.bob wrote:You can still structure things reasonably cleanly. But when you duplicate code, you invite bugs. My evaluation could be a bit faster with separate cases for white and black. But back to that having to change two different pieces of code problem. Having done it both ways, I prefer zero duplication when practical.Joost Buijs wrote:By merging everything in the search the source code will shrink by a factor 2 or 3 but the readability will be less, now every function is very well defined and readable.bob wrote: That is THE reason I discontinued this bad practice several years ago.
I have a single Search() procedure, and a simple Quiesce() procedure. I have one special case, QuiesceEvasions() that I will get rid of pretty soon...
I'm sure that when I spend a few days on it that I can merge everything in let's say 3 functions, but I feel very reluctant to spend time on a cosmetic change.
Somehow I'm a little bit chess tired, I didn't touch my source code for the last 1,5 years besides fixing one or two bugs.
A couple of weeks ago I decided to revive my old Draughts program, I made a new bitboard move generator for it using magics and behold now it is 4 times faster compared to the old mailbox one.
This gives me a lot more fun because it is new, after 40 years of programming chess (with a break of 10 years) I feel like I have seen it all.
Search readability
Moderators: hgm, Rebel, chrisw
-
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: Search readability
-
- Posts: 1563
- Joined: Thu Jul 16, 2009 10:47 am
- Location: Almere, The Netherlands
Re: Search readability
I don't know what is worse, the increased cache footprint or the added branches (which will also increase the cache footprint somewhat).bob wrote:Don't forget templates also increase the cache footprint, which is another performance loss.Joost Buijs wrote:Duplicating things is bad practice because it is difficult to maintain, I agree.bob wrote:You can still structure things reasonably cleanly. But when you duplicate code, you invite bugs. My evaluation could be a bit faster with separate cases for white and black. But back to that having to change two different pieces of code problem. Having done it both ways, I prefer zero duplication when practical.Joost Buijs wrote:By merging everything in the search the source code will shrink by a factor 2 or 3 but the readability will be less, now every function is very well defined and readable.bob wrote: That is THE reason I discontinued this bad practice several years ago.
I have a single Search() procedure, and a simple Quiesce() procedure. I have one special case, QuiesceEvasions() that I will get rid of pretty soon...
I'm sure that when I spend a few days on it that I can merge everything in let's say 3 functions, but I feel very reluctant to spend time on a cosmetic change.
Somehow I'm a little bit chess tired, I didn't touch my source code for the last 1,5 years besides fixing one or two bugs.
A couple of weeks ago I decided to revive my old Draughts program, I made a new bitboard move generator for it using magics and behold now it is 4 times faster compared to the old mailbox one.
This gives me a lot more fun because it is new, after 40 years of programming chess (with a break of 10 years) I feel like I have seen it all.
The PV and non PV code is different enough and not really a duplicate, but the SMP and non SMP code is almost a 100% duplicate besides some locking and atomic stuff.
When I want to change this I'll have to use templates (like in my evaluation function), otherwise the performance will be hurt by the additional if/then/else constructs.
On the other hand I don't see a reason why I would change this because the old code is more or less closed.
One way or the other, in practice it won't matter much, we are talking about speed differences of at most a few percent (a few Elo in strength).
-
- Posts: 27808
- Joined: Fri Mar 10, 2006 10:06 am
- Location: Amsterdam
- Full name: H G Muller
Re: Search readability
In my experience the most important way to achieve speed is good cache usage. Branch prediction seems to work amazingly well on these modern machines. I once tried to measure the delay caused by mispredicted branches by reading the nano-second clock (TSC) before and after the branch, and making a histogram. To cause the mispredictions I had it branch on the output of a simple PRNG. I was amazed how difficult it was to get a large fraction of mispredictions, and had to increase the complexity of the PRNG several times for that. The prediction logic was just outguessing the results of simple PRNGs!
-
- Posts: 2559
- Joined: Fri Nov 26, 2010 2:00 pm
- Location: Czech Republic
- Full name: Martin Sedlak
Re: Search readability
That's interesting, I did a similar test some time ago by using a large array of pseudo-random numbers and processing them based on some trivial logic (like counting bytes with msbit set)hgm wrote:In my experience the most important way to achieve speed is good cache usage. Branch prediction seems to work amazingly well on these modern machines. I once tried to measure the delay caused by mispredicted branches by reading the nano-second clock (TSC) before and after the branch, and making a histogram. To cause the mispredictions I had it branch on the output of a simple PRNG. I was amazed how difficult it was to get a large fraction of mispredictions, and had to increase the complexity of the PRNG several times for that. The prediction logic was just outguessing the results of simple PRNGs!
Then I measured it again with sorted data.
I don't remember the exact numbers but the former ran significantly (measurably) slower due to misprediction.
-
- Posts: 2559
- Joined: Fri Nov 26, 2010 2:00 pm
- Location: Czech Republic
- Full name: Martin Sedlak
Re: Search readability
I did this again and here are the results:mar wrote:I did a similar test some time ago by using a large array of pseudo-random numbers and processing them based on some trivial logic (like counting bytes with msbit set)
Then I measured it again with sorted data.
256MB array of bytes, 20 runs each (taking best result):
Code: Select all
unsorted (pseudo-random): 1250636 usec
sorted: 160017 usec
-
- Posts: 178
- Joined: Sat Jan 08, 2011 12:51 am
- Location: USA
- Full name: Alcides Schulz
Re: Search readability
In this video, the guy talks about programming performance related to cache, cores, etc.
https://vimeo.com/97337258
https://vimeo.com/97337258
-
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: Search readability
You should look at what templates do in Eugene's "egtb.cpp" c++ code. It takes longer to compile that code than it does to compile the rest of the crafty source. Crafty source is 40K lines of code with comments, egtb.cpp is about 6600 lines of code. Yet egtb compiles into 2x the code that is in Crafty.Joost Buijs wrote:I don't know what is worse, the increased cache footprint or the added branches (which will also increase the cache footprint somewhat).bob wrote:Don't forget templates also increase the cache footprint, which is another performance loss.Joost Buijs wrote:Duplicating things is bad practice because it is difficult to maintain, I agree.bob wrote:You can still structure things reasonably cleanly. But when you duplicate code, you invite bugs. My evaluation could be a bit faster with separate cases for white and black. But back to that having to change two different pieces of code problem. Having done it both ways, I prefer zero duplication when practical.Joost Buijs wrote:By merging everything in the search the source code will shrink by a factor 2 or 3 but the readability will be less, now every function is very well defined and readable.bob wrote: That is THE reason I discontinued this bad practice several years ago.
I have a single Search() procedure, and a simple Quiesce() procedure. I have one special case, QuiesceEvasions() that I will get rid of pretty soon...
I'm sure that when I spend a few days on it that I can merge everything in let's say 3 functions, but I feel very reluctant to spend time on a cosmetic change.
Somehow I'm a little bit chess tired, I didn't touch my source code for the last 1,5 years besides fixing one or two bugs.
A couple of weeks ago I decided to revive my old Draughts program, I made a new bitboard move generator for it using magics and behold now it is 4 times faster compared to the old mailbox one.
This gives me a lot more fun because it is new, after 40 years of programming chess (with a break of 10 years) I feel like I have seen it all.
The PV and non PV code is different enough and not really a duplicate, but the SMP and non SMP code is almost a 100% duplicate besides some locking and atomic stuff.
When I want to change this I'll have to use templates (like in my evaluation function), otherwise the performance will be hurt by the additional if/then/else constructs.
On the other hand I don't see a reason why I would change this because the old code is more or less closed.
One way or the other, in practice it won't matter much, we are talking about speed differences of at most a few percent (a few Elo in strength).
The issue is that a few percent here, a few percent there, and you have programs that vary in NPS by a factor of 4.
-
- Posts: 2559
- Joined: Fri Nov 26, 2010 2:00 pm
- Location: Czech Republic
- Full name: Martin Sedlak
Re: Search readability
Templates are very useful, C++ compiles slower, that's normal as offers more.bob wrote:You should look at what templates do in Eugene's "egtb.cpp" c++ code. It takes longer to compile that code than it does to compile the rest of the crafty source. Crafty source is 40K lines of code with comments, egtb.cpp is about 6600 lines of code. Yet egtb compiles into 2x the code that is in Crafty.
How do you do generic containers in C?
Usually a buch of ugly void * with casting or manual copypasting/multiple includes with macro redef magic. Yuck.
The same problem goes for stdlib qsort, it uses a callback function for comparisons. std::sort uses templates and can run much faster (especially on elementary types).
So one can use or abuse templates, that doesn't make them any less useful, they can save a lot of time and blood sweating.
Also recent compilers (especially with lto) automatically merge functions that compile to identical machine code (unless inlined of course).
-
- Posts: 1563
- Joined: Thu Jul 16, 2009 10:47 am
- Location: Almere, The Netherlands
Re: Search readability
Templates are not the Holy Grail and you have to use them wisely, it won't hurt to use them on occasion to get better maintainable code.bob wrote:You should look at what templates do in Eugene's "egtb.cpp" c++ code. It takes longer to compile that code than it does to compile the rest of the crafty source. Crafty source is 40K lines of code with comments, egtb.cpp is about 6600 lines of code. Yet egtb compiles into 2x the code that is in Crafty.Joost Buijs wrote:I don't know what is worse, the increased cache footprint or the added branches (which will also increase the cache footprint somewhat).bob wrote:Don't forget templates also increase the cache footprint, which is another performance loss.Joost Buijs wrote:Duplicating things is bad practice because it is difficult to maintain, I agree.bob wrote:You can still structure things reasonably cleanly. But when you duplicate code, you invite bugs. My evaluation could be a bit faster with separate cases for white and black. But back to that having to change two different pieces of code problem. Having done it both ways, I prefer zero duplication when practical.Joost Buijs wrote:By merging everything in the search the source code will shrink by a factor 2 or 3 but the readability will be less, now every function is very well defined and readable.bob wrote: That is THE reason I discontinued this bad practice several years ago.
I have a single Search() procedure, and a simple Quiesce() procedure. I have one special case, QuiesceEvasions() that I will get rid of pretty soon...
I'm sure that when I spend a few days on it that I can merge everything in let's say 3 functions, but I feel very reluctant to spend time on a cosmetic change.
Somehow I'm a little bit chess tired, I didn't touch my source code for the last 1,5 years besides fixing one or two bugs.
A couple of weeks ago I decided to revive my old Draughts program, I made a new bitboard move generator for it using magics and behold now it is 4 times faster compared to the old mailbox one.
This gives me a lot more fun because it is new, after 40 years of programming chess (with a break of 10 years) I feel like I have seen it all.
The PV and non PV code is different enough and not really a duplicate, but the SMP and non SMP code is almost a 100% duplicate besides some locking and atomic stuff.
When I want to change this I'll have to use templates (like in my evaluation function), otherwise the performance will be hurt by the additional if/then/else constructs.
On the other hand I don't see a reason why I would change this because the old code is more or less closed.
One way or the other, in practice it won't matter much, we are talking about speed differences of at most a few percent (a few Elo in strength).
The issue is that a few percent here, a few percent there, and you have programs that vary in NPS by a factor of 4.
What there is wrong with Nalimov EGTB I don't know, I never studied that code in detail, I only made some changes to it to make it compatible with modern C++ compilers.
It is true that it compiles horribly slow and that it generates huge binaries, without Nalimov my engine is 200 KB in size and with Nalimov it is 1.4 MB.
To be honest it is very messy code and it is a miracle that it works without many bugs.
The initialization is also very slow, when I use 6 men on SSD it still takes a minute or so for initialization.
Unfortunately there are not many alternatives, syzygy is probably faster, but I like to have distance to mate.
-
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: Search readability
Didn't say they were bad. Said they increase the cache footprint which is harmful for performance. I go low-level for speed, in general. Even though there might be many better languages for pure expression of concepts. Code bloat is not something I want in my chess engine, however.mar wrote:Templates are very useful, C++ compiles slower, that's normal as offers more.bob wrote:You should look at what templates do in Eugene's "egtb.cpp" c++ code. It takes longer to compile that code than it does to compile the rest of the crafty source. Crafty source is 40K lines of code with comments, egtb.cpp is about 6600 lines of code. Yet egtb compiles into 2x the code that is in Crafty.
How do you do generic containers in C?
Usually a buch of ugly void * with casting or manual copypasting/multiple includes with macro redef magic. Yuck.
The same problem goes for stdlib qsort, it uses a callback function for comparisons. std::sort uses templates and can run much faster (especially on elementary types).
So one can use or abuse templates, that doesn't make them any less useful, they can save a lot of time and blood sweating.
Also recent compilers (especially with lto) automatically merge functions that compile to identical machine code (unless inlined of course).