Search readability

bob · Post by **bob** » Tue Jun 14, 2016 11:52 pm

kbhearn wrote:
bob wrote:
Joost Buijs wrote:
bob wrote: That is THE reason I discontinued this bad practice several years ago.

I have a single Search() procedure, and a simple Quiesce() procedure. I have one special case, QuiesceEvasions() that I will get rid of pretty soon...
By merging everything in the search the source code will shrink by a factor 2 or 3 but the readability will be less, now every function is very well defined and readable.

I'm sure that when I spend a few days on it that I can merge everything in let's say 3 functions, but I feel very reluctant to spend time on a cosmetic change.
Somehow I'm a little bit chess tired, I didn't touch my source code for the last 1,5 years besides fixing one or two bugs.

A couple of weeks ago I decided to revive my old Draughts program, I made a new bitboard move generator for it using magics and behold now it is 4 times faster compared to the old mailbox one.
This gives me a lot more fun because it is new, after 40 years of programming chess (with a break of 10 years) I feel like I have seen it all.
You can still structure things reasonably cleanly. But when you duplicate code, you invite bugs. My evaluation could be a bit faster with separate cases for white and black. But back to that having to change two different pieces of code problem. Having done it both ways, I prefer zero duplication when practical.
i know this is going to make you cringe, but you could always convert to 'just a little' c++ and use templates to get the advantage of specific white/black code if it really is a significant gain without resorting to duplication.

Templates duplicates too much as well. For small pieces of code, duplication doesn't hurt so much with size, and by eliminating a branch (that is probably hard to predict also) you can get a bit of a speed increase. But you get increased debugging effort as well when you forget to change 1/2 of the block.

Joost Buijs · Post by **Joost Buijs** » Wed Jun 15, 2016 11:06 am

bob wrote:
Joost Buijs wrote:
bob wrote:
Joost Buijs wrote:
bob wrote: That is THE reason I discontinued this bad practice several years ago.

I have a single Search() procedure, and a simple Quiesce() procedure. I have one special case, QuiesceEvasions() that I will get rid of pretty soon...
By merging everything in the search the source code will shrink by a factor 2 or 3 but the readability will be less, now every function is very well defined and readable.

I'm sure that when I spend a few days on it that I can merge everything in let's say 3 functions, but I feel very reluctant to spend time on a cosmetic change.
Somehow I'm a little bit chess tired, I didn't touch my source code for the last 1,5 years besides fixing one or two bugs.

A couple of weeks ago I decided to revive my old Draughts program, I made a new bitboard move generator for it using magics and behold now it is 4 times faster compared to the old mailbox one.
This gives me a lot more fun because it is new, after 40 years of programming chess (with a break of 10 years) I feel like I have seen it all.
You can still structure things reasonably cleanly. But when you duplicate code, you invite bugs. My evaluation could be a bit faster with separate cases for white and black. But back to that having to change two different pieces of code problem. Having done it both ways, I prefer zero duplication when practical.
Duplicating things is bad practice because it is difficult to maintain, I agree.
The PV and non PV code is different enough and not really a duplicate, but the SMP and non SMP code is almost a 100% duplicate besides some locking and atomic stuff.
When I want to change this I'll have to use templates (like in my evaluation function), otherwise the performance will be hurt by the additional if/then/else constructs.
On the other hand I don't see a reason why I would change this because the old code is more or less closed.
Don't forget templates also increase the cache footprint, which is another performance loss.

I don't know what is worse, the increased cache footprint or the added branches (which will also increase the cache footprint somewhat).
One way or the other, in practice it won't matter much, we are talking about speed differences of at most a few percent (a few Elo in strength).

hgm · Post by **hgm** » Wed Jun 15, 2016 11:38 am

In my experience the most important way to achieve speed is good cache usage. Branch prediction seems to work amazingly well on these modern machines. I once tried to measure the delay caused by mispredicted branches by reading the nano-second clock (TSC) before and after the branch, and making a histogram. To cause the mispredictions I had it branch on the output of a simple PRNG. I was amazed how difficult it was to get a large fraction of mispredictions, and had to increase the complexity of the PRNG several times for that. The prediction logic was just outguessing the results of simple PRNGs!

mar · Post by **mar** » Wed Jun 15, 2016 11:52 am

hgm wrote:In my experience the most important way to achieve speed is good cache usage. Branch prediction seems to work amazingly well on these modern machines. I once tried to measure the delay caused by mispredicted branches by reading the nano-second clock (TSC) before and after the branch, and making a histogram. To cause the mispredictions I had it branch on the output of a simple PRNG. I was amazed how difficult it was to get a large fraction of mispredictions, and had to increase the complexity of the PRNG several times for that. The prediction logic was just outguessing the results of simple PRNGs!

That's interesting, I did a similar test some time ago by using a large array of pseudo-random numbers and processing them based on some trivial logic (like counting bytes with msbit set)
Then I measured it again with sorted data.
I don't remember the exact numbers but the former ran significantly (measurably) slower due to misprediction.

mar · Post by **mar** » Wed Jun 15, 2016 12:12 pm

mar wrote:I did a similar test some time ago by using a large array of pseudo-random numbers and processing them based on some trivial logic (like counting bytes with msbit set)
Then I measured it again with sorted data.

I did this again and here are the results:
256MB array of bytes, 20 runs each (taking best result):

Code: Select all

unsorted &#40;pseudo-random&#41;&#58; 1250636 usec
sorted&#58; 160017 usec

With other words, well predicted vs random is 7.8x faster on my machine here (some i7).

sedicla · Post by **sedicla** » Wed Jun 15, 2016 1:16 pm

In this video, the guy talks about programming performance related to cache, cores, etc.

https://vimeo.com/97337258

bob · Post by **bob** » Wed Jun 15, 2016 6:47 pm

Joost Buijs wrote:
bob wrote:
Joost Buijs wrote:
bob wrote:
Joost Buijs wrote:
bob wrote: That is THE reason I discontinued this bad practice several years ago.

I have a single Search() procedure, and a simple Quiesce() procedure. I have one special case, QuiesceEvasions() that I will get rid of pretty soon...
By merging everything in the search the source code will shrink by a factor 2 or 3 but the readability will be less, now every function is very well defined and readable.

I'm sure that when I spend a few days on it that I can merge everything in let's say 3 functions, but I feel very reluctant to spend time on a cosmetic change.
Somehow I'm a little bit chess tired, I didn't touch my source code for the last 1,5 years besides fixing one or two bugs.

A couple of weeks ago I decided to revive my old Draughts program, I made a new bitboard move generator for it using magics and behold now it is 4 times faster compared to the old mailbox one.
This gives me a lot more fun because it is new, after 40 years of programming chess (with a break of 10 years) I feel like I have seen it all.
You can still structure things reasonably cleanly. But when you duplicate code, you invite bugs. My evaluation could be a bit faster with separate cases for white and black. But back to that having to change two different pieces of code problem. Having done it both ways, I prefer zero duplication when practical.
Duplicating things is bad practice because it is difficult to maintain, I agree.
The PV and non PV code is different enough and not really a duplicate, but the SMP and non SMP code is almost a 100% duplicate besides some locking and atomic stuff.
When I want to change this I'll have to use templates (like in my evaluation function), otherwise the performance will be hurt by the additional if/then/else constructs.
On the other hand I don't see a reason why I would change this because the old code is more or less closed.
Don't forget templates also increase the cache footprint, which is another performance loss.
I don't know what is worse, the increased cache footprint or the added branches (which will also increase the cache footprint somewhat).
One way or the other, in practice it won't matter much, we are talking about speed differences of at most a few percent (a few Elo in strength).

You should look at what templates do in Eugene's "egtb.cpp" c++ code. It takes longer to compile that code than it does to compile the rest of the crafty source. Crafty source is 40K lines of code with comments, egtb.cpp is about 6600 lines of code. Yet egtb compiles into 2x the code that is in Crafty.

The issue is that a few percent here, a few percent there, and you have programs that vary in NPS by a factor of 4.

mar · Post by **mar** » Wed Jun 15, 2016 9:39 pm

bob wrote:You should look at what templates do in Eugene's "egtb.cpp" c++ code. It takes longer to compile that code than it does to compile the rest of the crafty source. Crafty source is 40K lines of code with comments, egtb.cpp is about 6600 lines of code. Yet egtb compiles into 2x the code that is in Crafty.

Templates are very useful, C++ compiles slower, that's normal as offers more.
How do you do generic containers in C?
Usually a buch of ugly void * with casting or manual copypasting/multiple includes with macro redef magic. Yuck.
The same problem goes for stdlib qsort, it uses a callback function for comparisons. std::sort uses templates and can run much faster (especially on elementary types).

So one can use or abuse templates, that doesn't make them any less useful, they can save a lot of time and blood sweating.

Also recent compilers (especially with lto) automatically merge functions that compile to identical machine code (unless inlined of course).

Joost Buijs · Post by **Joost Buijs** » Wed Jun 15, 2016 9:48 pm

bob wrote:
Joost Buijs wrote:
bob wrote:
Joost Buijs wrote:
bob wrote:
Joost Buijs wrote:
bob wrote: That is THE reason I discontinued this bad practice several years ago.

I have a single Search() procedure, and a simple Quiesce() procedure. I have one special case, QuiesceEvasions() that I will get rid of pretty soon...
By merging everything in the search the source code will shrink by a factor 2 or 3 but the readability will be less, now every function is very well defined and readable.

I'm sure that when I spend a few days on it that I can merge everything in let's say 3 functions, but I feel very reluctant to spend time on a cosmetic change.
Somehow I'm a little bit chess tired, I didn't touch my source code for the last 1,5 years besides fixing one or two bugs.

A couple of weeks ago I decided to revive my old Draughts program, I made a new bitboard move generator for it using magics and behold now it is 4 times faster compared to the old mailbox one.
This gives me a lot more fun because it is new, after 40 years of programming chess (with a break of 10 years) I feel like I have seen it all.
You can still structure things reasonably cleanly. But when you duplicate code, you invite bugs. My evaluation could be a bit faster with separate cases for white and black. But back to that having to change two different pieces of code problem. Having done it both ways, I prefer zero duplication when practical.
Duplicating things is bad practice because it is difficult to maintain, I agree.
The PV and non PV code is different enough and not really a duplicate, but the SMP and non SMP code is almost a 100% duplicate besides some locking and atomic stuff.
When I want to change this I'll have to use templates (like in my evaluation function), otherwise the performance will be hurt by the additional if/then/else constructs.
On the other hand I don't see a reason why I would change this because the old code is more or less closed.
Don't forget templates also increase the cache footprint, which is another performance loss.
I don't know what is worse, the increased cache footprint or the added branches (which will also increase the cache footprint somewhat).
One way or the other, in practice it won't matter much, we are talking about speed differences of at most a few percent (a few Elo in strength).
You should look at what templates do in Eugene's "egtb.cpp" c++ code. It takes longer to compile that code than it does to compile the rest of the crafty source. Crafty source is 40K lines of code with comments, egtb.cpp is about 6600 lines of code. Yet egtb compiles into 2x the code that is in Crafty.

The issue is that a few percent here, a few percent there, and you have programs that vary in NPS by a factor of 4.

Templates are not the Holy Grail and you have to use them wisely, it won't hurt to use them on occasion to get better maintainable code.

What there is wrong with Nalimov EGTB I don't know, I never studied that code in detail, I only made some changes to it to make it compatible with modern C++ compilers.
It is true that it compiles horribly slow and that it generates huge binaries, without Nalimov my engine is 200 KB in size and with Nalimov it is 1.4 MB.
To be honest it is very messy code and it is a miracle that it works without many bugs.
The initialization is also very slow, when I use 6 men on SSD it still takes a minute or so for initialization.
Unfortunately there are not many alternatives, syzygy is probably faster, but I like to have distance to mate.

bob · Post by **bob** » Thu Jun 16, 2016 12:21 am

mar wrote:
bob wrote:You should look at what templates do in Eugene's "egtb.cpp" c++ code. It takes longer to compile that code than it does to compile the rest of the crafty source. Crafty source is 40K lines of code with comments, egtb.cpp is about 6600 lines of code. Yet egtb compiles into 2x the code that is in Crafty.
Templates are very useful, C++ compiles slower, that's normal as offers more.
How do you do generic containers in C?
Usually a buch of ugly void * with casting or manual copypasting/multiple includes with macro redef magic. Yuck.
The same problem goes for stdlib qsort, it uses a callback function for comparisons. std::sort uses templates and can run much faster (especially on elementary types).

So one can use or abuse templates, that doesn't make them any less useful, they can save a lot of time and blood sweating.

Also recent compilers (especially with lto) automatically merge functions that compile to identical machine code (unless inlined of course).

Didn't say they were bad. Said they increase the cache footprint which is harmful for performance. I go low-level for speed, in general. Even though there might be many better languages for pure expression of concepts. Code bloat is not something I want in my chess engine, however.

Search readability

Re: Search readability

Re: Search readability

Re: Search readability

Re: Search readability

Re: Search readability

Re: Search readability

Re: Search readability

Re: Search readability

Re: Search readability

Re: Search readability