Search readability

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Search readability

Post by bob »

kbhearn wrote:
bob wrote:
Joost Buijs wrote:
bob wrote: That is THE reason I discontinued this bad practice several years ago.

I have a single Search() procedure, and a simple Quiesce() procedure. I have one special case, QuiesceEvasions() that I will get rid of pretty soon...
By merging everything in the search the source code will shrink by a factor 2 or 3 but the readability will be less, now every function is very well defined and readable.

I'm sure that when I spend a few days on it that I can merge everything in let's say 3 functions, but I feel very reluctant to spend time on a cosmetic change.
Somehow I'm a little bit chess tired, I didn't touch my source code for the last 1,5 years besides fixing one or two bugs.

A couple of weeks ago I decided to revive my old Draughts program, I made a new bitboard move generator for it using magics and behold now it is 4 times faster compared to the old mailbox one.
This gives me a lot more fun because it is new, after 40 years of programming chess (with a break of 10 years) I feel like I have seen it all.
You can still structure things reasonably cleanly. But when you duplicate code, you invite bugs. My evaluation could be a bit faster with separate cases for white and black. But back to that having to change two different pieces of code problem. Having done it both ways, I prefer zero duplication when practical.
i know this is going to make you cringe, but you could always convert to 'just a little' c++ and use templates to get the advantage of specific white/black code if it really is a significant gain without resorting to duplication.
Templates duplicates too much as well. For small pieces of code, duplication doesn't hurt so much with size, and by eliminating a branch (that is probably hard to predict also) you can get a bit of a speed increase. But you get increased debugging effort as well when you forget to change 1/2 of the block.
Joost Buijs
Posts: 1563
Joined: Thu Jul 16, 2009 10:47 am
Location: Almere, The Netherlands

Re: Search readability

Post by Joost Buijs »

bob wrote:
Joost Buijs wrote:
bob wrote:
Joost Buijs wrote:
bob wrote: That is THE reason I discontinued this bad practice several years ago.

I have a single Search() procedure, and a simple Quiesce() procedure. I have one special case, QuiesceEvasions() that I will get rid of pretty soon...
By merging everything in the search the source code will shrink by a factor 2 or 3 but the readability will be less, now every function is very well defined and readable.

I'm sure that when I spend a few days on it that I can merge everything in let's say 3 functions, but I feel very reluctant to spend time on a cosmetic change.
Somehow I'm a little bit chess tired, I didn't touch my source code for the last 1,5 years besides fixing one or two bugs.

A couple of weeks ago I decided to revive my old Draughts program, I made a new bitboard move generator for it using magics and behold now it is 4 times faster compared to the old mailbox one.
This gives me a lot more fun because it is new, after 40 years of programming chess (with a break of 10 years) I feel like I have seen it all.
You can still structure things reasonably cleanly. But when you duplicate code, you invite bugs. My evaluation could be a bit faster with separate cases for white and black. But back to that having to change two different pieces of code problem. Having done it both ways, I prefer zero duplication when practical.
Duplicating things is bad practice because it is difficult to maintain, I agree.
The PV and non PV code is different enough and not really a duplicate, but the SMP and non SMP code is almost a 100% duplicate besides some locking and atomic stuff.
When I want to change this I'll have to use templates (like in my evaluation function), otherwise the performance will be hurt by the additional if/then/else constructs.
On the other hand I don't see a reason why I would change this because the old code is more or less closed.
Don't forget templates also increase the cache footprint, which is another performance loss.
I don't know what is worse, the increased cache footprint or the added branches (which will also increase the cache footprint somewhat).
One way or the other, in practice it won't matter much, we are talking about speed differences of at most a few percent (a few Elo in strength).
User avatar
hgm
Posts: 27808
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Search readability

Post by hgm »

In my experience the most important way to achieve speed is good cache usage. Branch prediction seems to work amazingly well on these modern machines. I once tried to measure the delay caused by mispredicted branches by reading the nano-second clock (TSC) before and after the branch, and making a histogram. To cause the mispredictions I had it branch on the output of a simple PRNG. I was amazed how difficult it was to get a large fraction of mispredictions, and had to increase the complexity of the PRNG several times for that. The prediction logic was just outguessing the results of simple PRNGs!
mar
Posts: 2559
Joined: Fri Nov 26, 2010 2:00 pm
Location: Czech Republic
Full name: Martin Sedlak

Re: Search readability

Post by mar »

hgm wrote:In my experience the most important way to achieve speed is good cache usage. Branch prediction seems to work amazingly well on these modern machines. I once tried to measure the delay caused by mispredicted branches by reading the nano-second clock (TSC) before and after the branch, and making a histogram. To cause the mispredictions I had it branch on the output of a simple PRNG. I was amazed how difficult it was to get a large fraction of mispredictions, and had to increase the complexity of the PRNG several times for that. The prediction logic was just outguessing the results of simple PRNGs!
That's interesting, I did a similar test some time ago by using a large array of pseudo-random numbers and processing them based on some trivial logic (like counting bytes with msbit set)
Then I measured it again with sorted data.
I don't remember the exact numbers but the former ran significantly (measurably) slower due to misprediction.
mar
Posts: 2559
Joined: Fri Nov 26, 2010 2:00 pm
Location: Czech Republic
Full name: Martin Sedlak

Re: Search readability

Post by mar »

mar wrote:I did a similar test some time ago by using a large array of pseudo-random numbers and processing them based on some trivial logic (like counting bytes with msbit set)
Then I measured it again with sorted data.
I did this again and here are the results:
256MB array of bytes, 20 runs each (taking best result):

Code: Select all

unsorted (pseudo-random): 1250636 usec
sorted: 160017 usec
With other words, well predicted vs random is 7.8x faster on my machine here (some i7).
sedicla
Posts: 178
Joined: Sat Jan 08, 2011 12:51 am
Location: USA
Full name: Alcides Schulz

Re: Search readability

Post by sedicla »

In this video, the guy talks about programming performance related to cache, cores, etc.

https://vimeo.com/97337258
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Search readability

Post by bob »

Joost Buijs wrote:
bob wrote:
Joost Buijs wrote:
bob wrote:
Joost Buijs wrote:
bob wrote: That is THE reason I discontinued this bad practice several years ago.

I have a single Search() procedure, and a simple Quiesce() procedure. I have one special case, QuiesceEvasions() that I will get rid of pretty soon...
By merging everything in the search the source code will shrink by a factor 2 or 3 but the readability will be less, now every function is very well defined and readable.

I'm sure that when I spend a few days on it that I can merge everything in let's say 3 functions, but I feel very reluctant to spend time on a cosmetic change.
Somehow I'm a little bit chess tired, I didn't touch my source code for the last 1,5 years besides fixing one or two bugs.

A couple of weeks ago I decided to revive my old Draughts program, I made a new bitboard move generator for it using magics and behold now it is 4 times faster compared to the old mailbox one.
This gives me a lot more fun because it is new, after 40 years of programming chess (with a break of 10 years) I feel like I have seen it all.
You can still structure things reasonably cleanly. But when you duplicate code, you invite bugs. My evaluation could be a bit faster with separate cases for white and black. But back to that having to change two different pieces of code problem. Having done it both ways, I prefer zero duplication when practical.
Duplicating things is bad practice because it is difficult to maintain, I agree.
The PV and non PV code is different enough and not really a duplicate, but the SMP and non SMP code is almost a 100% duplicate besides some locking and atomic stuff.
When I want to change this I'll have to use templates (like in my evaluation function), otherwise the performance will be hurt by the additional if/then/else constructs.
On the other hand I don't see a reason why I would change this because the old code is more or less closed.
Don't forget templates also increase the cache footprint, which is another performance loss.
I don't know what is worse, the increased cache footprint or the added branches (which will also increase the cache footprint somewhat).
One way or the other, in practice it won't matter much, we are talking about speed differences of at most a few percent (a few Elo in strength).
You should look at what templates do in Eugene's "egtb.cpp" c++ code. It takes longer to compile that code than it does to compile the rest of the crafty source. Crafty source is 40K lines of code with comments, egtb.cpp is about 6600 lines of code. Yet egtb compiles into 2x the code that is in Crafty.

The issue is that a few percent here, a few percent there, and you have programs that vary in NPS by a factor of 4.
mar
Posts: 2559
Joined: Fri Nov 26, 2010 2:00 pm
Location: Czech Republic
Full name: Martin Sedlak

Re: Search readability

Post by mar »

bob wrote:You should look at what templates do in Eugene's "egtb.cpp" c++ code. It takes longer to compile that code than it does to compile the rest of the crafty source. Crafty source is 40K lines of code with comments, egtb.cpp is about 6600 lines of code. Yet egtb compiles into 2x the code that is in Crafty.
Templates are very useful, C++ compiles slower, that's normal as offers more.
How do you do generic containers in C?
Usually a buch of ugly void * with casting or manual copypasting/multiple includes with macro redef magic. Yuck.
The same problem goes for stdlib qsort, it uses a callback function for comparisons. std::sort uses templates and can run much faster (especially on elementary types).

So one can use or abuse templates, that doesn't make them any less useful, they can save a lot of time and blood sweating.

Also recent compilers (especially with lto) automatically merge functions that compile to identical machine code (unless inlined of course).
Joost Buijs
Posts: 1563
Joined: Thu Jul 16, 2009 10:47 am
Location: Almere, The Netherlands

Re: Search readability

Post by Joost Buijs »

bob wrote:
Joost Buijs wrote:
bob wrote:
Joost Buijs wrote:
bob wrote:
Joost Buijs wrote:
bob wrote: That is THE reason I discontinued this bad practice several years ago.

I have a single Search() procedure, and a simple Quiesce() procedure. I have one special case, QuiesceEvasions() that I will get rid of pretty soon...
By merging everything in the search the source code will shrink by a factor 2 or 3 but the readability will be less, now every function is very well defined and readable.

I'm sure that when I spend a few days on it that I can merge everything in let's say 3 functions, but I feel very reluctant to spend time on a cosmetic change.
Somehow I'm a little bit chess tired, I didn't touch my source code for the last 1,5 years besides fixing one or two bugs.

A couple of weeks ago I decided to revive my old Draughts program, I made a new bitboard move generator for it using magics and behold now it is 4 times faster compared to the old mailbox one.
This gives me a lot more fun because it is new, after 40 years of programming chess (with a break of 10 years) I feel like I have seen it all.
You can still structure things reasonably cleanly. But when you duplicate code, you invite bugs. My evaluation could be a bit faster with separate cases for white and black. But back to that having to change two different pieces of code problem. Having done it both ways, I prefer zero duplication when practical.
Duplicating things is bad practice because it is difficult to maintain, I agree.
The PV and non PV code is different enough and not really a duplicate, but the SMP and non SMP code is almost a 100% duplicate besides some locking and atomic stuff.
When I want to change this I'll have to use templates (like in my evaluation function), otherwise the performance will be hurt by the additional if/then/else constructs.
On the other hand I don't see a reason why I would change this because the old code is more or less closed.
Don't forget templates also increase the cache footprint, which is another performance loss.
I don't know what is worse, the increased cache footprint or the added branches (which will also increase the cache footprint somewhat).
One way or the other, in practice it won't matter much, we are talking about speed differences of at most a few percent (a few Elo in strength).
You should look at what templates do in Eugene's "egtb.cpp" c++ code. It takes longer to compile that code than it does to compile the rest of the crafty source. Crafty source is 40K lines of code with comments, egtb.cpp is about 6600 lines of code. Yet egtb compiles into 2x the code that is in Crafty.

The issue is that a few percent here, a few percent there, and you have programs that vary in NPS by a factor of 4.
Templates are not the Holy Grail and you have to use them wisely, it won't hurt to use them on occasion to get better maintainable code.

What there is wrong with Nalimov EGTB I don't know, I never studied that code in detail, I only made some changes to it to make it compatible with modern C++ compilers.
It is true that it compiles horribly slow and that it generates huge binaries, without Nalimov my engine is 200 KB in size and with Nalimov it is 1.4 MB.
To be honest it is very messy code and it is a miracle that it works without many bugs.
The initialization is also very slow, when I use 6 men on SSD it still takes a minute or so for initialization.
Unfortunately there are not many alternatives, syzygy is probably faster, but I like to have distance to mate.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Search readability

Post by bob »

mar wrote:
bob wrote:You should look at what templates do in Eugene's "egtb.cpp" c++ code. It takes longer to compile that code than it does to compile the rest of the crafty source. Crafty source is 40K lines of code with comments, egtb.cpp is about 6600 lines of code. Yet egtb compiles into 2x the code that is in Crafty.
Templates are very useful, C++ compiles slower, that's normal as offers more.
How do you do generic containers in C?
Usually a buch of ugly void * with casting or manual copypasting/multiple includes with macro redef magic. Yuck.
The same problem goes for stdlib qsort, it uses a callback function for comparisons. std::sort uses templates and can run much faster (especially on elementary types).


So one can use or abuse templates, that doesn't make them any less useful, they can save a lot of time and blood sweating.

Also recent compilers (especially with lto) automatically merge functions that compile to identical machine code (unless inlined of course).
Didn't say they were bad. Said they increase the cache footprint which is harmful for performance. I go low-level for speed, in general. Even though there might be many better languages for pure expression of concepts. Code bloat is not something I want in my chess engine, however.