What would you do differently

lucasart · Post by **lucasart** » Wed Jun 18, 2014 8:03 pm

Bloodbane wrote:I recently finished one rewrite of my engine, and since I am still lacking in software development experience it seems I will need to do another since there are still some parts which are just horribly done/designed. My advice is based for C++, though 3/4 is for all languages.

+Make it as readable as possible. Heavy usage of enum, namespace and comments. Before I rewrote my engine I stopped developing for a month and for a while after I had trouble remembering what some obscure thing was doing. After the rewrite things are better but nearly as good as I'd like it to be.

+Simplicity is important. I've noticed that simple data structures are better for optimization down the line, i.e. complex data structures might be good now but after changes not so much. Also, see first point.

+Helper functions should be used. Before I didn't really appreciate something like the code below since I thought they were just waste of space, but they are very good for debugging(if you slap an assert inside one that is) and they improve readability, which brings us back to point 1.
Code: Select all
int matedIn&#40;int ply&#41;
    return -mateScore + ply; 
+Use the features of C++11 as much as possible. C++11 has changed C++ completely and many many things which were hard to use or had to be done yourself before now have library solutions. For example, <random> gets rid of the need for your own RNG, <regex> makes parsing UCI-input easy etc.

Of course, using C++ while saying readability is important can seem like a contradiction. Everyone knows that some of the code is pretty unreadable. Take for example this thing I wrote today. Can you figure out what it does before reading below?
Code: Select all
auto seed = std&#58;&#58;chrono&#58;&#58;duration_cast<std&#58;&#58;chrono&#58;&#58;nanoseconds>&#40;std&#58;&#58;chrono&#58;&#58;high_resolution_clock&#58;&#58;now&#40;).time_since_epoch&#40;)).count&#40;);
That thing generates a random seed by taking the amount of nanoseconds since epoch. Nothing more, but there is so much stuff in the way. Still, even when considering the ugly cases(check out "variadic templates are funadic"), I think that C++ has enough good features to compensate for that. Of course, this opinion is subject to change and maybe ten years down the line I'll be hating C++, but for now I love it.

Thanks, but I am interested in *design* choices, not programming per se (I know how to program, how to write robust and readable code etc.)

PS: why do you need regex for UCI parsing? it's completely trivial with string and stringstream, all from C++98.

Ron Murawski · Post by **Ron Murawski** » Wed Jun 18, 2014 8:27 pm

lucasart wrote: * In fact it will be written in pure C, which has many advantages:...
...the code must be as context free as possible. c++ is the exact opposite of that... Writing certain things in C is tedious, and I really wish there was an alternative to C, but quite frankly I dont consider C++ as an acceptable alternative.

Hi Lucas,

I've looked at some C alternatives over the past several years. The best I've found is Vala, which transpiles to C code. Although the quality of the transpiled code is quite good, the development tools are not mature. There are problems finding the correspondence between the code you write and the generated C code, ie: 'error in line 1257' refers to the auto-generated C code rather than your own Vala code. Windows support for Vala is weak -- it is best to develop on Linux. If you avoid OO coding, you will lose about 10% speed compared to optimized C code. Vala language syntax resembles C# and it is pleasant to work with.
https://wiki.gnome.org/Projects/Vala

Another possibility is the Nimrod language. It too transpiles into C code (or C++, or Objective-C). I haven't worked with Nimrod, but I know it is quite fast.
http://nimrod-lang.org/

Rust is fast too.
http://www.rust-lang.org/

Ron

bob · Post by **bob** » Wed Jun 18, 2014 9:01 pm

lucasart wrote:
jdart wrote:
if you were restarting from scratch, what would you do differently?
Test, test and test some more. It is easy to introduce bugs and bugs can have a significant effect on playing strength. So verifying correctness is very important. I have unit tests now for some important functions (disabled in production builds) as well as a lot of asserts, support for perft, etc.

If you are careful there is no reason to avoid incremental movegen. But you have to verify it is working as you expect.

I agree with Pawel about lazy eval. I used to have it but it is quite problematic.

--Jon
Unit tests: completely agree. I am progressing very slowly and validating everything one step at a time. I have unit test for perft, bench, see. perft and bench tests validate correctness (that plus git bissect has made wonders in the past). They can also be used as a speed measurement to validate non functional patches supposed to speed up the engine.

Assert: I have them almost wherever possible. Every time a function is entered parameters are checked. Idea is not just to spot bugs but spot them early in the call tree so the consequences can easily be linked to the cause...

Phased movegen: I am ready to believe that there is a speed up, but I am pretty sure it can only be very small.
1. most nodes are qs nodes where a phased moven would generate all pseudo legal captures anyway. no speed up there (quite the contrary due to legal move checking having to be performed).
2. I think the real gain of phasing is when you have a TT move, and can skip movegen completely. This can also be done with a legal movegen, so I may do it later, if and when I care so much about raw speed.
3. In practice movegen is only a small part of my profile in DiscoCheck, so doing psejdo legal and phased movegen is way too much complexity and risk of bugs for a tiny gain IMO.

Lazy eval is one of those stupid ideas that looks like a good idea at first. Been there, done that, got the T-shirt... No lazy eval for me, thanks. Again, making design decisions based on dubious compromise between speed and accuracy is already the first sign that's it's plainly wrong. I do not compromise accuracy and correctness, speed is important but it comes after those.

A note on phased move generation. Here are the places where you save, ranked from highest to lowest.

1. No generation until hash move searched. Many times you bypass move generation completely. But this really doesn't required a "phased" generation, just avoiding any sort of generation until after searching the hash move, and then you need to be able to exclude searching that move gain after the move generation is done.

2. Captures only saves a LOT of time. In the q-search, just generating captures (bit board engines only of course) avoids a lot of unnecessary effort both in ordering the captures as well as in simply generating them.

3. I do killers next, which are guaranteed to be non-captures. I do this before generating non-captures for the same reason as the hash move above.

4. Next I generate the non-captures, add them to the captures I didn't rank high enough to search during the non-losing captures only part of the search. Again you need to eliminate that pesky hash move if it was not a capture, and you need to eliminate the killers that you have already searched.

I think the biggest gain is generating captures only since that eliminates looping down rays for sliding pieces, and looping over empty destinations for all pieces..

Lazy eval works done right. It can be problematic otherwise. But that is true of most any serious optimization one might do in an engine. Anything can be done poorly.

Bloodbane · Post by **Bloodbane** » Wed Jun 18, 2014 9:17 pm

Ah sorry, I missed the point a bit thanks to those things about unit tests and such. How about these, they should be more on topic:

+Move architecture should be made simpler. Currently I pack all information on the move into 16 bits, but I realized I actually don't have to do this. The only reason I need to do this is because the entry doesn't fit into the TT easily. So instead of the unnecessary packing I can just use separate ints for from, to etc. and pack everything into 16 bits only when I need to do it.

+Getting the PV from the hash table should be avoided, or another method which deals with its flaws should be used. When I implemented I thought it was brilliant but it has some big downsides, not being able to display the correct PV being one of the smallest. For example unless you guard your PV entries you can accidentally overwrite your root move entry leaving you with no move which crashes the engine. I had fun figuring that one out.

On regexes: you don't _need_ them but they have the attractive property of making sure the input precisely in the correct form if you want to sweat a bit. UCI-protocol wasn't the best example, it is pretty easy as you said.

PK · Post by PK » Wed Jun 18, 2014 9:18 pm

right now lazy eval vs no lazy eval is 54%-46% for me. However version with lazy eval gets checkmated more often (at least at ultra-short time controls) and gets slaughtered by standard minor piece positional sacrifices (Nd5 or Nxe5 in the Sicilian) from time to time, failing to notice the problem until a couple of moves afterwards. I think it should be stopped somehow before Rodent reaches 2800.[/list]

bob · Post by **bob** » Wed Jun 18, 2014 9:23 pm

Bloodbane wrote:Ah sorry, I missed the point a bit thanks to those things about unit tests and such. How about these, they should be more on topic:

+Move architecture should be made simpler. Currently I pack all information on the move into 16 bits, but I realized I actually don't have to do this. The only reason I need to do this is because the entry doesn't fit into the TT easily. So instead of the unnecessary packing I can just use separate ints for from, to etc. and pack everything into 16 bits only when I need to do it.

+Getting the PV from the hash table should be avoided, or another method which deals with its flaws should be used. When I implemented I thought it was brilliant but it has some big downsides, not being able to display the correct PV being one of the smallest. For example unless you guard your PV entries you can accidentally overwrite your root move entry leaving you with no move which crashes the engine. I had fun figuring that one out.

You are on the edge of a precipice. Simple is NOT always better. Take your move generation idea. In a position where you have a move list of 60 moves, do you want to pass over 60 ints, or N*60 ints where N is the number of different arrays you use (from, to, piece, captured, promote, etc). N=6 places 6x the pressure on cache since you have 6x as much memory per move, it places 6x the pressure on the memory channel for the same reason. The PC is not exactly brimming over with excess memory bandwidth, cache and then there is the latency issue since now a single move access will almost certainly result in accessing N different cache lines, with N latency delays on a cache miss.

If you like the "look" of separate arrays, you can always do as I do in Crafty and define macros like From(move), To(move), Piece(move) and such to extract the individual pieces "invisibly" so that you get better readability but not slower accesses.

Bloodbane · Post by **Bloodbane** » Wed Jun 18, 2014 9:31 pm

I've got a few macros of that kind there, but I don't like how there is bitshifting going under the hood all the time. The question is does removing that compensate for the increased memory usage. I haven't tested the idea yet so I don't know how it will go, but I don't think it's doomed to fail. I imagine you've tested the idea at some point?

bob · Post by **bob** » Wed Jun 18, 2014 10:22 pm

Bloodbane wrote:I've got a few macros of that kind there, but I don't like how there is bitshifting going under the hood all the time. The question is does removing that compensate for the increased memory usage. I haven't tested the idea yet so I don't know how it will go, but I don't think it's doomed to fail. I imagine you've tested the idea at some point?

zillions of times.

1. a shift instruction is typically a 1/2 clock cycle operation. Ditto for an AND (to extract some of the bits you will want to SHIFT them over, then AND to scrub.

2. When you have a cache miss, you fetch one block rather than N. BIG win each time a miss occurs, since a miss requires fetching 64 bytes for one block. And here (with separate from, to, etc) you end up fetching non-consecutive blocks of memory, further adding to your misery.

I suspect you will find the same thing I did, because I originally separated from, to and such myself. I would always suggest testing yourself, but in this case, I would be beyond surprised if packing is not significantly faster.

bob · Post by **bob** » Wed Jun 18, 2014 10:25 pm

PK wrote:right now lazy eval vs no lazy eval is 54%-46% for me. However version with lazy eval gets checkmated more often (at least at ultra-short time controls) and gets slaughtered by standard minor piece positional sacrifices (Nd5 or Nxe5 in the Sicilian) from time to time, failing to notice the problem until a couple of moves afterwards. I think it should be stopped somehow before Rodent reaches 2800.[/list]

remember that there are many implementations. Ferret had the best I have seen. Bruce kept an error term for each piece, which gave the largest/smallest positional score that piece alone could produce. If you sum these over the pieces left on the board, you can have a lazy eval that produces absolutely no errors at all, although it is quite pessimistic in nature since pieces rarely produce their largest scores.

The speed gain as an eval matures is significant, however, because adding terms is quite common, and each term adds computational costs that lazy eval can sometimes avoid.

The thing I hate to see anyone do is throw out an idea just because they couldn't get it to work easily. Same goes for storing mate scores in hash table or not, allowing exact hash matches on PV nodes or not, etc. I hate to give up anything just because it seems hard or tricky to solve.

cdani · Post by **cdani** » Wed Jun 18, 2014 10:36 pm

Hi.
With my limited experience, and in no particular order:
• King safety is always more important than what I think, it’s endless… In particular I have also done one test with Discocheck and I’m pretty sure it lacks at minimum some penalization for open king.
• Find a way to do more dynamical LMR. I’m sure there is many elo to win from this.
• Many times qsearch it’s not necessary. Many times pruning it’s a lot better.
• More evaluation concepts, some for specific types of positions, even if it’s difficult to get equilibrium.
• Tests with long time controls from the start. Even if the change gives very bad results at short time controls, one must try on long ones. Sometimes there is a surprise waiting there.

I don’t add other stuff that current Discocheck already has, like order

.

I will stick with generation of all legal moves (unless there is a hash move or in qsearch) and full attack generation (always). It’s simply all a lot easier, and can be used for many types of things as evaluations.

What would you do differently

Re: What would you do differently

Re: What would you do differently

Re: What would you do differently

Re: What would you do differently

Re: What would you do differently

Re: What would you do differently

Re: What would you do differently

Re: What would you do differently

Re: What would you do differently

Re: What would you do differently