What would you do differently

lucasart · Post by **lucasart** » Thu Jun 19, 2014 6:54 am

bob wrote:
lucasart wrote:
jdart wrote:
if you were restarting from scratch, what would you do differently?
Test, test and test some more. It is easy to introduce bugs and bugs can have a significant effect on playing strength. So verifying correctness is very important. I have unit tests now for some important functions (disabled in production builds) as well as a lot of asserts, support for perft, etc.

If you are careful there is no reason to avoid incremental movegen. But you have to verify it is working as you expect.

I agree with Pawel about lazy eval. I used to have it but it is quite problematic.

--Jon
Unit tests: completely agree. I am progressing very slowly and validating everything one step at a time. I have unit test for perft, bench, see. perft and bench tests validate correctness (that plus git bissect has made wonders in the past). They can also be used as a speed measurement to validate non functional patches supposed to speed up the engine.

Assert: I have them almost wherever possible. Every time a function is entered parameters are checked. Idea is not just to spot bugs but spot them early in the call tree so the consequences can easily be linked to the cause...

Phased movegen: I am ready to believe that there is a speed up, but I am pretty sure it can only be very small.
1. most nodes are qs nodes where a phased moven would generate all pseudo legal captures anyway. no speed up there (quite the contrary due to legal move checking having to be performed).
2. I think the real gain of phasing is when you have a TT move, and can skip movegen completely. This can also be done with a legal movegen, so I may do it later, if and when I care so much about raw speed.
3. In practice movegen is only a small part of my profile in DiscoCheck, so doing psejdo legal and phased movegen is way too much complexity and risk of bugs for a tiny gain IMO.

Lazy eval is one of those stupid ideas that looks like a good idea at first. Been there, done that, got the T-shirt... No lazy eval for me, thanks. Again, making design decisions based on dubious compromise between speed and accuracy is already the first sign that's it's plainly wrong. I do not compromise accuracy and correctness, speed is important but it comes after those.
A note on phased move generation. Here are the places where you save, ranked from highest to lowest.

1. No generation until hash move searched. Many times you bypass move generation completely. But this really doesn't required a "phased" generation, just avoiding any sort of generation until after searching the hash move, and then you need to be able to exclude searching that move gain after the move generation is done.

2. Captures only saves a LOT of time. In the q-search, just generating captures (bit board engines only of course) avoids a lot of unnecessary effort both in ordering the captures as well as in simply generating them.

3. I do killers next, which are guaranteed to be non-captures. I do this before generating non-captures for the same reason as the hash move above.

4. Next I generate the non-captures, add them to the captures I didn't rank high enough to search during the non-losing captures only part of the search. Again you need to eliminate that pesky hash move if it was not a capture, and you need to eliminate the killers that you have already searched.

I think the biggest gain is generating captures only since that eliminates looping down rays for sliding pieces, and looping over empty destinations for all pieces..

Lazy eval works done right. It can be problematic otherwise. But that is true of most any serious optimization one might do in an engine. Anything can be done poorly.

Thank you. That confirms my intuition. I already do 2 in QS. I will probably also do 1, but only if the gain is really measurable. My move ordering is perhaps different from yours though. What worked best in DiscoCheck is good captures by descnding SEE before quiet moves (in search) the bad captures by descending SEE. For QS I order captured by MVV/LVA which is slightly better than SEE.

lucasart · Post by **lucasart** » Thu Jun 19, 2014 7:37 am

Bloodbane wrote:I've got a few macros of that kind there, but I don't like how there is bitshifting going under the hood all the time. The question is does removing that compensate for the increased memory usage. I haven't tested the idea yet so I don't know how it will go, but I don't think it's doomed to fail. I imagine you've tested the idea at some point?

I solve this problem as follows:
1. move type has strict minimum info, which is fsq tsq and promotion piece (no flag).
2. I do the compression/decompression at once. When a function is given a compressed move and needs to use the components of it, I simply unpack into 3 ints and work with those throughout the function. And vice-versa (function calculates 3 ints and packs at the end to return a move).

Despite the fact that I am writing C code (no C++), I never use macros or even inline. I use extern functions implemented in source files (never in header files to keep those clean). I let gcc's link time optimization do the job of deciding what to inline at the code generation step which is done by at link time (ie. across all compilation units which are bytecode at this stage). Advantage of functions vs macros are obvious (debugging, type safety, avoid subtle syntactic traps when you forget to parenthethize everything in your macros or when you forget the famous "do {...} while (0);" hack...)

I can't believe the C-guy is telling the C++ guy not to use macros. Shouldn't it be the other way around ?

Desperado · Post by **Desperado** » Thu Jun 19, 2014 9:02 am

Hi.

I was rewriting Nemo a vast number of times and currently i am doing so again. The impressions which are burned into my brain strongest have been:

* Hiding implementations
* encapsulation of modules / classes ( as few as possible interfaces )

Argueing that this concepts will slowdown the engine in some way is simply nonsense. It will help a lot do major changes everywhere. You can puzzle your engine as you like.
Rewriting ( not only once in 4 years ) your engine may become a major process to improve it.

You will see after a while that there will be code you do not touch a long time for good reasons, which shows that is stable, hard to improve in any way.

You may realize what your preferences in your philosophie really are and you will come closer to them step by step.

Nemo was written in C, now i am using more or less a C Wrapper

.
I mean using C++ basic features. Encapsulation works better in OO than in functional programming, and that for example is very important to my philosophy.
So, now i am using some kind of pseudo C++. More detailed that means a lot of function collections,
data collection encapsulated in classes where the interfaces a absolute clear, no doubt where data and functionality belongs to. ( All possible with moduls but not that strict ).

These are just my 2 cents.

Btw, because rewriting my engine a countless times, i am already aware of that there will be always the point that you will wonder, what do i different than before, and why is the engine worse at some point than it was before.

I guess you will begin to "work" again if you reach the level 100-150 elo points what you already reached with your current engine.

No way out, trust me !

Good luck !

lucasart · Post by **lucasart** » Thu Jun 19, 2014 9:19 am

cdani wrote:Tests with long time controls from the start. Even if the change gives very bad results at short time controls, one must try on long ones. Sometimes there is a surprise waiting there..

This is always the bottleneck. Truth is that, even using optimal early stopping (sequential wald test), detecting small elo changes with good probability while rejecting regressions with good probability, is extremely costly.

In DuscoCheck I was using:
* 4 elo resolution (ie. sequential wald test with elo1=elo0+4 and alpha=beta=5%).
* STC pre-filtering (with bounds shifted down by 1 elo to account for LTC surprise).
* STC=1+0.01 and LTC=2+0.02. I know it's very fast l, but the only acceptable.compromise with the time and CPU resources I have.
* Occasionaly I did an LTC of 3+0.03 instead for king safety or passed pawns related stuff, or search patches which may behave differently at higher depth. But I really cannot push it further without compromising the elo resolution which is far worse.
* Every now and then I ran a regression test of 15k games against the previous released version in 6+0.05 (50x faster then 5'+3" standard blitz). That way I have safe points to revert to when I end up in regression.

tpetzke · Post by **tpetzke** » Thu Jun 19, 2014 10:02 am

Code: Select all

* Hiding implementations
* encapsulation of modules / classes ( as few as possible interfaces )

I do exactly the same and so far it saved me from rewriting the engine. I was always able to just replace the module or part of the module I did not like. And sometimes the changes affected hundreds of lines of code (e.g. wenn I completely changed my move generation to use templates instead of if (stm == WHITE).

Rein Halbersma · Post by **Rein Halbersma** » Thu Jun 19, 2014 12:26 pm

lucasart wrote:I'm in the process of rewriting DiscoCheck. Of course, a lot of quasi copy paste makes it much easier than actually writing an engine from scratch. But I'm resisting the temptation of doing too much copy paste, because the goal of the exercise is to correct many of the wrong design choices I came to regret in DiscoCheck. Often the right design is not obvious at first, and comes with experience and how it interacts with stuff done much later in the engine.

Generally speaking the philosophy is to choose robustness, simplicity and flexibility over raw speed. For instance I will not use pseudo legal and phased move generator. That is just asking for bugs and problems down the road, for a speed gain that is not sufficient to justify it IMO.

My question, to all those who've written chess engines is: if you were restarting from scratch, what would you do differently?

The two hardest problems to me are:
1) optimal (i.e. cache/memory friendly) data layout. Not just the Position class/struct, but also data for move generation, search stack etc. OO encapsulation gives clean interfaces very easily but sub-optimal performance (even without virtual and new, because of bad cache layout). Data oriented design seems better but harder to design and introducing globals.
2) managing the unmaintainable complexity of 800+ lines search functions. I have been experimenting with parameterized search enhancements where you have a 20 line search skeleton that calls around 7 groups of search enhancements from configurable hook functions (e.g. between TT lookup and move generation, between move generation and move sorting etc.) The search enhancements are encapsulated in a class, are independent of each other, and propagate a struct of a boolean and a score between the various steps in the skeleton (the boolean indicates a cutoff, of course).

Language choice seems largely a matter of personal expertise. However, I am a firm believer in using library code wherever possible (which for C++ means exploiting STL/Boost containers and algorithms, instead of trying to reinvent those wheels).

mar · Post by **mar** » Thu Jun 19, 2014 1:31 pm

Rein Halbersma wrote:However, I am a firm believer in using library code wherever possible (which for C++ means exploiting STL/Boost containers and algorithms, instead of trying to reinvent those wheels).

Sorry, but I disagree

For example, my simple HashMap beats unordered_map easily (using the same hash function),
the implementation is much simpler (=much less code) than in STL,
not to mention that STL suffers from some design issues and various STL implementations also vary in implementation quality:
I have seen one implementation being 2x slower [a particular container] than another on different platform!
(and no, it had nothing to do with compiler/optimizations, in fact the slower implementation used a better compiler)
Seriously, "no need to reinvent the wheel" already sounds like a religious mantra and not-so-convincing excuse to me

So no, thanks - my wheel fits my design and style, no hidden surprises, i have full control and full understanding of it.
I simply add functionality as needed (in fact, I will most likely drop red-black trees as they are useless to me as a container),
instead of depending on 90Mb worth of boost code only to use a tiny fraction of it

Rein Halbersma · Post by **Rein Halbersma** » Thu Jun 19, 2014 5:20 pm

mar wrote:
Rein Halbersma wrote:However, I am a firm believer in using library code wherever possible (which for C++ means exploiting STL/Boost containers and algorithms, instead of trying to reinvent those wheels).
Sorry, but I disagree For example, my simple HashMap beats unordered_map easily (using the same hash function),
the implementation is much simpler (=much less code) than in STL,
not to mention that STL suffers from some design issues and various STL implementations also vary in implementation quality:
I have seen one implementation being 2x slower [a particular container] than another on different platform!
(and no, it had nothing to do with compiler/optimizations, in fact the slower implementation used a better compiler)
Seriously, "no need to reinvent the wheel" already sounds like a religious mantra and not-so-convincing excuse to me
So no, thanks - my wheel fits my design and style, no hidden surprises, i have full control and full understanding of it.
I simply add functionality as needed (in fact, I will most likely drop red-black trees as they are useless to me as a container),
instead of depending on 90Mb worth of boost code only to use a tiny fraction of it

Ah, but what every chess programmer calls a hash table is not the same thing as std::unordered_map or boost::map. The latter do dynamic allocation of new nodes and expand automatically. What chess programs use is more appropriately known as a bounded cache (N-way associative for N-element bucket replacement)!

Of course any reasonable handcrafted solution will beat STL/Boost (which also provide node-based iterator that comewith extra space overhead that is rarely needed in a game engine). I also wrote my own, but I use a std::vector as the storage container rather than allocating my own memory.

So more accurately: I would prefer library solutions over equivalent handwritten code (std::unordered_map not being equivalent to what is required for a TT. I wouldn't dare to claim 2X improvement for such building blocks

And that doesn't even scratch the surface of what modern high-quality libraries offer in terms of exception safety, move semantics or concurrency, etc. Very hard to get right yourself.

jdart · Post by **jdart** » Thu Jun 19, 2014 6:26 pm

If you only used the the static eval for just generating a cutoff in the q-search then a correct implementation of it would be non-problematic I think. But eval is also is commonly used for pruning decisions at higher depths, as well as setting various thresholds (e.g. I vary null depth based on eval). Once you have an inexact score for the eval it is very hard to make these other places work properly. (You could possibly do a more exact eval only when you need it but that is a more complex implementation).

--Jon

mar · Post by **mar** » Thu Jun 19, 2014 6:28 pm

Rein Halbersma wrote:Ah, but what every chess programmer calls a hash table is not the same thing as std::unordered_map or boost::map. The latter do dynamic allocation of new nodes and expand automatically. What chess programs use is more appropriately known as a bounded cache (N-way associative for N-element bucket replacement)!

Really?

C'mon Rein, you completely twisted what I wrote! We both know what we were talking about, it was not about transposition table.

So more accurately: I would prefer library solutions over equivalent handwritten code (std::unordered_map not being equivalent to what is required for a TT. I wouldn't dare to claim 2X improvement for such building blocks And that doesn't even scratch the surface of what modern high-quality libraries offer in terms of exception safety, move semantics or concurrency, etc. Very hard to get right yourself.

Obviously we have a different measure of quality, of course if you are happy with stl/boost, good for you.

Actually on latest Ubuntu (g++) on my i7, the numbers are as follows:
std::unordered_map 6.57s
handwritten 3.00s

That's 2x enough for me

But performance wasn't even the point, just a bonus.

If you don't believe me see for yourself (my code [simplified - omitting Erase and using std::vector instead of Array] is myhashmap.h, or the third output, second is stl unordered_map, first is hashmap of a programmer who originally wrote the benchmark, it beats mine on msc/64-bit)www.crabaware.com/hashmap/hashmap_bench2.zip
msc (dinkumware) is 4.0s vs 3.6s on the same machine.

Anyway, I already polluted Lucas' thread more than I intended to... so I stop here

What would you do differently

Re: What would you do differently

Re: What would you do differently

Re: What would you do differently

Re: What would you do differently

Re: What would you do differently

Re: What would you do differently

Re: What would you do differently

Re: What would you do differently

Re: lazy eval

Re: What would you do differently