What would you do differently

lucasart · Post by **lucasart** » Wed Jun 18, 2014 7:18 am

I'm in the process of rewriting DiscoCheck. Of course, a lot of quasi copy paste makes it much easier than actually writing an engine from scratch. But I'm resisting the temptation of doing too much copy paste, because the goal of the exercise is to correct many of the wrong design choices I came to regret in DiscoCheck. Often the right design is not obvious at first, and comes with experience and how it interacts with stuff done much later in the engine.

Generally speaking the philosophy is to choose robustness, simplicity and flexibility over raw speed. For instance I will not use pseudo legal and phased move generator. That is just asking for bugs and problems down the road, for a speed gain that is not sufficient to justify it IMO.

My question, to all those who've written chess engines is: if you were restarting from scratch, what would you do differently?

mar · Post by **mar** » Wed Jun 18, 2014 10:32 am

I'd say that in my case the rewrite paid off.

There is a couple of things that I'd consider doing differently:

- too complicated do move (or make/unmake if you wish), castling and FRC-related hacks, OTOH I incrementally update material key and psq tables
- complicated isLegal test on hashtable/killers (after a couple of bugs in this routine I simply wrote brute-force validation, generating all possibile move combinations)
- maybe too many flags/move that have to be verified (complicates legality testing)
- full legal move generator instead of phased? but phased has one nice merit, you can write if ( phase >= quiet ) instead of if ( move != hashmove && move != killer1 && move != killer2 && move != good_capture ) etc.
OTOH I can't detect single-reply that way
- use less templated functions with constant parameters (inflates binary, color-blind code seems the way to go - IIRC this is what Steve does in Maverick).
also the constant-argument template approach is problematic when I want to use specific versions based variables where I had to write wrappers
- using simple insertion sort instead of std::sort

What I'm happy with:
- that I used pre-11 C++ (avoiding buggy std libs and clever constructs)

What I did to improve robustness:
- asserts (obviously)
- compile across different compilers (the more the better)
- use valgrind
- use static analysis tools that can detect trivial problems like typos etc. (only drawback is there can be false positives)

OT (random rant): I'm moving completely away from using stl stuff and I've to say I'm happy so far. more freedom, fits my needs (that's THE reason for reinventing a wheel), seems more logical to me (obviously):
consider stupid std::priority_queue for example which by default pulls max instead of min value! I wonder who came up with that

(those who ever wrote Huffman encoding or Dijkstra/A* know what I'm talking about

lucasart · Post by **lucasart** » Wed Jun 18, 2014 11:50 am

Thanks Martin. I played it safe and made my life a lot easier by
* minimizing redundancy in the data model, move is only fsq tsq and promotion (no flag), board does not maintain redundant info like king pos (that can be deduced with a simple lsb). no redundancy means no risk of screwing up and having corrupted objects with parts inconsistent with eacb other.
* legal movegen. not much harder to write than pseudo legal, and makes life SO much easier when you do the movesort stuff afterwards. allows single reply extension which was a clear gain in discocheck and is not possible with a phased mogeven.
* no undo. I use a copy of the board struct to play a move. this makes a whole kind of bug not possible, you know, when your perft is screwed in an apparently non deterministic way due to path dependant bugs that occurs after certain play/undo sequences...
* no path dependant state. all is stored in the board struct and eah has a pointer to the parent board, required to traverse the list of boards backward for repetition detection. very nice and smp compliant design I found in Stockfish.
* chess960 is handled cleanly, and chess is really viewed as a specific case of chess960, rather than post fitting hacks to do chess960 clumsily. my new engine could play chess960 before it could play chess.
* STL: no thanks. Im glad you too have realized that STL is best avoided. In fact it will be written in pure C, which has many advantages:
1. i dont waste my time thinking about how many ways I could solve a trivial problem by introducing some "nice" OO stuff and STL, only to regret it after and rewrite in plain C later on...
2. much better portability and smaller executable (because with c++ you always have to statically link hundreds of thousands of lines of stdlib with your code if you want to port)
3. everything must be KISS compliant and the code must be as context free as possible. c++ is the exact opposite of that... Writing certain things in C is tedious, and I really wish there was an alternative to C, but quite frankly I dont consider C++ as an acceptable alternative. Just want to say Im not a die hard C fan, but I see it as a the least of two evil (other languages are not an option for their lack of performance portability and flexibility). Avoiding C++ is avoiding the temptation to use STL and rely on mystical syntactic construct that will backfire eventually. KISS is my motto, complexity my enemy, so the choice of C is the only acceptable one, however boring and old fashion it may be.

zullil · Post by **zullil** » Wed Jun 18, 2014 12:28 pm

lucasart wrote:Writing certain things in C is tedious, and I really wish there was an alternative to C, but quite frankly I dont consider C++ as an acceptable alternative.

Have you looked at http://golang.org/ ? Don Dailey expressed high regard for this new language, though he felt that it might not be best for world-class chess engines, due to possible speed issues.

http://talkchess.com/forum/viewtopic.ph ... 67&t=49679

mar · Post by **mar** » Wed Jun 18, 2014 1:55 pm

You're welcome. What you do is perfectly reasonable and logical.
What I wrote about phased movegen (comparing phase) is not exactly true because you can do the same with full legal movegen if you store sort-weight along with the move itself, so you can do stuff like if ( move < SORT_KILLERn ) ...
As fot the rest, yes - probably storing king in separate bitboard is perfectly reasonable as well, eliminating redundancy.
I use separate king position (was tempting because I wanted to save space and avoid scanning [actually MSB-scanning should be fast on both x86 and ARM, it has CLZ but no CTZ]).
C vs C++, what I like about C is that it forces you to think low level (so you actually have to think about what you write in terms of performance and memory usage) and that it compiles super-fast compared to C++,
it has well defined name mangling so you can even link libraries compiled using different compilers (assuming you're on the same platform and the library format is the same

OTOH I find it kind of hard to build larger projects in C, so I prefer C++, but I try to avoid as much cleverness as possible.
(for example I have seen way too much obfuscation via type lists recently - imagine symbols with 1k+ characters, imagine how the compiler chokes parsing that mess!).
The KISS approach is IMHO absolutely necessary if one wants to write good, maintainable programs.
It's not easy (though good programs usually seem as if it was really easy to write them), but it pays off in the long run, iterating/compressing as necessary.
So in general, it really doesn't matter what language you choose, but the quality of the code you write does (I know as I've written (and seen) a lot of bad code in the past
and learned the hard way - but as long as I'm able to tell the difference, I can do better, one never stops learning...)

PK · Post by PK » Wed Jun 18, 2014 2:21 pm

* I would definately never introduce lazy eval again. Unfortunately, despite several attempts, I cannot get rid of it (Elo loss) and the best thing I can do is slowly increasing the margin, consumming most of speed gains that way
* I'd start with hashing material scores (or using material table) from day one. It might be an overkill for the first year or so, but at some stage more refined material evaluation is a must.
* Safety-first search. I have added so many rules not to prune / not to reduce certain moves, and most of them gained some Elo, that I might as well make it a habitual decision rather than exception
* Engine fully parametrized from an external file. At some stage most of testing consists of number tweaking, and it is safer to do it without changing actual code.
* Self-contained board+game history class with all the board manipulation routines would go a long way towards SMP search

jdart · Post by **jdart** » Wed Jun 18, 2014 2:59 pm

if you were restarting from scratch, what would you do differently?

Test, test and test some more. It is easy to introduce bugs and bugs can have a significant effect on playing strength. So verifying correctness is very important. I have unit tests now for some important functions (disabled in production builds) as well as a lot of asserts, support for perft, etc.

If you are careful there is no reason to avoid incremental movegen. But you have to verify it is working as you expect.

I agree with Pawel about lazy eval. I used to have it but it is quite problematic.

--Jon

lucasart · Post by **lucasart** » Wed Jun 18, 2014 4:08 pm

jdart wrote:
if you were restarting from scratch, what would you do differently?
Test, test and test some more. It is easy to introduce bugs and bugs can have a significant effect on playing strength. So verifying correctness is very important. I have unit tests now for some important functions (disabled in production builds) as well as a lot of asserts, support for perft, etc.

If you are careful there is no reason to avoid incremental movegen. But you have to verify it is working as you expect.

I agree with Pawel about lazy eval. I used to have it but it is quite problematic.

--Jon

Unit tests: completely agree. I am progressing very slowly and validating everything one step at a time. I have unit test for perft, bench, see. perft and bench tests validate correctness (that plus git bissect has made wonders in the past). They can also be used as a speed measurement to validate non functional patches supposed to speed up the engine.

Assert: I have them almost wherever possible. Every time a function is entered parameters are checked. Idea is not just to spot bugs but spot them early in the call tree so the consequences can easily be linked to the cause...

Phased movegen: I am ready to believe that there is a speed up, but I am pretty sure it can only be very small.
1. most nodes are qs nodes where a phased moven would generate all pseudo legal captures anyway. no speed up there (quite the contrary due to legal move checking having to be performed).
2. I think the real gain of phasing is when you have a TT move, and can skip movegen completely. This can also be done with a legal movegen, so I may do it later, if and when I care so much about raw speed.
3. In practice movegen is only a small part of my profile in DiscoCheck, so doing psejdo legal and phased movegen is way too much complexity and risk of bugs for a tiny gain IMO.

Lazy eval is one of those stupid ideas that looks like a good idea at first. Been there, done that, got the T-shirt... No lazy eval for me, thanks. Again, making design decisions based on dubious compromise between speed and accuracy is already the first sign that's it's plainly wrong. I do not compromise accuracy and correctness, speed is important but it comes after those.

michiguel · Post by **michiguel** » Wed Jun 18, 2014 7:42 pm

PK wrote:* I would definately never introduce lazy eval again. Unfortunately, despite several attempts, I cannot get rid of it (Elo loss) and the best thing I can do is slowly increasing the margin, consumming most of speed gains that way
* I'd start with hashing material scores (or using material table) from day one. It might be an overkill for the first year or so, but at some stage more refined material evaluation is a must.
* Safety-first search. I have added so many rules not to prune / not to reduce certain moves, and most of them gained some Elo, that I might as well make it a habitual decision rather than exception
* Engine fully parametrized from an external file. At some stage most of testing consists of number tweaking, and it is safer to do it without changing actual code.
* Self-contained board+game history class with all the board manipulation routines would go a long way towards SMP search

I have done all those changes over the years incrementally, but in C (no classes). If I have to do a rewrite, I am not sure I will do many major philosophical changes, but I am sure I will do many small ones. What I would certainly do is to integrate the terminal commands, the ini file, and the command line switches into one defined interface. They grew separately and I realize now that should be integrated. I would have a separate files for bookbuilding, tablebase generation, and things like that.

One thing I would consider is to implement my own protocol an drop both UCI and Winboard. I would write an adapter to communicate with the external world. I am tired of the limitations and ambiguity.

Miguel

Bloodbane · Post by **Bloodbane** » Wed Jun 18, 2014 7:57 pm

I recently finished one rewrite of my engine, and since I am still lacking in software development experience it seems I will need to do another since there are still some parts which are just horribly done/designed. My advice is based for C++, though 3/4 is for all languages.

+Make it as readable as possible. Heavy usage of enum, namespace and comments. Before I rewrote my engine I stopped developing for a month and for a while after I had trouble remembering what some obscure thing was doing. After the rewrite things are better but nearly as good as I'd like it to be.

+Simplicity is important. I've noticed that simple data structures are better for optimization down the line, i.e. complex data structures might be good now but after changes not so much. Also, see first point.

+Helper functions should be used. Before I didn't really appreciate something like the code below since I thought they were just waste of space, but they are very good for debugging(if you slap an assert inside one that is) and they improve readability, which brings us back to point 1.

Code: Select all

int matedIn&#40;int ply&#41;
    return -mateScore + ply;

+Use the features of C++11 as much as possible. C++11 has changed C++ completely and many many things which were hard to use or had to be done yourself before now have library solutions. For example, <random> gets rid of the need for your own RNG, <regex> makes parsing UCI-input easy etc.

Of course, using C++ while saying readability is important can seem like a contradiction. Everyone knows that some of the code is pretty unreadable. Take for example this thing I wrote today. Can you figure out what it does before reading below?

Code: Select all

auto seed = std&#58;&#58;chrono&#58;&#58;duration_cast<std&#58;&#58;chrono&#58;&#58;nanoseconds>&#40;std&#58;&#58;chrono&#58;&#58;high_resolution_clock&#58;&#58;now&#40;).time_since_epoch&#40;)).count&#40;);

That thing generates a random seed by taking the amount of nanoseconds since epoch. Nothing more, but there is so much stuff in the way. Still, even when considering the ugly cases(check out "variadic templates are funadic"), I think that C++ has enough good features to compensate for that. Of course, this opinion is subject to change and maybe ten years down the line I'll be hating C++, but for now I love it.

What would you do differently

What would you do differently

Re: What would you do differently

Re: What would you do differently

Re: What would you do differently

Re: What would you do differently

Re: What would you do differently

Re: What would you do differently

Re: What would you do differently

Re: What would you do differently

Re: What would you do differently