UCI Hash Usage Rules

bob · Post by **bob** » Sun Jul 15, 2018 5:44 am

Interesting issue.

Since I doubt I will ever do a UCI version of Crafty, I really don't have a dog in this hunt. But I do tend to agree that there are three parts to a chess program that are of interest.

(1) basic program + basic blocks of memory required to make the program functional. IE bit board stuff (magic, etc), book buffers, etc. The program can play legitimate chess using just this amount of memory.

(2) Optional things that are NOT in the category of transposition tables and such. IE egtb buffering, small caches for eval, or pawn scoring or whatever. Things that sometimes bigger is better, but HUGE is not really needed (except for EGTB stuff sometimes).

(3) optional things that can be arbitrarily big. Normal transposition/refutation table. Other types of hash tables (pawns, king safety, you-name-it). And perhaps egtb cache here (as opposed to a static-sized egtb index table above.

Would seem to me that if UCI only provides a single "control" that it really should be constrained to include (2) and (3) above. Else you have programs that are poorly behaved and which can caused unexpected paging activity that a naive user might not notice at all. I implemented a "smart hash" option in Crafty years ago, which was given a total memory constraint, and it used its own "judgement" as to what to set the usual hash, phash and such to to stay within that limit. I even made the adjustment and size take into consideration the speed of the program and the time control being used so that it would use less memory for fast games where large hash tables don't help much. Only down-side to this kind of stuff is the "automatic" part of the code will come up with different answers in different scenarios which can produce unexpected behavior.

If I had my preference, I would likely implement this as an "absolute max RAM limit" which includes ALL of the above. But without specifics in the so-called "UCI standard" that really isn't a standard, that would likely hurt me more than it helps anything. But it would be nice to tell each program exactly how big a sandbox it has to play in. And it would be even better if that were enforced, which just adds more complexity to everything.

There are already enough ways a program can be poorly behaved, without any attempt at correcting those, so worrying about the memory issues is probably also just as likely to not help much. Would be nice to see a real standard about program behavior, including memory usage, and then see everyone follow said standard. But something tells me that horse has already escaped from the barn...

noobpwnftw · Post by **noobpwnftw** » Sun Jul 15, 2018 10:53 am

hgm wrote: ↑Sun Jul 15, 2018 4:51 am
syzygy wrote: ↑Sat Jul 14, 2018 8:26 pmDo you believe that yourself? I have been very careful to stick to a single issue here.
Ah, sure. Disputing the meaning of the term 'insisting' was no doubt part of that issue...

So far you have not disputed my point that all UCI engines treat the UCI Hash setting as the (maximum) size of the transposition table. (There may be exceptions, but I don't know of any.)
But when these engines do not use any other tables of significant size (as none of them that are aware of do), that is not evidence for either interpretation, right? Then ty also use the UCI Hash setting as the maximum size of the total, as the TT is the total.

This is why I asked whether you knew any UCI engine that would use significant memory in addition to the Hash (which then would not work for most users without altering the centralized Hash setting that is perfect for almost every other engine). If you could name one, you would have a point.

But of course you failed to provide an answer, which leaves you entirely empty handed.

I can provide an example of

UCI engine that would use significant memory in addition to the Hash

:
If I run Stockfish and set Threads to 512, it will consume 38.2GB of more RAM. AFAIK it is unrelated to Hash and it is up to users' responsibility to understand that threads come with extra memory requirement or not to have that setting if they do not want such behavior, since I only set Hash to 16MB and its total RAM usage went way beyond that limit.

hgm · Post by **hgm** » Sun Jul 15, 2018 11:33 am

This is normal: Every program needs some space for its code and local variables. On a machine where you can usefully employ 512 threads a mere 38MB is a mere trifle. It amounts to 75KB per thread. How large do you typically set the Hash on that machine? I'll bet it is a lot more than 256MB. On 1GB the 38MB is just 4% extra, which doesn't qualify as 'significant'. Every user should know you cannot set the Hash at the size of your total memory, as the OS needs space to run too. In practice that often means you can set it at most to half your meory (e.g. 2GB on a 4GB machine). That it then uses 2.038GB wouldn't hurt the performance at all.

The point is not whether you can make the use of memory for 'unusual tables' a very large fraction of the Hash size by setting the latter to a ridiculously low value. Typical users would only set 16MB hash when they are running on hardware that has 32MB in total, so that this is the largest they then can afford. And indeed, 512 threads would completely crash such a machine.

syzygy · Post by **syzygy** » Sun Jul 15, 2018 12:41 pm

bob wrote: ↑Sun Jul 15, 2018 5:44 am Would seem to me that if UCI only provides a single "control" that it really should be constrained to include (2) and (3) above.

The UCI spec doesn't take care of anything one would like, but it does list "NalimovCache" as an option for setting the size of the Nalimov cache.

The question is what the UCI "Hash" option is supposed to mean.

I think regular chess engine users tend to say "hash" when they are referring to the transposition table. Programmers are more likely to be aware that "hash" and "hash table" are more general concepts than transposition table, but even they often say "hash" for "transposition table" when there is little risk of confusion.

The UCI spec was written for programmers, but since the names of UCI options will often be visible to end users, I can see why the author of the UCI spec may have preffered to call the option "Hash" and not "TranspositionTable". This is probably also why the UCI spec uses "hashfull" instead of something like "ttfull". (Or do other people think "hashfull" should report the average fill rate of all hash-table data structures?)

In any event, the name "Hash" would be a very bad name if it were meant to refer to total memory consumption (or even total "variable" memory consumption). There are engines like Brainfish that load gigabytes of book.bin into RAM. Engines using Nalimov use a lot of RAM for storing index tables. A book is not a hash table (well, it can be implemented as a hash table, but I don't think we want to let such technical details be decisive here). Index tables are not hash tables. And most caches are not hash tables.

In my view, the UCI spec just tries to keep things simple for programmers. Practically all engines implement a hash table (i.e. transposition table), and it is certainly useful if the size of this thing can be set in a uniform way. So that's why there is the "Hash" option. And I'm still not aware of any UCI engine that treats the option differently. So any new UCI engine author need not fret about this and can simply use the option to set the size of the transposition table.

Else you have programs that are poorly behaved and which can caused unexpected paging activity that a naive user might not notice at all. I implemented a "smart hash" option in Crafty years ago, which was given a total memory constraint, and it used its own "judgement" as to what to set the usual hash, phash and such to to stay within that limit. I even made the adjustment and size take into consideration the speed of the program and the time control being used so that it would use less memory for fast games where large hash tables don't help much. Only down-side to this kind of stuff is the "automatic" part of the code will come up with different answers in different scenarios which can produce unexpected behavior.

The xboard protocol seems to have a "memory" option for that. I doubt that many serious programs implement it.

If I had my preference, I would likely implement this as an "absolute max RAM limit" which includes ALL of the above. But without specifics in the so-called "UCI standard" that really isn't a standard, that would likely hurt me more than it helps anything. But it would be nice to tell each program exactly how big a sandbox it has to play in. And it would be even better if that were enforced, which just adds more complexity to everything.

But would we really want that complexity?
Total fairness in engine-engine matches can be achieved by running each engine on a separate computer.
If that is not an option, we'll just have to rely on the programs behaving well.

There are already enough ways a program can be poorly behaved, without any attempt at correcting those, so worrying about the memory issues is probably also just as likely to not help much.

Exactly.

noobpwnftw · Post by **noobpwnftw** » Sun Jul 15, 2018 12:42 pm

hgm wrote: ↑Sun Jul 15, 2018 11:33 am This is normal: Every program needs some space for its code and local variables. On a machine where you can usefully employ 512 threads a mere 38MB is a mere trifle. It amounts to 75KB per thread. How large do you typically set the Hash on that machine? I'll bet it is a lot more than 256MB. On 1GB the 38MB is just 4% extra, which doesn't qualify as 'significant'. Every user should know you cannot set the Hash at the size of your total memory, as the OS needs space to run too. In practice that often means you can set it at most to half your meory (e.g. 2GB on a 4GB machine). That it then uses 2.038GB wouldn't hurt the performance at all.

The point is not whether you can make the use of memory for 'unusual tables' a very large fraction of the Hash size by setting the latter to a ridiculously low value. Typical users would only set 16MB hash when they are running on hardware that has 32MB in total, so that this is the largest they then can afford. And indeed, 512 threads would completely crash such a machine.

I said 38.2 Gigabytes. I usually set Hash to 16 Gigabytes on that machine, in turn it uses more than 200% of RAM as would've expected.
I think it is kind of user knowledge when it comes to how engines would behave under certain settings, and it is only when some engine intentionally went too far beyond that we'd say there is a problem, for example if one engine ignores whatever settings you have and run many extra busy working threads, or to use significantly more extra memory by your definition, but not in cases like mine that it is probably not intended but just a matter of consequences.

Engines can have many more settings than Threads and Hash, they usually come with some manuals on what they do. By the end of the day we aren't dictating what engine authors should or should not do, it is up to the user to decide those settings and compensate if necessary.

hgm · Post by **hgm** » Sun Jul 15, 2018 2:52 pm

Oh, sorry, I misread.

That seems pretty absurd. Even for 512 threads (which definitely would not count as main-stream use). It is 75MB per thread. What the heck would it need all that memory for? Are you sure this is not an artifact of the task manager, that it counts separately for each thread what in fact is shared?

noobpwnftw wrote: ↑Sun Jul 15, 2018 12:42 pmEngines can have many more settings than Threads and Hash, they usually come with some manuals on what they do. By the end of the day we aren't dictating what engine authors should or should not do, it is up to the user to decide those settings and compensate if necessary.

But users cannot do that if they do not know what the settings mean. No one ever reads manuals, and most engines come entirely without manuals.

Anyway, it appears that the only reason Stockfish isn't generally considered a broken engine is that virtually no one in the world would run it with 512 threads, If the typical user had 512 cores and 32 GB of memory, and every other engine would run in that with 512 threads and 16GB of Hash, they would surely complain. You require them:
1) to read a manual
2) calculate by hand how much threads and or hash they can affort on Stockfish
3) somhow figure out how to optimally solve the conflict between leaving cores unused or having smaller hash
4) Adapt the GUI's Hash setting every time they switch between Stockfish and another engine

noobpwnftw · Post by **noobpwnftw** » Sun Jul 15, 2018 4:17 pm

I am quite sure that the memory usage per thread is real, but probably it came with the OS, threads are more resource heavy under Linux.

This might be one of those cases that sometimes even the author may not know the engine is behaving absurd under certain circumstances.

If a user does care much about what exactly is going on with the engines, he really should read the manuals and do measurements of the actual resource usage and decide what he should put in the settings.

How about inventing new options like Max Memory, Max Cores, Max Disk IOPS, etc, really, if one would push for such standardization?

hgm · Post by **hgm** » Sun Jul 15, 2018 7:43 pm

Unfortunately UCI seems to be set in stone, and in any case it is not up to use to change or extend its specification. People would simply ignore it. There even isn't a uniform name for the Threads option, many years after SMP became common.

The specs do define a set of 'standard' options, however, and it is important that engines attach the same meaning to them. Because GUIs might not allow you to set the standard options different for each engine, to compensate for difference in interpretation. 'Hash' is one such standard option.

My point is that virtually no user cares what is going on in the engine at all; they just want it to work (optimally), and they want it to be interchangeable with other engines they are using as far as the uniformly applied standard settings are concerned. This was the whole idea of having standard protocols in the first place.

To come back to the case of Stockfish: if the 38GB you mention is 16GB Hash + 22GB other stuff, and the other stuff is allocated per thread, it would mean that even a single-threaded version would use 44MB extra memory. As typical PCs have 4 cores that would make 176MB for 4 threads. Such PCs typically has 4 or 8GB memory, so likely hash settings are 2 or 4GB for analysis, and 1 or 2GB for engine-engine games. The 176MB still is only a minor fraction of that, probably not enough to make things come to a grinding halt.

But is is getting dangerously close to cheating, in engine-engine games. I forgot what exactly the CCRL test conditions are, but as they try to emulate conditions on obsolete computers, to keep in line with old test results, they might test at a Hash setting of 64MB only. Then using 176MB extra would give the engine a quite unfair advantage over its opponents, inflating its rating. If this extra memory use could not be somehow currtailed by options, it would IMO be reason to disqualify the engine altogether. OTOH, the engine might scale this extra memory use to be always small (per thread) compared to the hash size, and the 44MB might be only for a Hash setting of 16GB, compared to which it is quite negligible (the extreme value of 38GB only occurring because of the extraordinary large number of threads).

bob · Post by **bob** » Mon Jul 16, 2018 5:41 am

noobpwnftw wrote: ↑Sun Jul 15, 2018 4:17 pm I am quite sure that the memory usage per thread is real, but probably it came with the OS, threads are more resource heavy under Linux.

This might be one of those cases that sometimes even the author may not know the engine is behaving absurd under certain circumstances.

If a user does care much about what exactly is going on with the engines, he really should read the manuals and do measurements of the actual resource usage and decide what he should put in the settings.

How about inventing new options like Max Memory, Max Cores, Max Disk IOPS, etc, really, if one would push for such standardization?

I don't quite follow what that "threads are more resource heavy under linux." In fact, it is the exact opposite. Threads intentionally share everything, where processes (via fork()) share very little (at least they share executable code and anything that is not modified - an artifact of the "copy on write" approach fork() uses. Given the choice, threads is the least resource-intensive way to do multiprocessing.

noobpwnftw · Post by **noobpwnftw** » Mon Jul 16, 2018 10:24 am

bob wrote: ↑Mon Jul 16, 2018 5:41 am I don't quite follow what that "threads are more resource heavy under linux." In fact, it is the exact opposite. Threads intentionally share everything, where processes (via fork()) share very little (at least they share executable code and anything that is not modified - an artifact of the "copy on write" approach fork() uses. Given the choice, threads is the least resource-intensive way to do multiprocessing.

This is wrong. What you think is almost true under Windows, however, threads under Linux are essentially fully-loaded "processes", they share nothing more than what you would get from fork(), actually, in the recent past it was implemented exactly via fork().

UCI Hash Usage Rules

Re: UCI Hash Usage Rules

Re: UCI Hash Usage Rules

Re: UCI Hash Usage Rules

Re: UCI Hash Usage Rules

Re: UCI Hash Usage Rules

Re: UCI Hash Usage Rules

Re: UCI Hash Usage Rules

Re: UCI Hash Usage Rules

Re: UCI Hash Usage Rules

Re: UCI Hash Usage Rules