bob wrote:First, I though we were discussing this in the context of a chess engine, not in terms of general-purpose applications?
I am not sure what you want to say with this. We are discussing the best way to do paging. For Chess programs you don't want to page at all. So in the context of a Chess program it is immaterial how you page.
Secondly, operating systems definitely use a fault-in-on-demand strategy on program startup and while it is running. I do not want to page my program, and therefore I limit the size of everything to be certain it will fit in physical memory. If nothing else needs to run along with me. But I know that on a hash probe, I do not want to read 1/4 megabyte of data when I am only going to use 16 bytes. On the large magic number tables, I do not want to read in 1/4 megabyte to get to a 64 bit or 32 bit value. The list goes on. Because if I do have to page, I do not want to have to page out 256K chunks when only 1, 2 or 16 bytes have been changed.
There seems to be no basis for this. If part of your hash table gets paged out, you want it paged back in as quickly as possible. You can't say that you only need 16 bytes, as the hash table is constantly bombarded with probes. The magic number tables even more so. Any part of it that is missing will bring you to a grinding halt very soon. It should be always loaded. The more you bring in in advance the better. The larger the pages, the more likely that they will be 'least recently used', and not swapped out. And if it has to be swapped out to shut down operations of your chess program, you want to put the OS under maximum pressure to revoke this disastrous decision. If swapping 256KB is hardly anymore expensive that swapping 4KB, it is just as well that 256KB is swapped in at once, rather than having to wait for 64 page faults (which will certainly come quite soon, but not soon enough to have missed the opportunity to load them on this rotation of the disk, and forcing you to wait for the next).
Large page sizes have their place, but then so do small ones. The O/S is tricky with buffering file I/O using page-sized buffers. 256K is too big unless your program spends all its time reading/writing.
In general, if we have to use a "one-size-fits-all" approach, smaller is better, because _most_ running applications are very small.
Well, as I reported before, applications smaller than 2MB (memory footprint) virtually seem non-existent.. So "very small" means something different then I think, or this is just not true at all.
How would you do that? One large function? Poor software engineering. Lots of small functions? No way to put 'em into one contiguous block then, since we don't have any semantics to cause that to happen.
Linkers usually pack code coming from the same source file in one contiguous block. The order in which you specify the arguments could be a clue.
Again, the general feeling among computer architecure and O/S designers is that if the choice is large or small, small is better. Were it dynamically variable, then yes, you could tune page size to a specific application's behavior and help things.
Well, large or small are relative notions. I maintain that there also is something like 'too small': the optimum will not ly at asymptotically small page size (say 4 bytes). So the question is what is optimum. I agree that for general applications 4MB is definitely too large. But 4KB seems way too small.
But right now it is more about memory wastage, rather than paging costs since most systems are configured to minimize actual paging I/O. And that's the bad mark on the 4M page size. It greatly reduces TLB misses on very large programs. But if you use it on small programs memory wastage is very high.
Agreed, 4MB is (still) too big.
I will guarantee you that you will see paging in any program. It is unavoidable. We just work to minimize it. But we do have to start a game from scratch, and we have to page there. But as I mentioned, I worry far less about that than I do just the idea of wasting memory in the _general_ case. For Crafty, I certainly prefer 4M pages. I have reported on that here in the past and pointed out that performance was _higher_ because of the limited TLB we have. And TLB misses are _bad_. But I would not want to run with 4mb pages exclusively, because of the 100+ system daemons/tasks that run and I don't want every one of them to have a 4mb page for code, a 4mb page for read-only data, and a 4mb page for read/write data, when 128K will do for the whole thing. That's where large page sizes break down and hurt performance, if they _must_ be used everywhere.
Yes, 4MB is too big. But with 256KB and 4 different flavors of access rights, your 100+ daemons will still only occupy 100MB, or 2.5% of a 4GB memory (10% if you have 1GB which seems bare minimum now). That sounds acceptable, and there is no reason to shy away from it if the alternative has its own, possibly worse problems (like inefficient use of I/O bandwidth during swapping). As I said, even with 4KB pages only 3 of the 32 Windows Vista daemons was smaller than 2MB. 256B is simply asymptotically small, nowadays.
none of that matters much in the grand scheme of an operating system. The issue is how much of main memory is actually _used_ vs how much is wasted because of the large page sizes? A large number of small jobs, such as those spawned by unix or windows, increases waste significantly. And since a single page must have a single access control type, the typical program will require at least two different access controls, sometimes 3. Bringing in such large chunks also invalidates a lot of cache, and requires overhead to force dirty lines back to memory before the I/O is started. Not much is free... So yes, for a chess engine, which is a large application, large page sizes are good. I've already measured that a year or two back. But if you force it on everyone by just supporting 256K (or whatever) page size, it is not so good.
This assumes that there exist programs smaller than a few times 256KB, and I just don't see those in my task manager...