bob wrote:
I basically do one I/O operation to get all book moves for a position. I carefully group them by a hash signature trick. Used to be pretty significant. With today's SSD devices probably not so much....
Well, first I would note that your right about SSD's. They're much faster than traditional hard drives because they have negligible seek times and no rotational latency. This completely changes what's required to make a data retrieval system fast enough to meet requirements.
In 1995 an average hard drive for a PC had a transfer rate of about one to two megabytes a second, a rotational latency of 6-8ms, and a seek time of 15-20ms. Modern drives have transfer rate of 50-100 MB/sec, rotational latency of 2-4ms, and seek times of 4-12ms. This is misleading since all modern drives come with relatively large caches which hides some to much of this time depending on what you are doing.
SSD's of course have no rotational latency, seek times are 0.1ms to 0.05ms and transfer rates are 500MBs / sec. They support on the order of 50,000 to 100,000 4K random reads or writes per second. I cant imagine a book structure that would run so slow as to be unusable on an SSD.
A second alternative is a ram drive. I use one and it works great. I also have an SSD. I use it for files that are too large or too numerous to fit on the ram drive. I don't know anyone that's serious about chess that doesn't have one.
In short, the technology has changed while the file system hasn't. The current system was designed for hardware that doesn't exist anymore.
bob wrote: ...The problem with your scheme is where do you add the moves? To add them close to the other moves, you have to reserve space scattered throughout the file which hurts efficiency somewhat, and once you exhaust that, you are stuck once again...
If you're using a ram drive or an SSD locality doesn't make much of a difference. No rotational latency and fractions of a millisecond seek time to any location on the disk means low access time to all data. It would of course be better if all moves are grouped to gather so I wouldn't take the sort code out just yet.
The way I would do it is use two files. An index file and a data file.
The index file would consist of 4k records. Each record would contain multiple sub-records. Each sub record points to the position sought in the data file. The sub-records consist of all (or part) of the hash key and a pointer into the data file. The index records are referenced by some fraction of the hash key, which depends on the maximum number of record we intend to store in the system.
You could actually do all of this in one file if you wanted to. But to keep it simple we'll use two files. The data file could be almost identical to what you use now, except that you would only need to store the upper portion of hash key with the position data (if you wanted to use less space). This file could be sorted and built just like it is now. To add a record, you simply append it to the end of the data file, read the appropriate index record and add a new sub-record containing a pointer to the location of the added record.
I have used type of structure on very large files with good success and excellent speed. There are a couple of additional complications. The first one is you need to know approximately the maximum number of records you intend the system to hold. Most of the systems I designed like this spanned multiple drives. This was long before RAID systems were available. The index is built so that it will contain 120% of the maximum number of records envisioned. This is the loose space you referred to. It's not that much extra space and it give you a lot of flexibility.
The data file is packed file just like it is now. If you actually need to store more records than you originally intended then you run the risk of running out of index space. I always left one sub-record at the end of each index record so that I could point to an overflow index record. The overflow index records get appended to the end of the index file, if needed. A pointer is then put in the last sub-record of the main index record. This saves a lot of grief. It stops the system from breaking if you still have disk space available.
It sounds more complicated than it is. When I was doing this for a living I could write the code and debug it in a few days.
Anyway, just an idea I've been thinking about.
Regards,
Forrest
Only 2 defining forces have ever offered to die for you.....Jesus Christ and the American Soldier. One died for your soul, the other for your freedom.