Think "broader". I am talking both about the number of processors (cores) but also about the speed of them. On slow processors I can split nearer to the tips, while on faster processors I back off of that a bit to improve efficiency. So it is about both, and I have yet to find a "one size fits all" scheme to tune this.diep wrote:Basically you do 2 statements here that are relevant:bob wrote:I don't have this problem at all, which is a problem in itself, as I can't really tell much about the search overhead for the first 10 plies or so, and once I get beyond that point, it is difficult to adjust things quickly enough. I have something (now) that works if we start at the beginning of the game, because as things slow down in endings, it adjusts upward just fine. But if I start off in an endgame, it is more problematic because it could be that the search controls need to go up or down, and once I get into "real searches" I don't have a lot of time to "hunt around" without killing things while I do so...diep wrote:Hi Bob,
This is actually not a bad idea seen from generic viewpoint. You'll need a lot more statistics first. One important statistic is drawing a graph how many splits you get a second against the total search time.
The real problem in Diep at supercomputer, which is probably comparable to your problem at a few cores, is that the first few seconds of search a HUGE number of splits occur.
Already done that, and the answer is "no". Which is a problem in that I want this to work across any platform without requiring platform specific information for tuning.
So i worried back in 2002/2003 how to solve this problem.
Get a machine where you feel you did do a good splitting depthleft job.
then make a graph of how many splits per second against search time,
Then see whether applying automatically that graph at other hardware benefits crafty.
mine vary from 4 to even 20+ in positions like fine70...
Note that Diep also has different minimum splitdepths. Ranging from 2 (default) to 10 or so.
But i'm doing it total different than you.
Vincent
a) i assume we both want to look beyond ICGA limits and scale to more than a handful of cores which benefits the skulltrail machine, a mainboard designed to hold a lot of videocards for number crunching GPGPU, that is the funny thing. You mention clearly you want it to work at different hardware, i read that as: more cores
I had started working on a new feature, somewhat like "bench" but which would be used to tune the SMP search to a specific platform, and then save the tuning parameters in the .craftyrc file so that the search would always be optimized. But I discovered it was not that easy, because the parameters ideally would vary as the characteristics of the tree change from bushy/shallow to narrow/deep. And doing that automatically is a bit of a problem... I want the search to get better as it tunes, not get much worse before it figures out it is doing the wrong adjustment and backs up...
I claim I don't suffer from a split problem at very shallow searches, which I don't. You can easily verify that for yourself if you want. My problem happens as the search goes deeper, and there is, for any tree, a perfect "spot" where you split above the spot, but never below, for optimal performance. However, the "spot" is a shifty little SOB and I am trying to determine where it is, on the fly, for any hardware speed and number of cores and time-per-move limit. It seems to be somewhat akin to wanting to know both the speed of an electron and its physical location, if you are familiar with the concept...
b) you claim you don't suffer from a splitproblem.
I have run extensively through 32 cores, with good success. I ran some on 64 cores, but it was an Itanium box, and it didn't do so well (a first generation Itanium at that). But even there the scaling was > 50% of optimal so it wasn't horrible, but the architecture was just bad (and this was prior to the lockless hashing days so the locks on the hash table back then were probably murdering it).Diep is different from crafty, it scales better. I have done a lot of effort for that. The first problem to solve is to scale well. When searching at a big multiprocessor (NUMA) machine say a core or 64 (beckton quad socket box for example), then in Diep it scales with the hardware. So unlike crafty limiting the number of total splits a second is less relevant than for crafty i'd assume. For Diep as it scales that well, it is enough to have a model that is just taking into account local core considerations.
Vincent
But I am not nearly as concerned about the "big end" as I am about simply running optimally on realistic hardware... and that's what I am currently trying to solve so that it works for all platforms, not just the ones I have tested and tuned on...
Once or twice a year I get into a long discussion with someone that tries to run Crafty on something more exotic than usual. Last case was an interesting multi-cpu multi-core opteron box. And it took some testing and tweaking to get the numbers up to where they belonged, and the final result would not run worth a flip on my dual xeon, and were worse than the default when running on the 8-core xeon cluster here.
I want to make that problem go away if possible.