rbarreira wrote:I'm wondering about some of the technical details:
1- Did you implement the idea mentioned in the other thread, using different hash keys for each run in order to detect computational errors better? And if yes, is each run still reproducible (i.e. do you record the hash keys) in order to be able to detect cheaters?
That is a perft binary issue, my client uses 128 bit keys so I don't consider it an issue, but I know that Paul's binary changes the zobrist table for every run - so if one run gets a collision the next will probably be correct.
I take a few measures to protect against cheating, but nothing is foolproof, including the best cryptography. I have to strike a balance between practicality and paranoia. Here is what I have done:
Your client cannot post results without a valid password. The passwords are private and only the server and the user have them. Yes, the network could be "sniffed" and keys revealed since the pass in the clear - however if some key is stolen in order to corrupt the results we will notice that one user is producing incorrect results and limit the damage. I can always throw out these results since I track results by user.
I also track results by perft binary. In other words the server knows which user and which binary produced each result. The client actually checksums the client and sends the checksum and the server rejects any result that does not come from an approved client. This can be defeated fairly easily by a good engineer, but it will take a purposeful effort to do so and even if it is I will know which user it is coming from.
In theory nobody would want to invest a huge amount of time to sabotage the results - but stranger things have happened.
The plan is to run the calculation twice, each using different binaries for the perft calculation. Right now I am not doing that for perft(12) and I have not decided whether to do these runs concurrently or not. There is a serious load balancing issue involved too, if everyone want to run the fastest and best perft client there will be no one running the slower verification client or client(s.) So I will either have to throw out results or force people to use a specific client. Imagine having to schedule the same work over and over against until someone happens to run it with a different client and you will see that this is a load balancing issue.
A simple solution is to do the entire calculation once, then partition the result by which perft binary was used. It would be simple to set up multiple servers and the users would be instructed about which client they should NOT use and directed to the appropriate server. For example if there are 6 million gperftd results I would run those 6 million over again and inform the users that they cannot use the gperftd binaries. The server would simply reject result that were sent from those binaries. In fact that is automatic anyway, it's just a configuration issue.
A more clever solution is let the server try to detect which binary each user is using. It knows this, so it could attempt to track this and could be smart enough to send appropriate assignments. But if some users are setting up instances using different binaries then this would be defeated. But even if there were not, we still have the load balancing issue, if one binary is far more popular than another (perhaps) slower one, then at some point we will run out of assignments for this binary and be left with a minority of users running the slower clients but having to do the majority of the calculations - not a good situation.
Since I don't plan to be too obsessive about this and spend a huge amount of time, I will probably do the run once, then partition the 9,000,000 + positions into separate cases and set up multiple servers for verifying with different clients. Each server would reject one particular client but not the others. That is simple and effective.
During this verification pass we would be getting statistics on how often any particular perft "went wrong" and if there were any problem users - we would probably find that out very quickly.
There is little motivation for someone to cheat since they will get flagged anyway, they would not be slowing down legitimate users at all, they would be just wasting their own time and their own CPU time. The worse case is that for every bogus record they insert we would have to do 2 additional searches. I cannot imagine anyone investing a lot of time and CPU resources and their own electric bill (money) just to slow down our calculation.
The most malicious thing that could happen is that someone uses a binary that is returning the wrong results and can use the same binary for additional calculations while pretending it's a different binary. Then a real error could slip in, but even in this case we would almost surely see that errors were coming from a single user. In such a case I would flag that user and throw out all his results. It's not likely that this would be a very big percentage of the results.
By the way, for each binary there are 2 platforms, Linux and Windows. I have identified the specific binary and also a "family" because if there is a bug for any given authors binary, it's more than likely in both platforms - and even if it isn't we have the means to analyze it. (It could turn out that only the Windows, or only the Linux version has the bug too.)
2- Do you make sure that people can't send work units that weren't assigned to them?
No. The model I use is that the server is just organizing the work. If someone wants to send me a million results that is ok with me. They will still be checked and verified and they will be ignored if they are not one of the legitimate positions. The server is robust with respect to multiple records and such. In fact after a server stop and restart it's likely that some results were assigned twice - the server doesn't care and it does not affect the calculation. In fact the server (or the off-line process that is) already checks these records to make sure the node counts match.
3- Is it an optimized perft client, or something like what Steve was using which spent lots of time calculating non-perft stuff?
i don't know much about Steve's perft binary.
If the technical fundamentals of how the project works are sound I'll probably contribute some CPU time.
The real test is when someone ELSE, independent of us, does a perft(14) and either comes up with the same numbers or different ones. If they are different then it doesn't mean ours are wrong. It just means one of us is wrong!
Right now I am praying that the perft(12) comes out right the first time. We will probably know within 2 or 3 days.
Capital punishment would be more effective as a preventive measure if it were administered prior to the crime.