STS (v2.0) - Open Files and Diagonals:
80/100, Grade: A+
STS (v3.0) - Knight Outposts/Centralization/Repositioning:
78/100, Grade: A
STS (v4.0) - Square Vacancy:
69/100, Grade: B+
STS (v5.0) - Bishop vs Knight:
70/100, Grade: A-
STS (v6.0) - Re-Capturing:
75/100, Grade: A
STS (v7.0) - Offer of Simplification:
66/100, Grade: B+
STS (v8.0) - Advancement of f/g/h Pawns:
59/100, Grade: C+
Overall Performance:
Total Score: 576/800
Overall Average: 72.00 %
Grade: A-
Regards,
Swami
PS: Unless there's something wrong with the executable and the way it was compiled, I don't think that this is going to be 40+ elo. Perhaps the default settings don't have correctly tuned evaluation.
It tested at something like 56 ELO on 1 minute + 1 second time controls over a few thousand games. I deducted about 15 ELO from the estimate due to the low sample sample size - it is extremely unlikely to be weaker than 40+ at the 1 minute + 1 second Fischer time control.
However, at longer time controls anything is possible. It could be weaker, it could be stronger.
I have never seen a problem set that works well for consistently getting close to the ELO of a program or measuring small improvement and although I think this set is very good I don't think it can replace playing long matches.
It could be that this set is revealing an issue with longer time controls or the trickiness of making it compile correctly on Windows. We shall see!
STS (v2.0) - Open Files and Diagonals:
80/100, Grade: A+
STS (v3.0) - Knight Outposts/Centralization/Repositioning:
78/100, Grade: A
STS (v4.0) - Square Vacancy:
69/100, Grade: B+
STS (v5.0) - Bishop vs Knight:
70/100, Grade: A-
STS (v6.0) - Re-Capturing:
75/100, Grade: A
STS (v7.0) - Offer of Simplification:
66/100, Grade: B+
STS (v8.0) - Advancement of f/g/h Pawns:
59/100, Grade: C+
Overall Performance:
Total Score: 576/800
Overall Average: 72.00 %
Grade: A-
Regards,
Swami
PS: Unless there's something wrong with the executable and the way it was compiled, I don't think that this is going to be 40+ elo. Perhaps the default settings don't have correctly tuned evaluation.
It tested at something like 56 ELO on 1 minute + 1 second time controls over a few thousand games. I deducted about 15 ELO from the estimate due to the low sample sample size - it is extremely unlikely to be weaker than 40+ at the 1 minute + 1 second Fischer time control.
However, at longer time controls anything is possible. It could be weaker, it could be stronger.
I have never seen a problem set that works well for consistently getting close to the ELO of a program or measuring small improvement and although I think this set is very good I don't think it can replace playing long matches.
It could be that this set is revealing an issue with longer time controls or the trickiness of making it compile correctly on Windows. We shall see!
You're right on all counts, Don. The problem is that there are not enough test sets. By the end of the year, I believe we could have a total of 20 test sets, which I hope could give a large sample datas and that could lend some credibility to accurate and better predictions.
Meanwhile, waiting for the new updated file compile from Jim!
Hi,
I am testing Doch 1.3 x64 with sample.per for CEGT 40/20.
First time installing the engine inside Arena 2.01 it crashed here too.
After that I could edit engine options and now the match is running ok.
I have had no problems installing the engine inside Fritz 11 GUI.
sockmonkey wrote:Any chance of some OSX support? If Linux is building, Darwin is probably just a make away.
Jeremy
Let me rephrase that. I would gladly volunteer to get this compiling on MacOS, should the developers care to entrust me with the task.
Jeremy
I would gladly try to compile this on a mac if you give me an account and ssh access to your machine. MacOS is unix, right? Presumably gcc is installed and working?