Nunemaker Chess Rating List (NCRL)

can00336 · Post by **can00336** » Sat Dec 12, 2015 9:23 pm

I have started my own rating list that focuses on the time scaling of engines. I don't have much CPU power, but I do what I can with what I've got.

http://chess.nunemaker.net/

Let me know what you think!

Graham Banks · Post by **Graham Banks** » Sat Dec 12, 2015 9:47 pm

can00336 wrote:I have started my own rating list that focuses on the time scaling of engines. I don't have much CPU power, but I do what I can with what I've got.

http://chess.nunemaker.net/

Let me know what you think!

Nice little project.

cetormenter · Post by **cetormenter** » Sun Dec 13, 2015 12:18 am

Very interesting. Unfortunately you will need a lot of games to be able to come to any conclusions. Something that I would add is an additional graph with a sort of "scaling" factor where the number of points at each time control is normalized to the number of points at 15+0.125. Right now it is too hard to compare one engine to another without doing that math yourself.

lkaufman · Post by **lkaufman** » Sun Dec 13, 2015 4:47 am

can00336 wrote:I have started my own rating list that focuses on the time scaling of engines. I don't have much CPU power, but I do what I can with what I've got.

http://chess.nunemaker.net/

Let me know what you think!

Very informative! I think you made many wise decisions in setting up your tests. The 120 to 1 ratio of time to increment is very reasonable, the use of Ordo for the rating calculations is wise since it insures that higher scores mean higher ratings, and your engine selection seems to avoid including two engines by the same author in the same run, also wise. It's interesting to see how the draw percentage and White advantage climb with each doubling of time. It's also interesting that the spread of ratings from bottom to top doesn't vary so much with time; logically the longer times with more draws should have narrower ranges, but it seems that the stronger engines scale better than the weaker ones to compensate for this.

My only suggestion (other than playing more games!) would be regarding hash size. 64 MB is adequate for the fastest two or three levels, but not enough for the longer ones. Maybe 128 would be a better compromise if you don't want to bother with different hash sizes for different levels. But it's not a big deal.

cdani · Post by **cdani** » Sun Dec 13, 2015 7:45 am

can00336 wrote:I have started my own rating list that focuses on the time scaling of engines. I don't have much CPU power, but I do what I can with what I've got.

http://chess.nunemaker.net/

Let me know what you think!

Thanks!! Is really very interesting.

Is not a surprise the behavior of Andscacs. I always tried to be sure that is better at long time controls that at short ones. Anyway is very nice to view it reflected, one never knows

Frank Quisinsky · Post by **Frank Quisinsky** » Sun Dec 13, 2015 4:02 pm

Hello Chad,

nice to have a list more.
I added the link in my "Comparable" selection.
http://www.amateurschach.de/main/_comparable.htm

I will look from time to time.

THANKS!

Best
Frank

can00336 · Post by **can00336** » Tue Dec 15, 2015 4:57 pm

Graham Banks wrote:Nice little project.

Thanks! CCRL was one of my inspirations for doing this project.

can00336 · Post by **can00336** » Tue Dec 15, 2015 5:02 pm

cetormenter wrote:Very interesting. Unfortunately you will need a lot of games to be able to come to any conclusions. Something that I would add is an additional graph with a sort of "scaling" factor where the number of points at each time control is normalized to the number of points at 15+0.125. Right now it is too hard to compare one engine to another without doing that math yourself.

I have been slowly trying to complete another rating list that uses the top 100 4-ply chess openings instead of the top 10 2-ply. This should help with the "not enough games" problem.
It just takes forever on a single quad-core. I'm waiting for Broadwell-E 10 core CPUs to come out in the spring to upgrade.

I agree with the "scaling" factor idea, so I have added a table and graph to make this more readable. Thanks!

can00336 · Post by **can00336** » Tue Dec 15, 2015 6:12 pm

lkaufman wrote:Very informative! I think you made many wise decisions in setting up your tests. The 120 to 1 ratio of time to increment is very reasonable, the use of Ordo for the rating calculations is wise since it insures that higher scores mean higher ratings, and your engine selection seems to avoid including two engines by the same author in the same run, also wise. It's interesting to see how the draw percentage and White advantage climb with each doubling of time. It's also interesting that the spread of ratings from bottom to top doesn't vary so much with time; logically the longer times with more draws should have narrower ranges, but it seems that the stronger engines scale better than the weaker ones to compensate for this.

My only suggestion (other than playing more games!) would be regarding hash size. 64 MB is adequate for the fastest two or three levels, but not enough for the longer ones. Maybe 128 would be a better compromise if you don't want to bother with different hash sizes for different levels. But it's not a big deal.

In setting up my tests, I drew inspiration from many of the currently successful rating lists. I've added them to my "Thanks" page now.

I chose the 120 to 1 ratio since my rating list started as just 60+0.5, then I thought, "Hey, why not do some shorter/longer games too?".

Ever since Ordo went 1.0, I haven't used anything else. Its so powerful and easy to use!

In selecting engines, I tried to avoid engines that did not provide a benefit to correspondence chess players who want a plethora of unique, strong moves suggested by their engines. That's why I'm using Don's similarity tester, not as a clone detector, but as a true move similarity tester. Finding strong, but off-beat, moves using a unique engine helps keep chess exciting!

I expected the draw rate to rise with the time control, but the white advantage increasing seems odd, though I doubt that it is statistically significant. I haven't looked at the spread of ratings, but your explanation seems reasonable.

Since my list started at 60+0.5, 64MB hash seemed sufficient in my testing. Then my list added 15+0.125 and 30+0.25, still fine. It was only later that I added the longer time controls. I wanted all the time controls to utilize the same hash size to minimize the factors at play when comparing time controls, but in retrospect I would use a hash size of 256MB instead. Whenever I upgrade my hardware next, I will remake my list with a larger hash size, probably 256MB.

can00336 · Post by **can00336** » Tue Dec 15, 2015 6:20 pm

cdani wrote:Thanks!! Is really very interesting.

Is not a surprise the behavior of Andscacs. I always tried to be sure that is better at long time controls that at short ones. Anyway is very nice to view it reflected, one never knows

I've been testing Andscacs since version 0.80, but I don't remember if earlier versions exhibited similar behavior or not.

Code: Select all

240+2 All Games
-----------------------
PLAYER         &#58; RATING
Andscacs 0.84  &#58;   2990
Andscacs 0.82  &#58;   2922
Andscacs 0.81  &#58;   2904
Andscacs 0.80  &#58;   2874

I somehow missed testing Andscacs 0.83, but you have been making amazing progress in my rating list! Keep up the awesome work.

Nunemaker Chess Rating List (NCRL)

Nunemaker Chess Rating List (NCRL)

Re: Nunemaker Chess Rating List (NCRL)

Re: Nunemaker Chess Rating List (NCRL)

Re: Nunemaker Chess Rating List (NCRL)

Re: Nunemaker Chess Rating List (NCRL)

Re: Nunemaker Chess Rating List (NCRL)

Re: Nunemaker Chess Rating List (NCRL)

Re: Nunemaker Chess Rating List (NCRL)

Re: Nunemaker Chess Rating List (NCRL)

Re: Nunemaker Chess Rating List (NCRL)