lkaufman wrote:Regarding the renormalizing of the list to DS 12 (1 core) = 2800, I think it is a good change, although I think it is also clear that DS 12 on 1 core would crush Carlsen, Aronian, Kasparov, or Anand in a match. The reason I think it is a good change despite this is that engine vs engine ratings show larger differences than would be obtained if they only played top human players, so the result is that while 2800 engines will be underrated now in human terms, the ones at the very top will have ratings that are fairly close to what they would get in human tournaments, in my opinion.
I agree that S12 1core is rated to low compared to humans. Everything else is guessing as we do not have any reliable number ...
Anyhow, if the CEGT will change to Bayes as well AND they want to raise the raiting to something else I am open for suggestions and willing to go with them to a reasonable point ...
Bye
Ingo
PS: Hello CEGT: Put together S12 x64 and S12 w32. The Engines ARE identical and you have a much better base then!
I have overwhelming evidence based on a study of the Swedish computer rating list over a 20 year span that these lists "expand" rating differences from human competition by about a 4/3 ratio. Therefore it is certain that if DS 12 were rated "correctly" (let's say 3000 or so), the top engines would be rated too high in human terms. So I think you are both quite reasonable to choose 2800 for DS12 to make the top ratings reasonable. Maybe I would have chosen 2850, but for sure it should not be higher than 2900, even though it would actually rate near 3000 against humans I believe.
Regarding s12 x64 and x32, if they are really identical this would seem to be fraud to sell x64 as somehow better than x32. My impression is that x64 is supposed to be at least a tiny bit better for some reason, though I'm sure you are right about combining them for rating purposes. Does anyone know what (if any) is the difference between s12 x64 and x32? There must be something as I don't believe the author would intentionally deceive his customers, his reputation is too good for this.
lkaufman wrote:...Does anyone know what (if any) is the difference between s12 x64 and x32? There must be something as I don't believe the author would intentionally deceive his customers, his reputation is too good for this.
What "deceive" is there? You buy the Deep version and you get the 64 bit version, which can alocate more hash, for free. Why does it have to be better or where is it advertised as such?
You answered my question. The 64 bit version allows you to use more hash, so for serious analysis it would be stronger, without being any stronger at the time limits most testers use. I knew there had to be some difference, I just didn't know what it was. Thanks.
IWB wrote:
Ahhh, I think about changing this and you changed first - very good.
Now you change to bayes (I provide a batch file if needed), clean up the 40/20 a bit (remove engines with a low number of games) and things would be perfect.
How could things be perfect ?
We're testing with ponder=off.
IWB wrote:
Ahhh, I think about changing this and you changed first - very good.
Now you change to bayes (I provide a batch file if needed), clean up the 40/20 a bit (remove engines with a low number of games) and things would be perfect.
How could things be perfect ?
We're testing with ponder=off.
I do not like reducing the start elo of all programs.
I see no reason to try to adjust the rating of top programs to human's rating when humans do not compete in CEGT conditions(not from the opening position) so we have no data to know the rating of humans in these conditions.
It is going to be interesting to have tournaments between humans in CEGT conditions about the opening position and rating for humans in the same conditions(I will not be surprised if some humans are going to be 200 elo stronger or 200 elo weaker relative to normal chess) but unfortunately I guess that it is not going to happen.
our actual rating lists are online and can be found under the attached links. We have adjusted our lists. New reference engine is now Deep Shredder 12 x64 1CPU with 2800 points. The difference in startelo was (-)181 points here in our 40/20 list.
40 / 20:
New games: 1862 ; 52 different engines
Total: 573.179
NEW Engines
785 DanaSah 4.88: 2379 - 6000 games (-1 to version 4.66 here - and +26 at the moment in our blitz-tests)
UPDATES 2 Houdini 2.0c x64 4CPU: 3097 - 2642 games (+2) 88 Deep Junior 13 x64 4CPU: 2867 - 252 games (+10 and +5 to version 12.5) 191 Deep Junior 13 x64 1CPU: 2767 - 1048 games (-14 and +5 to version 12.5)
40 / 4: New games: 8600 All games now: 978.710
New startelo here is 2588 (-204). New reference engine with 2800 points is Deep Shredder 12 x64 1CPU!
New Engines 202 Deep Junior 13 x64 4CPU : 2744 - 1200 games (+9 to version 12.5) 263 Deep Junior 13 w32 1CPU : 2693 - 800 games (+-0 to version 12.5) 542 Cheng 3 v1.07 x64: 2496 - 1000 games (+23 to v. 1.06) 766 DanaSah 4.88: 2388 - 1000 games (+26 to v. 4.66) 807 GreKo 9.0 x64: 2362 - 1000 games (+5 to v. 8.2 here) 1036 EveAnn 1.67: 2154 - 800 games (+45 to v. 1.66) 1074 Waxmann 2011: 2086 - 800 games (-13 to v. 2010)
I have been studying your 40/4 lists to help me in my 40/3 engine match choices. Very comprehensive testing and results and rating for just about any engine version in existence. Already the listing and games for Junior 13. Your list has been invaluable to me. I am very impressed with the work.
our actual rating lists are online and can be found under the attached links. We have adjusted our lists. New reference engine is now Deep Shredder 12 x64 1CPU with 2800 points. The difference in startelo was (-)181 points here in our 40/20 list.
40 / 20:
New games: 1862 ; 52 different engines
Total: 573.179
NEW Engines
785 DanaSah 4.88: 2379 - 6000 games (-1 to version 4.66 here - and +26 at the moment in our blitz-tests)
UPDATES 2 Houdini 2.0c x64 4CPU: 3097 - 2642 games (+2) 88 Deep Junior 13 x64 4CPU: 2867 - 252 games (+10 and +5 to version 12.5) 191 Deep Junior 13 x64 1CPU: 2767 - 1048 games (-14 and +5 to version 12.5)
40 / 4: New games: 8600 All games now: 978.710
New startelo here is 2588 (-204). New reference engine with 2800 points is Deep Shredder 12 x64 1CPU!
New Engines 202 Deep Junior 13 x64 4CPU : 2744 - 1200 games (+9 to version 12.5) 263 Deep Junior 13 w32 1CPU : 2693 - 800 games (+-0 to version 12.5) 542 Cheng 3 v1.07 x64: 2496 - 1000 games (+23 to v. 1.06) 766 DanaSah 4.88: 2388 - 1000 games (+26 to v. 4.66) 807 GreKo 9.0 x64: 2362 - 1000 games (+5 to v. 8.2 here) 1036 EveAnn 1.67: 2154 - 800 games (+45 to v. 1.66) 1074 Waxmann 2011: 2086 - 800 games (-13 to v. 2010)
To add one point- I had run 2 different 40/3 repeating matches and I was shocked a bit at the results. I am sitting there wondering what I did wrong- and I happen to notice the threads with your blitz list. I checked the ratings of the engines in question, and they were right in the center of the margin for error in your list. My results reflected perfectly your ratings. So I was the problem- not the engines. I know all results won't fall in that perfectly- but this time it certainly answered every question I had. Thank you.