Re: TCEC Division 1 results simulator

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Harvey Williamson, bob

Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
chrisw
Posts: 1744
Joined: Tue Apr 03, 2012 2:28 pm

Re: TCEC Division 1 results simulator

Post by chrisw » Thu Aug 30, 2018 12:04 pm

Milos wrote:
Thu Aug 30, 2018 12:18 am
chrisw wrote:
Wed Aug 29, 2018 11:01 am
Laskos wrote:
Tue Aug 28, 2018 9:59 am
chrisw wrote:
Tue Aug 28, 2018 9:23 am
40 rounds so far, Division 1, sim predictions for being promoted to Premier Division:
Ethereal 90%
Chiron 83%
Fizbo 19%
Laser 3%

Code: Select all

Column 'First' is chance of winning, Column 'First two' is chance of being in first two
Engine,   Tournament Elo, Initial Elo, First, First two
Ethereal       3374         3334       0.562  0.909
Chiron         3351         3288       0.392  0.835
Fizbo          3301         3287       0.040  0.193
Laser          3247         3226       0.004  0.030
Fritz          3231         3226       0.001  0.014
ChessBrainVB   3247         3279       0.001  0.016
Jonny          3139         3140       0.000  0.000
Booot          3218         3272       0.000  0.002
Based on some not very well researched estimate elos, chances of winning the Premier Division would then be:

Code: Select all

Stockfish      3441         3441       0.572  
Komodo         3405         3405       0.186
Houdini        3400         3400       0.159  
Ethereal       3374         3374       0.055 
Chiron         3351         3351       0.019  
Fire           3326         3326       0.005 
Ginkgo         3322         3322       0.004  
Andscacs       3245         3245       0.000
About the second table. First, you seem to put Ethereal and Chiron too high. Second, generally, that in 56 games each, the winner is not one of the big three, seems to be a very slim possibility, much smaller than 8%. Third, Stockfish to win seems in above 70% to me. It has now "contempt" and is not underperforming in these round-robins with weaker engines. Gut feeling: SF say 72%, Kom 15%, Hou (if not updated) 11%, rest 2%.

And why not going further: SF above 90% to win the TCEC 13 (if Hou is not updated).
Ethereal and Chiron elos are being initialised to their given elo from the previous round plus 50% of the tournament elo gain, where:
tournament game = sum of [+/-400] / game count including draws
elo = initial elo + 0.5 * tournament gain

The elos of the other six engines are more likely to be problematic (too low, because progress).
Progress is tricky to guess at. At the moment I'ld go for:
Stockfish (++) because worked on continuously and reporting progress
Komodo (+) because worked on, but the side project MCTS may be distracting
Houdini (++) or (+) or (??) because I don't know
Fire (??) no idea at the moment
Ginggo (??)
Andsacs (+) or (++) because worked on

The Monte Carlo rollouts for the Premier are adjusted for the doubled game count (I just checked that I remembered to do that), and that will have effect of increasing the top winning chances for the sim.
Just a small illustration how ridiculous are your assumptions (regarding Premier Division):
http://talkchess.com/forum3/viewtopic.p ... 31#p772531
Haha! It didn't need to be personalised with "your" assumptions, "the" assumptions" would be enough. Or "ridiculous", when "Ethereal is likely 70 pts below Fire, as these blitz test results show ..." would be enough to get the meaning across. But I suppose it wouldn't be Milos without the added chilli pepper. What is unseasoned Milos like, actually? One of the smarter ones, in my opinion. But, anyway, if you think need the added seasoning, up to you.

Okay, so back to boring factuality-land ...

Assumptions are assumptions. Best thing to do with them is make them clear. Well, I try to. Stated already: I don't really know where Fire should be, in relation to the others. Maybe 100pts or so behind the top three, according to the lists. Is that an unfair assumption? Is it being worked on? Gimme data.

As to the relative position of Ethereal. Well, the programmer suggested when I asked, without being committal, IIRC, that it was around 3300 or so, using the CCRL 40 scaling of elos. When Ethereal played Round 4, it got a tournament elo increase of about 70pts, which I factored into its actual elo at 50% and set it at 3335 for Round 3. It ended Round 3 with no change to its tournament given elo. So I put it in this Round at 3334. TCEC chose 3341, not much different.
Ethereal is doing well enough this round, TCEC is giving it currently an elo improvement of +50, and my algorithm says something similar. If promoted and all else stays equal, my elo-giving algorithm will probably start Ethereal off at 3360 in Premier Division.

So, like I said in the previous post, which it looks like you didn't actually carefully read:

"The elos of the other six engines are more likely to be problematic (too low, because progress)."

If you want to suggest a rough figure for Fire, including a "worked on" estimate, feel free, I am all ears.

Uri Blass
Posts: 8507
Joined: Wed Mar 08, 2006 11:37 pm
Location: Tel-Aviv Israel

Re: TCEC Division 1 results simulator

Post by Uri Blass » Thu Aug 30, 2018 3:36 pm

Milos wrote:
Thu Aug 30, 2018 11:43 am
:twisted: op
Uri Blass wrote:
Thu Aug 30, 2018 6:53 am
Milos wrote:
Thu Aug 30, 2018 12:18 am
chrisw wrote:
Wed Aug 29, 2018 11:01 am
Laskos wrote:
Tue Aug 28, 2018 9:59 am
chrisw wrote:
Tue Aug 28, 2018 9:23 am
40 rounds so far, Division 1, sim predictions for being promoted to Premier Division:
Ethereal 90%
Chiron 83%
Fizbo 19%
Laser 3%

Code: Select all

Column 'First' is chance of winning, Column 'First two' is chance of being in first two
Engine,   Tournament Elo, Initial Elo, First, First two
Ethereal       3374         3334       0.562  0.909
Chiron         3351         3288       0.392  0.835
Fizbo          3301         3287       0.040  0.193
Laser          3247         3226       0.004  0.030
Fritz          3231         3226       0.001  0.014
ChessBrainVB   3247         3279       0.001  0.016
Jonny          3139         3140       0.000  0.000
Booot          3218         3272       0.000  0.002
Based on some not very well researched estimate elos, chances of winning the Premier Division would then be:

Code: Select all

Stockfish      3441         3441       0.572  
Komodo         3405         3405       0.186
Houdini        3400         3400       0.159  
Ethereal       3374         3374       0.055 
Chiron         3351         3351       0.019  
Fire           3326         3326       0.005 
Ginkgo         3322         3322       0.004  
Andscacs       3245         3245       0.000
About the second table. First, you seem to put Ethereal and Chiron too high. Second, generally, that in 56 games each, the winner is not one of the big three, seems to be a very slim possibility, much smaller than 8%. Third, Stockfish to win seems in above 70% to me. It has now "contempt" and is not underperforming in these round-robins with weaker engines. Gut feeling: SF say 72%, Kom 15%, Hou (if not updated) 11%, rest 2%.

And why not going further: SF above 90% to win the TCEC 13 (if Hou is not updated).
Ethereal and Chiron elos are being initialised to their given elo from the previous round plus 50% of the tournament elo gain, where:
tournament game = sum of [+/-400] / game count including draws
elo = initial elo + 0.5 * tournament gain

The elos of the other six engines are more likely to be problematic (too low, because progress).
Progress is tricky to guess at. At the moment I'ld go for:
Stockfish (++) because worked on continuously and reporting progress
Komodo (+) because worked on, but the side project MCTS may be distracting
Houdini (++) or (+) or (??) because I don't know
Fire (??) no idea at the moment
Ginggo (??)
Andsacs (+) or (++) because worked on

The Monte Carlo rollouts for the Premier are adjusted for the doubled game count (I just checked that I remembered to do that), and that will have effect of increasing the top winning chances for the sim.
Just a small illustration how ridiculous are your assumptions (regarding Premier Division):
http://talkchess.com/forum3/viewtopic.p ... 31#p772531
Note that TCEC is a different type of competition than 3+1 blitz.
I do not claim that Ethereal is better than Fire at TCEC conditions but only that I do not know.
Here is CCRL 40/40
http://www.computerchess.org.uk/ccrl/40 ... t_all.html

I see
Fire 7.1 64-bit 4CPU 3326
Deep Shredder 13 64-bit 4CPU 3287
Ethereal 10.55 64-bit 4CPU 3283

I guess new ethereal is better than 10.55 and shredder13 so the gap at 40/40 seem to be smaller than 40 elo.
Oh give me a break and try to say something useful for a change. Both Fire and Ethereal have identical LazySMP as SF, and scale pretty much identically. These claims that we don't know how something behaves at 10 or 40x longer TC is just BS. We do know. There is just compression of rating differences. There is never an inversion. No one ever showed one with any reasonably strong A/B engine.
CCRL 40/40 is irrelevant. Ethereal only played 400 games there. Probably real difference vs Fire is 50-60 Elo instead of 70-80. And that's perfectly fine for 10x longer TC. And it's a simple fact that you compress rating differences with longer TC but you also compress error margins. So probability of marginal events happening remains the same.
It played 50 games in TCEC and ppl make even more ridiculous conclusions. It's a hype and nothing else. Chance for Ethereal to finish 4th in primary division is less than 20%. To finish in top 3 is less than 2%.
It is not correct that there is never an inversion.

Here is an example for an inversion with smaller elo difference

From the ccrl 40/4 list

http://www.computerchess.org.uk/ccrl/404/

Colossus 2008b 2642 +7 −7 48.3% +13.6 30.7% 8497
Movei 00.8.438 (10 10 10) 2624 +7 −7 46.8% +23.9 30.7% 8182

from the ccrl 40/40 list

Movei 00.8.438 (10 10 10) 2667 +11 −11 47.3% +15.3 37.4% 2693
Colossus 2008b 2643 +13 −13 49.3% +2.4 38.3% 2098

I know from my own tests when i developed movei and tested it with time handicap that the better engine usually scaled better because in most cases the main advantage of the better engine was not a programmer who knew to do the same thing on the computer significantly faster but an algorithm that scale better.
Inspite of it in theory it is possible that one engine has an algorithm that scale better when the second engine has a programmer who is better in tricks how to do the same thing faster and in this case I expect to see an inversion at some time control.

AndrewGrant
Posts: 467
Joined: Tue Apr 19, 2016 4:08 am
Location: U.S.A
Full name: Andrew Grant
Contact:

Re: TCEC Division 1 results simulator

Post by AndrewGrant » Thu Aug 30, 2018 8:58 pm

Milos wrote:
Thu Aug 30, 2018 11:43 am
Both Fire and Ethereal have identical LazySMP as SF, and scale pretty much identically. These claims that we don't know how something behaves at 10 or 40x longer TC is just BS. We do know. There is just compression of rating differences. There is never an inversion. No one ever showed one with any reasonably strong A/B engine.
Spot on. There are two possible cases for scaling here, the # of threads and the Time Control of the games. On the topic of time control, I have not once seen a real elo difference generated from differing time controls that can not be considered compression. The exception here is however the trivially fast games -- namely I can Beat Stockfish using 1+.01s, and then immediately lose by hundreds of elo on 10+.1s. On the topic of threads, if we assume that these top engines all use LazySMP, and don't do anything absolutely stupid in terms of memory / thread management, then I believe there is virtually no scaling difference. The only component I think which matters is the TT replacment scheme when running under many threads, but I'm guessing we all do what Stockfish does because 1) Stockfish does it, 2) It works, 3) It makes sense even if you've never seen SF source.
Milos wrote:
Thu Aug 30, 2018 11:43 am
CCRL 40/40 is irrelevant. Ethereal only played 400 games there. Probably real difference vs Fire is 50-60 Elo instead of 70-80. And that's perfectly fine for 10x longer TC. And it's a simple fact that you compress rating differences with longer TC but you also compress error margins. So probability of marginal events happening remains the same.
I agree about the compression and error margins, but I think the real elo difference is actually greater than 70-80, based on some internal testing.
Milos wrote:
Thu Aug 30, 2018 11:43 am
It played 50 games in TCEC and ppl make even more ridiculous conclusions. It's a hype and nothing else. Chance for Ethereal to finish 4th in primary division is less than 20%. To finish in top 3 is less than 2%.
One of your earlier posts said 30% for 4th, which I think I would agree with, given the low # of games played. Certainly no prospects at a top 3 finish of course.

Too many users in this thread throwing words around without a whole lot of experience to back them up.

... and I generally try not to agree with Milos for my own sake ...

Milos
Posts: 3379
Joined: Wed Nov 25, 2009 12:47 am

Re: TCEC Division 1 results simulator

Post by Milos » Thu Aug 30, 2018 9:54 pm

Uri Blass wrote:
Thu Aug 30, 2018 3:36 pm
Milos wrote:
Thu Aug 30, 2018 11:43 am
Oh give me a break and try to say something useful for a change. Both Fire and Ethereal have identical LazySMP as SF, and scale pretty much identically. These claims that we don't know how something behaves at 10 or 40x longer TC is just BS. We do know. There is just compression of rating differences. There is never an inversion. No one ever showed one with any reasonably strong A/B engine.
CCRL 40/40 is irrelevant. Ethereal only played 400 games there. Probably real difference vs Fire is 50-60 Elo instead of 70-80. And that's perfectly fine for 10x longer TC. And it's a simple fact that you compress rating differences with longer TC but you also compress error margins. So probability of marginal events happening remains the same.
It played 50 games in TCEC and ppl make even more ridiculous conclusions. It's a hype and nothing else. Chance for Ethereal to finish 4th in primary division is less than 20%. To finish in top 3 is less than 2%.
It is not correct that there is never an inversion.

Here is an example for an inversion with smaller elo difference

From the ccrl 40/4 list

http://www.computerchess.org.uk/ccrl/404/

Colossus 2008b 2642 +7 −7 48.3% +13.6 30.7% 8497
Movei 00.8.438 (10 10 10) 2624 +7 −7 46.8% +23.9 30.7% 8182

from the ccrl 40/40 list

Movei 00.8.438 (10 10 10) 2667 +11 −11 47.3% +15.3 37.4% 2693
Colossus 2008b 2643 +13 −13 49.3% +2.4 38.3% 2098

I know from my own tests when i developed movei and tested it with time handicap that the better engine usually scaled better because in most cases the main advantage of the better engine was not a programmer who knew to do the same thing on the computer significantly faster but an algorithm that scale better.
Inspite of it in theory it is possible that one engine has an algorithm that scale better when the second engine has a programmer who is better in tricks how to do the same thing faster and in this case I expect to see an inversion at some time control.
That's why I put reasonably strong A/B engines. With all due respect Movei (or specially Colossus in this particular example) is not something that I would classify as a reasonably strong engine.
Problem with engines below 2800Elo or less and even more with relatively old engines is that often there are some bugs and problems with search (like improper reductions, extensions) that can cause scaling issues and that engine plays much worse at longer TC's than at bullet.
There was at some point a few years back, a story that Komodo scales much better at longer TC and even some ppl claimed there would be inversion related to SF, since SF was at that time 20-30 Elo stronger at bullet. And then came TCEC super final (don't remember which season exactly and I'm too lazy to look for it) and due to sharp openings difference after 100 games was even larger than what was at bullet.

Milos
Posts: 3379
Joined: Wed Nov 25, 2009 12:47 am

Re: TCEC Division 1 results simulator

Post by Milos » Thu Aug 30, 2018 10:20 pm

AndrewGrant wrote:
Thu Aug 30, 2018 8:58 pm
One of your earlier posts said 30% for 4th, which I think I would agree with, given the low # of games played. Certainly no prospects at a top 3 finish of course.

Too many users in this thread throwing words around without a whole lot of experience to back them up.

... and I generally try not to agree with Milos for my own sake ...
I often write posts that can be classified as full of vitriol, but I always try to make them as factual as possible. Ofc, there can be a bad judgement or miscalculation here and there, but I never intentionally distort data.
Quite a few ppl at this forum often react based on a gut feeling from few games only or simply as fans.

Regarding chances of Ethereal finishing 4th in Premier Division, I guess it depends on the actual Elo difference vs Fire at TCEC TC with 50Elo would be around 30%, with 60Elo around 20%, etc. But that's a minor thing.

Milos
Posts: 3379
Joined: Wed Nov 25, 2009 12:47 am

Re: TCEC Division 1 results simulator

Post by Milos » Thu Aug 30, 2018 10:31 pm

chrisw wrote:
Thu Aug 30, 2018 12:04 pm
Milos wrote:
Thu Aug 30, 2018 12:18 am
Just a small illustration how ridiculous are your assumptions (regarding Premier Division):
http://talkchess.com/forum3/viewtopic.p ... 31#p772531
Haha! It didn't need to be personalised with "your" assumptions, "the" assumptions" would be enough. Or "ridiculous", when "Ethereal is likely 70 pts below Fire, as these blitz test results show ..." would be enough to get the meaning across. But I suppose it wouldn't be Milos without the added chilli pepper. What is unseasoned Milos like, actually? One of the smarter ones, in my opinion. But, anyway, if you think need the added seasoning, up to you.

Okay, so back to boring factuality-land ...

Assumptions are assumptions. Best thing to do with them is make them clear. Well, I try to. Stated already: I don't really know where Fire should be, in relation to the others. Maybe 100pts or so behind the top three, according to the lists. Is that an unfair assumption? Is it being worked on? Gimme data.

As to the relative position of Ethereal. Well, the programmer suggested when I asked, without being committal, IIRC, that it was around 3300 or so, using the CCRL 40 scaling of elos. When Ethereal played Round 4, it got a tournament elo increase of about 70pts, which I factored into its actual elo at 50% and set it at 3335 for Round 3. It ended Round 3 with no change to its tournament given elo. So I put it in this Round at 3334. TCEC chose 3341, not much different.
Ethereal is doing well enough this round, TCEC is giving it currently an elo improvement of +50, and my algorithm says something similar. If promoted and all else stays equal, my elo-giving algorithm will probably start Ethereal off at 3360 in Premier Division.

So, like I said in the previous post, which it looks like you didn't actually carefully read:

"The elos of the other six engines are more likely to be problematic (too low, because progress)."

If you want to suggest a rough figure for Fire, including a "worked on" estimate, feel free, I am all ears.
I apologize for the language used.
Regarding topic, I have a problem in general with TCEC "ratings". These are not real data but some kind of mishmash of a priory CCRL ratings and live rating change based on actual TCEC games of that season. Problem is that person who made the calculation in TCEC (completely wrongly) assumed due to TCEC conditions that CCRL prior plays very small role, while in reality CCRL prior should have more than a dominant effect on rating calculation and TCEC games very small effect if any.
So +50, +100Elo rating gains based on few TCEC games are nothing but BS. Also assuming that since CCRL tests last official version that is like 3 months old and there is a new version playing in TCEC it could gain 50 and more Elo is simply dreaming or having no clue how painstakingly slow improvement of strong engines is.

Uri Blass
Posts: 8507
Joined: Wed Mar 08, 2006 11:37 pm
Location: Tel-Aviv Israel

Re: TCEC Division 1 results simulator

Post by Uri Blass » Fri Aug 31, 2018 5:06 pm

Milos wrote:
Thu Aug 30, 2018 9:54 pm
Uri Blass wrote:
Thu Aug 30, 2018 3:36 pm
Milos wrote:
Thu Aug 30, 2018 11:43 am
Oh give me a break and try to say something useful for a change. Both Fire and Ethereal have identical LazySMP as SF, and scale pretty much identically. These claims that we don't know how something behaves at 10 or 40x longer TC is just BS. We do know. There is just compression of rating differences. There is never an inversion. No one ever showed one with any reasonably strong A/B engine.
CCRL 40/40 is irrelevant. Ethereal only played 400 games there. Probably real difference vs Fire is 50-60 Elo instead of 70-80. And that's perfectly fine for 10x longer TC. And it's a simple fact that you compress rating differences with longer TC but you also compress error margins. So probability of marginal events happening remains the same.
It played 50 games in TCEC and ppl make even more ridiculous conclusions. It's a hype and nothing else. Chance for Ethereal to finish 4th in primary division is less than 20%. To finish in top 3 is less than 2%.
It is not correct that there is never an inversion.

Here is an example for an inversion with smaller elo difference

From the ccrl 40/4 list

http://www.computerchess.org.uk/ccrl/404/

Colossus 2008b 2642 +7 −7 48.3% +13.6 30.7% 8497
Movei 00.8.438 (10 10 10) 2624 +7 −7 46.8% +23.9 30.7% 8182

from the ccrl 40/40 list

Movei 00.8.438 (10 10 10) 2667 +11 −11 47.3% +15.3 37.4% 2693
Colossus 2008b 2643 +13 −13 49.3% +2.4 38.3% 2098

I know from my own tests when i developed movei and tested it with time handicap that the better engine usually scaled better because in most cases the main advantage of the better engine was not a programmer who knew to do the same thing on the computer significantly faster but an algorithm that scale better.
Inspite of it in theory it is possible that one engine has an algorithm that scale better when the second engine has a programmer who is better in tricks how to do the same thing faster and in this case I expect to see an inversion at some time control.
That's why I put reasonably strong A/B engines. With all due respect Movei (or specially Colossus in this particular example) is not something that I would classify as a reasonably strong engine.
Problem with engines below 2800Elo or less and even more with relatively old engines is that often there are some bugs and problems with search (like improper reductions, extensions) that can cause scaling issues and that engine plays much worse at longer TC's than at bullet.
There was at some point a few years back, a story that Komodo scales much better at longer TC and even some ppl claimed there would be inversion related to SF, since SF was at that time 20-30 Elo stronger at bullet. And then came TCEC super final (don't remember which season exactly and I'm too lazy to look for it) and due to sharp openings difference after 100 games was even larger than what was at bullet.
I decided to try to find cases of inversion at the ccrl list for high rated programs when I compare 40/40 and 40/4.

so far I found some inversions in 4 cpu vs 1 cpu when for some reason 4 cpu seem to perform better at long time control when I suspect that the problem is that there are not enough games of 4 cpu against 1 cpu to get a reliable rating.

The way I did it was simply by downloading the lists and manually deleting programs that appear only in one of the lists or programs that can be better than all other programs in both lists in the process because it is obvious that if A can be number 1 in both 40/40 and 40/4 list A cannot be part of an inversion.

The highest case that I could find is the following:
40/40
Komodo 10.1 64-bit 4CPU 3366 +19 −19 74.8% −163.6 43.7% 938
SugaR XPrO 1.5.3 64-bit 3321 +18 −18 71.2% −139.0 50.0% 1050

40/4

SugaR XPrO 1.5.3 64-bit 3463 +22 −22 64.8% −95.2 57.4% 599
Komodo 10.1 64-bit 4CPU 3429 +18 −18 72.2% −158.5 39.5% 1164

next pair is
40/40
Komodo 9.42 64-bit 4CPU 3345 +21 −20 74.7% −158.9 47.2% 777
SugaR XPrO 1.2 64-bit 3314 +16 −16 68.1% −117.5 56.0% 1172


40/4

SugaR XPrO 1.2 64-bit 3444 +15 −15 61.0% −80.5 59.5% 1300
Komodo 9.42 64-bit 4CPU 3406 +17 −17 74.9% −170.7 39.1% 1244

next pair is more significant difference

40/40

Deep Shredder 13 64-bit 4CPU 3287 +13 −13 51.2% −7.6 61.9% 1687
Stockfish 7 64-bit 3246 +13 −13 67.9% −112.8 54.2% 1809

40/4
Stockfish 7 64-bit 3355 +14 −14 76.0% −188.2 37.6% 2066
Deep Shredder 13 64-bit 4CPU 3328 +11 −11 58.9% −64.1 42.7% 2818

Uri Blass
Posts: 8507
Joined: Wed Mar 08, 2006 11:37 pm
Location: Tel-Aviv Israel

Re: TCEC Division 1 results simulator

Post by Uri Blass » Fri Aug 31, 2018 5:43 pm

next pair is the first pair of inversion that is not different number of cores.

40/40
Komodo 9 64-bit 3236 +20 −20 71.8% −138.0 46.0% 848
Fire 6.1 64-bit 3204 +12 −12 52.9% −17.2 55.5% 2084

40/4
Fire 6.1 64-bit 3291 +11 −11 50.0% −14.2 44.8% 2727
Komodo 9 64-bit 3259 +20 −20 73.9% −174.0 33.9% 949

AndrewGrant
Posts: 467
Joined: Tue Apr 19, 2016 4:08 am
Location: U.S.A
Full name: Andrew Grant
Contact:

Re: TCEC Division 1 results simulator

Post by AndrewGrant » Fri Aug 31, 2018 5:46 pm

Uri Blass wrote:
Fri Aug 31, 2018 5:43 pm
next pair is the first pair of inversion that is not different number of cores.

40/40
Komodo 9 64-bit 3236 +20 −20 71.8% −138.0 46.0% 848
Fire 6.1 64-bit 3204 +12 −12 52.9% −17.2 55.5% 2084

40/4
Fire 6.1 64-bit 3291 +11 −11 50.0% −14.2 44.8% 2727
Komodo 9 64-bit 3259 +20 −20 73.9% −174.0 33.9% 949
Only a head-to-head case is of interest.

chrisw
Posts: 1744
Joined: Tue Apr 03, 2012 2:28 pm

Re: TCEC Division 1 results simulator

Post by chrisw » Fri Aug 31, 2018 6:27 pm

79 rounds so far
Changed the output slightly, sim now shows the chances of being 1st, 2nd and so on.
Using 10% as the bar:

Division 1, Chiron predicted between 2nd and 5th
Booot between 3rd and 7th
Fizbo between 2nd and 7th

Then , assuming Ethereal and Chiron go through ...
Ethereal predicted final place somewhere between 3rd and 6th
Houdini predicted final place somewhere between 1st and 3rd
Andsacs 6th to 8th
Fire 3rd to 6th and so on

Code: Select all

Division 1
Engine     Tournament Initial  First   Second  Third   Fourth  Fifth   Sixth   Seventh Eighth
Ethereal       3391   3334     0.825   0.104   0.040   0.018   0.008   0.004   0.001   0.000
Chiron         3347   3288     0.058   0.303   0.296   0.194   0.104   0.038   0.007   0.000
Fizbo          3285   3287     0.029   0.125   0.123   0.143   0.153   0.179   0.187   0.060
ChessBrainVB   3239   3279     0.029   0.131   0.117   0.098   0.088   0.097   0.146   0.294
Laser          3211   3226     0.019   0.102   0.110   0.112   0.100   0.109   0.162   0.287
Fritz          3236   3226     0.019   0.111   0.108   0.140   0.161   0.185   0.195   0.082
Booot          3199   3272     0.008   0.056   0.106   0.184   0.268   0.249   0.114   0.014
Jonny          3184   3140     0.013   0.068   0.099   0.111   0.118   0.140   0.190   0.262
using starting elos = CCRL 40 + guess at progress + reading some posts

Code: Select all

	
	"Stockfish", 0, 3441+70,
	"Komodo", 0, 3405+35,
	"Houdini", 0, 3400+70,
	"Fire", 0, 3326+60,
	"Andscacs", 0, 3245+70,
	"Ginkgo", 0, 3322+0,
	"P1", 0, 1200,
	"P2", 0, 1200,

Code: Select all

Division Premier
Engine     Tournament Initial  First   Second  Third   Fourth  Fifth   Sixth   Seventh Eighth
Stockfish      3511   3511     0.696   0.226   0.063   0.012   0.002   0.000   0.000   0.000
Houdini        3470   3470     0.217   0.453   0.232   0.071   0.022   0.004   0.001   0.000
Komodo         3440   3440     0.070   0.240   0.403   0.182   0.074   0.024   0.006   0.001
Fire           3386   3386     0.006   0.034   0.119   0.276   0.288   0.167   0.080   0.030
Chiron         3347   3347     0.001   0.004   0.027   0.092   0.182   0.300   0.240   0.153
Ginkgo         3322   3322     0.001   0.001   0.007   0.034   0.091   0.198   0.314   0.354
Andscacs       3315   3315     0.001   0.001   0.005   0.025   0.070   0.162   0.297   0.440
Ethereal       3391   3391     0.008   0.041   0.143   0.308   0.269   0.145   0.063   0.022

Post Reply