A dangerous combination

Discussion of computer chess matches and engine tournaments.

Moderators: hgm, Rebel, chrisw

Michael Sherwin
Posts: 3196
Joined: Fri May 26, 2006 3:00 am
Location: WY, USA
Full name: Michael Sherwin

Re: I still cant say much (best performance ever against Dan

Post by Michael Sherwin »

Tony Thomas wrote:Romi got 75% against Danasah, I hope that Michael isnt tuning against Dana.

Code: Select all

4 RomiChessDK6              : 2473  180 (+ 84,= 39,- 57), 57.5 %

Danasah 2.85                  :  30 (+ 20,=  5,-  5), 75.0 %
Francesca MAD 0.13            :  30 (+  7,= 10,- 13), 40.0 %
Lime 6.3                      :  30 (+ 25,=  3,-  2), 88.3 %
Djinn 0.925x                  :  30 (+ 11,=  8,- 11), 50.0 %
Zappa 1.1                     :  30 (+  9,=  4,- 17), 36.7 %
Arasan 9.5                    :  30 (+ 12,=  9,-  9), 55.0 %
Krazzy isn't it! :D No I am not currently using DanaSah to tune against.
Romi is back to her old stomping grounds - she is on the trail of some really mean and viscious hamsters! :lol:

The variance that you are experiancing in your test is due IMHO, to the randomness of the opening books.

Even so, 75% vs DanaSah is huger than huge! :D
If you are on a sidewalk and the covid goes beep beep
Just step aside or you might have a bit of heat
Covid covid runs through the town all day
Can the people ever change their ways
Sherwin the covid's after you
Sherwin if it catches you you're through
Tony Thomas

Re: I still cant say much (best performance ever against Dan

Post by Tony Thomas »

Michael Sherwin wrote:
Tony Thomas wrote:Romi got 75% against Danasah, I hope that Michael isnt tuning against Dana.

Code: Select all

4 RomiChessDK6              : 2473  180 (+ 84,= 39,- 57), 57.5 %

Danasah 2.85                  :  30 (+ 20,=  5,-  5), 75.0 %
Francesca MAD 0.13            :  30 (+  7,= 10,- 13), 40.0 %
Lime 6.3                      :  30 (+ 25,=  3,-  2), 88.3 %
Djinn 0.925x                  :  30 (+ 11,=  8,- 11), 50.0 %
Zappa 1.1                     :  30 (+  9,=  4,- 17), 36.7 %
Arasan 9.5                    :  30 (+ 12,=  9,-  9), 55.0 %
Krazzy isn't it! :D No I am not currently using DanaSah to tune against.
Romi is back to her old stomping grounds - she is on the trail of some really mean and viscious hamsters! :lol:

The variance that you are experiancing in your test is due IMHO, to the randomness of the opening books.

Even so, 75% vs DanaSah is huger than huge! :D
Those poor animals, how much blood are you spilling nowadays? I am pretty sure that its not as close to 100% as it was before.
Michael Sherwin
Posts: 3196
Joined: Fri May 26, 2006 3:00 am
Location: WY, USA
Full name: Michael Sherwin

Re: I still cant say much (best performance ever against Dan

Post by Michael Sherwin »

Tony Thomas wrote:
Michael Sherwin wrote:
Tony Thomas wrote:Romi got 75% against Danasah, I hope that Michael isnt tuning against Dana.

Code: Select all

4 RomiChessDK6              : 2473  180 (+ 84,= 39,- 57), 57.5 %

Danasah 2.85                  :  30 (+ 20,=  5,-  5), 75.0 %
Francesca MAD 0.13            :  30 (+  7,= 10,- 13), 40.0 %
Lime 6.3                      :  30 (+ 25,=  3,-  2), 88.3 %
Djinn 0.925x                  :  30 (+ 11,=  8,- 11), 50.0 %
Zappa 1.1                     :  30 (+  9,=  4,- 17), 36.7 %
Arasan 9.5                    :  30 (+ 12,=  9,-  9), 55.0 %
Krazzy isn't it! :D No I am not currently using DanaSah to tune against.
Romi is back to her old stomping grounds - she is on the trail of some really mean and viscious hamsters! :lol:

The variance that you are experiancing in your test is due IMHO, to the randomness of the opening books.

Even so, 75% vs DanaSah is huger than huge! :D
Those poor animals, how much blood are you spilling nowadays? I am pretty sure that its not as close to 100% as it was before.
The latest beta was at 75% at game 50 in a Nooman.pgn match before falling all the way down to 63% at game 100. Still 63% is better than the previous 58%!

Also Romi does not like the new test set (Sherwin50.pgn) that I made for her as she only scored 55% against Hamsters 0.2. She was quite upset with me until I told her that when she masters this test set she will be stompping hamsters like never before. Not sure that she believed me, but she is playing along for now! :D
If you are on a sidewalk and the covid goes beep beep
Just step aside or you might have a bit of heat
Covid covid runs through the town all day
Can the people ever change their ways
Sherwin the covid's after you
Sherwin if it catches you you're through
Tony Thomas

Re: I still cant say much (best performance ever against Dan

Post by Tony Thomas »

Michael Sherwin wrote:
Tony Thomas wrote:
Michael Sherwin wrote:
Tony Thomas wrote:Romi got 75% against Danasah, I hope that Michael isnt tuning against Dana.

Code: Select all

4 RomiChessDK6              : 2473  180 (+ 84,= 39,- 57), 57.5 %

Danasah 2.85                  :  30 (+ 20,=  5,-  5), 75.0 %
Francesca MAD 0.13            :  30 (+  7,= 10,- 13), 40.0 %
Lime 6.3                      :  30 (+ 25,=  3,-  2), 88.3 %
Djinn 0.925x                  :  30 (+ 11,=  8,- 11), 50.0 %
Zappa 1.1                     :  30 (+  9,=  4,- 17), 36.7 %
Arasan 9.5                    :  30 (+ 12,=  9,-  9), 55.0 %
Krazzy isn't it! :D No I am not currently using DanaSah to tune against.
Romi is back to her old stomping grounds - she is on the trail of some really mean and viscious hamsters! :lol:

The variance that you are experiancing in your test is due IMHO, to the randomness of the opening books.

Even so, 75% vs DanaSah is huger than huge! :D
Those poor animals, how much blood are you spilling nowadays? I am pretty sure that its not as close to 100% as it was before.
The latest beta was at 75% at game 50 in a Nooman.pgn match before falling all the way down to 63% at game 100. Still 63% is better than the previous 58%!

Also Romi does not like the new test set (Sherwin50.pgn) that I made for her as she only scored 55% against Hamsters 0.2. She was quite upset with me until I told her that when she masters this test set she will be stompping hamsters like never before. Not sure that she believed me, but she is playing along for now! :D
I played one more game, this time against Phalanx. Romi has never been able to score more than 60% against Phalanx, despite the engine being clearly weaker than Arasan and same level as Dana. Some romi versions have even lost their match to Phalanx. Romi's results dropped sligtly due to the not so great performance against the old warrior.

Code: Select all

5 RomiChessDK6              : 2466  210 (+ 98,= 42,- 70), 56.7 %

Danasah 2.85                  :  30 (+ 20,=  5,-  5), 75.0 %
Phalanx Reborn                :  30 (+ 14,=  3,- 13), 51.7 %
Francesca MAD 0.13            :  30 (+  7,= 10,- 13), 40.0 %
Lime 6.3                      :  30 (+ 25,=  3,-  2), 88.3 %
Djinn 0.925x                  :  30 (+ 11,=  8,- 11), 50.0 %
Zappa 1.1                     :  30 (+  9,=  4,- 17), 36.7 %
Arasan 9.5                    :  30 (+ 12,=  9,-  9), 55.0 %
Tony Thomas

Still look very friggin close

Post by Tony Thomas »

Another not so bad match for the current version of Romi versus Delphil. It is still behind the previous version but only by 5 points or so.

Code: Select all

5 RomiChessDK6              : 2465  240 (+111,= 51,- 78), 56.9 %

Danasah 2.85                  :  30 (+ 20,=  5,-  5), 75.0 %
Phalanx Reborn                :  30 (+ 14,=  3,- 13), 51.7 %
Delphil 1.6c                  :  30 (+ 13,=  9,-  8), 58.3 %
Francesca MAD 0.13            :  30 (+  7,= 10,- 13), 40.0 %
Lime 6.3                      :  30 (+ 25,=  3,-  2), 88.3 %
Djinn 0.925x                  :  30 (+ 11,=  8,- 11), 50.0 %
Zappa 1.1                     :  30 (+  9,=  4,- 17), 36.7 %
Arasan 9.5                    :  30 (+ 12,=  9,-  9), 55.0 %
Tony Thomas

Test is finished (better than previous version or is it?)

Post by Tony Thomas »

Romi did not so bad in last few matches, she came ahead or previous version by 4 points. So I cant really say which one is stronger, either version could be better than the other.

Code: Select all

4 RomiChessDK6              : 2472  360 (+186,= 69,-105), 61.2 %

Danasah 2.85                  :  30 (+ 20,=  5,-  5), 75.0 %
Phalanx Reborn                :  30 (+ 14,=  3,- 13), 51.7 %
Delphil 1.6c                  :  30 (+ 13,=  9,-  8), 58.3 %
Francesca MAD 0.13            :  30 (+  7,= 10,- 13), 40.0 %
Lime 6.3                      :  30 (+ 25,=  3,-  2), 88.3 %
Djinn 0.925x                  :  30 (+ 11,=  8,- 11), 50.0 %
Zeus 1.28                     :  30 (+ 18,=  4,-  8), 66.7 %
Zappa 1.1                     :  30 (+  9,=  4,- 17), 36.7 %
GreKo 5.2                     :  30 (+ 25,=  4,-  1), 90.0 %
NanoSzachy 2.7                :  30 (+ 19,=  4,-  7), 70.0 %
Arasan 9.5                    :  30 (+ 12,=  9,-  9), 55.0 %
Horizon_4_3_173               :  30 (+ 13,=  6,- 11), 53.3 %
Michael Sherwin
Posts: 3196
Joined: Fri May 26, 2006 3:00 am
Location: WY, USA
Full name: Michael Sherwin

Re: Test is finished (better than previous version or is it?

Post by Michael Sherwin »

Hi Tony,

Thanks for running this test! What an incredibly bumpy ride this was. Two results stick out more than all the others, that are very interesting, 75% vs DanaSah and 90% vs Grecko. This is by far the best performance by Romi against these two engines ever in any test anywhere. Romi in two betas has climbed from below 50% against DanaSah to 75% and 90% against Grecko is just unbelievable! And yet we have no real good idea if this latest beta is any better than the last one. What to do now?

Mike
If you are on a sidewalk and the covid goes beep beep
Just step aside or you might have a bit of heat
Covid covid runs through the town all day
Can the people ever change their ways
Sherwin the covid's after you
Sherwin if it catches you you're through
Michael Sherwin
Posts: 3196
Joined: Fri May 26, 2006 3:00 am
Location: WY, USA
Full name: Michael Sherwin

Re: Test is finished (better than previous version or is it?

Post by Michael Sherwin »

An exercise in logic tells me that it is safe to not count any variance in score in a 30 game match of 3 points (10%) either way. In the above match that leaves only three results to consider.

+6.0 vs Grecko which gave +217
-6.0 vs Djinn which gave -149
+3.5 vs DanaSah which gave +98

for a total of +166

divided by 3 = +55 points

using this possibly flawed logic gives a good indication that this beta may be stronger.

Is there any merit to this calculation or not?
If you are on a sidewalk and the covid goes beep beep
Just step aside or you might have a bit of heat
Covid covid runs through the town all day
Can the people ever change their ways
Sherwin the covid's after you
Sherwin if it catches you you're through
Tony Thomas

Re: Test is finished (better than previous version or is it?

Post by Tony Thomas »

Michael Sherwin wrote:An exercise in logic tells me that it is safe to not count any variance in score in a 30 game match of 3 points (10%) either way. In the above match that leaves only three results to consider.

+6.0 vs Grecko which gave +217
-6.0 vs Djinn which gave -149
+3.5 vs DanaSah which gave +98

for a total of +166

divided by 3 = +55 points

using this possibly flawed logic gives a good indication that this beta may be stronger.

Is there any merit to this calculation or not?
I have no idea, I am pretty sure that statistic experts wont agree with you. Many reasons to why we are testing Romi against 12 different opponents. I am thinking about bumping it up to 16 or may be replacing some of the engines because they are too close in rating even after 600 games or so. I played the first game of Naum and that engine will be integrated in to the rating system pretty soon. I know that the results will make you cry, but since Naum is currently rated about 350 points higher than Romi, it did not affect her rating. Note that I am not using the commercial version of Naum, because my results did not show it to be better or worse than the public version, namely 2.0.

Code: Select all

Romi's sparring partners

Rank Engine Score Na 
1 Naum 2.0  27.0/30 · ·· ·· ·· ·· ·· ·· ·· ·· ·· ·· ·· ·· ·  
2 RomiChessDK6 3.0/30 01000000000000000000=0=0=0000=  


30 games played / Tournament finished
Tournament start: 2007.02.05, 16:11:55
Latest update: 2007.04.11, 01:26:56
Site/ Country: YOUR-E358B65523, United States
Level: Blitz 1/1
Hardware: Intel(R) Celeron(R) CPU 2.80GHz with 239 MB Memory
Operating system: Microsoft Windows XP Home Edition Service Pack 2 (Build 2600)
PGN-File: Romi_s sparring partners.pgn
Table created with: Arena 1.1
Michael Sherwin
Posts: 3196
Joined: Fri May 26, 2006 3:00 am
Location: WY, USA
Full name: Michael Sherwin

Re: Test is finished (better than previous version or is it?

Post by Michael Sherwin »

I would have thought that avoiding extreams would give more solid results. A +/- 1 point is worth 44 ELO when at 90% and only worth 7 ELO at 50%. However, it could be interesting.
If you are on a sidewalk and the covid goes beep beep
Just step aside or you might have a bit of heat
Covid covid runs through the town all day
Can the people ever change their ways
Sherwin the covid's after you
Sherwin if it catches you you're through