Krazzy isn't it! No I am not currently using DanaSah to tune against.
Romi is back to her old stomping grounds - she is on the trail of some really mean and viscious hamsters!
The variance that you are experiancing in your test is due IMHO, to the randomness of the opening books.
Even so, 75% vs DanaSah is huger than huge!
If you are on a sidewalk and the covid goes beep beep
Just step aside or you might have a bit of heat
Covid covid runs through the town all day
Can the people ever change their ways
Sherwin the covid's after you
Sherwin if it catches you you're through
Krazzy isn't it! No I am not currently using DanaSah to tune against.
Romi is back to her old stomping grounds - she is on the trail of some really mean and viscious hamsters!
The variance that you are experiancing in your test is due IMHO, to the randomness of the opening books.
Even so, 75% vs DanaSah is huger than huge!
Those poor animals, how much blood are you spilling nowadays? I am pretty sure that its not as close to 100% as it was before.
Krazzy isn't it! No I am not currently using DanaSah to tune against.
Romi is back to her old stomping grounds - she is on the trail of some really mean and viscious hamsters!
The variance that you are experiancing in your test is due IMHO, to the randomness of the opening books.
Even so, 75% vs DanaSah is huger than huge!
Those poor animals, how much blood are you spilling nowadays? I am pretty sure that its not as close to 100% as it was before.
The latest beta was at 75% at game 50 in a Nooman.pgn match before falling all the way down to 63% at game 100. Still 63% is better than the previous 58%!
Also Romi does not like the new test set (Sherwin50.pgn) that I made for her as she only scored 55% against Hamsters 0.2. She was quite upset with me until I told her that when she masters this test set she will be stompping hamsters like never before. Not sure that she believed me, but she is playing along for now!
If you are on a sidewalk and the covid goes beep beep
Just step aside or you might have a bit of heat
Covid covid runs through the town all day
Can the people ever change their ways
Sherwin the covid's after you
Sherwin if it catches you you're through
Krazzy isn't it! No I am not currently using DanaSah to tune against.
Romi is back to her old stomping grounds - she is on the trail of some really mean and viscious hamsters!
The variance that you are experiancing in your test is due IMHO, to the randomness of the opening books.
Even so, 75% vs DanaSah is huger than huge!
Those poor animals, how much blood are you spilling nowadays? I am pretty sure that its not as close to 100% as it was before.
The latest beta was at 75% at game 50 in a Nooman.pgn match before falling all the way down to 63% at game 100. Still 63% is better than the previous 58%!
Also Romi does not like the new test set (Sherwin50.pgn) that I made for her as she only scored 55% against Hamsters 0.2. She was quite upset with me until I told her that when she masters this test set she will be stompping hamsters like never before. Not sure that she believed me, but she is playing along for now!
I played one more game, this time against Phalanx. Romi has never been able to score more than 60% against Phalanx, despite the engine being clearly weaker than Arasan and same level as Dana. Some romi versions have even lost their match to Phalanx. Romi's results dropped sligtly due to the not so great performance against the old warrior.
Romi did not so bad in last few matches, she came ahead or previous version by 4 points. So I cant really say which one is stronger, either version could be better than the other.
Thanks for running this test! What an incredibly bumpy ride this was. Two results stick out more than all the others, that are very interesting, 75% vs DanaSah and 90% vs Grecko. This is by far the best performance by Romi against these two engines ever in any test anywhere. Romi in two betas has climbed from below 50% against DanaSah to 75% and 90% against Grecko is just unbelievable! And yet we have no real good idea if this latest beta is any better than the last one. What to do now?
Mike
If you are on a sidewalk and the covid goes beep beep
Just step aside or you might have a bit of heat
Covid covid runs through the town all day
Can the people ever change their ways
Sherwin the covid's after you
Sherwin if it catches you you're through
An exercise in logic tells me that it is safe to not count any variance in score in a 30 game match of 3 points (10%) either way. In the above match that leaves only three results to consider.
+6.0 vs Grecko which gave +217
-6.0 vs Djinn which gave -149
+3.5 vs DanaSah which gave +98
for a total of +166
divided by 3 = +55 points
using this possibly flawed logic gives a good indication that this beta may be stronger.
Is there any merit to this calculation or not?
If you are on a sidewalk and the covid goes beep beep
Just step aside or you might have a bit of heat
Covid covid runs through the town all day
Can the people ever change their ways
Sherwin the covid's after you
Sherwin if it catches you you're through
Michael Sherwin wrote:An exercise in logic tells me that it is safe to not count any variance in score in a 30 game match of 3 points (10%) either way. In the above match that leaves only three results to consider.
+6.0 vs Grecko which gave +217
-6.0 vs Djinn which gave -149
+3.5 vs DanaSah which gave +98
for a total of +166
divided by 3 = +55 points
using this possibly flawed logic gives a good indication that this beta may be stronger.
Is there any merit to this calculation or not?
I have no idea, I am pretty sure that statistic experts wont agree with you. Many reasons to why we are testing Romi against 12 different opponents. I am thinking about bumping it up to 16 or may be replacing some of the engines because they are too close in rating even after 600 games or so. I played the first game of Naum and that engine will be integrated in to the rating system pretty soon. I know that the results will make you cry, but since Naum is currently rated about 350 points higher than Romi, it did not affect her rating. Note that I am not using the commercial version of Naum, because my results did not show it to be better or worse than the public version, namely 2.0.
I would have thought that avoiding extreams would give more solid results. A +/- 1 point is worth 44 ELO when at 90% and only worth 7 ELO at 50%. However, it could be interesting.
If you are on a sidewalk and the covid goes beep beep
Just step aside or you might have a bit of heat
Covid covid runs through the town all day
Can the people ever change their ways
Sherwin the covid's after you
Sherwin if it catches you you're through