Seems to be something wrong. But, if things are as they seem then you have unwittingly conducted a very interesting experiment. This shows the wide variance in results that can be seen in a gauntlet of this size. That is why Bob says that 2560 games are needed per opponent to make a determination.
I believe that you were supposed to run NG4 in this latest gauntlet!
But, this is also very interesting, so please let it finish.
Thanks!
Mike
If you are on a sidewalk and the covid goes beep beep
Just step aside or you might have a bit of heat
Covid covid runs through the town all day
Can the people ever change their ways
Sherwin the covid's after you
Sherwin if it catches you you're through
Seems to be something wrong. But, if things are as they seem then you have unwittingly conducted a very interesting experiment. This shows the wide variance in results that can be seen in a gauntlet of this size. That is why Bob says that 2560 games are needed per opponent to make a determination.
I don't think the result will be the same if you run 2 different tests of 2500 games with the same version.There's some sort of variance maybe it's due to Romi's learning features?
I believe that you were supposed to run NG4 in this latest gauntlet!
Ouch, Sorry Yes This was an error.Anyway, From next edition if you haven't got a newer beta then i can make tests on Romichess NG4.
But, this is also very interesting, so please let it finish.
Thanks!
Mike
Yeah,Let's see what the difference in the scores are.
It looks like Graham's trophy edition of RomiChess will be playing in your next edition. Sofar, it's kicking ass. Not only have I got it tuned up really well, I also added a little something from Glaurung that seems to help. Like Tord, I now "shave" the last two bits off of the eval to decrease the granularity of the scores returned. This allows for more efficient null move prunning and also quicker more frequent beta-cuts. Or at least I think that that is what it does.
If you are on a sidewalk and the covid goes beep beep
Just step aside or you might have a bit of heat
Covid covid runs through the town all day
Can the people ever change their ways
Sherwin the covid's after you
Sherwin if it catches you you're through
Michael Sherwin wrote:It looks like Graham's trophy edition of RomiChess will be playing in your next edition.
That's the cool prize to award the winner,Thanks
I heard from Alessandro yesterday that his new Hamsters beta has surpassed Kiwi's strength and has shown progress in 300 games he ran. Romi has to catch up with Hamsters!
Michael Sherwin wrote:It looks like Graham's trophy edition of RomiChess will be playing in your next edition.
That's the cool prize to award the winner,Thanks
I heard from Alessandro yesterday that his new Hamsters beta has surpassed Kiwi's strength and has shown progress in 300 games he ran. Romi has to catch up with Hamsters!
Once I reach my goal of 70% vs Hamsters 0.2 then I will tackle the latest available version of Hamsters. Sofar, after 28 games of 100 in a 4'4 match it is Romi +16 -3 =9 for 73%!
When I first saw Alessandro's Hamsters, I knew that it has the potential to be one of the great programs. That is why I am so glad that Alessandro is working on it again.
+17 -3 =9
If you are on a sidewalk and the covid goes beep beep
Just step aside or you might have a bit of heat
Covid covid runs through the town all day
Can the people ever change their ways
Sherwin the covid's after you
Sherwin if it catches you you're through
Testing it against the antique version of Hamsters would be pointless,Michael. It's similar to testing the Latest beta of Hamsters against the Public old Romichess proto.
Get the prize and see if you can report the similar results.
swami wrote:Testing it against the antique version of Hamsters would be pointless,Michael. It's similar to testing the Latest beta of Hamsters against the Public old Romichess proto.
Get the prize and see if you can report the similar results.
Everyone thinks that it is pointless!
It is a known quantity that I can use to judge an improvement. A better score is a better score. That is why I still test against Crafty 19.19 instead of moving up to 21.5. I have a 'yardstick' that I use to measure with. If I change the 'yardstick' then I have to go back and test previous versions of RomiChess against the new 'yardstick' to tell if my new results are an improvement and that kind of time, I do not have. I change the 'yardstick' as infrequently as possible for this reason.
+18 -4 =9
If you are on a sidewalk and the covid goes beep beep
Just step aside or you might have a bit of heat
Covid covid runs through the town all day
Can the people ever change their ways
Sherwin the covid's after you
Sherwin if it catches you you're through
Yeah Except few members who have every single version of engines
I only use updated or the best version of the engine and that's why It seems pointless to me, but it would seem useful to you
It is a known quantity that I can use to judge an improvement. A better score is a better score. That is why I still test against Crafty 19.19 instead of moving up to 21.5. I have a 'yardstick' that I use to measure with. If I change the 'yardstick' then I have to go back and test previous versions of RomiChess against the new 'yardstick' to tell if my new results are an improvement and that kind of time, I do not have. I change the 'yardstick' as infrequently as possible for this reason.
I see, you do have a good way of testing engines, but I don't have much interest in testing old versions except in the case where old version is better than the current one. Crafty 19 and Hamsters 0.2 are ancient.