

Moderator: Ras
???
I compiled some binaries for Megalodon 0.43. Also I did some tests yesterday if it would be more stable than the last ones I tested inbetween.Gabor Szots wrote: ↑Tue Apr 13, 2021 8:35 am Megalodon 0.4.3 with hash tables!
https://github.com/megalodon-chess/mega ... tag/v0.4.3
Cosette 5.0
https://github.com/Tearth/Cosette/releases/tag/v5.0
Code: Select all
Score of Megalodon_043-64 vs KillerQueen_20b3: 9 - 33 - 18 [0.300]
... Megalodon_043-64 playing White: 5 - 16 - 9 [0.317] 30
... Megalodon_043-64 playing Black: 4 - 17 - 9 [0.283] 30
... White vs Black: 22 - 20 - 18 [0.517] 60
Elo difference: -147.2 +/- 78.8, LOS: 0.0 %, DrawRatio: 30.0 %
60 of 200 games finished.
Player: Megalodon_043-64
"Draw by 3-fold repetition": 15
"Draw by adjudication: user decision": 1
"Draw by stalemate": 2
"Loss: Black loses on time": 1
"Loss: Black mates": 15
"Loss: White loses on time": 1
"Loss: White mates": 16
"Win: Black loses on time": 1
"Win: Black makes an illegal move: e8d8": 1
"Win: Black mates": 3
"Win: White makes an illegal move: b4b5": 1
"Win: White mates": 3
Splendid news, Niels.niel5946 wrote: ↑Tue Apr 13, 2021 5:19 pm After a lot of work and testing, Loki 3.0 is now released. Thanks to Marcel (who pointed out how underrated Loki was, its feature set taken into account), I have been working tirelessly to rework the search and evaluation function. This has paid off greatly!
I have tested the new version to have a strength of around 2510 (>600 elo points stronger than v1.2.0). This was done in three separate gauntlets against MadChess 2.2 and Raven 1.10.
The release: https://github.com/BimmerBass/Loki/releases/tag/v3.0.0
Sounds like great progress, Niels. However, I saw in your recent commit you tested Loki versus MadChess 2.2 with a time control of 5s + 0.1s. I don't test MadChess at lightning time control. The fastest games I use are 2m+1s, so MadChess' rating at 5s + 0.1s is unreliable. My guess is MadChess performs much worse at 5s + 0.1s than 2m + 1s. I don't know why. I've never investigated it. Perhaps due to the disadvantage of a managed memory runtime (C#) versus a native runtime (C++). Perhaps due to poor implementation on my part that manifests itself only in very fast games.niel5946 wrote: ↑Tue Apr 13, 2021 5:19 pm After a lot of work and testing, Loki 3.0 is now released. Thanks to Marcel (who pointed out how underrated Loki was, its feature set taken into account), I have been working tirelessly to rework the search and evaluation function. This has paid off greatly!
I have tested the new version to have a strength of around 2510 (>600 elo points stronger than v1.2.0). This was done in three separate gauntlets against MadChess 2.2 and Raven 1.10.
The release: https://github.com/BimmerBass/Loki/releases/tag/v3.0.0
Thank you for your response! I don't know if this is worth anything, but when I ran the gauntlets, I anchored the ratings to Raven's 2min+1sec CCRL rating, and the rating of MadChess 2.2 (in all three tests) was nearly identical to the one on CCRL Blitz, so I assumed the results would at least be partly valid. I also tested Loki 3.0 against some weaker engines (because my self-play against v2.0 and v1.2.0 showed lower elo changes), and they showed an elo in the same range.emadsen wrote: ↑Tue Apr 13, 2021 7:30 pm Sounds like great progress, Niels. However, I saw in your recent commit you tested Loki versus MadChess 2.2 with a time control of 5s + 0.1s. I don't test MadChess at lightning time control. The fastest games I use are 2m+1s, so MadChess' rating at 5s + 0.1s is unreliable. My guess is MadChess performs much worse at 5s + 0.1s than 2m + 1s. I don't know why. I've never investigated it. Perhaps due to the disadvantage of a managed memory runtime (C#) versus a native runtime (C++). Perhaps due to poor implementation on my part that manifests itself only in very fast games.
Anyhow, if you're basing your ELO estimate of Loki off a gauntlet versus MadChess at 5s + 0.1s, you're likely inflating the rating of Loki. You can answer the question definitively by running games at 2m+1s, which is what CCRL uses for blitz ratings.
OK. I wanted to warn you were wandering into uncharted territory using a very fast time control with my engine. I understand testing is difficult with limited computing resources. We'll see what CCRL finds.