It now gets 47/100 correct. Still not great but better than 38 anyway
weaker winboard engines that support setboard?
Moderator: Ras
-
MattieShoes
- Posts: 718
- Joined: Fri Mar 20, 2009 8:59 pm
Re: weaker winboard engines that support setboard?
Ah hah! I found a bug in my eval code. I tested open file code when I implemented it, found a bug, fixed it, then somehow went back to the buggy code at a later date
As a result, rooks on open files were not being properly scored.
It now gets 47/100 correct. Still not great but better than 38 anyway
It now gets 47/100 correct. Still not great but better than 38 anyway
-
MattieShoes
- Posts: 718
- Joined: Fri Mar 20, 2009 8:59 pm
Re: weaker winboard engines that support setboard?
If you want proof that epd performance correlates poorly with strength...
My engine now gets about 65/100 on undermining and 45/100 on open files and diags for 10 sec/position, but it scores about 25% vs my old eval that scored 52 and 38 on the tests, and did worse vs other engines as well. Not that many games but when the difference is so large...
Oi! I think there must be a bug for it to be that bad.
And I think the reason the number correct from gradualtest are different than arena is because a few positions have two best moves (both with a score of 10) but only one is listed in the original file. In arena, it only recognizes the one listed after bm. With the gradualtest scored format, it counts either.
My engine now gets about 65/100 on undermining and 45/100 on open files and diags for 10 sec/position, but it scores about 25% vs my old eval that scored 52 and 38 on the tests, and did worse vs other engines as well. Not that many games but when the difference is so large...
Oi! I think there must be a bug for it to be that bad.
And I think the reason the number correct from gradualtest are different than arena is because a few positions have two best moves (both with a score of 10) but only one is listed in the original file. In arena, it only recognizes the one listed after bm. With the gradualtest scored format, it counts either.
-
bob
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: weaker winboard engines that support setboard?
Question is, what are you concluding from the above? You certainly can't conclude that 1.7 is 50 Elo stronger than 1.6 with any reasonable level of confidence. Those error bars are huge. And you should remove beaches since it is not winning a single game it seems. I don't see why 1.7 has a higher Elo than 1.6, yet has a 1% worse losing percentage, unless you are not playing the same number of games against each opponent.MattieShoes wrote:If I get to the point where I'm trying to detect small changes in rating like Mr. Hyatt does, I'll certainly have to change to some more complex system, probably with longer time controls. For now though, I'm making large changes to the code, each version seems to be at least 50 points different than the last.
In 1.7, I implemented futility pruning and extended futility pruning. (7.5-12.5 vs Gaviota)
Code: Select all
Rank Name Elo + - games score oppo. draws 1 OliThink 5.1.8alpha 328 73 65 80 80% 66 8% 2 Gaviota 0.33 198 69 68 60 56% 150 8% 3 Moneypenny 1.7 89 65 67 60 38% 186 22% 4 Moneypenny 1.6 34 58 60 80 39% 139 18% 5 Moneypenny 1.5 -58 82 86 60 48% -76 13% 6 Beaches 2.26 -590 172 387 20 0% -58 0%
I do have an external check too -- it plays on FICS. This gives it access to humans, other engines on a variety of hardware, uses a bunch of time controls, and leaves me free to turn pondering on. In this case, it's pretty easy to tell where it switched to 1.7...
-
MattieShoes
- Posts: 718
- Joined: Fri Mar 20, 2009 8:59 pm
Re: weaker winboard engines that support setboard?
It had only played 60 games when I pasted that. It's now more like 180 games and still shows a 52 point increase. The results aren't conclusive or anything. It's played several hundred games online now as well against humans and comps I haven't explicitly tested against, mostly with longer time controls. Rating stays around 2150 (the old version hovered around 2075) And that probably STILL isn't statistically solid but I'm pretty confident it's better.
It is +2 -15 =2 vs a 2-CPU version of Crafty 23.0 by the way
The two wins were in 1 0 and I suspect the crafty CPUs were in power saving mode though.
Since then I've been sidetracked with trying to implement the atomic variant, which turns out to be a royal pain. You can lose without being in check first which makes quiescence searches fun, and you can win AFTER being checkmated.
It is +2 -15 =2 vs a 2-CPU version of Crafty 23.0 by the way
Since then I've been sidetracked with trying to implement the atomic variant, which turns out to be a royal pain. You can lose without being in check first which makes quiescence searches fun, and you can win AFTER being checkmated.
-
PK-4
Re: weaker winboard engines that support setboard?
Hi,MattieShoes wrote:Here's the undermine.epd for the gradualtest scoring stuff.
That was great help. Is it possible to post the converted epd file for Swaminathan's STS (v2.0): Open Files and Diagonals ?
Regards
-
MattieShoes
- Posts: 718
- Joined: Fri Mar 20, 2009 8:59 pm
Re: weaker winboard engines that support setboard?
It's further down the thread 
