School of Bullet Fish

Discussion of computer chess matches and engine tournaments.

Moderators: hgm, Rebel, chrisw

Gusev
Posts: 1476
Joined: Mon Jan 28, 2013 2:51 pm

Re: School of Bullet Fish

Post by Gusev »

Adam Hair wrote:
Gusev wrote:There is nothing wrong with boundary testing. We take an edge case (100% CPU load), we test it. Some engines handle it more gracefully than others. There were no crashes in this tournament, mind you.
jhellis3 wrote:.... You don't even understand why what you did is wrong. I guess there is no hope for a productive conversation. Ah well, carry on with your "tests"....
I don't understand the purpose of the test. What was learnt here?
Let me elaborate. Here's the statistics of time forfeits (out of 42 games that each engine played).

Code: Select all

1. Houdini 5 - 4
2. McBrain 2.6 - 4
3. McBrain 2.5 - 5
4. Cfish 2017-07-20 - 6
5. SugaR XPro 1.0e - 9
6. Raubfisch ME262_GTZ14c - 6
7. BrainFish 170709 - 10
8. Symphysodon 220517 - 9
9. Stockfish 170715 - 9
10. AsmFish 2017-07-10 - 12
11. Nayeem 10.1 - 7
12. Corchess 1.8 - 12
13. Deep Shredder 13 - 7
14. Booot 6.2 - 0
15. Seagull - 3
16. Gull 3.1 - 2
17. Fizbo 1.9 - 0
18. Andscacs 0.91 - 7
19. Fire 5 - 0
20. Ivanhoe 1945a - 0
21. Komodo 11.2 - 14
22. Sting SF 8.5 - 19
Some engines handle the heat better in my kitchen. Not necessarily the ones that are better at LTC. (In addition to the time forfeits, Houdini 5 lost 2 games for illegal moves.) Overall, 145 games out of 462 terminated due to time forfeits. That's 31.4%, almost a third. Some engines did not play as well as others, yet did not allow any time forfeits under these strenuous conditions (Booot 6.2, Fizbo 1.9, Fire 5, and Ivanhoe 1945a). Finally, considering that Houdini 5 allowed a comparable number of forfeits (+2 illegal moves) to McBrain 2.6, McBrain 2.5, and Cfish that finished next, we can agree that H5 is the king of the extreme bullet.
Gusev
Posts: 1476
Joined: Mon Jan 28, 2013 2:51 pm

Re: School of Bullet Fish

Post by Gusev »

These results cannot be meaningless. Indeed, each engine played 42 games in this tournament. And the number 42, as you may know, is the "Answer to the Ultimate Question of Life, the Universe, and Everything". :D
jhellis3 wrote:
There is nothing wrong with boundary testing. We take an edge case (100% CPU load), we test it. Some engines handle it more gracefully than others. There were no crashes in this tournament, mind you.
Nope, you missed the point yet again... attempt number 3 on why the results are meaningless?