asmFish

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

Stephen Ham
Posts: 2488
Joined: Wed Mar 08, 2006 9:40 pm
Location: Eden Prairie, Minnesota
Full name: Stephen Ham

asmFish

Post by Stephen Ham »

Hello all,

I'm continuing to search for the tactically strongest engine to assist my ICCF performance. So, I'm now running a match between the latest iterations of Stockfish (SF) versus asmFish, both using my "Tactical" opening book.

However, the result is perplexing. After 20 games with a time-control of 40/40, 15/15, game/25, SF leads 5:1, yet asmFish is outsearching SF by 1,000-2,000 kN/s per move!!! How can an engine that is constantly outsearched by such large a large measure, move after move, still defeat the faster engine in these tactical battles?

I can see this result in a technical battle, where superior technique and evaluation can overcome search depth. But these are complex tactical games.

Off the top of my head, the only thing I can think of is that asmFish has not been updated since Dec 7, 2018. So, in that update of nearly five months ago, did it bring asmFish current to that date, or was the Dec 7 update merely a Stockfish update of a much earlier period?

Where I'm going with this is that perhaps so much "quality" has been added to SF, subsequent to asmFish updates, to more than offset the incredible speed differential.

Your thoughts gents?

-Steve-
User avatar
MikeB
Posts: 4889
Joined: Thu Mar 09, 2006 6:34 am
Location: Pen Argyl, Pennsylvania

Re: asmFish

Post by MikeB »

Stephen Ham wrote: Mon Apr 29, 2019 12:16 am Hello all,

I'm continuing to search for the tactically strongest engine to assist my ICCF performance. So, I'm now running a match between the latest iterations of Stockfish (SF) versus asmFish, both using my "Tactical" opening book.

However, the result is perplexing. After 20 games with a time-control of 40/40, 15/15, game/25, SF leads 5:1, yet asmFish is outsearching SF by 1,000-2,000 kN/s per move!!! How can an engine that is constantly outsearched by such large a large measure, move after move, still defeat the faster engine in these tactical battles?

I can see this result in a technical battle, where superior technique and evaluation can overcome search depth. But these are complex tactical games.

Off the top of my head, the only thing I can think of is that asmFish has not been updated since Dec 7, 2018. So, in that update of nearly five months ago, did it bring asmFish current to that date, or was the Dec 7 update merely a Stockfish update of a much earlier period?

Where I'm going with this is that perhaps so much "quality" has been added to SF, subsequent to asmFish updates, to more than offset the incredible speed differential.

Your thoughts gents?

-Steve-
You can check where asmfish is by searching for the bench node count in the closed PR.

https://github.com/official-stockfish/S ... ed+4503866

In this case, it is patch "Rank threats on pinned pawns" made on 7/25/2018. 9 months behind.

In my opinion, the asmfish model is non sustainable. To ask for volunteers to donate scores of laborious hours for free to upgrade SF by 1M or 2M/mps is simply not sustainable. That is why we have compilers to do that work today. It's a nice effort - but eventually life moves on and they find more rewarding things to do with their free time.
Image
tpoppins
Posts: 919
Joined: Tue Nov 24, 2015 9:11 pm
Location: upstate

Re: asmFish

Post by tpoppins »

Stephen Ham wrote: Mon Apr 29, 2019 12:16 am However, the result is perplexing. After 20 games
The results are not perplexing, merely meaningless. What is perplexing is that you persist in wasting time on these tests despite being told repeatedly about the small sample sizes, too-wide error margins and statistical insignificance of such tests. Your excuses for not doing them properly may be perfectly valid but that doesn't make the results valid.
Tirsa Poppins
CCRL
corres
Posts: 3657
Joined: Wed Nov 18, 2015 11:41 am
Location: hungary

Re: asmFish

Post by corres »

tpoppins wrote: Mon Apr 29, 2019 9:01 am
Stephen Ham wrote: Mon Apr 29, 2019 12:16 am However, the result is perplexing. After 20 games
The results are not perplexing, merely meaningless. What is perplexing is that you persist in wasting time on these tests despite being told repeatedly about the small sample sizes, too-wide error margins and statistical insignificance of such tests. Your excuses for not doing them properly may be perfectly valid but that doesn't make the results valid.
If you do not want to know the exact Elo and you just want to know what engine is stronger in a kind of situation you should not make many thousand of games.
I think the experienced difference is caused by the fact the Asmfish is not exactly the standard Stockfish.
The difference arises in part of the issue in programming technique and of the aim of getting a faster binary.
Stephen Ham
Posts: 2488
Joined: Wed Mar 08, 2006 9:40 pm
Location: Eden Prairie, Minnesota
Full name: Stephen Ham

Re: asmFish

Post by Stephen Ham »

tpoppins wrote: Mon Apr 29, 2019 9:01 am
Stephen Ham wrote: Mon Apr 29, 2019 12:16 am However, the result is perplexing. After 20 games
The results are not perplexing, merely meaningless. What is perplexing is that you persist in wasting time on these tests despite being told repeatedly about the small sample sizes, too-wide error margins and statistical insignificance of such tests. Your excuses for not doing them properly may be perfectly valid but that doesn't make the results valid.
Mr. or Ms. Poppins,

Your rant was erroneous on all fronts.

First, the only one declaring small sample size was me. Read my previous posts, this time with comprehension.

Second, nobody "repeatedly" wrote of too-wide error margins, etc. Why claim somebody did?

Third, the claim of statistical insignificance is redundant, as it's instead a reference to small sample size. Why are you redundant?

Fourth, I've made no excuses. Why claim I did?

And fifth, who are you to claim I'm wasting my time? It's my time and I'll determine how to spend it.

I merely reported the status after 20 games. The test match games continue, yet the performance disparity remains. Significantly greater speed is losing tactical games to better coding. Given the longer time-control, the results to date are meaningful even with few games played, and are being confirmed with each subsequent game.

-Steve-
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: asmFish

Post by Laskos »

Stephen Ham wrote: Mon Apr 29, 2019 11:31 am
tpoppins wrote: Mon Apr 29, 2019 9:01 am
Stephen Ham wrote: Mon Apr 29, 2019 12:16 am However, the result is perplexing. After 20 games
The results are not perplexing, merely meaningless. What is perplexing is that you persist in wasting time on these tests despite being told repeatedly about the small sample sizes, too-wide error margins and statistical insignificance of such tests. Your excuses for not doing them properly may be perfectly valid but that doesn't make the results valid.
Mr. or Ms. Poppins,

Your rant was erroneous on all fronts.

First, the only one declaring small sample size was me. Read my previous posts, this time with comprehension.

Second, nobody "repeatedly" wrote of too-wide error margins, etc. Why claim somebody did?

Third, the claim of statistical insignificance is redundant, as it's instead a reference to small sample size. Why are you redundant?

Fourth, I've made no excuses. Why claim I did?

And fifth, who are you to claim I'm wasting my time? It's my time and I'll determine how to spend it.

I merely reported the status after 20 games. The test match games continue, yet the performance disparity remains. Significantly greater speed is losing tactical games to better coding. Given the longer time-control, the results to date are meaningful even with few games played, and are being confirmed with each subsequent game.

-Steve-
It might well not be your waste of time, but it's a waste of time for this forum to have a discussion on your topic. Some folks are confused when a bit of statistics is involved, and your post will only strengthen these confusions, as one poster is already showing.
You are comparing a five months old asmFish with a current Stockfish, get some 5:1 W/L result in slow games, and you want to open a thread about that on CCC.
Dann Corbit
Posts: 12537
Joined: Wed Mar 08, 2006 8:57 pm
Location: Redmond, WA USA

Re: asmFish

Post by Dann Corbit »

It is not necessary for anyone to be combative about the thread contents or questions.
I think the root question here is not about Elo difference, but about LOS, which are rather different.
I think it is also surprising to many people who are not mathematicians how lopsided results can be in early testing, especially with engines that are nearly equal in strength.

For instance, if I flip a coin ten times, I could possibly get the sequence:
h-h-h-h-h-t-h-h-h-h
And it may seem that 'h' is much more likely than 't'.
But every exact sequence of ten h or t is equally likely (all 1023 of them), assuming a fair coin and no bias in the toss.
It is only when large numbers are involved that we converge to the norm.

Suppose that this very discussion had been explained twice already. Would it be bad to explain it a third time?
Especially since it is somewhat counter intuitive and because new readers come all the time, I suggest it is helpful to go ahead and explain it again.
And if it is beneath your dignity to explain again (or the more likely event that you have tired of explaining it), then simply ignore the post.
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
supersharp77
Posts: 1242
Joined: Sat Jul 05, 2014 7:54 am
Location: Southwest USA

Re: asmFish

Post by supersharp77 »

tpoppins wrote: Mon Apr 29, 2019 9:01 am
Stephen Ham wrote: Mon Apr 29, 2019 12:16 am However, the result is perplexing. After 20 games
The results are not perplexing, merely meaningless. What is perplexing is that you persist in wasting time on these tests despite being told repeatedly about the small sample sizes, too-wide error margins and statistical insignificance of such tests. Your excuses for not doing them properly may be perfectly valid but that doesn't make the results valid.
Hahahaha....Tpoppins!! :) :wink:

AsmFish 120k vs Stockfish Development...1.2mb .....Is faster better? Methinks Deep Learning Stockfish is the future of Engine chess along with Scorpio (MC) and Komodo 12 (MC) ShashChess and the like....No Doubts... :) :wink:
Stephen Ham
Posts: 2488
Joined: Wed Mar 08, 2006 9:40 pm
Location: Eden Prairie, Minnesota
Full name: Stephen Ham

Re: asmFish

Post by Stephen Ham »

Hi Dann and Kai, and fellow members,

Thank you gents for your thoughtful posts. As a result, I wonder if I should have worded my initial post better, since it's apparent that you've misunderstood me, Kai.

I know that results from just 20 games in an on-going match is far too little to draw conclusions from. But, I made no conclusions - merely an observation about data that developed. That observation is that all my match games are: dynamic, unbalanced, and highly complex tactical battles. Even the endgames remain tactically sharp. In such circumstances, evaluation and technique play lesser roles. Instead, the single greatest factor is search speed, not evaluation and technique. Here, asmFish is outsearching Stockfish by 1,000-2,000kN/s on each move, move after move, game after game, yet is getting clobbered. Now after 33 games, the victory count is 6:2 in favor of Stockfish and Stockfish is claiming a "winning position" in game 34.

Since you gents are engine and computer experts, and I'm an admitted dummy, I turned to you for an explanation of how a speedy engine, given a search speed environment virtually made for it, can get clobbered by a much slower engine.

Back to the issue of engine testing. It's generally agreed that to test similar engines, one needs a minimum of 1,000 games. But such testing is of overall game strength - not specifically tactical acuity. And to achieve so many games in a reasonable time, the testing is done with speed chess. The result, however, is quantitative rather than qualitative.

Instead, as an ICCF competitor who uses an engine for long periods during move selection, I want a more qualitative test. So, my time control in the asmFish match is largely game/80'. Yes, given the much longer time control, fewer games can be played per unit of time. However, after the match as in previous test matches, I audit each completed game with Stockfish's assistance to see what the engines saw/missed. Rather than game strength, I'm trying to just measure tactical strength and engine tendencies. So, this post-mortem analysis also provides qualitative data.

Hopefully this clarifies what my initial post was about. More match games are being played, yet Stockfish continues to be dominant.

Corres and MikeB, thank you for your helpful replies that apparently explain this mystery. Mike B, you've confirmed my fear that although asmFish is nearly five months old, its last update really was about nine months ago. And Corres, I agree with you that apparently Stockfish's quality updates are superior to asmFisch's huge speed advantage, even though this test favors search speed. That says a lot for Stockfish's developers.

All the best,
-Steve-
corres
Posts: 3657
Joined: Wed Nov 18, 2015 11:41 am
Location: hungary

Re: asmFish

Post by corres »

Reading the above posts (Laskos and Corbit) one can think a match result depend on only the number of games and no any other factors. Laskos mentioned the Asmfish is an older one but he did not suppose that only this fact may be enough to get the result of 5 : 1 for newer Stockfish dev in a match of 20 games.
Moreover Stephen mentioned he used tactical start position only. Is there anybody who investigated the power of Stockfish and Asmfish in relation to each other in tactical positions? I do not know about such an investigation.
It would not be a superfluous test because Asmfish is not an asm-copy of Stockfish and the difference between them more then the speed. There was a real copy named pedantFish, but this version ended in the past. The speed of PedantFish was about 10 % higher than Stockfish but this 10 % was important in very short games only.
Generally it is a miss to suppose only some % of higher speed gives better result in an LTC match.
It may true in the case of a fast games but in a long time games no.
I think if Stephen reruns the test there is very-very small possibility that Asmfish will win it with a result of
5 : 1.
So valuing a test result it is a too few to take into consideration the number of games only.
If somebody want to decide just the exact Elo numbers - this is another case.