My engine simply does not crash. But only after debugging stability issues, and in that one, my favourite was illegal addressing.
NG-Play still has a hash table bug where an out-of-bounds write may occur. I have converted the dynamic alloc to static alloc, and in this scenario, the out of bounds write can overwrite the variable after the table - which was another pointer. That one was really hard to debug, though the fix was easy.
The second one (fixed as of NG-Play 9.87, btw) was an algorithmic one. I noticed that the engine was throwing away a queen for nothing. Debugging showed that this position was so bad that all of the moves were cut off in futility pruning - and since zero moves remained, that was mis-interpreted as stalemate. The fix was to also count legal but pruned moves for the stalemate detection.
Your favorite crash
Moderators: hgm, Rebel, chrisw
-
- Posts: 2488
- Joined: Tue Aug 30, 2016 8:19 pm
- Full name: Rasmus Althoff
-
- Posts: 199
- Joined: Sun Nov 03, 2013 9:32 am
Re: Your favorite crash
Rebel wrote:During the years I noticed that when my engine crashes during development it's because of a division by zero, apparently my favorite sloppiness.
Yours is?
I program in Pascal, therefore it never crashes
-
- Posts: 931
- Joined: Tue Mar 09, 2010 3:46 pm
- Location: New York
- Full name: Álvaro Begué (RuyDos)
Re: Your favorite crash
After 15 years writing and maintaining software for a trading system, I don't seem to make software that crashes much.
Sometimes I hack something together, I get sloppy, and then the flavor of crash I get is generally something like a buffer overrun.
Sometimes I hack something together, I get sloppy, and then the flavor of crash I get is generally something like a buffer overrun.
-
- Posts: 27809
- Joined: Fri Mar 10, 2006 10:06 am
- Location: Amsterdam
- Full name: H G Muller
Re: Your favorite crash
Indeed, I had a similar problem in accounting the validity depth of a node; when all moves were futile that depth remained at infinity. Logically one should treat a futility-pruned move like it has score = currentEval + value[victim] + MARGIN, and depth = 1 (because at larger depth you would not do futility pruning). I had forgotten to do that.Ras wrote:Debugging showed that this position was so bad that all of the moves were cut off in futility pruning - and since zero moves remained, that was mis-interpreted as stalemate. The fix was to also count legal but pruned moves for the stalemate detection.
-
- Posts: 2250
- Joined: Wed Mar 08, 2006 8:47 pm
- Location: Hattingen, Germany
Re: Your favorite crash
Not computer chess related - parsing a null-terminated C-string via char* ptr, before checking (*ptr) I referred (*(ptr+1)) for some wild speculative optimizations, which crashed once in a while if ptr was last address of a 4K page ...
-
- Posts: 3232
- Joined: Mon May 31, 2010 1:29 pm
- Full name: lucasart
Re: Your favorite crash
Crashes are the tip of the iceberg. Most of the iceberg is under water.
It's amazing the amount of hidden bugs that one finds, when carefully testing. Some elo draining bugs, that don't crash, and produce no visible effect. Some not impacting elo, but still worth fixing to strengthen the codebase.
The worst kind of bugs are, by far, SMP bugs, such as races and dead locks, and compiler optimization (things allowed by the C standard that break SMP code not carefully written).
It's amazing the amount of hidden bugs that one finds, when carefully testing. Some elo draining bugs, that don't crash, and produce no visible effect. Some not impacting elo, but still worth fixing to strengthen the codebase.
The worst kind of bugs are, by far, SMP bugs, such as races and dead locks, and compiler optimization (things allowed by the C standard that break SMP code not carefully written).
Theory and practice sometimes clash. And when that happens, theory loses. Every single time.
-
- Posts: 2204
- Joined: Sat Jan 18, 2014 10:24 am
- Location: Andorra
Re: Your favorite crash
The strangest crash I had. Happened today. Seems like caused by a punctual RAM incoherence or similar, in a Ryzen 7 1800X. I have such detail due to generating crash dumps.
Code: Select all
000000013F2A6F36 call marcar_captures_bones_q (013F2BAFA0h)
estat_actual->actual = &estat_actual->moviments[0];
000000013F2A6F3B mov qword ptr [rdi+0B08h],rdi
000000013F2A6F42 mov eax,0FFFFh
000000013F2A6F47 mov ebx,0C000h
while ((m = seguent_moviment(estat_actual))) {
000000013F2A6F4C mov rdx,qword ptr [rdi+0B08h]
000000013F2A6F53 cmp dword ptr [rdx],eax
000000013F2A6F55 je $no_fer_hash+108Fh (013F2A7381h)
000000013F2A6F5B cmp dword ptr [rdx],0
000000013F2A6F5E je $no_fer_hash+452h (013F2A6744h)
000000013F2A6F64 mov r8d,dword ptr [rdx+8]
000000013F2A6F68 ?? ??
000000013F2A6F69 ?? ??
000000013F2A6F6A ?? ??
000000013F2A6F6B ?? ??
000000013F2A6F6C ?? ??
000000013F2A6F6D ?? ??
000000013F2A6F6E ?? ??
000000013F2A6F6F ?? ??
000000013F2A6F70 ?? ??
000000013F2A6F71 ?? ??
000000013F2A6F72 ?? ??
000000013F2A6F73 ?? ??
000000013F2A6F74 ?? ??
000000013F2A6F75 ?? ??
000000013F2A6F76 ?? ??
000000013F2A6F77 ?? ??
000000013F2A6F78 ?? ??
000000013F2A6F79 ?? ??
000000013F2A6F7A ?? ??
000000013F2A6F7B ?? ??
000000013F2A6F7C ?? ??
000000013F2A6F7D ?? ??
000000013F2A6F7E ?? ??
000000013F2A6F7F ?? ??
000000013F2A6F80 ?? ??
000000013F2A6F81 ?? ??
000000013F2A6F82 ?? ??
000000013F2A6F83 ?? ??
000000013F2A6F84 ?? ??
000000013F2A6F85 ?? ??
000000013F2A6F86 ?? ??
000000013F2A6F87 ?? ??
000000013F2A6F88 ?? ??
000000013F2A6F89 ?? ??
000000013F2A6F8A ?? ??
000000013F2A6F8B ?? ??
000000013F2A6F8C ?? ??
000000013F2A6F8D ?? ??
000000013F2A6F8E ?? ??
000000013F2A6F8F ?? ??
000000013F2A6F90 ?? ??
000000013F2A6F91 ?? ??
000000013F2A6F92 ?? ??
000000013F2A6F93 ?? ??
000000013F2A6F94 ?? ??
000000013F2A6F95 ?? ??
000000013F2A6F96 ?? ??
000000013F2A6F97 ?? ??
000000013F2A6F98 ?? ??
000000013F2A6F99 ?? ??
000000013F2A6F9A ?? ??
000000013F2A6F9B ?? ??
000000013F2A6F9C ?? ??
000000013F2A6F9D ?? ??
000000013F2A6F9E ?? ??
000000013F2A6F9F ?? ??
000000013F2A6FA0 ?? ??
000000013F2A6FA1 ?? ??
000000013F2A6FA2 ?? ??
000000013F2A6FA3 ?? ??
000000013F2A6FA4 ?? ??
000000013F2A6FA5 ?? ??
000000013F2A6FA6 ?? ??
000000013F2A6FA7 ?? ??
000000013F2A6FA8 ?? ??
000000013F2A6FA9 ?? ??
000000013F2A6FAA ?? ??
000000013F2A6FAB ?? ??
000000013F2A6FAC ?? ??
000000013F2A6FAD ?? ??
000000013F2A6FAE ?? ??
000000013F2A6FAF ?? ??
000000013F2A6FB0 ?? ??
000000013F2A6FB1 ?? ??
000000013F2A6FB2 ?? ??
000000013F2A6FB3 ?? ??
000000013F2A6FB4 ?? ??
000000013F2A6FB5 ?? ??
000000013F2A6FB6 ?? ??
000000013F2A6FB7 ?? ??
000000013F2A6FB8 ?? ??
000000013F2A6FB9 ?? ??
000000013F2A6FBA ?? ??
000000013F2A6FBB ?? ??
000000013F2A6FBC ?? ??
000000013F2A6FBD ?? ??
000000013F2A6FBE ?? ??
000000013F2A6FBF ?? ??
000000013F2A6FC0 ?? ??
000000013F2A6FC1 ?? ??
000000013F2A6FC2 ?? ??
000000013F2A6FC3 ?? ??
000000013F2A6FC4 ?? ??
000000013F2A6FC5 ?? ??
000000013F2A6FC6 ?? ??
000000013F2A6FC7 ?? ??
000000013F2A6FC8 ?? ??
000000013F2A6FC9 ?? ??
000000013F2A6FCA ?? ??
000000013F2A6FCB ?? ??
000000013F2A6FCC ?? ??
000000013F2A6FCD ?? ??
000000013F2A6FCE ?? ??
000000013F2A6FCF ?? ??
continue;
nummovimentlegal++;
if (provades > 0
&& !EsPromocio(m)
&& abs(beta) < MATE
&& ss->tau.c[Desti(m)] == 0
&& see(ss, m, -50, (e_colors)estat_actual->mou) < 0
)
000000013F2A6FD0 xor byte ptr [rbp+198F0FC9h],al
000000013F2A6FD6 add dword ptr [rax],eax
000000013F2A6FD8 add bh,bh
continue;
Daniel José - http://www.andscacs.com
-
- Posts: 234
- Joined: Sat Jan 17, 2015 11:54 pm
Re: Your favorite crash
What is a "punctual RAM incoherence" ?
-
- Posts: 2204
- Joined: Sat Jan 18, 2014 10:24 am
- Location: Andorra
Re: Your favorite crash
I mean a RAM bug that happens one time, but of course I cannot be sure. Maybe is cache (CPU) related. I don't know if is common.MahmoudUthman wrote:What is a "punctual RAM incoherence" ?
Daniel José - http://www.andscacs.com
-
- Posts: 2204
- Joined: Sat Jan 18, 2014 10:24 am
- Location: Andorra
Re: Your favorite crash
This is the disassembly of the place where the crash happened, at 000000013F2A6FD0, just after the ?? ?? invalid instructions. Where there are ?? ??, should be part of the assembly code of the engine, and instead of it there are a lot of invalid instructions. So the RAM went corrupted, and is not some bug of the engine, as I understand the processes does not have permission to overwrite the execution memory.
Daniel José - http://www.andscacs.com