Oh, i think there can be a lot of bugs, also if the values,all you mentioned are matching...
1.
I had once the same(maybe similar) problem, all was matching...,
then i watched the taskmanager where memory usage got higher and higher and

higher. i didnt notice the failure because i always used short
time controls for testing my stuff. At the end, i didnt delete dynamic allocated memory(so swapping becomes the problem)...in this case all numbers for example were matching...
2.
imagine a loop, for let us say an attackbitboard, where 40/56 bits are set
with a control structure like.
while(tmp)
{
sq=bsf64()
...
clb64(tmp)
}
if something like this works not propper, all numbers maybe equal, but
may there shouldnt be "40" but perhaps 12 bits set...
there can be thousands of bugs, without changing the "search numbers", if this loop is for example not in "search-sensible" function itegrated.
(sorry for my bad english...i hope you understand what i mean)
The highest probability is in my opinion, to find a bug, or are there "significant indicators" for a hardware problem, which would also occur on other software(chess engines) ?
So did you tried a simple perft ? (5 min to write such a function and you know more than now, or not?, nothing to loose !

), if this works well why the hardware should have a problem ?