Komodo 2.03 SSE42 available

Dann Corbit · Post by **Dann Corbit** » Wed Jun 22, 2011 8:49 pm

rbarreira wrote:
Dann Corbit wrote:
rbarreira wrote:
Eelco de Groot wrote:
Dann Corbit wrote:
gaard wrote:Should the analysis between these versions, even with 'deterministic' turned on, differ?
It would not surprise me if there were differences. Despite optimization obeying the 'as-if' rule, re-ordering of instructions or which branch is taken first could easily affect (for instance) the shape of a tree that is being searched.

Also, Komodo uses floating point for evaluation and so (for instance) an interrupt can shave off the high order bits used for intermediate calculations so that even the same version can produce slightly different results at times.
Excuse the interrupt Dann, but does that sound very strange. An interrupt can cause the result of a floating point calculation to be rounded off or terminated early Would that not invalidate our computers for any serious mathematical, statistical or otherwise scientific application. Or are these interrupts designed into Komodo only? Could the intermediate results not be stored and resumed correctly after the interrupt? How long has this been going on and can we blame Bill Gates

Nondeterministic behaviour with Stockfish -different behaviour compared to a stock compile I mean- was I recall the result of a wrong optimization behaviour in the compilation, but I would not know if Komodo is less deterministic. Is that not problematic for bughunting for instance, I suppose that is why there is a switch?

Eelco
I'm with you, unless someone comes up with a reasonable explanation why interrupts would change the behavior of a program. During any context switch the OS is supposed to store the exact content of all CPU registers, and I've never heard of any rounding taking place for floating point registers during this process...

However, it is true that different compilers may give different results for floating point formats, particularly when unsafe optimizations are enabled (which they sometimes are even without an explicit command line parameter to enable them).
It is possible, for instance, for 4 byte floats to have calculations performed using 8 byte intermediate results. Or for 8 byte floats to have intermediate calculations performed using 10 byte floats. If an IRQ happens, then the results do get saved in registers: 4 byte registers and 8 byte registers, respectively, trimming off a few bits of intermediate precision.

These things can be controlled to some degree by compiler options.
Where do you propose that these 10-byte floats with extra precision are stored, if not in registers? How does the CPU calculate on them?

In memory. And for some settings (e.g. with the Intel compiler) x86 and x64 architecture chips can perform 80 bit intermediate calculations.

Are you saying that there's a "magic area" on the CPU where extra precision is stored and which doesn't get saved on context switches?

Data objects are stored in memory and calculated in registers. If, during a calculation that is allowed to use the 80 bit registers for intermediate results an IRQ happens or some other context switch, it is possible that an 80 bit intermediate result is stored into an 64 bit register.

I was unable to find anything backing this up and I highly doubt that it's the case.

I suggest that you start with the Intel, GCC and Microsoft compiler documentation. Anyone who has studies numerical analysis knows fully well about this and does not find it surprising at all.

Everything I've read about context switching indicates that all register content, including floating point, is saved and restored by the OS.

It is, and that is the problem.

On x86 there is a hardware context switch which doesn't save floating point stuff, but it seems that common operating systems don't use it precisely for that reason.

Look up the GCC flag -ffast-math and see, for instance:
http://forums.anandtech.com/showthread.php?t=1796118
Look up the Microsoft flag fp:fast and see what effect it has on calculations and why.
See the description of the Intel compiler flag fast[=1|2] and see what effect it has on results and why.

Of course, it could be all in my imagination.

rbarreira · Post by **rbarreira** » Wed Jun 22, 2011 9:44 pm

Dann Corbit wrote: Data objects are stored in memory and calculated in registers. If, during a calculation that is allowed to use the 80 bit registers for intermediate results an IRQ happens or some other context switch, it is possible that an 80 bit intermediate result is stored into an 64 bit register.

That doesn't make any sense. IRQs don't cause data to be backed up from memory to registers. They cause exactly the opposite (registers get saved in memory to be later restored to their exact original value when the CPU / OS switches back to the thread in question).

Dann Corbit wrote:Look up the GCC flag -ffast-math and see, for instance:
http://forums.anandtech.com/showthread.php?t=1796118

I know about these flags, I have mentioned them earlier in this thread. But they have nothing to do with interrupts or context switches in general, what they do is enable code optimizations that cause operations to deviate from the standard (whether the code gets interrupted or not).

If the problem you are talking about actually existed (I don't think it does), it would be out of the control of the compiler. The OS is responsible for context switches, not the compiler or any code generated by it.

Dann Corbit wrote:Of course, it could be all in my imagination.

Regarding the interrupts issue, I think it is yes.

F. Bluemers · Post by **F. Bluemers** » Wed Jun 22, 2011 9:55 pm

rbarreira wrote:
Dann Corbit wrote: Data objects are stored in memory and calculated in registers. If, during a calculation that is allowed to use the 80 bit registers for intermediate results an IRQ happens or some other context switch, it is possible that an 80 bit intermediate result is stored into an 64 bit register.
That doesn't make any sense. IRQs don't cause data to be backed up from memory to registers. They cause exactly the opposite (registers get saved in memory to be later restored to their exact original value when the CPU / OS switches back to the thread in question).

Dann Corbit wrote:Look up the GCC flag -ffast-math and see, for instance:
http://forums.anandtech.com/showthread.php?t=1796118
I know about these flags, I have mentioned them earlier in this thread. But they have nothing to do with interrupts or context switches in general, what they do is enable code optimizations that cause operations to deviate from the standard (whether the code gets interrupted or not).

If the problem you are talking about actually existed (I don't think it does), it would be out of the control of the compiler. The OS is responsible for context switches, not the compiler or any code generated by it.

Dann Corbit wrote:Of course, it could be all in my imagination.
Regarding the interrupts issue, I think it is yes.

maybe not with minix
http://wiki.minix3.org/en/DevelopersGui ... mmingMinix

rbarreira · Post by **rbarreira** » Wed Jun 22, 2011 9:59 pm

F. Bluemers wrote: maybe not with minix
http://wiki.minix3.org/en/DevelopersGui ... mmingMinix

What that page says is that Minix 3 does not know about floating point registers at all, which causes much bigger problems than loss of numerical precision. What it means that floating point math cannot be used reliably on that OS.

It's not the case with any serious OS like Windows or Linux.

Dann Corbit · Post by **Dann Corbit** » Wed Jun 22, 2011 10:07 pm

rbarreira wrote:
Dann Corbit wrote: Data objects are stored in memory and calculated in registers. If, during a calculation that is allowed to use the 80 bit registers for intermediate results an IRQ happens or some other context switch, it is possible that an 80 bit intermediate result is stored into an 64 bit register.
That doesn't make any sense. IRQs don't cause data to be backed up from memory to registers. They cause exactly the opposite (registers get saved in memory to be later restored to their exact original value when the CPU / OS switches back to the thread in question).

And if an intermediate calculation for the product of 2 8 byte floats is stored in an 80 bit register and an IRQ fires, the data is stored where?

Why not read this:
http://msdn.microsoft.com/en-us/library/e7s85ffb.aspx
And specifically where they talk about 80 bit operations and loss of precision

Dann Corbit wrote:Look up the GCC flag -ffast-math and see, for instance:
http://forums.anandtech.com/showthread.php?t=1796118
I know about these flags, I have mentioned them earlier in this thread. But they have nothing to do with interrupts or context switches in general, what they do is enable code optimizations that cause operations to deviate from the standard (whether the code gets interrupted or not).

If the problem you are talking about actually existed (I don't think it does), it would be out of the control of the compiler. The OS is responsible for context switches, not the compiler or any code generated by it.

Dann Corbit wrote:Of course, it could be all in my imagination.
Regarding the interrupts issue, I think it is yes.

rbarreira · Post by **rbarreira** » Wed Jun 22, 2011 10:12 pm

Dann Corbit wrote: And if an intermediate calculation for the product of 2 8 byte floats is stored in an 80 bit register and an IRQ fires, the data is stored where?

In memory, probably using these instructions or similar, which as mentioned in the link save and restore "the entire floating-point unit state".

Dann Corbit wrote:Why not read this:
http://msdn.microsoft.com/en-us/library/e7s85ffb.aspx
And specifically where they talk about 80 bit operations and loss of precision

I read it. It doesn't talk about IRQs or context switches at all, only about code generation, just as every compiler documentation I've seen about floating point optimization options.

Dann Corbit · Post by **Dann Corbit** » Wed Jun 22, 2011 10:15 pm

rbarreira wrote:
Dann Corbit wrote: And if an intermediate calculation for the product of 2 8 byte floats is stored in an 80 bit register and an IRQ fires, the data is stored where?
In memory, probably using these instructions or similar, which as mentioned in the link save and restore "the entire floating-point unit state".

Dann Corbit wrote:Why not read this:
http://msdn.microsoft.com/en-us/library/e7s85ffb.aspx
And specifically where they talk about 80 bit operations and loss of precision
I read it. It doesn't talk about IRQs or context switches at all, only about code generation, just as every compiler documentation I've seen about floating point optimization options.

"Improves the consistency of floating-point tests for equality and inequality by disabling optimizations that could change the precision of floating-point calculations, which is required for strict ANSI conformance. By default, the compiler uses the coprocessor's 80-bit registers to hold the intermediate results of floating-point calculations. This increases program speed and decreases program size. Because the calculation involves floating-point data types that are represented in memory by less than 80 bits, however, carrying the extra bits of precision (80 bits minus the number of bits in a smaller floating-point type) through a lengthy calculation can produce inconsistent results."

Dann Corbit · Post by **Dann Corbit** » Wed Jun 22, 2011 10:18 pm

Dann Corbit wrote:
rbarreira wrote:
Dann Corbit wrote: And if an intermediate calculation for the product of 2 8 byte floats is stored in an 80 bit register and an IRQ fires, the data is stored where?
In memory, probably using these instructions or similar, which as mentioned in the link save and restore "the entire floating-point unit state".

Dann Corbit wrote:Why not read this:
http://msdn.microsoft.com/en-us/library/e7s85ffb.aspx
And specifically where they talk about 80 bit operations and loss of precision
I read it. It doesn't talk about IRQs or context switches at all, only about code generation, just as every compiler documentation I've seen about floating point optimization options.
"Improves the consistency of floating-point tests for equality and inequality by disabling optimizations that could change the precision of floating-point calculations, which is required for strict ANSI conformance. By default, the compiler uses the coprocessor's 80-bit registers to hold the intermediate results of floating-point calculations. This increases program speed and decreases program size. Because the calculation involves floating-point data types that are represented in memory by less than 80 bits, however, carrying the extra bits of precision (80 bits minus the number of bits in a smaller floating-point type) through a lengthy calculation can produce inconsistent results."

Imagine this bit in all caps:
Because the calculation involves floating-point data types that are represented in memory by less than 80 bits, however, carrying the extra bits of precision (80 bits minus the number of bits in a smaller floating-point type) through a lengthy calculation can produce inconsistent results.

rbarreira · Post by **rbarreira** » Wed Jun 22, 2011 10:22 pm

Dann Corbit wrote:
Dann Corbit wrote:
rbarreira wrote:
Dann Corbit wrote: And if an intermediate calculation for the product of 2 8 byte floats is stored in an 80 bit register and an IRQ fires, the data is stored where?
In memory, probably using these instructions or similar, which as mentioned in the link save and restore "the entire floating-point unit state".

Dann Corbit wrote:Why not read this:
http://msdn.microsoft.com/en-us/library/e7s85ffb.aspx
And specifically where they talk about 80 bit operations and loss of precision
I read it. It doesn't talk about IRQs or context switches at all, only about code generation, just as every compiler documentation I've seen about floating point optimization options.
"Improves the consistency of floating-point tests for equality and inequality by disabling optimizations that could change the precision of floating-point calculations, which is required for strict ANSI conformance. By default, the compiler uses the coprocessor's 80-bit registers to hold the intermediate results of floating-point calculations. This increases program speed and decreases program size. Because the calculation involves floating-point data types that are represented in memory by less than 80 bits, however, carrying the extra bits of precision (80 bits minus the number of bits in a smaller floating-point type) through a lengthy calculation can produce inconsistent results."
Imagine this bit in all caps:
Because the calculation involves floating-point data types that are represented in memory by less than 80 bits, however, carrying the extra bits of precision (80 bits minus the number of bits in a smaller floating-point type) through a lengthy calculation can produce inconsistent results.

Again, that doesn't refer to context switching. It refers to keeping full-precision when not requested by the source code, which may give different results from stricter code generation which rounds and writes to memory (to a data type with less precision) at every assignment operator in the source code.

Until you find even a single authoritative source which talks about loss of fp precision on context switching in, say, Windows or Linux, I'm going to ignore further posts in this thread...

Dann Corbit · Post by **Dann Corbit** » Wed Jun 22, 2011 10:34 pm

rbarreira wrote:
Dann Corbit wrote:
Dann Corbit wrote:
rbarreira wrote:
Dann Corbit wrote: And if an intermediate calculation for the product of 2 8 byte floats is stored in an 80 bit register and an IRQ fires, the data is stored where?
In memory, probably using these instructions or similar, which as mentioned in the link save and restore "the entire floating-point unit state".

Dann Corbit wrote:Why not read this:
http://msdn.microsoft.com/en-us/library/e7s85ffb.aspx
And specifically where they talk about 80 bit operations and loss of precision
I read it. It doesn't talk about IRQs or context switches at all, only about code generation, just as every compiler documentation I've seen about floating point optimization options.
"Improves the consistency of floating-point tests for equality and inequality by disabling optimizations that could change the precision of floating-point calculations, which is required for strict ANSI conformance. By default, the compiler uses the coprocessor's 80-bit registers to hold the intermediate results of floating-point calculations. This increases program speed and decreases program size. Because the calculation involves floating-point data types that are represented in memory by less than 80 bits, however, carrying the extra bits of precision (80 bits minus the number of bits in a smaller floating-point type) through a lengthy calculation can produce inconsistent results."
Imagine this bit in all caps:
Because the calculation involves floating-point data types that are represented in memory by less than 80 bits, however, carrying the extra bits of precision (80 bits minus the number of bits in a smaller floating-point type) through a lengthy calculation can produce inconsistent results.
Again, that doesn't refer to context switching. It refers to keeping full-precision when not requested by the source code, which may give different results from stricter code generation which rounds and writes to memory (to a data type with less precision) at every assignment operator in the source code.

Until you find even a single authoritative source which talks about loss of fp precision on context switching in, say, Windows or Linux, I'm going to ignore further posts in this thread...

Apparently, my information was dated. Here is the latest from Anger Fogg's assembly reference:

6.1 Can floating point registers be used in 64-bit Windows?
There has been widespread confusion about whether 64-bit Windows allows the use of the
floating point registers ST(0)-ST(7) and the MM0 - MM7 registers that are aliased upon
these. One early technical document found at Microsoft’s website says "x87/MMX registers
are unavailable to Native Windows64 applications" (Rich Brunner: Technical Details Of
Microsoft® Windows® For The AMD64 Platform, Dec. 2003). An AMD document says: "64-
bit Microsoft Windows does not strongly support MMX and 3Dnow! instruction sets in the
64-bit native mode" (Porting and Optimizing Multimedia Codecs for AMD64 architecture on
Microsoft® Windows®, July 21, 2004). A document in Microsoft’s MSDN says: "A caller
must also handle the following issues when calling a callee: [...] Legacy Floating-Point
Support: The MMX and floating-point stack registers (MM0-MM7/ST0-ST7) are volatile. That
is, these legacy floating-point stack registers do not have their state preserved across
context switches" (MSDN: Kernel-Mode Driver Architecture: Windows DDK: Other Calling
Convention Process Issues. Preliminary, June 14, 2004; February 18, 2005). This
description is nonsense because it confuses saving registers across function calls and
saving registers across context switches. Some versions of the Microsoft assembler ml64
(e.g. v. 8.00.40310) gives the following message when attempts are made to use floating
point registers in 64 bit mode: "error A2222: x87 and MMX instructions disallowed; legacy
FP state not saved in Win64".
However, a public discussion forum quotes the following answers from Microsoft engineers
regarding this issue: "From: Program Manager in Visual C++ Group, Sent: Thursday, May
26, 2005 10:38 AM. It does preserve the state. It’s the DDK page that has stale information,
which I’ve requested it to be changed. Let them know that the OS does preserve state of
x87 and MMX registers on context switches." and "From: Software Engineer in Windows
Kernel Group, Sent: Thursday, May 26, 2005 11:06 AM. For user threads the state of legacy
floating point is preserved at context switch. But it is not true for kernel threads. Kernel
mode drivers can not use legacy floating point instructions."
(www.planetamd64.com/index.php?showtopic=3458&st=100).
The issue has finally been resolved with the long overdue publication of a more detailed ABI
for x64 Windows in the form of a document entitled "x64 Software Conventions", well hidden
in the bin directory (not the help directory) of some compiler packages. This document says:
"The MMX and floating-point stack registers (MM0-MM7/ST0-ST7) are preserved across
context switches. There is no explicit calling convention for these registers. The use of
these registers is strictly prohibited in kernel mode code." The same text has later appeared
at the Microsoft website (msdn2.microsoft.com/en-us/library/a32tsf7t(VS.80).aspx).
My tests indicate that these registers are saved correctly during task switches and thread
switches in 64-bit mode, even in an early beta version of x64 Windows.
The Microsoft C++ compiler version 14.0 never uses these registers in 64-bit mode, and
doesn’t support long double precision. The Intel C++ compiler for x64 Windows supports
long double precision and __m64 in version 9.0 and later, while earlier versions do not.
The conclusion is that it is safe to use floating point registers and MMX registers in 64-bit
Windows, except in kernel mode drivers.

Komodo 2.03 SSE42 available

Re: Komodo 2.03 SSE42 available

Re: Komodo 2.03 SSE42 available

Re: Komodo 2.03 SSE42 available

Re: Komodo 2.03 SSE42 available

Re: Komodo 2.03 SSE42 available

Re: Komodo 2.03 SSE42 available

Re: Komodo 2.03 SSE42 available

Re: Komodo 2.03 SSE42 available

Re: Komodo 2.03 SSE42 available

Re: Komodo 2.03 SSE42 available