The upcoming Y2038 catastrophe

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

syzygy
Posts: 5557
Joined: Tue Feb 28, 2012 11:56 pm

Re: Everything's positive

Post by syzygy »

sje wrote:
mvk wrote:The ntpd program forces backwards and forward jumps when the error exceeds 100-something milliseconds.
I doubt that ntpd is forcing any time jumps other than at boot time or maybe at wake-up time. Once the system is up, adjtime() takes over and adjtime() will never cause the clock to be set back and will never allow any big jumps, only small adjustments to the rate.
Just do some research? Or just scroll up to where it was already spellt out. No need to have doubt on this point.
AlvaroBegue
Posts: 931
Joined: Tue Mar 09, 2010 3:46 pm
Location: New York
Full name: Álvaro Begué (RuyDos)

Re: Everything's positive

Post by AlvaroBegue »

We haven't even started discussing what happens if you call gettimeofday twice in quick succession and your process is assigned to a different CPU in between the two calls. As far as I know, there is no guarantee that the results are in the order you expect.
User avatar
sje
Posts: 4675
Joined: Mon Mar 13, 2006 7:43 pm

Why clock_gettime(CLOCK_MONOTONIC) is bad

Post by sje »

Why clock_gettime(CLOCK_MONOTONIC) is bad

The clock_gettime(CLOCK_MONOTONIC) call is bad, or at least not as good as using gettimeofday(), for measuring intervals. Why? Because on a properly configured system with decent hardware and an occasional net connection, gettimeofday() also produces monotonic output AND is not totally reliant on a single timebase as is clock_gettime(CLOCK_MONOTONIC).

The single and probably unconditioned time base used for clock_gettime(CLOCK_MONOTONIC) is just the local crystal driving the CPU by some timer interrupt. The interval measurements are no more accurate than that crystal, and that crystal could be off by a couple of seconds per day but still be within specification. Compare this to ntpd / adjtime() / gettimeofday() which use a networked ensemble of time servers plus conditioning of the local oscillator output based on historical performance.
User avatar
sje
Posts: 4675
Joined: Mon Mar 13, 2006 7:43 pm

Re: Why clock_gettime(CLOCK_MONOTONIC) is bad

Post by sje »

Further, from the adjtime() man page:
Adjtime() makes small adjustments to the system time, as returned by
gettimeofday(2), advancing or retarding it by the time specified by the
timeval delta. If delta is negative, the clock is slowed down by incre-
menting it more slowly than normal until the correction is complete. If
delta is positive, a larger increment than normal is used. The skew used
to perform the correction is generally a fraction of one percent. Thus,
the time is always a monotonically increasing function.
syzygy
Posts: 5557
Joined: Tue Feb 28, 2012 11:56 pm

Re: Why clock_gettime(CLOCK_MONOTONIC) is bad

Post by syzygy »

sje wrote:Further, from the adjtime() man page:
Adjtime() makes small adjustments to the system time, as returned by
gettimeofday(2), advancing or retarding it by the time specified by the
timeval delta. If delta is negative, the clock is slowed down by incre-
menting it more slowly than normal until the correction is complete. If
delta is positive, a larger increment than normal is used. The skew used
to perform the correction is generally a fraction of one percent. Thus,
the time is always a monotonically increasing function.
For the second time, just scroll up to the beginning of this thread. Everything has been spellt out already.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: The upcoming Y2038 catastrophe

Post by bob »

AlvaroBegue wrote:
bob wrote:
AlvaroBegue wrote:Imagine I close my laptop's lid for a couple of days, then open it up and launch a chess engine. I am happily playing against it and then an ntpd update happens, which makes it think it has been searching for some negative amount of time. I would say that's a bug in the engine because it is using the wrong clock.

Users with root access can mess up your clock_gettime readings if they want, and people can close the lid of their laptops. The former never happens and the latter happens all the time. But if you are incapable of admitting that you simply didn't think of people closing their laptops or disconnecting their computers from the Internet... well, that's between you and your therapist.
How would that happen? When you open it up, ntpd should start the update process instantly, and set the correct time BEFORE you get to run anything.
No, getting a WiFi connection can take time. I could even be in some new environment where I have to configure the WiFi connection manually. At some point after I am connected to the WiFi the next ntpd update will happen (I don't know how long that may take, but it is irrelevant for this conversation).

If you mean that you start a search, close the lid, and then re-open it, and you REALLY expect it to work correctly, as the movie "grumpier old men pointed out." "you can wish in one hand, crap in the other, and see which one fills up first."
... or you can actually get your act together and make it work. For example, you can keep a "game clock" that you use for all your time-control logic. You have a global node counter and a variable called next_clock_check that is set initially to 10000. Once the global node counter is at least next_clock_check, we call clock_gettime(CLOCK_MONOTONIC) to find out how much time has elapsed. If the answer is more than some threshold (say, 1 second), you ignore the reading from the clock and substitute it with some average of recent updates, or something of that sort. You then update your game clock by adding the time increment.

This sort of thing is completely standard in video games. If you close the lid of your laptop at a time when you were running forward in some video game, you can open the laptop and keep playing and nothing bad will happen (unless it's a really crappy video game).

[...] As far as opening/closing the lid however, I do it dozens of times per day, and I NEVER see time glitches. I would hardly expect a chess engine to work correctly, however, if I start a search, close the lid, and re-open it tomorrow. That is never going to work, nor should it.
See above.
Here is the order things are done.

ntpd runs and keeps the clock spot on. It measures drift and thereby "learns" how your local hardware clock performs, and it then "tweaks" the clock using adjtime() to make it run much more accurately.

when you close the lid, and then reopen, the time goes to your hardware clock, which should be in a very accurate state already. When ntpd runs, your clock is NOT going to be off by minutes. It might be off by milliseconds. And ntp will then adjust the clock by either speeding it up to catch up (more clocks run slow than fast) or by slowing it down, to keep it a monotonic time.

This getting the time off by minutes is just not going to happen, unless, perhaps, you leave the thing suspended for years so that the internal clock either slowly loses time, or the battery dies and the clock totally stops.

Otherwise, it works just fine. I teach two classes on MWF. I hit my office about 8am or so, open up, close lid, go to class, open up, do lecture, close lid, back to office, open lid until time for next class, close lid, go to class, open lid, lecture, close lid, back to office, open lid, close lid to go home, open lid when I get home, and I might open/close the thing at least a half-dozen times.

I've run a test for the last 30 minutes or so, grabbing the system time 10 times a second. I have suspended, resumed, repeatedly. I see gaps in the time when my macbook is sleeping/suspended, but when the lid is opened and things resume, time marches on. A second test, I grab and save the time every 0.1 seconds. If sample N+1 is more than 1.0 greater than sample N, I then grabbed the time as fast as possible over the next couple of seconds and stuffed the times into an enormous array. I then scanned the array looking for any funny business. Only thing "funny" is that long "pause" in the times. But it NEVER has violated the monotonic property, not one time. I even suspended for 1.5 hours while we went to eat. Not one "correction" was needed, looking at the values, at least nothing within less than a ms.

So again, explain to me EXACTLY what this imagined problem is you are encountering? This is my 2012 apple macbook dual-core i7 machine. Nothing special or exotic. ntp works.

You are all assuming that from lid close to lid open sees your TOD clock drift wildly away from the actual time. Which would represent a hardware problem in its own right, and one you could not solve anyway. Also, from the time you open your lid until your application starts to run again is more than enough for your clock to be set to a correct value from the hardware clock. NTP will only need a TINY tweak if anything.

This is REALLY a problem of imagination, not a problem of reality.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: The upcoming Y2038 catastrophe

Post by bob »

syzygy wrote:
bob wrote:
syzygy wrote:
bob wrote:(2) you KNOWINGLY are inviting time sync problems by disconnecting and re-connecting. And if you recall, I specifically pointed out that an unsophisticated user, given root access or the ability to set the system time, can easily screw up the time. This falls directly under that umbrella.
No, it was NOT YOU who pointed out that gettimeofday() can easily be screwed up. It was Richard:
http://talkchess.com/forum/viewtopic.ph ... 684#589684
abulmo wrote:gettimeofday should not be used to measure elapsed time. You should use clock_gettime(CLOCK_MONOTONIC) or an equivalent function of your OS instead. If the time of our system is adjusted (by a user or automatically), gettimeofday can return a wrong value.
bob wrote:If time is important, then a logical person would make every effort to provide the services needed to make it correct. Namely an internet connection. And a logical person would realize that if they disconnect from the net, it is a near-certainty that they will see some "time slip" where their time slowly falls behind the network time standard.
That is utter nonsense if you are talking about a Crafty user.
Please explain to me how a crafty user is going to have time issues if he is running ntpd and has it correctly configured? THIS crafty user has not had a problem in 20+ years. Running on laptops, on desktops, on clusters, you-name-it. Can you break anything? probably? But it can work correctly if you stay out of the way. Will any chess program work correctly across a lid closing? Of course not. Nor should it be expected to do so. If that is desirable, just use CPU time rather than elapsed time. Then all is well, except for the issues of process scheduling, where you use far less CPU time than wall-clock time and lose on time in a game that actually uses wall-clock time.
The truth is that nobody will lose any sleep if an instance of Crafty at some point in time loses a game due to this problem. It's really unimportant.

The question is why you can't admit that using a guaranteed monotonic clock is the superior solution, even if you might not find it worth your time to change Crafty to using one.

But we have been seeing this behaviour now for 20 years or so, so nothing new here.
The unix clock + ntpd is guaranteed to be monotonic, except for situations where NO clock algorithm will work. But a program won't see any non-monotonic behavior. I spent several hours testing this every way I could think of. NEVER saw anything but monotonic time, always positive, as expected. Yes there are jumps. You open the lid an hour later, but save the time before you close the lid, yes you will see an instantaneous +1 hour, as you should. I mean, what did you expect it to do during that hour of "sleep"? You CAN use pure CPU time, which is ALSO guaranteed to be monotonic, but it is not a very good way of timing a chess game if anything else is running on that computer, even if infrequently.

I have already pointed out your "solution" is not available on all machines. OS X being the first case in point. What good is a "solution" that excludes a large number of machines, my laptop and office box included? Answer: not useful at all, particularly when gettimeofday() does exactly what it is supposed to do, namely providing a monotonic clock except for circumstances caused by a human not doing things that are rational. No way to fix that with anything.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Everything's positive

Post by bob »

AlvaroBegue wrote:We haven't even started discussing what happens if you call gettimeofday twice in quick succession and your process is assigned to a different CPU in between the two calls. As far as I know, there is no guarantee that the results are in the order you expect.
There certainly is a guarantee. The time of day is a global memory address in unix, not something carried inside the CPU.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Why clock_gettime(CLOCK_MONOTONIC) is bad

Post by bob »

syzygy wrote:
sje wrote:Further, from the adjtime() man page:
Adjtime() makes small adjustments to the system time, as returned by
gettimeofday(2), advancing or retarding it by the time specified by the
timeval delta. If delta is negative, the clock is slowed down by incre-
menting it more slowly than normal until the correction is complete. If
delta is positive, a larger increment than normal is used. The skew used
to perform the correction is generally a fraction of one percent. Thus,
the time is always a monotonically increasing function.
For the second time, just scroll up to the beginning of this thread. Everything has been spellt out already.
With INVALID basic assumptions. Once ntpd is up and running, the clock will NEVER get off by 0.1 seconds, that's ntpd's function in computing, to prevent that. I would also predict that if you move your laptop into a region of space where time runs both backward and forward, gettimeofday() is going to become non-monotonic. But that doesn't seem worth worrying about. In the real world, this stuff works correctly.
abulmo
Posts: 151
Joined: Thu Nov 12, 2009 6:31 pm

Re: Why clock_gettime(CLOCK_MONOTONIC) is bad

Post by abulmo »

sje wrote:Why clock_gettime(CLOCK_MONOTONIC) is bad

The clock_gettime(CLOCK_MONOTONIC) call is bad, or at least not as good as using gettimeofday(), for measuring intervals. Why? Because on a properly configured system with decent hardware and an occasional net connection, gettimeofday() also produces monotonic output AND is not totally reliant on a single timebase as is clock_gettime(CLOCK_MONOTONIC).

The single and probably unconditioned time base used for clock_gettime(CLOCK_MONOTONIC) is just the local crystal driving the CPU by some timer interrupt. The interval measurements are no more accurate than that crystal, and that crystal could be off by a couple of seconds per day but still be within specification. Compare this to ntpd / adjtime() / gettimeofday() which use a networked ensemble of time servers plus conditioning of the local oscillator output based on historical performance.
You are completely wrong. adjtime() does affect clock_gettime(CLOCK_MONOTONIC). clock_settime(CLOCK_MONOTONIC), on the other hand,does not. settimeofday() does change the time reported by gettimeofday(). Computers with intermittent internet connection do exist. I write now from one of them.
Just read the gettimeofday OFFICIAL manual page:
http://pubs.opengroup.org/onlinepubs/96 ... ofday.html
Applications should use the clock_gettime() function instead of the obsolescent gettimeofday() function.
.
Richard