The upcoming Y2038 catastrophe

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

User avatar
sje
Posts: 4675
Joined: Mon Mar 13, 2006 7:43 pm

Re: Everything's positive

Post by sje »

mvk wrote:No true Scotsman
Wikipedia wrote:No true Scotsman is an informal fallacy, an ad hoc attempt to retain an unreasoned assertion. When faced with a counterexample to a universal claim ("no Scotsman would do such a thing"), rather than denying the counterexample or rejecting the original universal claim, this fallacy modifies the subject of the assertion to exclude the specific case or others like it by rhetoric, without reference to any specific objective rule ("no true Scotsman would do such a thing").
Show me the counter-example.

Your screen shot is not a counter-example; it just shows a cumulative drift from network time. That's something to be expected, and it does not imply that there is any backward adjustment to the clock; only an adjustment to the rate of the clock.
syzygy
Posts: 5566
Joined: Tue Feb 28, 2012 11:56 pm

Re: Everything's positive

Post by syzygy »

sje wrote:On any properly configured system with decent hardware, the ntpd / adjtime() / gettimeofday() trio, and an occasional connection to the net, there will be no backward adjustments to the clock that a user program will see.
How long will this ignorance / lack of any form of memory continue?

It has been pointed out near the beginning of this thread that ntpd will typically step the time if the difference exceeds 125ms.
mvk
Posts: 589
Joined: Tue Jun 04, 2013 10:15 pm

Re: Everything's positive

Post by mvk »

The ntpd program forces backwards and forward jumps when the error exceeds 100-something milliseconds. That's probably why smaller adjustments don't appear in the screenshot.
[Account deleted]
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: The upcoming Y2038 catastrophe

Post by bob »

AlvaroBegue wrote:Imagine I close my laptop's lid for a couple of days, then open it up and launch a chess engine. I am happily playing against it and then an ntpd update happens, which makes it think it has been searching for some negative amount of time. I would say that's a bug in the engine because it is using the wrong clock.

Users with root access can mess up your clock_gettime readings if they want, and people can close the lid of their laptops. The former never happens and the latter happens all the time. But if you are incapable of admitting that you simply didn't think of people closing their laptops or disconnecting their computers from the Internet... well, that's between you and your therapist.
How would that happen? When you open it up, ntpd should start the update process instantly, and set the correct time BEFORE you get to run anything. If you mean that you start a search, close the lid, and then re-open it, and you REALLY expect it to work correctly, as the movie "grumpier old men pointed out." "you can wish in one hand, crap in the other, and see which one fills up first."

An ntp update should not happen AFTER you start the game. Most laptops (at least the decent ones running linux with ntp configured) would catch that resume/unsuspend event and get the date synced before you can run anything. I can't guarantee os X does this right since apple has their fair share of glitches and then some. But my linux laptops do this correctly. I suspend and resume all the time with nary a clock jump. In fact, since the hardware TOD clock is managed by the OS as well, the time should be pretty accurate within milliseconds of opening the lid, then ntpd will wake up and make any further corrections as needed, without large jumps back and forth.

Remember, not ALL of the operating systems around have that specific timing mechanism that is being discussed here. OSX is one example. I don't want to have to write timing code for N different operating systems, I want to just do one. One with ntpd has always worked flawlessly for anything we have done here, including the grandaddy of them all, trying to keep 128 compute nodes on a cluster synchronized to within 1ms for proper timestamping.

As far as opening/closing the lid however, I do it dozens of times per day, and I NEVER see time glitches. I would hardly expect a chess engine to work correctly, however, if I start a search, close the lid, and re-open it tomorrow. That is never going to work, nor should it.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: The upcoming Y2038 catastrophe

Post by bob »

syzygy wrote:
bob wrote:(2) you KNOWINGLY are inviting time sync problems by disconnecting and re-connecting. And if you recall, I specifically pointed out that an unsophisticated user, given root access or the ability to set the system time, can easily screw up the time. This falls directly under that umbrella.
No, it was NOT YOU who pointed out that gettimeofday() can easily be screwed up. It was Richard:
http://talkchess.com/forum/viewtopic.ph ... 684#589684
abulmo wrote:gettimeofday should not be used to measure elapsed time. You should use clock_gettime(CLOCK_MONOTONIC) or an equivalent function of your OS instead. If the time of our system is adjusted (by a user or automatically), gettimeofday can return a wrong value.
bob wrote:If time is important, then a logical person would make every effort to provide the services needed to make it correct. Namely an internet connection. And a logical person would realize that if they disconnect from the net, it is a near-certainty that they will see some "time slip" where their time slowly falls behind the network time standard.
That is utter nonsense if you are talking about a Crafty user.
Please explain to me how a crafty user is going to have time issues if he is running ntpd and has it correctly configured? THIS crafty user has not had a problem in 20+ years. Running on laptops, on desktops, on clusters, you-name-it. Can you break anything? probably? But it can work correctly if you stay out of the way. Will any chess program work correctly across a lid closing? Of course not. Nor should it be expected to do so. If that is desirable, just use CPU time rather than elapsed time. Then all is well, except for the issues of process scheduling, where you use far less CPU time than wall-clock time and lose on time in a game that actually uses wall-clock time.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: The upcoming Y2038 catastrophe

Post by bob »

mvk wrote:
AlvaroBegue wrote:
abulmo wrote: OK, an example from my real life of a computer with intermittent internet connection.
When I travel, I bring my laptop to places without wifi connection nor ethernet plugs, but I can connect it from time to time to the internet through my 4G mobile phone. Usually I let my computer working all days, doing some timed computations, and I will only connect it to the internet when I am present. If my computer stays a few days, or even weeks without internet connection it's internal clock can be off time by a few seconds. Once connected to internet, my computer real time clock can then be abruptly set to the network time, going forward or backward.
I wonder what is wrong with using clock_gettime. It 's a POSIX standard. It solves a potentially existing problem.
This is going to be good. I can't wait to see what Bob has to say about a perfectly reasonable real-world scenario where his "GUARANTEED" method actually fails miserably. :)

Perhaps one should heed the advice of the Linux man page.
I have negative steps all the time on my laptop at home, just as explained above. See screenshot. And that is without travelling, and not all of these are human interference. Geographically distributed systems are not reliable. Maybe in the context of wired mainframes stuffed in an server center it works better. But who wants to go back to that? I also don't see why I would have to stop a computation when potentially losing signal. If I want to use a monotonic clock, then I should use one that guarantees me that, no strings attached. I don't want to depend on what I know (or in this case, what I think I know) of an implementation, when that property is neither mandated nor implied by the requirements. This is the strcpy story all over again. Piss-poor engineering to depend on something which is not there (or blaming the malfunction on the user, for that matter), while there is a standardised solution available.

Image
Which "standard solution" is available to OS/X users again?
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Everything's positive

Post by bob »

syzygy wrote:
sje wrote:On any properly configured system with decent hardware, the ntpd / adjtime() / gettimeofday() trio, and an occasional connection to the net, there will be no backward adjustments to the clock that a user program will see.
How long will this ignorance / lack of any form of memory continue?

It has been pointed out near the beginning of this thread that ntpd will typically step the time if the difference exceeds 125ms.
So what? How does the time get that far off? On a normal system, it won't happen. A good quartz-oscillator watch has 2 secs/month accuracy. or 66ms per day. With ntpd running, it is NOT going to be sampling and setting once per day. And it also learns what the clock is doing (how fast it moves relative to a clock standard) and factors that in so that even without a network, once ntpd has "learned your clock" it will be within a couple of ms per day off, at most.

I have never seen so much crying and complaining about something that works, is known to work, and has been demonstrated to work to the satisfaction of most of the electronic world.

Yes you can break anything. No it won't break if everything is working normally. Yes there are time discontinuities if you suspend/hibernate a laptop. Yes that will cause problems if you time something across the discontinuity. But if you do such, you deserve exactly what you get, anyway. Unix time is, and has been monotonic for as long as I can recall, UNLESS you let a human loose with excess privileges. At that point, anything can happen.

So keep dreaming up stupid ways of breaking the time. Those of us with a scintilla of common sense will keep right on using a clock that works perfectly, no backward jumps, no large forward jumps (except at those "lid opening" cases where it MUST jump forward. But NOT backward.

This entire argument looks pretty silly. "If the clock gets way off, then the clock will be way off." If you run ntp, you WON'T get 125ms off, period. So if you do it right, it will work right, and all of this nonsensical discussion would die the death it so deserves. I suspect you don't know how ntpd actually works. Hint: It does NOT spend most of its time whacking around the network comparing the time to the upper strata servers. Computer clocks are consistent, even if they are consistently wrong. ntpd addresses this and makes them consistently right, even when the network is down.
AlvaroBegue
Posts: 931
Joined: Tue Mar 09, 2010 3:46 pm
Location: New York
Full name: Álvaro Begué (RuyDos)

Re: The upcoming Y2038 catastrophe

Post by AlvaroBegue »

bob wrote:
AlvaroBegue wrote:Imagine I close my laptop's lid for a couple of days, then open it up and launch a chess engine. I am happily playing against it and then an ntpd update happens, which makes it think it has been searching for some negative amount of time. I would say that's a bug in the engine because it is using the wrong clock.

Users with root access can mess up your clock_gettime readings if they want, and people can close the lid of their laptops. The former never happens and the latter happens all the time. But if you are incapable of admitting that you simply didn't think of people closing their laptops or disconnecting their computers from the Internet... well, that's between you and your therapist.
How would that happen? When you open it up, ntpd should start the update process instantly, and set the correct time BEFORE you get to run anything.
No, getting a WiFi connection can take time. I could even be in some new environment where I have to configure the WiFi connection manually. At some point after I am connected to the WiFi the next ntpd update will happen (I don't know how long that may take, but it is irrelevant for this conversation).

If you mean that you start a search, close the lid, and then re-open it, and you REALLY expect it to work correctly, as the movie "grumpier old men pointed out." "you can wish in one hand, crap in the other, and see which one fills up first."
... or you can actually get your act together and make it work. For example, you can keep a "game clock" that you use for all your time-control logic. You have a global node counter and a variable called next_clock_check that is set initially to 10000. Once the global node counter is at least next_clock_check, we call clock_gettime(CLOCK_MONOTONIC) to find out how much time has elapsed. If the answer is more than some threshold (say, 1 second), you ignore the reading from the clock and substitute it with some average of recent updates, or something of that sort. You then update your game clock by adding the time increment.

This sort of thing is completely standard in video games. If you close the lid of your laptop at a time when you were running forward in some video game, you can open the laptop and keep playing and nothing bad will happen (unless it's a really crappy video game).

[...] As far as opening/closing the lid however, I do it dozens of times per day, and I NEVER see time glitches. I would hardly expect a chess engine to work correctly, however, if I start a search, close the lid, and re-open it tomorrow. That is never going to work, nor should it.
See above.
syzygy
Posts: 5566
Joined: Tue Feb 28, 2012 11:56 pm

Re: The upcoming Y2038 catastrophe

Post by syzygy »

bob wrote:
syzygy wrote:
bob wrote:(2) you KNOWINGLY are inviting time sync problems by disconnecting and re-connecting. And if you recall, I specifically pointed out that an unsophisticated user, given root access or the ability to set the system time, can easily screw up the time. This falls directly under that umbrella.
No, it was NOT YOU who pointed out that gettimeofday() can easily be screwed up. It was Richard:
http://talkchess.com/forum/viewtopic.ph ... 684#589684
abulmo wrote:gettimeofday should not be used to measure elapsed time. You should use clock_gettime(CLOCK_MONOTONIC) or an equivalent function of your OS instead. If the time of our system is adjusted (by a user or automatically), gettimeofday can return a wrong value.
bob wrote:If time is important, then a logical person would make every effort to provide the services needed to make it correct. Namely an internet connection. And a logical person would realize that if they disconnect from the net, it is a near-certainty that they will see some "time slip" where their time slowly falls behind the network time standard.
That is utter nonsense if you are talking about a Crafty user.
Please explain to me how a crafty user is going to have time issues if he is running ntpd and has it correctly configured? THIS crafty user has not had a problem in 20+ years. Running on laptops, on desktops, on clusters, you-name-it. Can you break anything? probably? But it can work correctly if you stay out of the way. Will any chess program work correctly across a lid closing? Of course not. Nor should it be expected to do so. If that is desirable, just use CPU time rather than elapsed time. Then all is well, except for the issues of process scheduling, where you use far less CPU time than wall-clock time and lose on time in a game that actually uses wall-clock time.
The truth is that nobody will lose any sleep if an instance of Crafty at some point in time loses a game due to this problem. It's really unimportant.

The question is why you can't admit that using a guaranteed monotonic clock is the superior solution, even if you might not find it worth your time to change Crafty to using one.

But we have been seeing this behaviour now for 20 years or so, so nothing new here.
User avatar
sje
Posts: 4675
Joined: Mon Mar 13, 2006 7:43 pm

Re: Everything's positive

Post by sje »

mvk wrote:The ntpd program forces backwards and forward jumps when the error exceeds 100-something milliseconds.
I doubt that ntpd is forcing any time jumps other than at boot time or maybe at wake-up time. Once the system is up, adjtime() takes over and adjtime() will never cause the clock to be set back and will never allow any big jumps, only small adjustments to the rate.