Ticket #588 (new Defect)

Opened 5 months ago

Last modified 5 months ago

indefinite suspension of computing when changing system clock

Reported by: Richard Haselgrove Assigned to: davea
Priority: Minor Milestone: Undetermined
Component: Client - Daemon Version: 5.10.45
Keywords: benchmark clock Cc:

Description

If you make a user error with the system clock in Windows XP, you can cause BOINC to stop processing indefinitely (or for a longer period than I have patience to wait).

To verify:

Set system clock 1 month forward. Note that BOINC immediately runs a benchmark.

Set system clock 1 month back (i.e. to correct time). Wait until next checkpoint for the current app. BOINC suspends computation for a benchmark, but according to <benchmark_debug> doesn't actually start running the benchmark code.

Full message-log posted at Benchmarking bug - indefinite suspension of computing

Change History

(follow-up: ↓ 3 ) 03/30/08 10:55:05 changed by Nicolas

  • keywords changed from benchmark to benchmark clock.
  • summary changed from Benchmarking - indefinite suspension of computing to indefinite suspension of computing when changing system clock.

Wow, I really thought there was a ticket for this already.

Many problems appear when the system clock is changed. Most impossible to solve, or so hard it's not worth it.

For example, if you have your clock 1 month forward than the correct date, and contact a scheduler, the deferral time is stored as an absolute timestamp: when to contact the server again. If you then take your clock 1 month back (ie. to correct time), communication with that project will be deferred for a month and a bit.

03/30/08 11:25:19 changed by Didactylos

  • milestone changed from 5.10 to Undetermined.

I think there are three ways to mitigate this:

  • Check for time conflicts during every server interaction. This would at least log a relevant message.
  • Check every time against the current time looking for time-travel errors.
  • Subscribe to time-change events from the operating system.

Sadly, there is no quick fix. None of these methods (and really we need all of them, not just one) are particularly simple to implement.

(in reply to: ↑ 1 ) 03/30/08 12:03:47 changed by Richard Haselgrove

Replying to Nicolas:

Wow, I really thought there was a ticket for this already.

Well, I searched both trac and the message boards before posting, and I couldn't find it.

Many problems appear when the system clock is changed. Most impossible to solve, or so hard it's not worth it. For example, if you have your clock 1 month forward than the correct date, and contact a scheduler, the deferral time is stored as an absolute timestamp: when to contact the server again. If you then take your clock 1 month back (ie. to correct time), communication with that project will be deferred for a month and a bit.

I agree there are lots of problems, but this particular one seems to cause significant loss of scientific work (by halting computing) at one specific and clearly-defined point: the two or three seconds between

Running CPU benchmarks

and

[benchmark_debug] Starting floating-point benchmark

That would seem to be worth solving on its own, and shouldn't be to difficult to track down what it's waiting for.

04/05/08 10:04:50 changed by Richard Haselgrove

I think I've got it:

File: cs_benchmark.C Routine: cpu_benchmarks_poll Line 309:

static double last_time = 0;

If benchmarks have been run in the future (as envisioned by changeset [12128], lines 247-248), this static variable will be pre-initialised to some time in the indefinite future. The test at line 312 will always be satisfied, and the application hangs, by indefinite looping.

Solution: discard variable last_time (or set it to zero) at all possible valid exit points from the benchmarking process.


If this page is incomplete or incorrect, please edit it or add it to the wiki to-do list. To do this, you must be logged in; click Login or Register above.