Spurious Suspends

Message boards : BOINC client : Spurious Suspends
Message board moderation

To post messages, you must log in.

AuthorMessage
cjreyn

Send message
Joined: 17 Aug 09
Posts: 19
United Kingdom
Message 31515 - Posted: 10 Mar 2010, 17:27:17 UTC

Hi guys,
I have an application that is deployed across a large private DG (1600 nodes), running an old 5.10.45 version of the client.

The main worker thread of my application must always run, as its launching other processes (similarly to the wrapper) and needs to control them accordingly. Hence I've disabled the worker thread suspend function, and instead poll for suspend flags via a call to boinc_get_status() from my worker thread, and handle suspend requests accordingly.

The problem is that on most nodes, including my own test node (running the same core client version), I'm seeing random setting of the suspend flags as signalled by the core client. These occur, even without explicit gui based suspend requests.

I heard somewhere that Boinc is using suspend to "nice" applications, and was wondering if this is the case on all client versions?

Cheers

Chris
ID: 31515 · Report as offensive
Rom Walton
Project developer
Avatar

Send message
Joined: 26 Aug 05
Posts: 164
Message 31520 - Posted: 10 Mar 2010, 18:22:08 UTC - in response to Message 31515.  

BOINC can suspend an app for a number of reasons. Benchmarks and CPU throttling (what you are refering to as 'nice' I think) are the two most common causes.

However, I've seen your problem crop up in a few applications due to memory corruption. The last time was due to a static array declared on the heap, the app wrote past the end of the array. Oddly enough the value being written in the array was a timestamp and so it was 'randomly' switching the suspend/resume state of the app based on the time of day.

Is there any chance you can run your app under a debugger and break on memory access?

----- Rom
BOINC Development Team, U.C. Berkeley
My Blog
ID: 31520 · Report as offensive
cjreyn

Send message
Joined: 17 Aug 09
Posts: 19
United Kingdom
Message 31540 - Posted: 11 Mar 2010, 14:42:50 UTC - in response to Message 31520.  

Well, I never explicitly allocate heap memory using malloc, but that's not to say the compiler won't allocate it. The app is developed using VC++ 2008 Express so sure I could get it to dump as soon as a suspend flag is found.

However, the CPU throttling is more likely to be the problem, as other processes could well be running on the node (Novel based updates, virus scans etc). This could well be causing the seemingly random suspend (and also resume messages). This would be strange however, as its the child processes launched by my app (via win32 CreateProcess() calls) that are consuming cpu. Will the Core Client recognise these as Boinc related, and call suspend/resume to throttle back the CPU?

Also, why is suspend/resume used, and not Windows or Unix thread/process priority nicing instead?
ID: 31540 · Report as offensive

Message boards : BOINC client : Spurious Suspends

Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.