Message boards : BOINC client : bug in idle detection
Message board moderation
Author | Message |
---|---|
Send message Joined: 7 Dec 08 Posts: 5 |
Hello, A few days ago I've installed the boinc client (Linux) to use spare capacity on one of our number crunching machines. However, it seems that the boinc client doesn't properly detect the idleness of a machine -- it starts processes even when the load level is high. The number crunching machine is used as follows: one or more users logs on, starts a job (on one or all cores) and then logs out. The jobs may last from one hour to one week. Boinc seems to think that since no one is logged on, it can run. However, the reality is that all cores on the machine are being used by user jobs. As a temporary work around I've reniced boinc to 20, so that processes with higher priority take more cpu time. I've also written a script that periodically checks the load level and freezes (kill -STOP) all boinc processes until the load level drops below a pre-defined limit. However, the above solutions aren't entirely satisfactory -- i.e. they're hacks rather than addressing the cause of the problem. Is there a way to properly detect the load level within the boinc client itself ? |
Send message Joined: 29 Aug 05 Posts: 304 |
KILLing BOINC processes will most likely cause the task to error out. A better way would be using "boinccmd --set_run_mode never" to pause and "boinccmd --set_run_mode auto" to resume in your script. more details Coming in 6.4.x you will be able to define exclusive apps that when running will cause BOINC to suspend processing. BOINC WIKI BOINCing since 2002/12/8 |
Send message Joined: 7 Dec 08 Posts: 5 |
Why not just detect the load level ? Surely this is a more general solution than detecting particular programs. |
Send message Joined: 29 Aug 05 Posts: 15552 |
a) The science applications (not BOINC) are set to run at the lowest possible priority (thus lowest load level). They should yield if other processes need the processor. b) BOINC uses an idle checker called boinctray, which may not work right on Linux. The last I heard about it was that the developers were still having problems with the correct detection under Linux. c) Any platform specific additions to BOINC are a bit difficult to implement, because BOINC is a cross-platform application. This means that you should be able to download one source code and then directly compile it for Linux, or Windows or the Mac. When parts of BOINC are then dependent on a certain platform, this becomes difficult to maintain. |
Send message Joined: 30 Oct 05 Posts: 1239 |
There is no boinctray on Linux. I believe it's a Windows only thing. Mouse/keyboard detection may or may not be broken for Linux. I think it depends on the hardware (I think I saw someone post that it worked for PS2 hardware but not USB hardware) and/or the kernel version. But it sounds like keyboard/mouse isn't the problem for the OP... I'll bow out now because that's where my Linux knowledge ends. Kathryn :o) |
Send message Joined: 7 Dec 08 Posts: 5 |
I understand the cross-platform maintanance issues (have to deal with them myself as well). However, the "load level detection" functionality brings a major benefit that outweighs the minor maintanance issues. Furthermore, it is not that difficult to implement under Linux and is likely to work under MacOS X as well. method 1: getloadavg(). Present in glibc, Solaris and BSDs (hence likely in MacOS X). method 2: /proc/loadavg (linux specific) For cross-platform compilation, a wrapped version of getloadavg() can be used to abstract the functionality -- e.g. boinc_getloadavg(), with platform specific versions #ifdef'ed. Under Windows boinc_getloadavg() can either do nothing or call a suitable Windows function (there must be something similar). |
Send message Joined: 29 Aug 05 Posts: 304 |
I think not using load level detection was a design decision more than anything else. BOINC is designed to make the load level 100% in normal situations so that could be a conflict. Also I don't think windows differentiates between normal processes and "nice" processes like *nix does so any kind of load level detection would cause problems in windows. BOINC WIKI BOINCing since 2002/12/8 |
Send message Joined: 7 Dec 08 Posts: 5 |
BOINC is designed to make the load level 100% in normal situations so that could be a conflict. Also I don't think windows differentiates between normal processes and "nice" processes like *nix does so any kind of load level detection would cause problems in windows. Based on my experience, the load level includes both normal and "nice" processes under Linux, so there is no conflict in terms of different functionality under Windows. It's also not a problem that Boinc puts the load level to 100% per core. We are more interested if the load level considerably exceeds 100% (e.g. >150%), indicating that Boinc is running while a heavy-duty user program is also running. This is precisely when Boinc should suspend all computation. As an example, let's say there has been no user activity for 30 minutes, but all eight cores on our machine are running user jobs. The load level is 800% (100% per core). The current version of Boinc thinks that nobody is using the machine, and hence starts its computation on all eight cores, raising the load level to 1600% (whether its niced or not). Boinc can already detect the number of cores, so normalising the load level by the number of cores is easy, and hence detection when computation should be suspended is also easy. |
Send message Joined: 30 Oct 05 Posts: 1239 |
Idle detection is based on keyboard/mouse activity (in an ideal world). I've not had trouble with BOINC (Windows and Linux) not getting out of the way. I've never done any tinkering with priority or nice level. That's just personal experience though and may or may not accurately represent the real world. Kathryn :o) |
Send message Joined: 20 Dec 07 Posts: 1069 |
... How can the load level exceed 100%? As an example, let's say there has been no user activity for 30 minutes, but all eight cores on our machine are running user jobs. The load level is 800% (100% per core). The current version of Boinc thinks that nobody is using the machine, and hence starts its computation on all eight cores, raising the load level to 1600% (whether its niced or not)... How can the load level exceed 800% on an 8-core machine? Could you explain those maths to an ignorant physicist? Gruß, Gundolf Computer sind nicht alles im Leben. (Kleiner Scherz) |
Send message Joined: 30 Oct 05 Posts: 1239 |
I'll take a stab since I just finished talking with Eric M about this. The higher the load average (can be seen with `top` as well as `uptime` and given as 1, 5 and 15 minute averages) the more the machine is trying to do. It includes processes waiting for the CPU as well as processes waiting for I/O. Load average is calculated similarly to RAC in BOINC. It's a weighted, moving average. Load is not equivalent to what you'd see in task manager in Windows. It can go over 100%. Right now, the load average on my machine (running BOINC, Firefox, Skype, xchat, Pidgin, mplayer and a shell in addition to the regular background processes) is as follows: [kathryn@Galaxy ~]$ uptime It's a dual core computer. So on average, I have 2 processes running and about 1 waiting to run. If it was a single core computer it would have 1 process running and about 2 waiting to run. Awww, heck. The Wikipedia page explains it better than I can. Kathryn :o) |
Send message Joined: 20 Dec 07 Posts: 1069 |
Okay, should have said "windozer" instead of "physicist" (though both is true :-) Thanks for the explanation (and forwarding link). Gruß, Gundolf Computer sind nicht alles im Leben. (Kleiner Scherz) |
Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License,
Version 1.2 or any later version published by the Free Software Foundation.