bug in idle detection

Message boards : BOINC client : bug in idle detection
Message board moderation

To post messages, you must log in.

AuthorMessage
Mark Tall

Send message
Joined: 7 Dec 08
Posts: 5
Australia
Message 21664 - Posted: 7 Dec 2008, 10:24:37 UTC

Hello,

A few days ago I've installed the boinc client (Linux) to use spare capacity on one of our number crunching machines. However, it seems that the boinc client doesn't properly detect the idleness of a machine -- it starts processes even when the load level is high.

The number crunching machine is used as follows: one or more users logs on, starts a job (on one or all cores) and then logs out. The jobs may last from one hour to one week.

Boinc seems to think that since no one is logged on, it can run. However, the reality is that all cores on the machine are being used by user jobs.

As a temporary work around I've reniced boinc to 20, so that processes with higher priority take more cpu time. I've also written a script that periodically checks the load level and freezes (kill -STOP) all boinc processes until the load level drops below a pre-defined limit.

However, the above solutions aren't entirely satisfactory -- i.e. they're hacks rather than addressing the cause of the problem. Is there a way to properly detect the load level within the boinc client itself ?

ID: 21664 · Report as offensive
Keck_Komputers
Avatar

Send message
Joined: 29 Aug 05
Posts: 304
United States
Message 21677 - Posted: 8 Dec 2008, 8:36:50 UTC

KILLing BOINC processes will most likely cause the task to error out. A better way would be using "boinccmd --set_run_mode never" to pause and "boinccmd --set_run_mode auto" to resume in your script. more details

Coming in 6.4.x you will be able to define exclusive apps that when running will cause BOINC to suspend processing.
BOINC WIKI

BOINCing since 2002/12/8
ID: 21677 · Report as offensive
Mark Tall

Send message
Joined: 7 Dec 08
Posts: 5
Australia
Message 21679 - Posted: 8 Dec 2008, 13:23:12 UTC - in response to Message 21677.  


Coming in 6.4.x you will be able to define exclusive apps that when running will cause BOINC to suspend processing.


Why not just detect the load level ? Surely this is a more general solution than detecting particular programs.
ID: 21679 · Report as offensive
Profile Jord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15477
Netherlands
Message 21680 - Posted: 8 Dec 2008, 13:42:06 UTC - in response to Message 21679.  

a) The science applications (not BOINC) are set to run at the lowest possible priority (thus lowest load level). They should yield if other processes need the processor.

b) BOINC uses an idle checker called boinctray, which may not work right on Linux. The last I heard about it was that the developers were still having problems with the correct detection under Linux.

c) Any platform specific additions to BOINC are a bit difficult to implement, because BOINC is a cross-platform application. This means that you should be able to download one source code and then directly compile it for Linux, or Windows or the Mac. When parts of BOINC are then dependent on a certain platform, this becomes difficult to maintain.
ID: 21680 · Report as offensive
Profile KSMarksPsych
Avatar

Send message
Joined: 30 Oct 05
Posts: 1239
United States
Message 21686 - Posted: 8 Dec 2008, 23:45:21 UTC

There is no boinctray on Linux. I believe it's a Windows only thing.

Mouse/keyboard detection may or may not be broken for Linux. I think it depends on the hardware (I think I saw someone post that it worked for PS2 hardware but not USB hardware) and/or the kernel version.

But it sounds like keyboard/mouse isn't the problem for the OP...

I'll bow out now because that's where my Linux knowledge ends.
Kathryn :o)
ID: 21686 · Report as offensive
Mark Tall

Send message
Joined: 7 Dec 08
Posts: 5
Australia
Message 21697 - Posted: 10 Dec 2008, 2:42:27 UTC - in response to Message 21680.  


c) Any platform specific additions to BOINC are a bit difficult to implement, because BOINC is a cross-platform application. This means that you should be able to download one source code and then directly compile it for Linux, or Windows or the Mac. When parts of BOINC are then dependent on a certain platform, this becomes difficult to maintain.


I understand the cross-platform maintanance issues (have to deal with them myself as well). However, the "load level detection" functionality brings a major benefit that outweighs the minor maintanance issues. Furthermore, it is not that difficult to implement under Linux and is likely to work under MacOS X as well.

method 1: getloadavg(). Present in glibc, Solaris and BSDs (hence likely in MacOS X).
method 2: /proc/loadavg (linux specific)

For cross-platform compilation, a wrapped version of getloadavg() can be used to abstract the functionality -- e.g. boinc_getloadavg(), with platform specific versions #ifdef'ed. Under Windows boinc_getloadavg() can either do nothing or call a suitable Windows function (there must be something similar).
ID: 21697 · Report as offensive
Keck_Komputers
Avatar

Send message
Joined: 29 Aug 05
Posts: 304
United States
Message 21703 - Posted: 10 Dec 2008, 8:03:52 UTC

I think not using load level detection was a design decision more than anything else. BOINC is designed to make the load level 100% in normal situations so that could be a conflict. Also I don't think windows differentiates between normal processes and "nice" processes like *nix does so any kind of load level detection would cause problems in windows.
BOINC WIKI

BOINCing since 2002/12/8
ID: 21703 · Report as offensive
Mark Tall

Send message
Joined: 7 Dec 08
Posts: 5
Australia
Message 21706 - Posted: 10 Dec 2008, 12:25:10 UTC - in response to Message 21703.  

BOINC is designed to make the load level 100% in normal situations so that could be a conflict. Also I don't think windows differentiates between normal processes and "nice" processes like *nix does so any kind of load level detection would cause problems in windows.


Based on my experience, the load level includes both normal and "nice" processes under Linux, so there is no conflict in terms of different functionality under Windows.

It's also not a problem that Boinc puts the load level to 100% per core. We are more interested if the load level considerably exceeds 100% (e.g. >150%), indicating that Boinc is running while a heavy-duty user program is also running. This is precisely when Boinc should suspend all computation.

As an example, let's say there has been no user activity for 30 minutes, but all eight cores on our machine are running user jobs. The load level is 800% (100% per core). The current version of Boinc thinks that nobody is using the machine, and hence starts its computation on all eight cores, raising the load level to 1600% (whether its niced or not). Boinc can already detect the number of cores, so normalising the load level by the number of cores is easy, and hence detection when computation should be suspended is also easy.


ID: 21706 · Report as offensive
Profile KSMarksPsych
Avatar

Send message
Joined: 30 Oct 05
Posts: 1239
United States
Message 21708 - Posted: 10 Dec 2008, 13:04:35 UTC

Idle detection is based on keyboard/mouse activity (in an ideal world).

I've not had trouble with BOINC (Windows and Linux) not getting out of the way. I've never done any tinkering with priority or nice level. That's just personal experience though and may or may not accurately represent the real world.
Kathryn :o)
ID: 21708 · Report as offensive
Profile Gundolf Jahn

Send message
Joined: 20 Dec 07
Posts: 1069
Germany
Message 21709 - Posted: 10 Dec 2008, 13:17:19 UTC - in response to Message 21706.  

...
It's also not a problem that Boinc puts the load level to 100% per core. We are more interested if the load level considerably exceeds 100% (e.g. >150%), indicating that Boinc is running while a heavy-duty user program is also running. This is precisely when Boinc should suspend all computation.

How can the load level exceed 100%?

As an example, let's say there has been no user activity for 30 minutes, but all eight cores on our machine are running user jobs. The load level is 800% (100% per core). The current version of Boinc thinks that nobody is using the machine, and hence starts its computation on all eight cores, raising the load level to 1600% (whether its niced or not)...

How can the load level exceed 800% on an 8-core machine?

Could you explain those maths to an ignorant physicist?

Gruß,
Gundolf
Computer sind nicht alles im Leben. (Kleiner Scherz)
ID: 21709 · Report as offensive
Profile KSMarksPsych
Avatar

Send message
Joined: 30 Oct 05
Posts: 1239
United States
Message 21710 - Posted: 10 Dec 2008, 13:36:54 UTC

I'll take a stab since I just finished talking with Eric M about this.

The higher the load average (can be seen with `top` as well as `uptime` and given as 1, 5 and 15 minute averages) the more the machine is trying to do. It includes processes waiting for the CPU as well as processes waiting for I/O. Load average is calculated similarly to RAC in BOINC. It's a weighted, moving average. Load is not equivalent to what you'd see in task manager in Windows. It can go over 100%. Right now, the load average on my machine (running BOINC, Firefox, Skype, xchat, Pidgin, mplayer and a shell in addition to the regular background processes) is as follows:

[kathryn@Galaxy ~]$ uptime
22:28:06 up 10 days, 7:07, 6 users, load average: 2.80, 3.06, 2.98


It's a dual core computer. So on average, I have 2 processes running and about 1 waiting to run.

If it was a single core computer it would have 1 process running and about 2 waiting to run.

Awww, heck. The Wikipedia page explains it better than I can.
Kathryn :o)
ID: 21710 · Report as offensive
Profile Gundolf Jahn

Send message
Joined: 20 Dec 07
Posts: 1069
Germany
Message 21712 - Posted: 10 Dec 2008, 14:29:42 UTC - in response to Message 21710.  

Okay, should have said "windozer" instead of "physicist" (though both is true :-)

Thanks for the explanation (and forwarding link).

Gruß,
Gundolf
Computer sind nicht alles im Leben. (Kleiner Scherz)
ID: 21712 · Report as offensive

Message boards : BOINC client : bug in idle detection

Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.