Posts by Art Brown

1) Message boards : Questions and problems : Task status changes from 'Running' to 'Waiting to run' automatically (Message 42188)
Posted 23 Jan 2012 by Art Brown
Post:
Problem solved!

The BOINC client software that runs continuously in the background periodically checks to see if more or fewer tasks can be run (among many other things).

In my case of a 4-core single-threaded CPU, BOINC was adding up the CPU Usage values for all of the tasks that could be run. When that total CPU Usage exceeded the number of cores (4), BOINC suspended a task until the task could run and not exceed the core count.

You can see the calculation in the Messages tab of Boinc Tasks if you add a line in the cc_config.xml file (see below) in the <log_flags> section:

<cpu_sched_debug>1</cpu_sched_debug>

Change the argument from 1 to 0 to turn off the logging function. A BOINC restart is required for <log_flags> changes to take effect.

The solution was to simply add one line in the cc_config.xml file that changes the number of CPU cores that BOINC uses to schedule tasks. In my case I set the core count to 5:

<ncpus>5</ncpus>

This entry goes in the <options> section of the cc_config.xml file (below).

Once entered, you need to execute the "Read configuration file" command in your Boinc Manager or Boinc Tasks program for changes to the <options> section to take effect.

From then on, Boinc will use the new core count value to determine how many tasks it will schedule. The higher value prevented the problem I was seeing, which was a task would automatically change from "Running" to "Waiting to run".

There is a Message thread on this issue on the Test4Theory (T4T) web site: http://lhcathome2.cern.ch/test4theory/forum_thread.php?id=742

With <ncpus> set at 5, I noticed that the total CPU load was about 97%, so I changed the <ncpus> argument from 5 to 6, and now the CPU load stays at 100%. The extra task that is run is simply "time shared" among the available cores by Window's normal scheduling algorithms. Sometimes there is an extra World Community Grid (WCG) task in the "Ready to start" or "Waiting to run" state, but that doesn't cause any problems. That task just waits its turn to run.

I run the WCG tasks at Low CPU priority, VirtualBox.exe for the Test4Theory task at Below Normal (between Low and Normal priorities), and the GPUGRID tasks at Normal priority. These settings produce a fully responsive PC to human inputs, yet use all of the available CPU capacity available. I use Process Lasso, a free utility from Bitsum Technologies, to set the CPU priorities in my Windows 7 system.

Now the 4-core PC runs 3 WCG 1-core tasks, one 2-core T4T task, and 2 GPUGRID 0.37-core tasks concurrently, which fully utilize the CPU and the two GPUs.

Here is the cc_config.xml file I am currently using:

<cc_config>
<log_flags>
<cpu_sched_debug>0</cpu_sched_debug>
</log_flags>
<options>
<client_version_check_url>http://www.worldcommunitygrid.org/download.php?xml=1</client_version_check_url>
<client_download_url>http://www.worldcommunitygrid.org/download.php</client_download_url>
<network_test_url>http://www.ibm.com/</network_test_url>
<use_all_gpus>1</use_all_gpus>
<ncpus>6</ncpus>
<report_results_immediately>1</report_results_immediately>
<start_delay>60</start_delay>
</options>
</cc_config>

Art Brown
2) Message boards : Questions and problems : Task status changes from 'Running' to 'Waiting to run' automatically (Message 42149)
Posted 19 Jan 2012 by Art Brown
Post:
Hi Jord,

The scheduler debug Message log captured the events that you mentioned regarding limiting CPU usage (lines 112 & 113):

112 World Community Grid 2012-01-19 06:12 [cpu_sched] avoiding overcommit with multithread job, skipping c4cw_target05_058607454_0
113 2012-01-19 06:12 [cpu_sched] using 3.75 out of 4 CPUs

Thanks for picking up on this issue for me.

The result of this behavior is to change the status of a task from "Running" to "Waiting to run", effectively shutting the task down until another task completes, which could be several hours.

I think you said that this behavior comes from the 2-core Test4Theory (T4T) application.

If that is right, I'd like to see T4T (or whoever controls it) let the user choose whether they want this automatic CPU utilization control, or if they would prefer no T4T CPU utilization control.

In my case, the T4T automatic status control:
1) shuts down a non-T4T task which results in one core idling with no BOINC task to run.
- This has two negative side effects from my perspective:
- a) T4T takes CPU time away from a non-T4T project, which seems unfair.
- b) it also reduces the overall BOINC CPU efficiency of my PC; my 4-core CPU becomes a 3-core CPU.
2) the T4T 2-core application only consumes 70% of the CPU time of the two cores it runs on. This low T4T efficiency effectively reduces the "3-core CPU" to a "1 + 0.7 + 0.7-core CPU" or "2.4-core CPU".

Overall, that is a 1.6 core (40%) reduction in BOINC CPU time on a 4-core system as the direct result of running the 2-core T4T task and its CPU scheduling process. Loosing 1.6 cores is a very steep price to pay to increase the T4T CPU time from 100% of 1 core to 70% of 2 cores, a gain of just 0.4 cores.

I think a contributing factor to the scheduling problem is the CPU "Use" estimate that the GPUGRID tasks have. In my case, each GPUGRID task was rated at 0.37 CPU + 1 Nvidia GPU. However the long-term CPU load for the GPU tasks is under 4%, not the 37% the rating indicates. It almost looks like the CPU rating has the decimal point in the wrong place. At these low measured CPU loadings, the GPU tasks could be ignored when calculating CPU loads. That would have made line 113 above calculate to needing 3.01 cores (not 3.75) for 4 tasks (2 GPUGRID, 1 WCG and one 2-core T4T), which should allow 4 cores to run 5 tasks. No tasks would have to be suspended, and the PC would still be responsive, at least the way I run my BOINC tasks.

I maintain good PC responsiveness running 2 GPUGRID tasks and 4 BOINC CPU tasks by setting World Community Grid tasks to Below Normal, so in my case I don't need the T4T control, and would not use it if I had the choice.

I think I'll go back to the 1-core T4T tasks until a solution for this problem is available. I paid for a 4-core CPU, and this is currently the only way to get all of the benefits from it. I hope this is temporary, however.

In any case, thanks again for solving the mystery of automatic task status changes.

If there is anything that I can do to help resolve this issue, please let me know.

Art
3) Message boards : Questions and problems : Task status changes from 'Running' to 'Waiting to run' automatically (Message 42143)
Posted 18 Jan 2012 by Art Brown
Post:
Thanks for the detailed explanations. They are very useful and appreciated.

1) I added <cpu_sched_debug>1</cpu_sched_debug> to cc_config.xml, so we'll see what shows up the next time the WCG status is changed.

Currently, all 5 tasks are running happily on the 4-core CPU: 2 GPUGRID, 2 WCG and one 2-core T4T.

2) I changed the new T4T option to assign 2-core tasks so I can continue to monitor the automatic status change issue.

3) I noticed the 2-core T4T task uses about 35% of the total CPU capacity, or only about 70% of the 2 cores it uses. There is a lot of idle time available in those two cores. The WCG tasks use virtually all of each core they are assigned to.

Do you have any insight on why the 2-core T4T task doesn't use more of the CPU resource it has available to it?

Regarding the usability issue on a heavily loaded machine, I set the WCG task priorities to Below Normal (using Process Lasso), so they only run when nothing more important is taking place. I leave the BOINC tasks in memory so they can be run without going to disk. I have assigned the GPU tasks to specific cores in the past, but I am not seeing degradation in CPU or GPU performance if I don't, so right now the CPU affinity for the GPUGRID tasks is not restricted by me. So far, my PC is responsive to human input, yet can run as many BOINC tasks as possible.

Before the 2-core T4T tasks were available, I ran all 4 cores at virtually 100% load, and the PC was still responsive, using the Below Normal WCG priority strategy above.

Thanks again. Art Brown
4) Message boards : Questions and problems : Task status changes from 'Running' to 'Waiting to run' automatically (Message 42134)
Posted 18 Jan 2012 by Art Brown
Post:
Thanks for the background info.

1) Since I can suspend and resume World Community Grid tasks following an automatic status change such that both WCG tasks run, it is curious that the thread count would be different after both WCG tasks resume running than before the status change, but perhaps that is what happens.

On a dual core machine I have, I run one GPUGRID task, one T4T single core task, and one WCG CPU task, and I have not seen the automatic status change for the WCG task as yet. All three tasks run at the same time continuously and without any apparent conflicts.

2) The WCG status change is not coincident with a T4T task starting. The status change happens at some later, apparently random time that I have not as yet been able to correlate anything to. Additionally on the 4-core machine, once the two WCG tasks are running, along with the two GPUGRID tasks and the 2-core T4T task, all five BOINC tasks continue to run for hours until an automatic status change happens.

3) I had not considered the thread count issue, so I checked the thread count for the BOINC tasks I run. Each GPUGRID tasks shows 6 threads, each WCG task shows 2 or 3 threads, and virtualbox.exe for the T4T task shows 24 threads. I don't know if these threads are the same kind of threads you referred to, however.

Thanks again. Art Brown
5) Message boards : Questions and problems : Task status changes from 'Running' to 'Waiting to run' automatically (Message 42133)
Posted 18 Jan 2012 by Art Brown
Post:
Thanks for the suggestion.

I use ProcessLasso to set the GPUGRID CPU priorities to Above Normal, and affinities to specific cores if I see any CPU usage conflicts between GPU and CPU tasks. A GPUGRID task only consumes up to about 4% of a CPU, so this way I can use all CPUs for BOINC CPU tasks and can run GPUGRID tasks at the same time.

I think that BOINC has a task management issue that causes the task statuses to change unexpectedly.

Regards, Art Brown
6) Message boards : Questions and problems : Task status changes from 'Running' to 'Waiting to run' automatically (Message 42125)
Posted 17 Jan 2012 by Art Brown
Post:
Greetings.

The status of a running BOINC task is being automatically changed from "Running" to "Waiting to run", and I don't know why.

I am running BOINC 6.12.34 for Windows x86-64 on an Intel Core i7 920 running 4 single-threaded cores.

The OS is Windows 7 Home Premium x64 SP 1 with all updates applied through 15 Jan 2012.

I use BoincTasks 1.30 to view my BOINC projects.

I run two World Community Grid (WCG) CPU tasks, one Test4Theory (T4T) version 7.05 2-core CPU task, and two GPUGRID tasks on two GPUs.

Occasionally one of the WCG tasks has its status changed from "Running" to "Waiting to run", yet there is an open core available for it to use (the core it was just Running on); another task is not started in its place. The number of Running tasks just drops from 5 (2 GPUGRID, 2 WCG, and one T4T) to 4 (2 GPUGRID, 1 WCG, and 1 T4T) along with a corresponding drop in CPU utilization.

The automatic status changes are not coincident with any BOINC task completions or transfers.

When one WCG task is automatically changed to "Waiting to run", if I Suspend the other running WCG task, the "Waiting to run" task runs. Then when I Resume the WCG task I just suspended, it runs, too.

The only entries in the Messages log are the manual status changes I initiate in response to an automatic status change.

I can induce this behavior when one WCG task is "Running", and the other WCG task is "Running High Priority". I Suspend the "Running" task, then Resume the same task. It's status changes to "Waiting to run" for an indefinite period (many minutes, perhaps until the other WCG task completes). Then I suspend the "Running High Priority" task, and then Resume it. Now both WCG tasks will run until the next automatic status change. Suspending the "Running High Priority" task and Resuming it works as expected: the task resumes its previous "Running High Priority" status.

I noticed that if the WCG tasks have a status of "Running" when no T4T task is running, the WCG statuses change to "Running High Priority" as soon as a T4T tasks runs. This is independent from and possibly unrelated to the automatic status changes to "Waiting to run", but I don't know why this behavior is occurring.

I could have a configuration issue with BOINC or my PC that I don't know how to change to prevent these automatic status changes, or there might be another explanation.

Any comments or suggestions would be appreciated.

Thanks for your help, and let me know if you need more info.

Regards, Art Brown




Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.