Boinc client wastes resources

Message boards : Questions and problems : Boinc client wastes resources
Message board moderation

To post messages, you must log in.

AuthorMessage
BobmALCS

Send message
Joined: 22 Feb 12
Posts: 19
United Kingdom
Message 71152 - Posted: 2 Aug 2016, 19:01:34 UTC

BOINC Client 7.6.22 on Windows 7 frequently does not utilise all available resources for many hours at a time.

4 core CPU with 1 allocated to Boinc.
2 GPUs. GpuA is about 4 to 5 times 'faster' than GpuB
ProjectA - CPU only. Resource share 400.
ProjectB - GPU only. Resource share 50. Can run on either or both GPUs. No restrictions. Tasks take about 2 hours or 12 hours.
ProjectC - GPU only. Resource share 10. Can run on either GPU. Restricted to one task at a time. Each task takes about 10 minutes on GpuA.

PrjA runs without a problem all the time.

Often (I don't keep a close watch) I see that only PrjC is running on one of the GPUs, usually GpuA. The other GPU is not utilised at all. The last time I checked it was this situation for at least 4 hours.

At the beginning of this I get the message
"Not requesting tasks: don't need (CPU: job cache full; NVIDIA GPU: not highest priority project)"
for PrjB.

All the projects constantly have work available for them.

Why does Boinc waste resources like this and can it be avoided.

BobM
ID: 71152 · Report as offensive
Richard Haselgrove
Volunteer tester
Help desk expert

Send message
Joined: 5 Oct 06
Posts: 5082
United Kingdom
Message 71163 - Posted: 3 Aug 2016, 10:27:59 UTC - in response to Message 71152.  

I assume that the GPU resource is idle because no work is downloaded and "Ready to start"?

The question then becomes "Why doesn't BOINC request, or possibly receive, GPU work from PrjB or PrjC?"?

My suspicion is that it's because your resource shares for those two projects are so low, and may be aggravated if either of those two projects apply limits to the number of tasks you can download or run at any one time.

The best way of diagnosing a case like this is to briefly enable the "work_fetch_debug" Event Log option - that's very easy in BOINC v7.6.22, just press Ctrl+Shift+F from BOINC Manager and tick the box. Then open the Event Log window itself (Ctrl+Shift+E), click the line '--- start work fetch state ---', shift-click the line '--- end work fetch state ---' (that should select all lines in between), copy the selected lines and paste them here for us to analyse. Depending how well we know the behaviour of A, B and C, we may also need to ask you to post a full 'scheduler request' and reply from the Event Log for each of them.
ID: 71163 · Report as offensive
BobmALCS

Send message
Joined: 22 Feb 12
Posts: 19
United Kingdom
Message 71168 - Posted: 3 Aug 2016, 16:34:25 UTC - in response to Message 71163.  

As requested. Let me know if you need more.

03/08/2016 17:29:22 | | Re-reading cc_config.xml
03/08/2016 17:29:22 | | Config: event log limit 20000 lines
03/08/2016 17:29:22 | | Config: use all coprocessors
03/08/2016 17:29:22 | | log flags: file_xfer, sched_ops, task, work_fetch_debug
03/08/2016 17:29:22 | Collatz Conjecture | Found app_config.xml
03/08/2016 17:29:22 | Einstein@Home | Found app_config.xml
03/08/2016 17:29:22 | Milkyway@Home | Found app_config.xml
03/08/2016 17:29:22 | SETI@home | Found app_config.xml
03/08/2016 17:29:22 | | [work_fetch] Request work fetch: Core client configuration
03/08/2016 17:29:24 | | [work_fetch] ------- start work fetch state -------
03/08/2016 17:29:24 | | [work_fetch] target work buffer: 180.00 + 43200.00 sec
03/08/2016 17:29:24 | | [work_fetch] --- project states ---
03/08/2016 17:29:24 | Collatz Conjecture | [work_fetch] REC 15071.357 prio -9.171 can request work
03/08/2016 17:29:24 | Einstein@Home | [work_fetch] REC 59773.957 prio -7.419 can request work
03/08/2016 17:29:24 | GPUGRID | [work_fetch] REC 0.000 prio -1000.000 can't request work: "no new tasks" requested via Manager
03/08/2016 17:29:24 | Milkyway@Home | [work_fetch] REC 0.000 prio -0.000 can't request work: "no new tasks" requested via Manager
03/08/2016 17:29:24 | PrimeGrid | [work_fetch] REC 746.517 prio -0.016 can request work
03/08/2016 17:29:24 | SETI@home | [work_fetch] REC 3.917 prio -1000.000 can request work
03/08/2016 17:29:24 | | [work_fetch] --- state for CPU ---
03/08/2016 17:29:24 | | [work_fetch] shortfall 13752.53 nidle 0.00 saturated 29627.47 busy 0.00
03/08/2016 17:29:24 | Collatz Conjecture | [work_fetch] share 0.000 blocked by project preferences
03/08/2016 17:29:24 | Einstein@Home | [work_fetch] share 0.000 blocked by project preferences
03/08/2016 17:29:24 | GPUGRID | [work_fetch] share 0.000 blocked by project preferences
03/08/2016 17:29:24 | Milkyway@Home | [work_fetch] share 0.000
03/08/2016 17:29:24 | PrimeGrid | [work_fetch] share 1.000
03/08/2016 17:29:24 | SETI@home | [work_fetch] share 0.000 blocked by project preferences
03/08/2016 17:29:24 | | [work_fetch] --- state for NVIDIA GPU ---
03/08/2016 17:29:24 | | [work_fetch] shortfall 0.00 nidle 0.00 saturated 45161.49 busy 0.00
03/08/2016 17:29:24 | Collatz Conjecture | [work_fetch] share 0.167
03/08/2016 17:29:24 | Einstein@Home | [work_fetch] share 0.833
03/08/2016 17:29:24 | GPUGRID | [work_fetch] share 0.000 zero resource share
03/08/2016 17:29:24 | Milkyway@Home | [work_fetch] share 0.000
03/08/2016 17:29:24 | PrimeGrid | [work_fetch] share 0.000 project is backed off (resource backoff: 79001.55, inc 86400.00)
03/08/2016 17:29:24 | SETI@home | [work_fetch] share 0.000 zero resource share
03/08/2016 17:29:24 | | [work_fetch] ------- end work fetch state -------
03/08/2016 17:29:24 | | [work_fetch] No project chosen for work fetch
03/08/2016 17:29:46 | | Re-reading cc_config.xml

BobM
ID: 71168 · Report as offensive
BobmALCS

Send message
Joined: 22 Feb 12
Posts: 19
United Kingdom
Message 71230 - Posted: 5 Aug 2016, 23:27:22 UTC - in response to Message 71168.  

Did the same thing when only 1 GPU was being used.

06/08/2016 00:20:11 | | Re-reading cc_config.xml
06/08/2016 00:20:11 | | Config: event log limit 20000 lines
06/08/2016 00:20:11 | | Config: use all coprocessors
06/08/2016 00:20:11 | | log flags: file_xfer, sched_ops, task, work_fetch_debug
06/08/2016 00:20:11 | Collatz Conjecture | Found app_config.xml
06/08/2016 00:20:11 | Einstein@Home | Found app_config.xml
06/08/2016 00:20:11 | Milkyway@Home | Found app_config.xml
06/08/2016 00:20:11 | SETI@home | Found app_config.xml
06/08/2016 00:20:11 | | [work_fetch] Request work fetch: Core client configuration
06/08/2016 00:20:13 | | [work_fetch] ------- start work fetch state -------
06/08/2016 00:20:13 | | [work_fetch] target work buffer: 180.00 + 43200.00 sec
06/08/2016 00:20:13 | | [work_fetch] --- project states ---
06/08/2016 00:20:13 | Collatz Conjecture | [work_fetch] REC 14821.802 prio -304.127 can request work
06/08/2016 00:20:13 | Einstein@Home | [work_fetch] REC 63776.540 prio -0.250 can't request work: scheduler RPC backoff (7250.03 sec)
06/08/2016 00:20:13 | GPUGRID | [work_fetch] REC 0.000 prio -1000.000 can't request work: "no new tasks" requested via Manager
06/08/2016 00:20:13 | Milkyway@Home | [work_fetch] REC 0.000 prio -0.000 can't request work: "no new tasks" requested via Manager
06/08/2016 00:20:13 | PrimeGrid | [work_fetch] REC 768.647 prio -0.057 can request work
06/08/2016 00:20:13 | SETI@home | [work_fetch] REC 2.636 prio -1000.000 can request work
06/08/2016 00:20:13 | | [work_fetch] --- state for CPU ---
06/08/2016 00:20:13 | | [work_fetch] shortfall 1579.10 nidle 0.00 saturated 41800.90 busy 0.00
06/08/2016 00:20:13 | Collatz Conjecture | [work_fetch] share 0.000 blocked by project preferences
06/08/2016 00:20:13 | Einstein@Home | [work_fetch] share 0.000 blocked by project preferences
06/08/2016 00:20:13 | GPUGRID | [work_fetch] share 0.000 blocked by project preferences
06/08/2016 00:20:13 | Milkyway@Home | [work_fetch] share 0.000
06/08/2016 00:20:13 | PrimeGrid | [work_fetch] share 1.000
06/08/2016 00:20:13 | SETI@home | [work_fetch] share 0.000 blocked by project preferences
06/08/2016 00:20:13 | | [work_fetch] --- state for NVIDIA GPU ---
06/08/2016 00:20:13 | | [work_fetch] shortfall 4904.56 nidle 0.00 saturated 40679.32 busy 0.00
06/08/2016 00:20:13 | Collatz Conjecture | [work_fetch] share 1.000
06/08/2016 00:20:13 | Einstein@Home | [work_fetch] share 0.000
06/08/2016 00:20:13 | GPUGRID | [work_fetch] share 0.000 zero resource share
06/08/2016 00:20:13 | Milkyway@Home | [work_fetch] share 0.000
06/08/2016 00:20:13 | PrimeGrid | [work_fetch] share 0.000 project is backed off (resource backoff: 48238.04, inc 86400.00)
06/08/2016 00:20:13 | SETI@home | [work_fetch] share 0.000 zero resource share
06/08/2016 00:20:13 | | [work_fetch] ------- end work fetch state -------
06/08/2016 00:20:13 | | [work_fetch] No project chosen for work fetch
06/08/2016 00:20:25 | | Re-reading cc_config.xml

It is not fetching work from Einstein@Home even though there is a 'spare' GPU it could run on.

BobM
ID: 71230 · Report as offensive
Juha
Volunteer developer
Volunteer tester
Help desk expert

Send message
Joined: 20 Nov 12
Posts: 801
Finland
Message 71315 - Posted: 8 Aug 2016, 18:20:19 UTC - in response to Message 71230.  

It is not fetching work from Einstein@Home even though there is a 'spare' GPU it could run on.


In this case there was scheduler RPC backoff still active:

06/08/2016 00:20:13 | Einstein@Home | [work_fetch] REC 63776.540 prio -0.250 can't request work: scheduler RPC backoff (7250.03 sec)


And PrimeGrid had resource backoff:

06/08/2016 00:20:13 | PrimeGrid | [work_fetch] share 0.000 project is backed off (resource backoff: 48238.04, inc 86400.00)


I don't know why it didn't try to get any work from Collatz:

06/08/2016 00:20:13 | Collatz Conjecture | [work_fetch] share 1.000


Anyway, in the first post you describe the situation with three projects but I'm counting four in the log. Could you describe the situation with real project names?

You say Project C is limited to only task at a time. Is that done with max_concurrent? The last I heard max_concurrent and work fetch don't play well together. I don't think that has changed.
ID: 71315 · Report as offensive
BobmALCS

Send message
Joined: 22 Feb 12
Posts: 19
United Kingdom
Message 71318 - Posted: 8 Aug 2016, 22:10:44 UTC - in response to Message 71315.  

Collatz - Share 10. GPU only. Only one task allowed at a time. Set by <max_concurrent>1</max_concurrent>

Einstein@Home - Share 50. GPU only. No restrictions.

PrimeGrid - Share 400. CPU only. No restrictions.

Seti@home - Share 0. GPU only. No restrictions.

Milkyway@home - Share 5. CPU and/or GPU. Set to not get new tasks.

GPUGRID - Share 0. GPU only. Set to not get new tasks.

CPU - Intel Core i5 Quad 2500K (Sandy Bridge) 3.30GHz [D2].
Overclocked to 4.3GHz.
Only 1 cpu out 4 available to BOINC.

GPUs - GTX760 and GT640

BobM
ID: 71318 · Report as offensive
Juha
Volunteer developer
Volunteer tester
Help desk expert

Send message
Joined: 20 Nov 12
Posts: 801
Finland
Message 71363 - Posted: 9 Aug 2016, 18:30:49 UTC - in response to Message 71318.  

Ok. I still think the root cause of the problem is max_concurrent not considered in work fetch but at this point I have to say "back to you, Richard".
ID: 71363 · Report as offensive
BobmALCS

Send message
Joined: 22 Feb 12
Posts: 19
United Kingdom
Message 71429 - Posted: 10 Aug 2016, 12:40:53 UTC - in response to Message 71363.  

Whilst looking at the above scheduler problem I noticed another peculiarity.

Collatz tasks have a stated CPU requirement of >0.5 of a CPU.
If I run 2 Collatz (GPU only) tasks then the PrimeGrid (CPU only) task is stopped.
So I forced the Collatz task to require 0.1 of a CPU using app_config and <cpu_usage>0.1</cpu_usage>.
Now when 2 Collatz tasks run PrimeGrid continues quite happily.

I had always understood, possibly wrongly, that you had to leave 'spare' CPU cores for the GPU task to utilise. In my case there are 3 spare CPUs. So, what does the scheduler do about CPU requirements for CPU time for GPUs.

This raised another, minor, problem.
In app_config for Collatz I had

<gpu_versions>
<gpu_usage>1.0</gpu_usage>
<cpu_usage>0.1</cpu_usage>
</gpu_versions>

If the <gpu_usage> tag is omitted then the <cpu_usage> tag is silently ignored. No error message is issued.
The syntax shows that both tags are required so an error message should be issued. Or perhaps allow each tag to default to a value of 1.

BobM
ID: 71429 · Report as offensive
Juha
Volunteer developer
Volunteer tester
Help desk expert

Send message
Joined: 20 Nov 12
Posts: 801
Finland
Message 71443 - Posted: 10 Aug 2016, 17:18:54 UTC - in response to Message 71429.  

I had always understood, possibly wrongly, that you had to leave 'spare' CPU cores for the GPU task to utilise. In my case there are 3 spare CPUs. So, what does the scheduler do about CPU requirements for CPU time for GPUs.


I believe it depends on the science app. One benefits from a free CPU core, another requires a free core, and third couldn't care less. Some people tell BOINC to use only x cores, other like to set cpu_usage with app_config.xml. Seems you had both with Collatz.

The server computes cpu_usage using some kind of simple formula and often gets it wrong. The >0.5 figure feels quite high. Maybe Collatz has tweaked server code. You'd have to check their forums.
ID: 71443 · Report as offensive

Message boards : Questions and problems : Boinc client wastes resources

Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.