understanding overworked projects and LTD with CUDA

Message boards : Questions and problems : understanding overworked projects and LTD with CUDA
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile Joseph Stateson
Volunteer tester
Avatar

Send message
Joined: 27 Jun 08
Posts: 641
United States
Message 22446 - Posted: 14 Jan 2009, 22:28:46 UTC
Last modified: 14 Jan 2009, 22:37:50 UTC

I have a number of overworked projects on a quad opteron with a 9800gtx cuda device. I was trying to understand what is going on. A project is defined as overworked here if the LTD < -sched_period. I take it that the default 1 hour time slice represents 3600 seconds and is the scheduleing period referred to above ie: -3600 seconds.

Here is what BV shows for this system:


Not obvious above is that seti@home beta has a whole lot of really small jobs that run 20 minutes wall clock or about 5 minutes cpu seconds on the 9800gtx+. However, the completion time is shown as 13h, 45m, not 20 minutes as shown below


I assume this causes the appearant problem with the overworked projects?

It seems to me that if a project is overworked it is being given too many time slices. This system has a lot of memory that is not being used. If I drop the scheduling time from 1 hour down to, say 5 minutes, and set the memory to not swap out, then the long duration tasks (this would be all except seti beta) will get serviced more often and finish earlier. But if I do that, then the schedule period drops to "-300" and just about all those projects will stay overworked. Maybe someone can explain what is happening. BTW, the 6.07 beta are not causing any display problems nor have I seen any 0.01 vlar crap recently.
ID: 22446 · Report as offensive
Richard Haselgrove
Volunteer tester
Help desk expert

Send message
Joined: 5 Oct 06
Posts: 5082
United Kingdom
Message 22450 - Posted: 15 Jan 2009, 11:02:58 UTC - in response to Message 22446.  

First, let's get rid of a couple of red herrings.

1) BoincView time 'To completion' of 13h45m
I'm pretty sure this is a minor display problem in BoincView, given that it's a very old program and hasn't been updated to match all the latest developments in BOINC (but it still does a great job, and is the most-used program on my system). I *think* the display problem - since, as you have noticed, it only affects tasks 'Ready to start', and doesn't affect tasks 'Running', is an RPC bug in the latest CC - see my 'Time to completion' in BoincView for CUDA thread in API.

2) Messing with the task switch time slice.
Will be a minor workround at best. Don't waste time on it.

No, the real problem is with the management of debt on mutli-resource BOINC platforms. The BOINC developers forgot to think about this when rushing out the new version to catch the holiday graphics-card sales opportunity, and are now paying the price with some hurried reconsideration.

I have exactly the same problem with a quad-core+CUDA

Mine is currently running three projects, all with exactly equal resource share. So I have two cores running Astropulse, two cores running Einstein, and one core running SETI Beta CUDA.

So in terms of seconds (the only currency debt is measured in round here), Astropulse and Einstein are each getting twice the share of CUDA. They process 2 seconds (one on each core), for every one second that CUDA gets. It doesn't matter that my CUDA card is doing vastly more FLOPS or cobblestones than my CPU, the only thing BOINC understands is seconds, and all seconds are created equal. For the time being.

Normally, BOINC would solve this problem by suspending a project with negative STD on one core, and starting a second task on the project with positive STD on the spare core thus freed up. But here it can't - neither of us has got a spare CUDA core to give extra SETI Beta tasks to. And if BOINC is processing SETI Beta in CUDA mode, it won't assign a CPU core to it, even though the SETI Beta application is perfectly capable of dropping back to CPU-mode operation (well, in a rudimentary sort of way - they stripped out all the CPU optimisation code).

Your display is trivial - I'm showing SETI Beta with +ve STD of 50,467: Astropulse -ve STD 21,826: and Einstein -ve STD 28,640. Astropulse is also showing a LTD of -3,958,122 - but it still fetches work if I massage it hard enough.

For the time being, the only way to allow BOINC to manage debts automatically would be to set the sum of your CUDA-enabled projects (SETI and GPUgrid, if you wanted to add that back into the mix) to be exactly 20% of your overall all-project resource share. If you go any lower than 20%, except for a short-term re-balancing period, you'll go into -ve STD for CUDA. I don't know whether BOINC would be stupid enough to idle your GPU under those circumstances, and I probably won't bother to find out.

For the moment, resource share is the key to your question, and remember that it's only measured in seconds - until at least BOINC v6.8, I believe.
ID: 22450 · Report as offensive

Message boards : Questions and problems : understanding overworked projects and LTD with CUDA

Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.