boinc sometimes will only request GPU work.

Message boards : Questions and problems : boinc sometimes will only request GPU work.
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
PaperDragon

Send message
Joined: 13 Sep 05
Posts: 10
Message 27468 - Posted: 18 Sep 2009, 0:03:03 UTC

Every so often my BOINC installation will only request GPU work for projects, not any CPU work. Even projects that do not have GPU applications. So, of course, I do not get any work units.

The only solution I have found is to detach all projects and re-add, as they run out of CPU work.

So before the deattach I will get a message that says requesting GPU work. After the deattach/reattach I will get the message requesting CPU and GPU work.

Anyone have any ideas as to why this happens?
ID: 27468 · Report as offensive
Les Bayliss
Help desk expert

Send message
Joined: 25 Nov 05
Posts: 1654
Australia
Message 27469 - Posted: 18 Sep 2009, 0:39:41 UTC

If you have a graphics card that is suitable for gpu work, and the drivers that are needed for gpu work, then BOINC is programmed to ask each of your projects for gpu work, so that in the event that one of those projects suddenly releases a gpu application, your computer will get some.
Apparently, after a while BOINC will learn not to ask. Provided that you stop detaching all of the time, which forces BOINC to re-start it's learning.

If you don't want any gpu work, there's an option among the preferences to say so.

As for not getting any cpu work, it may be that BOINC has decided that your computer has all of the work that it can handle at present.
And constantly detaching may be interfering with this. Just let BOINC get on with it.

ID: 27469 · Report as offensive
BarryAZ

Send message
Joined: 4 Sep 09
Posts: 381
United States
Message 27547 - Posted: 23 Sep 2009, 16:31:32 UTC - in response to Message 27469.  

I find it curious and frustrating in mixed mode scenarios.

I run multiple CPU only and multiple GPU only projects. I also run some projects which have both CPU and GPU support and have configured them at the project level for GPU only or CPU only.


Unfortunately, the functionality which allows the client to figure out resource share from that project configured setting has not been completed to obtain the project specific settings regarding GPU/CPU handling and instead imposes it's own 'logic' regarding GPU or CPU work fetch. It seems to me that rather than ping the client servers repeatedly for work the project can not send out, the more 'user friendly' and 'project friendly' approach would be to read the configured preferences and USE them.

I have heard tales about I/O constraints with various projects (including SETI), and would think that even if the user level frustration had no bearing in the client design, generating additional useless project I/O hits might motivate the developers to fix this bug (which has been there starting with the 6.6.36 client and appears to have gotten significantly worse with the 6.10.x series).




If you have a graphics card that is suitable for gpu work, and the drivers that are needed for gpu work, then BOINC is programmed to ask each of your projects for gpu work, so that in the event that one of those projects suddenly releases a gpu application, your computer will get some.
Apparently, after a while BOINC will learn not to ask. Provided that you stop detaching all of the time, which forces BOINC to re-start it's learning.

If you don't want any gpu work, there's an option among the preferences to say so.

As for not getting any cpu work, it may be that BOINC has decided that your computer has all of the work that it can handle at present.
And constantly detaching may be interfering with this. Just let BOINC get on with it.


ID: 27547 · Report as offensive
BarryAZ

Send message
Joined: 4 Sep 09
Posts: 381
United States
Message 27568 - Posted: 24 Sep 2009, 18:04:08 UTC

I still find it curious that the current iteration of the BOINC client insists on fetching GPU work from projects that don't support GPU work, and then insists on fetching CPU work from projects that don't support CPU work.

The project/user preference is available to the client, and why the client ignores it strikes me as, shall we say, a 'suboptimal' design choice.

That this draws seemingly little or no comment suggests something of a case of 'dropping the ball' regarding design considerations or attitude.
ID: 27568 · Report as offensive
BarryAZ

Send message
Joined: 4 Sep 09
Posts: 381
United States
Message 27569 - Posted: 24 Sep 2009, 18:32:59 UTC - in response to Message 27568.  

One work around for the current state of the 6.6.36 and more recent client fetch routine:

1) Shut down the client
2) Uninstall the client
3) Reinstall the better fetch handling client 6.4.5
4) Load up on the queues for those projects which are being starved
by the current fetch design (I find this particularly useful for
POEM and Spinhenge on the CPU side and GPUGrid on the GPU side, it
won't work for Collatz since they require the 6.10.x client)
5) After developing an overlarge queue for those projects being starved
by the 6.6.36 and more recent work fetch routine, you can typically
install the new client over the existing client (though an uninstall
might be cleaner).

This procedure works in 32 bit XP, and I suspect 32 bit Vista (haven't tried it there yet).

The ONLY reason at the moment for using the later client is to work with those projects which support ATI GPU (pretty much compels the 6.10.x client) or CUDA 2.3 support (again, I think that requires the 6.10.x client).

At the moment, if you are running only CPU tasks and not running Vista/Win7 on the Windows side (say Windows XP or 2000), then the 5.4.5 client is the last 'good' one. If you are running a mix with GPU and CPU tasks, but NOT running Collatz or an ATI GPU supporting project, then the 6.4.5 client seems the best choice.

The only time I need to employ this work around is when I am running Collatz in a workstation project mix.

Eventually, I would sincerely hope that the work fetch routines utilize configuration information available to the client and not stress the projects with needless I/O pings nor frustrate users trying to keep a balance of multiproject GPU and CPU tasks on hand.

That the 6.10.x client supports ATI GPU is an EXCELLENT thing. That is supports CUDA 2.3 is good as well. I just wish the work fetch routine wasn't skewered in the process.



I still find it curious that the current iteration of the BOINC client insists on fetching GPU work from projects that don't support GPU work, and then insists on fetching CPU work from projects that don't support CPU work.

The project/user preference is available to the client, and why the client ignores it strikes me as, shall we say, a 'suboptimal' design choice.

That this draws seemingly little or no comment suggests something of a case of 'dropping the ball' regarding design considerations or attitude.

ID: 27569 · Report as offensive
Profile Gundolf Jahn

Send message
Joined: 20 Dec 07
Posts: 1069
Germany
Message 27570 - Posted: 24 Sep 2009, 19:20:24 UTC - in response to Message 27569.  
Last modified: 24 Sep 2009, 19:21:18 UTC

At the moment, if you are running only CPU tasks and not running Vista/Win7 on the Windows side (say Windows XP or 2000), then the 5.4.5 client is the last 'good' one.

Wouldn't that be 5.10.45? [edit]Or 5.10.20 for Einstein[/edit]

Gruß,
Gundolf
Computer sind nicht alles im Leben. (Kleiner Scherz)
ID: 27570 · Report as offensive
BarryAZ

Send message
Joined: 4 Sep 09
Posts: 381
United States
Message 27572 - Posted: 24 Sep 2009, 20:00:54 UTC - in response to Message 27570.  

My error -- you are quite correct - 5.10.45. I haven't seen problems with it and Einstein so I'm not sure of your 5.10.20 reference.


At the moment, if you are running only CPU tasks and not running Vista/Win7 on the Windows side (say Windows XP or 2000), then the 5.4.5 client is the last 'good' one.

Wouldn't that be 5.10.45? [edit]Or 5.10.20 for Einstein[/edit]

Gruß,
Gundolf

ID: 27572 · Report as offensive
Profile Gundolf Jahn

Send message
Joined: 20 Dec 07
Posts: 1069
Germany
Message 27574 - Posted: 24 Sep 2009, 21:05:50 UTC - in response to Message 27572.  

My error -- you are quite correct - 5.10.45. I haven't seen problems with it and Einstein so I'm not sure of your 5.10.20 reference.

Oh, those problems only occur when the updload server crashes. Then a client bug introduced with 5.10.21 causes a permanent upload error instead of a temporary one.

Gruß,
Gundolf
ID: 27574 · Report as offensive
Richard Haselgrove
Volunteer tester
Help desk expert

Send message
Joined: 5 Oct 06
Posts: 5077
United Kingdom
Message 27575 - Posted: 24 Sep 2009, 21:48:10 UTC - in response to Message 27547.  

.... (which has been there starting with the 6.6.36 client and appears to have gotten significantly worse with the 6.10.x series).

No, no, and again no.

You're referring to the logged request for CPU/GPU work on projects which supply (for the time being) only GPU/CPU work.

Whether you feel that the request "Have you, perchance, installed a new application since the last time I asked?" is a bug or a feature - and that's a fair question, which is worthy of debate - the fact remains that BOINC v6.6.xx clients have always made these requests. The only thing new "starting with the 6.6.36 client" was that details of the request were logged by default to the messages tab: previously, you had to enable debug logging to find out whether the work requested was for CPU, GPU or (notoriously) neither.

As I've said before, the only thing you gain by degrading to v6.6.33 or earlier is ignorance (lack of information in the logs). And the only thing you gain by degrading to v6.4.5 is the inability to run both CPU and GPU tasks for the same project/application.
ID: 27575 · Report as offensive
BarryAZ

Send message
Joined: 4 Sep 09
Posts: 381
United States
Message 27578 - Posted: 25 Sep 2009, 5:05:47 UTC - in response to Message 27575.  

OK -- so it is just anecdotal that with the 6.45 client when I see a work request it get's work, and when I see work requests with the newer client I don't get work. Fair enough. Glad it works the way you expect it to work.

Notwithstanding the way it works for you, I find the ask for GPU request that I see on POEM severe enough that it can empty its queue. When I downgrade to 6.4.5 -- the queue fills back up nicely. That approach while cumbersome (actually a PITA), works.

I understand that from your vantage point it is a WAD and won't be changed because you see it as the 'correct' approach. To me it seems that instead pinging the server for GPU work (when the project doesn't support GPU work at all), or pinging the server for CPU work (when the project doesn't support CPU work at all), or pinging the server for ATI GPU work when the workstation hardware is CUDA, either ping for all (and I HAVE seen that request in my messages) or look first at the workstation/client configuration and ask for the matching work (ie CUDA, ATI, or CPU).

I will admit to being a simpleton on this, and realize that my report here is getting consigned to a 'Dumb User Wastebin' -- so be it.

Rather than inundate you with the various actual reports I see in the messages side and from other (obviously dumb) users, I'll not waste my (and your) time on reporting this observation further.



As I've said before, the only thing you gain by degrading to v6.6.33 or earlier is ignorance (lack of information in the logs). And the only thing you gain by degrading to v6.4.5 is the inability to run both CPU and GPU tasks for the same project/application.

ID: 27578 · Report as offensive
BarryAZ

Send message
Joined: 4 Sep 09
Posts: 381
United States
Message 27579 - Posted: 25 Sep 2009, 5:20:46 UTC - in response to Message 27575.  

That might be what you want to do with 'default users' -- those who install, attach and don't do anything else. There used to be a LOT of those folks back in the old Seti@home days. To the extent that many of them are running BOINC today, I suspect they are using legacy clients and haven't changed much since they set up BOINC on their computers.

But these days, those remaining in the BOINC population as active include a fair number of folks who actually have hardware configurations matched to the projects they attach to. These folks (including myself) support multiple projects -- some of which support 'everything' (MW as an example = though their CUDA and ATI support is only double precision' or Collatz -- which supports the broadest combination out there as well as CPU), projects which specifically are GPU only (like GPUGrid), as well projects which are CPU only (like Climate, POEM, Spinhenge, Malaria and a number of others. The thing is, where a project supports multiple hardware configurations, the user can configure the account for GPU or CPU or GPU and CPU. And for those with a broad range of hardware they can configure Home, Work, School -- for different combinations.

I suppose on the project side, CUDA or ATI might be an additional choice that could help things there as well -- but ONLY if the client has the capability to do the account/local hardware check BEFORE it goes out to the project and asks. It just seems that while all the extra 'stuff' is going into the Client that some means to let the installed client glean that information and use it, thus reducing user observational frustration and project I/O traffic.

But as I said, I'm probably just simply too stupid to see things the right way here.



For this population, they not only


As I've said before, the only thing you gain by degrading to v6.6.33 or earlier is ignorance (lack of information in the logs). And the only thing you gain by degrading to v6.4.5 is the inability to run both CPU and GPU tasks for the same project/application.

ID: 27579 · Report as offensive
Richard Haselgrove
Volunteer tester
Help desk expert

Send message
Joined: 5 Oct 06
Posts: 5077
United Kingdom
Message 27580 - Posted: 25 Sep 2009, 11:31:56 UTC

BarryAZ,

Sorry if I came over a bit heavy last night - it was my last post before going to bed, and I was probably in too much of a rush and more tired than I realised.

One of the problems with BOINC development recently (as some of us see it) has been the "dumbing down" of the default message log. It used to display the exact number of seconds of work requested: then for a while it just said "to request work": finally, after much pressure from mo.v of CPDN and others, we got the present indicator for which resource (CPU, CUDA, ATI) is being 'topped up'. I have been trying to explain that the behaviour is much older, and it was only the information that was (re-)introduced with v6.6.36 (message 26309, 25520): I even used that line "the only thing you have gained by downgrading is ignorance" back in June! So to see it described as a 'bug ... starting with the 6.6.36 client' touched a raw nerve. Again, I'm sorry if it came over too heavy, but there's a danger that the 'bug' could become an urban myth and divert attention from the more fundamental re-working required in the client.

For what it's worth, I've also expressed concern about meaningless 'pings' to project servers: message 24379, trac [trac]#896[/trac]. But we really need feedback from project administrators who are prepared to examine their server logs, to see if the extra traffic from inappropriate work requests makes a significant difference to their server workloads. If it doesn't concern the projects, then - however messy it looks - it needn't concern the users either.

The theory is that work requests for inappropiate resources rapidly back off to an average of two per day from current clients, and with recent code changes (not yet widely deployed) that can be backed down to one request every 28 days by pro-active configuration changes to project servers. Given that I've recently come across several project admins who don't yet understand or know about the existing configuration tools (see the discussion about 'resend lost results' at GPUGrid!), I think that adding yet more new tools is a triumph for hope over expectation, but we'll have to see how it develops.

The 'work fetch backoffs' for active resources are reset to zero when a task finishes, but only for that resource, so (for instance) a CUDA task finishing at GPUGrid won't trigger a CPU request. But manually updating the project resets the backoff for all resources, which may be counter-productive. You made an interesting point over at GPUGrid about projects (POEM, Spinhenge) which impose a long communications deferral after any scheduler contact. That's another way that clients can starve themselves by asking for the wrong resource first, to add to 'the client which cries wolf' I described in 24379.

I don't know how long PEOM makes you wait between requests, or why they've chosen to introduce the delay: are the delays following the twice-daily GPU ping (actually, presumably now four times a day, twice for CUDA and twice for ATI) sufficient to explain your work drought with v6.10.7 on their own? Or could their be other mechanisms at play, like the changing definitions of long term debt and 'overworked'? You'll probably need to get deep down and dirty with "work fetch debug" logging flags before you can explain exactly where the current mechanism is breaking down, and that's a necessary first step before fixing it.

Overall, I agree with your general thrust that use/don't use resource switches should operate at the client level, project by project and under user control (I'm a great believer in giving users choice over how the resources they're donating are used), but it will take a concerted and well-documented effort to persuade David Anderson that this is the way forward.
ID: 27580 · Report as offensive
BarryAZ

Send message
Joined: 4 Sep 09
Posts: 381
United States
Message 27583 - Posted: 25 Sep 2009, 18:21:50 UTC - in response to Message 27580.  

OK -- and sorry about the testiness of my replies as well -- it seems there is that classic 80% of agreement.

With POEM -- it isn't so much the back off cycle (theirs is quite short - a couple of minutes and not progressive), but rather that subsequent requests by the client are STILL GPU only. POEM does not support GPU and like a number of projects does not appear to have the inclination or resources to develop a GPU application. With Spinhenge, the backoff cycle is a non-progressive 15 minutes. Curiously enough, the 6.10.x client doesn't appear to be repetitive about GPU requests there, just once and it reverts to CPU. Spinhenge is similarly not supporting or likely to support GPU.

With GPUGrid, I found it curious that the request was for CPU -- again, the project is a 'single mode' project -- GPU and for that matter CUDA GPU only. I have seen queries for ATI GPU and CPU work in the client messages on the workstation -- that's just wrong to my way of looking at things.

With the implementation of ATI GPU support (YES -- a GOOD thing), the matter gets a bit more complicated.

In my view, ideally, at the account/preferences/resource share and graphics settings one should have the capability (for the default and by the three available groups) to control use to cover all the options -- instead of the current use GPU if available (Yes/No), Use CPU (Yes/no), there should be a USE ATI GPU (yes/no), USE CUDA GPU (yes/no), Use CPU (Yes/no).

Different settings could be configured by the user for each group should they so wish (for example on Collatz which has the broadest support -- I'd have a use ATI GPU group and a use CUDA GPU group and so on and set my workstations to be part of the specific group which matches the hardware).

These preferences should then get downloaded to the specific workstation to the account_project file and read by the client as a control for the type of work fetch it should use. When first adding a computer to a project, it would pull the settings off the default configuration which then could be changes by the user by switching the computer to the appropriate group. (It might be a nice 'advanced' feature when joining a new computer to a project where you are an existing user to specify which group the computer belongs in at the outset.

The idea behind this is to have the client work fetch be targeted to getting work which matches the workstation configuration and not to waste time and cycles pinging the project servers -- many of which are currently stressed out with I/O traffic (SETI is not the only one running at (or over) the edge regarding I/O traffic). It would also calm the noise level of troublemakers like me (and Paul for that matter) <smile>.



I don't know how long POEM makes you wait between requests, or why they've chosen to introduce the delay: are the delays following the twice-daily GPU ping (actually, presumably now four times a day, twice for CUDA and twice for ATI) sufficient to explain your work drought with v6.10.7 on their own? Or could their be other mechanisms at play, like the changing definitions of long term debt and 'overworked'? You'll probably need to get deep down and dirty with "work fetch debug" logging flags before you can explain exactly where the current mechanism is breaking down, and that's a necessary first step before fixing it.

Overall, I agree with your general thrust that use/don't use resource switches should operate at the client level, project by project and under user control (I'm a great believer in giving users choice over how the resources they're donating are used), but it will take a concerted and well-documented effort to persuade David Anderson that this is the way forward.

ID: 27583 · Report as offensive
BarryAZ

Send message
Joined: 4 Sep 09
Posts: 381
United States
Message 27595 - Posted: 26 Sep 2009, 6:01:51 UTC

Richard -- got this earlier this evening -- workstation is Windows XP with a 9800GT. Collatz is configured for GPU only at the project level. GPU Grid is, of course, GPU (and CUDA) only. The code behind this sort of work fetch scenario is NOT RIGHT.

9/25/2009 9:34:21 PM Collatz Conjecture Sending scheduler request: To fetch work.
9/25/2009 9:34:21 PM Collatz Conjecture Requesting new tasks for CPU < !!!
9/25/2009 9:34:26 PM Collatz Conjecture Scheduler request completed: got 0 new tasks
9/25/2009 9:34:26 PM Collatz Conjecture Message from server: No work sent
9/25/2009 9:34:26 PM Collatz Conjecture Message from server: Your computer has no ATI GPU < !!!
9/25/2009 9:34:31 PM GPUGRID Sending scheduler request: To fetch work.
9/25/2009 9:34:31 PM GPUGRID Requesting new tasks for CPU <!!!
9/25/2009 9:34:36 PM GPUGRID Scheduler request completed: got 0 new tasks
9/25/2009 9:34:36 PM GPUGRID Message from server: No work sent
9/25/2009 9:34:36 PM GPUGRID Message from server: ACEMD beta version is not available for your type of computer.
9/25/2009 9:34:36 PM GPUGRID Message from server: Full-atom molecular dynamics on Cell processor is not available for your type of computer.
ID: 27595 · Report as offensive
Richard Haselgrove
Volunteer tester
Help desk expert

Send message
Joined: 5 Oct 06
Posts: 5077
United Kingdom
Message 27599 - Posted: 26 Sep 2009, 9:18:57 UTC

Have you ever tried using "Work Fetch Debug"? You get something like this - it's an old one, but still instructive, and you'll see how much information you can get.

07-Jun-2009 17:56:08 [---] [wfd] ------- start work fetch state -------
07-Jun-2009 17:56:08 [---] [wfd] target work buffer: 8640.00 + 21600.00 sec
07-Jun-2009 17:56:08 [---] [wfd] CPU: shortfall 0.00 nidle 0.00 est. delay 0.00 RS fetchable 200.00 runnable 200.00
07-Jun-2009 17:56:08 [climateprediction.net] [wfd] CPU: fetch share 0.00 debt 0.00 backoff dt 0.00 int 0.00 (no new tasks)
07-Jun-2009 17:56:08 [CPDN Beta] [wfd] CPU: fetch share 0.00 debt 0.00 backoff dt 0.00 int 86400.00 (no new tasks)
07-Jun-2009 17:56:08 [Einstein@Home] [wfd] CPU: fetch share 0.50 debt -51614.93 backoff dt 0.00 int 120.00
07-Jun-2009 17:56:08 [lhcathome] [wfd] CPU: fetch share 0.00 debt 0.00 backoff dt 79785.40 int 86400.00
07-Jun-2009 17:56:08 [SETI@home Beta Test] [wfd] CPU: fetch share 0.00 debt -244776.66 backoff dt 0.00 int 7680.00 (no new tasks) (overworked)
07-Jun-2009 17:56:08 [SETI@home] [wfd] CPU: fetch share 0.50 debt 0.00 backoff dt 0.00 int 0.00
07-Jun-2009 17:56:08 [---] [wfd] CUDA: shortfall 30240.00 nidle 1.00 est. delay 0.00 RS fetchable 100.00 runnable 0.00
07-Jun-2009 17:56:08 [climateprediction.net] [wfd] CUDA: fetch share 0.00 debt 0.00 backoff dt 0.00 int 86400.00 (no new tasks)
07-Jun-2009 17:56:08 [CPDN Beta] [wfd] CUDA: fetch share 0.00 debt 0.00 backoff dt 0.00 int 61440.00 (no new tasks)
07-Jun-2009 17:56:08 [Einstein@Home] [wfd] CUDA: fetch share 0.00 debt 0.00 backoff dt 2679.27 int 3840.00
07-Jun-2009 17:56:08 [lhcathome] [wfd] CUDA: fetch share 0.00 debt 0.00 backoff dt 37745.51 int 86400.00
07-Jun-2009 17:56:08 [SETI@home Beta Test] [wfd] CUDA: fetch share 0.00 debt 0.00 backoff dt 0.00 int 15360.00 (no new tasks)
07-Jun-2009 17:56:08 [SETI@home] [wfd] CUDA: fetch share 1.00 debt 0.00 backoff dt 0.00 int 0.00
07-Jun-2009 17:56:08 [climateprediction.net] [wfd] overall_debt 0
07-Jun-2009 17:56:08 [CPDN Beta] [wfd] overall_debt 0
07-Jun-2009 17:56:08 [Einstein@Home] [wfd] overall_debt -51615
07-Jun-2009 17:56:08 [lhcathome] [wfd] overall_debt 0
07-Jun-2009 17:56:08 [SETI@home Beta Test] [wfd] overall_debt -244777
07-Jun-2009 17:56:08 [SETI@home] [wfd] overall_debt 0
07-Jun-2009 17:56:08 [---] [wfd] ------- end work fetch state -------

In your case, you'll have had a CPU shortfall and no CUDA shortfall - the other way round - but the principle is the same.

What it's doing is saying 'my CPU is hungry - where can I get some nibbles?', and then going through *every* project to consider whether it's allowed even to ask.

It *won't* ask if comms are deferred, NNT set, or backoff dt > 0.00 (mybe others too). Anything else is 'fetchable'.

So, it tried Collatz. Any nibbles? No? *shrugs and moves on*.
It tried GPUGrid. Any nibbles? No? *shrugs and moves on*.

And so on, in debt order. It'll keep asking until it finds one with the right sort of nibbles - though there is the concept of "overworked" ('oh no, not fish *again* - I'm fed up with fish - I won't ask there until I'm really starving')

Think of it as a shambolic bear of very little brain. It'll keep turning over the trashcans, even the ones that didn't have any food scraps yesterday, and won't have any tomorrow.

Meanwhile, back with BOINC, if there is no CUDA shortfall (cache full), you won't see any CUDA requests.

The question is, does any of this matter? Rom's famous "The evils of returning results immediately" blog implied that the database overheads involved in processing a scheduler RPC were high, and to be avoided weherever possible. But that was almost three years ago, and a lot of code has flowed under the bridge since then. Maybe the simple 'Got CUDA? No." interaction can happen without touching the user / host / workunit / result tables in the database (though I doubt it - at least the RPC count and 'last contact' fields for the host get updated). As I said yesterday, it's really a question for the projects: can their servers handle four extra 'pings' per host, per day? If that question has been asked, and answered in the negative, then it really shouldn't be a problem for us. But I suspect it hasn't been asked.
ID: 27599 · Report as offensive
BarryAZ

Send message
Joined: 4 Sep 09
Posts: 381
United States
Message 27607 - Posted: 26 Sep 2009, 17:48:30 UTC - in response to Message 27599.  

Interesting explanation -- so, (I think this is what you are saying) when a particular workstation 'senses' it needs more CPU work, it broadcasts its request to each and every attached project until it completes its mission. The thing is, I had more than those two projects on that workstation including a couple of CPU only projects which were going 'short queue' and they were not pinged. Instead, two GPU only projects were pinged.

Further to that, in the middle of that 'get CPU tasks' process, I got a 'Get ATI GPU' ping from this >9800GT workstation.<.

It seems to me that if the client detects 'CPU hunger' it should be capable, using locally stored information, to go specifically after CPU project feeders. Likewise if it detects 'GPU hunger', it should be capable, again using locally stored information, to go specifically after GPU project feeders. Lastly, if it detects 'GPU Hunger' it should be capable, again using locally stored information which is re-established each time the client starts up, to know whether it is looking for ATI GPU or Cuda GPU work.

As to pinging the server extra times -- some day you might tramp thru the swamp of traffic over in SETI regarding Server I/O issues and that does to that project -- one which has far more resources than most other projects.
ID: 27607 · Report as offensive
Richard Haselgrove
Volunteer tester
Help desk expert

Send message
Joined: 5 Oct 06
Posts: 5077
United Kingdom
Message 27616 - Posted: 26 Sep 2009, 21:24:28 UTC - in response to Message 27607.  

Interesting explanation -- so, (I think this is what you are saying) when a particular workstation 'senses' it needs more CPU work, it broadcasts its request to each and every attached project until it completes its mission. The thing is, I had more than those two projects on that workstation including a couple of CPU only projects which were going 'short queue' and they were not pinged. Instead, two GPU only projects were pinged.

Further to that, in the middle of that 'get CPU tasks' process, I got a 'Get ATI GPU' ping from this >9800GT workstation.<.

It seems to me that if the client detects 'CPU hunger' it should be capable, using locally stored information, to go specifically after CPU project feeders. Likewise if it detects 'GPU hunger', it should be capable, again using locally stored information, to go specifically after GPU project feeders. Lastly, if it detects 'GPU Hunger' it should be capable, again using locally stored information which is re-established each time the client starts up, to know whether it is looking for ATI GPU or Cuda GPU work.

Did you pick up on the reference to "overworked" (analogy: too much fish)? If any of your CPU projects are in this state, they won't get "pinged" for work until the last possible moment - about five minutes before the last CPU task is expected to finish.

If you don't want to go down the full "work fetch debug" route, you can get a snapshot of what's going on by looking at the 'projects' tab of BOINC Manager (advanced view), and clicking the 'properties' button for each project in turn. You'll see 'Work fetch priority' for both CPU and GPU. If these are zero, it'll ping for work in the first cycle (unless deferred: that will also be shown). If the priority is negative but small, it'll ping for work as soon as the obvious candidates have been tested and failed. If it's below about -100,000, it's "overworked" and won't be pinged until the last moment.

You may be interested to know that the parallel discussion with Paul Buck started on the boinc_dev mailing list has unearthed some interesting insights. I had assumed, from observation, that the current 'use CPU' and 'use GPU' settings were being applied on project servers only, and not transmitted to clients. It turns out that they are being sent out to the clients, but included in a "project specific" bundle with things like screensaver colours and movement rates, which BOINC itself doesn't get involved in but simply passes on to the project science application.

Largely as a result of this conversation, I have argued that more settings should be moved to a grouping analogous to "Resource Share": known and understood by the BOINC client across all projects, but with individual values applied separately to each individual project. That seems to have found favour:

I agree.  I'll do this soon
(just the "no CPU" and "no GPU" prefs for now).
-- David

Don't hold your breath - it'll be complicated, and take a while - but there is hope on the horizon.

As to pinging the server extra times -- some day you might tramp thru the swamp of traffic over in SETI regarding Server I/O issues and that does to that project -- one which has far more resources than most other projects.

Been there, done that. Did you see my post in Technical News yesterday, reporting 6,377 upload failures on 1,336 finished tasks over a 102 hour period? :-P
ID: 27616 · Report as offensive
BarryAZ

Send message
Joined: 4 Sep 09
Posts: 381
United States
Message 27621 - Posted: 26 Sep 2009, 21:53:57 UTC - in response to Message 27616.  

OK -- like I said -- 80% or more in agreement -- if it could get implemented.

I have seen those various numbers for project properties in the 6.10.x client -- I wondered where they get that information. Strikes that if it is getting used for work fetch choices, at least some of that could be picked up from user configuration choices (like the resource share you suggested).

Still, I've one unanswered question in particular, why, in the middle of a 'CPU fetch' routine did I get a Fetch ATI GPU call (to a project with no GPU support) on a 9800GT system? That seems at best just silly.

ID: 27621 · Report as offensive
Richard Haselgrove
Volunteer tester
Help desk expert

Send message
Joined: 5 Oct 06
Posts: 5077
United Kingdom
Message 27622 - Posted: 26 Sep 2009, 22:09:17 UTC - in response to Message 27621.  

Still, I've one unanswered question in particular, why, in the middle of a 'CPU fetch' routine did I get a Fetch ATI GPU call (to a project with no GPU support) on a 9800GT system? That seems at best just silly.

Almost certainly just a random backoff timer decrementing to zero - one of the twice-daily pings to see if they've written an ATI app in the last 12 hours :-)

But an ATI ping on an nVidia system? You've got me there. Sounds like a bug report to boinc_alpha is indicated if you can track down the circumstances. [In particular - and being serious, for a moment - someone needs to enable [wfd] or [sched_op_debug], to see if it's really asking for work or just issuing a 'request' for 0.00 seconds, like the Einstein case I documented in April]
ID: 27622 · Report as offensive
BarryAZ

Send message
Joined: 4 Sep 09
Posts: 381
United States
Message 27626 - Posted: 27 Sep 2009, 4:31:58 UTC - in response to Message 27622.  

Below is a repost of the message I posted here earlier.


But an ATI ping on an nVidia system? You've got me there. Sounds like a bug report to boinc_alpha is indicated if you can track down the circumstances. [In particular - and being serious, for a moment - someone needs to enable [wfd] or [sched_op_debug], to see if it's really asking for work or just issuing a 'request' for 0.00 seconds, like the Einstein case I documented in April]


Richard -- got this earlier this evening -- workstation is Windows XP with a 9800GT. Collatz is configured for GPU only at the project level. GPU Grid is, of course, GPU (and CUDA) only. The code behind this sort of work fetch scenario is NOT RIGHT.

9/25/2009 9:34:21 PM Collatz Conjecture Sending scheduler request: To fetch work.
9/25/2009 9:34:21 PM Collatz Conjecture Requesting new tasks for CPU < !!!
9/25/2009 9:34:26 PM Collatz Conjecture Scheduler request completed: got 0 new tasks
9/25/2009 9:34:26 PM Collatz Conjecture Message from server: No work sent
9/25/2009 9:34:26 PM Collatz Conjecture Message from server: Your computer has no ATI GPU < !!!
9/25/2009 9:34:31 PM GPUGRID Sending scheduler request: To fetch work.
9/25/2009 9:34:31 PM GPUGRID Requesting new tasks for CPU <!!!
9/25/2009 9:34:36 PM GPUGRID Scheduler request completed: got 0 new tasks
9/25/2009 9:34:36 PM GPUGRID Message from server: No work sent
9/25/2009 9:34:36 PM GPUGRID Message from server: ACEMD beta version is not available for your type of computer.
9/25/2009 9:34:36 PM GPUGRID Message from server: Full-atom molecular dynamics on Cell processor is not available for your type of computer.
ID: 27626 · Report as offensive
1 · 2 · Next

Message boards : Questions and problems : boinc sometimes will only request GPU work.

Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.