Message boards : Questions and problems : GPU not receiving tasks when CPU computing disabled
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 · Next
Author | Message |
---|---|
Send message Joined: 30 Mar 20 Posts: 400 |
I have absolutely no problems getting iGPU tasks. I have always had CPU tasks set to NO. The same goben. I've been here on the BOINC site for a long time. Have you set all GPU settings to YES, even if you don't have any NVIDIA or AMD GPU's in that computer? Early in BETA Uplinger mentioned something about that. |
Send message Joined: 29 Apr 21 Posts: 50 |
I have absolutely no problems getting iGPU tasks. I have always had CPU tasks set to NO. Yes, I have all 4 settings under Graphics Card Usage to Yes. Meaning: Graphics Card Usage Do work on my graphics card while computer is in use? Yes Use my AMD/ATI graphics card if possible: Yes Use my Intel graphics card if possible: Yes Use my NVIDIA graphics card if possible: Yes If it was not clear earlier in the thread. There is only one thing I change in the project preferences to switch between getting the intel GPU tasks and not getting the intel GPU tasks: "Allow research to run on my CPU?" When set to No -> No gpu tasks are received When set to Yes-> intel gpu tasks are received |
Send message Joined: 30 Mar 20 Posts: 400 |
OK! Also early on in the BETA, both Richard and I found it very hard to start getting any iGPU tasks at one point. What we did then, was to set the settings in BOINC for "Store at least" and "Store up to an additional" to very low numbers. "Store at least" to 0.05 or 0.10 days, and "Store up to an additional" to 0.01 days. Then tasks for the iGPU started to flow. After it started working then it was possible to increase those numbers. Edit, added: But then of course iGPU tasks is not something you get at every request, since WCG has the strange idea that iGPU must be matched with another iGPU. There's a lot fewer iGPU's than NVIDIA and AMD by the looks of things. Same for NVIDIA and AMD. They must be matched with the same device type, |
Send message Joined: 29 Apr 21 Posts: 50 |
The thing is that I do not find it hard to get iGPU units. I always get them (up until the max # I have set) when CPU is enabled. When CPU is disabled I never get them. This is the case whether I have set 0.25+0.05 days or 0.75+0.75 days. If I have it set really low, like 0.10+0.01 days, this still occurs(other than being up until it gets to 0.11 days instead of the max # of tasks I have set for opn since the igpu takes a while to complete each igpu task.) Thus I take this as a server side issue. Please note that I only discovered this trying to replicate the problem that another wcg user was having getting iGPU units on his machine that only had an iGPU. This is because I only had CPU set to No on my machine with the nvidia + intel. Edit: changed "to run" to "to complete each igpu task" |
Send message Joined: 5 Oct 06 Posts: 5121 |
If it was not clear earlier in the thread. There is only one thing I change in the project preferences to switch between getting the intel GPU tasks and not getting the intel GPU tasks:I think that is possibly a specific WCG problem with the implementation of those controls: their way of transferring those choices from the web site to the scheduler will be unique to them (because their website is unique). But I still intend to look through the generic BOINC server code when I get a chance, but it'll be slow - I have open diagnosis calls to both Einstein and CPDN on the go at the moment. Edit - ah, I'd forgotten that WCG also has a user selection for 'max tasks in progress'. That's another complication: I was going to say 'unique to WCG', but I've recently seen it on 'traditional' BOINC projects too. It'll be a thread on this board - can anyone remind me while I finish breakfast? OK, you can stop looking: message 103817 |
Send message Joined: 30 Mar 20 Posts: 400 |
OK, got it. Then I'm at a loss, and I do not have any more ideas to come up with. :-( |
Send message Joined: 29 Apr 21 Posts: 50 |
If it was not clear earlier in the thread. There is only one thing I change in the project preferences to switch between getting the intel GPU tasks and not getting the intel GPU tasks:I think that is possibly a specific WCG problem with the implementation of those controls: their way of transferring those choices from the web site to the scheduler will be unique to them (because their website is unique). But I still intend to look through the generic BOINC server code when I get a chance, but it'll be slow - I have open diagnosis calls to both Einstein and CPDN on the go at the moment. OK, if it may be WCG specific, I will post there outlining the issue. For the max tasks: Yes, they do have that setting. I only run into it if the cache size is set higher than what the max tasks will allow. In case it is interesting, setting "OpenPandemics - COVID-19" under "Project Limits" on the profile page results in this in the account_www.worldcommunitygrid.org.xml file. <limited_app> <limited_app_id>92</limited_app_id> <limited_app_max_in_progress>25</limited_app_max_in_progress> </limited_app> <limited_app> <limited_app_id>94</limited_app_id> <limited_app_max_in_progress>25</limited_app_max_in_progress> </limited_app> Is there a way for me to edit this file and have BOINC use it to test having OPN1(I think 92) set to 1 and OPNG(I think 94) still set to 25? Just for testing to be able give more info to the WCG people. That way it could have CPU enabled but only get a max of 1 task at a time. Unless something (maybe "-1"?) could be put in to mean no tasks. Or probably better to remove 92 from: <apps_selected> <app_id>92</app_id> <app_id>94</app_id> <app_id>82</app_id> </apps_selected> OK, got it. It is ok. :) I mostly am trying to pin down where the issue is so that it is more likely to get fixed. |
Send message Joined: 5 Oct 06 Posts: 5121 |
Is there a way for me to edit this file and have BOINC use it to test having OPN1(I think 92) set to 1 and OPNG(I think 94) still set to 25? Just for testing to be able give more info to the WCG people. That way it could have CPU enabled but only get a max of 1 task at a time. Unless something (maybe "-1"?) could be put in to mean no tasks. Or probably better to remove 92 from:Yes, that should be possible - for testing, at least. But I think it might only work once: it can only have reached your machine via a scheduler request/reply, and it might be repeated in all subsequent replies. Have a look in sched_reply_[WCG].xml (root of BOINC data folder), and see what you can find in there. It might be safest to stop the client before making that change, and see what happens on restart. The value to use for 'no restriction' in BOINC is undefined. Sometimes, David has used 0, and sometimes he's used -1. There's even one file where he's used both within 15 lines of code. [https://github.com/BOINC/boinc/blob/master/lib/common_defs.h#L86]. I think he flipped somewhere around 2015, so 'recent' changes like this one (2016) probably use -1, if it's defined at all. I'll look. |
Send message Joined: 29 Apr 21 Posts: 50 |
Yes, that should be possible - for testing, at least. But I think it might only work once: it can only have reached your machine via a scheduler request/reply, and it might be repeated in all subsequent replies. Have a look in sched_reply_[WCG].xml (root of BOINC data folder), and see what you can find in there. The information from the account_ file is in the sched_reply file, which gets overwritten during each schedule request. So far I have not gotten it to use the changes. It seems to get overwritten on startup when it contacts the project for prefs. |
Send message Joined: 29 Apr 21 Posts: 50 |
Would it be correct to say that if I get it to use the locally modified account_ file and it starts getting the intel gpu tasks that it may be an issue with the client? |
Send message Joined: 5 Oct 06 Posts: 5121 |
Would it be correct to say that if I get it to use the locally modified account_ file and it starts getting the intel gpu tasks that it may be an issue with the client?I think it would be fair to say that if you got it to work - we don't know what the heck will happen! Your mission, should you choose to accept it, is to ... I'm not getting the feeling that this is a client problem. You're getting the right sort of figures for 'work request' in the event log, and I think we'd have noticed by now if those figures didn't also appear in sched_request (though that's something you could check). The problem seems to happen after you send the request, and before you get the reply - and the only thing between those two events is the server. |
Send message Joined: 29 Apr 21 Posts: 50 |
Would it be correct to say that if I get it to use the locally modified account_ file and it starts getting the intel gpu tasks that it may be an issue with the client?I think it would be fair to say that if you got it to work - we don't know what the heck will happen! Your mission, should you choose to accept it, is to ... I did get it to work using the locally modified account_ file. What was changed: no_cpu was changed from 0 to 1 app_id 92 was removed from apps_selected. opn1=92 I will have to revert to the account_ file and check the sched_request file. |
Send message Joined: 29 Apr 21 Posts: 50 |
So I saved the work request from using the account_ file from the server and then using the locally modified account_ file. It looks like the only significant difference is: non modified: scheduler request -> work_req_seconds is 0 scheduler request -> cpu_req_seconds is 0 modified: scheduler request -> work_req_seconds is >0 scheduler request -> cpu_req_seconds is >0 both have: coproc_intel_gpu->req_secs >0 I think the next thing that would be interesting is if scheduler request -> work_req_seconds was >0 and scheduler request -> cpu_req_seconds = 0. If so, I would take it as a client issue. If not, I would take it as a server issue. Do you have any thoughts/ideas? |
Send message Joined: 5 Oct 06 Posts: 5121 |
The interesting question is "why do we have separate lines for <work_req_seconds> and <cpu_req_seconds>?". For the project where I get my CPU tasks from, both are populated, and identical to the microsecond. I suspect the answer may be historical: when BOINC was first written, there was only one possible sort of work (GPUs couldn't compute), and maybe the wording was changed in newer servers? The alternative may (and probably should) have been retained for backward compatibility. I'll add that to my code reading list ('blame' is useful in these situations). The other thing I'll look at is the timing evidence from the modification datestamp on the various files - notably, request/reply and the account file. Edit - interesting post at https://www.worldcommunitygrid.org/forums/wcg/viewthread_thread,43386_offset,570#657527: Just mentioned that my GPU cruncher with 1660 Super, 280X and A12-9800 APU gets up to 150 NV wu's and almost non for the ATI/AMD. That's no real problem because I can handle this with a second Boinc instance.With a 50 per GPU limit, is the limit being applied globally? Does the NVidia card get all 150, and leave no quota for the other two? That makes some sense, because NVidia was the first GPU type to be handled by BOINC, followed by AMD and finally Intel. The requests are almost certainly handled in that order in the server code, but I'll check. |
Send message Joined: 5 Oct 06 Posts: 5121 |
The interesting question is "why do we have separate lines for <work_req_seconds> and <cpu_req_seconds>?".The 'historical legacy' answer seems to hold up. David Jan 10 2009 My 'account_[WCG]' file still has a datestamp of 28 April - so it's not re-written every time. Probably only when the setting on the server has changed. |
Send message Joined: 29 Apr 21 Posts: 50 |
The interesting question is "why do we have separate lines for <work_req_seconds> and <cpu_req_seconds>?". For the project where I get my CPU tasks from, both are populated, and identical to the microsecond. Yes, legacy as you pointed out. It appears to be related to the anonymous computing platform. It is in work_fetch.cpp If project is anonymous_platform, then it sets work_req_seconds to the highest of the req_secs. if project is anonymous platform, set the overall work req to the max of the requests of resource types for which we have versions. Otherwise projects with old schedulers won't send us work. I am guessing that WCG falls under the projects with old schedulers even though it is not anonymous_platform. Edit - interesting post at https://www.worldcommunitygrid.org/forums/wcg/viewthread_thread,43386_offset,570#657527: IIRC, the limit is 50 per GPU with a max of 200 per machine. People with multiple GPUs have pointed that out before.[url][/url] |
Send message Joined: 29 Apr 21 Posts: 50 |
if project is anonymous platform, set the overall work req to the max of the requests of resource types for which we have versions. Otherwise projects with old schedulers won't send us work. This seems to be held up by the results of my testing. If work_req_seconds is set to the highest of req_secs, then WCG correctly sends iGPU work units. This is without using the locally modified account_ file. |
Send message Joined: 29 Apr 21 Posts: 50 |
The interesting question is "why do we have separate lines for <work_req_seconds> and <cpu_req_seconds>?".The 'historical legacy' answer seems to hold up. If I am interpreting the code properly, the settings are sent with every sched_reply. The account_[WCG] file is overwritten if "project->gui_urls != old_gui_urls || update_project_prefs". update_projects_prefs is set if the venue has changed or if the sent project settings are different from the current project settings. See cs_scheduler.cpp |
Send message Joined: 5 Oct 06 Posts: 5121 |
IIRC, the limit is 50 per GPU with a max of 200 per machine. People with multiple GPUs have pointed that out before.Yes, we've got that one sorted. But I was drawing attention to "gets up to 150 NV wu's" in the post I quoted. He shouldn't have space for 150 tasks for a single NV GPU if they were allocated strictly '50 for the NV, 50 for the ATI, 50 for the APU'. It seems to '150 in total - first come, first served' - that's the effect we're chasing ("Why are there none left for tail-end Charlie?"). |
Send message Joined: 5 Oct 06 Posts: 5121 |
If I am interpreting the code properly, the settings are sent with every sched_reply. The account_[WCG] file is overwritten if "project->gui_urls != old_gui_urls || update_project_prefs". update_projects_prefs is set if the venue has changed or if the sent project settings are different from the current project settings. See cs_scheduler.cppThe alternative way of reading that is "don't send the settings if nothing's changed". I must have changed mine on Wednesday, but not since. They probably haven't considered the case of "user modified client record, so now it's different from what the server remembered" - maybe that's the mis-match that prompts your send. |
Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License,
Version 1.2 or any later version published by the Free Software Foundation.