Message boards :
Questions and problems :
Is it normal to fulfill all requests but from idle device?
Message board moderation
Author | Message |
---|---|
Send message Joined: 9 Apr 06 Posts: 302 |
2/15/2014 3:09:24 PM | SETI@home | Requesting new tasks for CPU and NVIDIA and intel_gpu 2/15/2014 3:09:24 PM | SETI@home | [sched_op] CPU work request: 206247.11 seconds; 0.00 devices 2/15/2014 3:09:24 PM | SETI@home | [sched_op] NVIDIA work request: 51163.15 seconds; 0.00 devices 2/15/2014 3:09:24 PM | SETI@home | [sched_op] intel_gpu work request: 181440.00 seconds; 1.00 devices 2/15/2014 3:09:27 PM | SETI@home | Scheduler request completed: got 25 new tasks 2/15/2014 3:09:27 PM | SETI@home | [sched_op] Server version 703 2/15/2014 3:09:27 PM | SETI@home | Project requested delay of 303 seconds 2/15/2014 3:09:27 PM | SETI@home | [sched_op] estimated total CPU task duration: 53254 seconds 2/15/2014 3:09:27 PM | SETI@home | [sched_op] estimated total NVIDIA task duration: 38162 seconds 2/15/2014 3:09:27 PM | SETI@home | [sched_op] estimated total intel_gpu task duration: 0 seconds |
Send message Joined: 29 Aug 05 Posts: 15483 |
Mind telling us which BOINC this is? But in essence, it's the client asking for work for device X, and the server not giving any. That could be a problem with how BOINC asks for the work, but also be with how the server answers. You've asked the same thing at Seti, right? |
Send message Joined: 5 Oct 06 Posts: 5082 |
Yes, we did discuss and analyse this situation in some detail at SETI: message 1471467 and following. There are three elements to the problem: 1) Human psychology 2) SETI-specific project management 3) BOINC code objectives With the current server code, (1) and (2) combine to produce the effect Raistmer has illustrated. SETI, as a project, has chosen to limit the number of 'tasks in progress' for any one host: human psychology seems to dictate that people ask for as much work as possible. We started the discussion at SETI with the work request set at 15 days: we're now down to 2.1 days, but that's probably still more than the project 'in progress' limits will allow. Let's leave the rest of that conversation for the project board. But BOINC code objectives are appropriate for discussion here. As we all know, BOINC comes in two parts. a) A client, which requests work b) A scheduler (on the server) which allocates work In the client code, there's a strong emphasis (among many others) on finding work for an idle resource. In the server code, the emphasis seems to be on filling requests from the fastest device first. [I say 'seems': I haven't walked the code. It may be simply that the scheduler plods through the available applications in sequential, index, order, and fills the request dumbly as it encounters tasks and applications which match] But it certainly appears that the server does nothing to fulfill the client priority of avoiding idle resources: I see no indication - even at Einstein, where the server logs are accessible - that the "1.00 devices" in the work request is used to prioritise the server actions. In an integrated client/server system with shared objectives, surely it should? |
Send message Joined: 29 Aug 05 Posts: 15483 |
But it certainly appears that the server does nothing to fulfill the client priority of avoiding idle resources: I see no indication - even at Einstein, where the server logs are accessible - that the "1.00 devices" in the work request is used to prioritise the server actions. Well, in the case of Einstein, they run an old revision of the server software. Are you sure that it knows how to handle the priority/fastest device first requests from the client? I don't know for sure what other project there is that has applications for all GPU classes, has work for them and uses one of the latest BOINC back-end versions, but for Seti. And there you're hampered by maximum 100 CPU, 100 GPU tasks (no matter the amount of CPUs or GPUs you have). |
Send message Joined: 9 Apr 06 Posts: 302 |
BOINC 7.2.33, recommended one. Richard, it's not the case we discussed before. Here one can see 25 recived tasks. All tasks are SETI MB, Intel GPU requests SETI MB too, so, 25 recived/was available...but none for idle device. It's not human psychology IMO, it's bug in BOINC scheme of work fetch. EDIT: I'm trying to get balance on that hostto have as much work as possible within 200 tasks limit for all devices ( cause host works with netword disabled most of time). Balance was reached (with only ~10 Intel GPU tasks in cache ) but then SETI started to give more shorties hence again 200 limits hits. But nevertheless, to provide as much as 25 tasks but ignore idle device.. it's plain wrong fetch decision. |
Send message Joined: 9 Apr 06 Posts: 302 |
And next work fetch log: 2/15/2014 6:47:17 PM | SETI@home | Reporting 4 completed tasks 2/15/2014 6:47:17 PM | SETI@home | Requesting new tasks for CPU and NVIDIA and intel_gpu 2/15/2014 6:47:17 PM | SETI@home | [sched_op] CPU work request: 174020.43 seconds; 0.00 devices 2/15/2014 6:47:17 PM | SETI@home | [sched_op] NVIDIA work request: 12601.86 seconds; 0.00 devices 2/15/2014 6:47:17 PM | SETI@home | [sched_op] intel_gpu work request: 172800.00 seconds; 1.00 devices 2/15/2014 6:47:19 PM | SETI@home | Scheduler request completed: got 4 new tasks 2/15/2014 6:47:19 PM | SETI@home | [sched_op] Server version 703 2/15/2014 6:47:19 PM | SETI@home | Project requested delay of 303 seconds 2/15/2014 6:47:19 PM | SETI@home | [sched_op] estimated total CPU task duration: 11195 seconds 2/15/2014 6:47:19 PM | SETI@home | [sched_op] estimated total NVIDIA task duration: 2128 seconds 2/15/2014 6:47:19 PM | SETI@home | [sched_op] estimated total intel_gpu task duration: 0 seconds So, it's not coincidence, it the way BOINC suipplies work. idle device remains idle even when host recivers more tasks in ALLOWED cathegory (!) |
Send message Joined: 9 Apr 06 Posts: 302 |
You've asked the same thing at Seti, right? As I answered to Richard - not quite the same. I hope difference in that situation and current one is clear. Currently BOINC fails to keep available device busy though it COULD by all applied settings. So, wrong decision at fetch, not just obey some limits (and to me flaw in server-side code, cause client done all it could, it asked for work for idle device and reported to server that device is idle). |
Send message Joined: 9 Apr 06 Posts: 302 |
And one more log: 2/15/2014 7:19:42 PM | SETI@home | [sched_op] Starting scheduler request 2/15/2014 7:19:42 PM | SETI@home | Sending scheduler request: To fetch work. 2/15/2014 7:19:42 PM | SETI@home | Reporting 3 completed tasks 2/15/2014 7:19:42 PM | SETI@home | Requesting new tasks for CPU and NVIDIA and intel_gpu 2/15/2014 7:19:42 PM | SETI@home | [sched_op] CPU work request: 136706.06 seconds; 0.00 devices 2/15/2014 7:19:42 PM | SETI@home | [sched_op] NVIDIA work request: 4142.12 seconds; 0.00 devices 2/15/2014 7:19:42 PM | SETI@home | [sched_op] intel_gpu work request: 164160.00 seconds; 1.00 devices 2/15/2014 7:19:45 PM | SETI@home | Scheduler request completed: got 3 new tasks 2/15/2014 7:19:45 PM | SETI@home | [sched_op] Server version 703 2/15/2014 7:19:45 PM | SETI@home | Project requested delay of 303 seconds 2/15/2014 7:19:45 PM | SETI@home | [sched_op] estimated total CPU task duration: 7626 seconds 2/15/2014 7:19:45 PM | SETI@home | [sched_op] estimated total NVIDIA task duration: 4355 seconds 2/15/2014 7:19:45 PM | SETI@home | [sched_op] estimated total intel_gpu task duration: 12127 seconds Finally I reduced cache to level where all devices were able to get some work. But again, even this log demonstrates flaw in fetch logic. Intel GPU device was the one who needs work most, nevertheless NV GPUs were granted with work too. To sum up: there is no priority implemented in server side of BOINC code inside fetch decision though such priority required. |
Send message Joined: 5 Oct 06 Posts: 5082 |
BOINC 7.2.33, recommended one. Well, in the message I linked, you reported 36 tasks, and received 36 tasks in return. Later in the same thread, you reported 43 tasks, and got 43 tasks in return - I commented on that one, as potential evidence that the 'work in progress' limits were coming into play. The only difference this time is that you didn't include the section of the log that would have told us how many (if any) tasks were being reported. "Requesting new tasks for CPU and NVIDIA and intel_gpu" is identical in both cases. it's plain wrong fetch decision. I see nothing wrong with the fetch decision (the client request). I think it would be less ambiguous in English to describe it as a 'wrong supply decision', to emphasise that it's the server response that needs investigating in detail - as I think you accepted in your later post. |
Send message Joined: 9 Apr 06 Posts: 302 |
Yes, supply decision is wrong. |
Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License,
Version 1.2 or any later version published by the Free Software Foundation.