Is it normal to fulfill all requests but from idle device?

Message boards : Questions and problems : Is it normal to fulfill all requests but from idle device?
Message board moderation

To post messages, you must log in.

AuthorMessage
Raistmer

Send message
Joined: 9 Apr 06
Posts: 302
Message 52558 - Posted: 15 Feb 2014, 11:13:10 UTC

2/15/2014 3:09:24 PM | SETI@home | Requesting new tasks for CPU and NVIDIA and intel_gpu
2/15/2014 3:09:24 PM | SETI@home | [sched_op] CPU work request: 206247.11 seconds; 0.00 devices
2/15/2014 3:09:24 PM | SETI@home | [sched_op] NVIDIA work request: 51163.15 seconds; 0.00 devices
2/15/2014 3:09:24 PM | SETI@home | [sched_op] intel_gpu work request: 181440.00 seconds; 1.00 devices
2/15/2014 3:09:27 PM | SETI@home | Scheduler request completed: got 25 new tasks
2/15/2014 3:09:27 PM | SETI@home | [sched_op] Server version 703
2/15/2014 3:09:27 PM | SETI@home | Project requested delay of 303 seconds
2/15/2014 3:09:27 PM | SETI@home | [sched_op] estimated total CPU task duration: 53254 seconds
2/15/2014 3:09:27 PM | SETI@home | [sched_op] estimated total NVIDIA task duration: 38162 seconds
2/15/2014 3:09:27 PM | SETI@home | [sched_op] estimated total intel_gpu task duration: 0 seconds
ID: 52558 · Report as offensive
Profile Jord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15483
Netherlands
Message 52559 - Posted: 15 Feb 2014, 11:22:42 UTC - in response to Message 52558.  

Mind telling us which BOINC this is?
But in essence, it's the client asking for work for device X, and the server not giving any. That could be a problem with how BOINC asks for the work, but also be with how the server answers. You've asked the same thing at Seti, right?
ID: 52559 · Report as offensive
Richard Haselgrove
Volunteer tester
Help desk expert

Send message
Joined: 5 Oct 06
Posts: 5082
United Kingdom
Message 52560 - Posted: 15 Feb 2014, 12:16:31 UTC - in response to Message 52559.  

Yes, we did discuss and analyse this situation in some detail at SETI: message 1471467 and following.

There are three elements to the problem:

1) Human psychology
2) SETI-specific project management
3) BOINC code objectives

With the current server code, (1) and (2) combine to produce the effect Raistmer has illustrated. SETI, as a project, has chosen to limit the number of 'tasks in progress' for any one host: human psychology seems to dictate that people ask for as much work as possible. We started the discussion at SETI with the work request set at 15 days: we're now down to 2.1 days, but that's probably still more than the project 'in progress' limits will allow. Let's leave the rest of that conversation for the project board.

But BOINC code objectives are appropriate for discussion here.

As we all know, BOINC comes in two parts.
a) A client, which requests work
b) A scheduler (on the server) which allocates work

In the client code, there's a strong emphasis (among many others) on finding work for an idle resource. In the server code, the emphasis seems to be on filling requests from the fastest device first. [I say 'seems': I haven't walked the code. It may be simply that the scheduler plods through the available applications in sequential, index, order, and fills the request dumbly as it encounters tasks and applications which match]

But it certainly appears that the server does nothing to fulfill the client priority of avoiding idle resources: I see no indication - even at Einstein, where the server logs are accessible - that the "1.00 devices" in the work request is used to prioritise the server actions.

In an integrated client/server system with shared objectives, surely it should?
ID: 52560 · Report as offensive
Profile Jord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15483
Netherlands
Message 52561 - Posted: 15 Feb 2014, 12:36:17 UTC - in response to Message 52560.  

But it certainly appears that the server does nothing to fulfill the client priority of avoiding idle resources: I see no indication - even at Einstein, where the server logs are accessible - that the "1.00 devices" in the work request is used to prioritise the server actions.

In an integrated client/server system with shared objectives, surely it should?

Well, in the case of Einstein, they run an old revision of the server software. Are you sure that it knows how to handle the priority/fastest device first requests from the client?

I don't know for sure what other project there is that has applications for all GPU classes, has work for them and uses one of the latest BOINC back-end versions, but for Seti. And there you're hampered by maximum 100 CPU, 100 GPU tasks (no matter the amount of CPUs or GPUs you have).
ID: 52561 · Report as offensive
Raistmer

Send message
Joined: 9 Apr 06
Posts: 302
Message 52563 - Posted: 15 Feb 2014, 14:37:27 UTC
Last modified: 15 Feb 2014, 14:41:06 UTC

BOINC 7.2.33, recommended one.

Richard, it's not the case we discussed before.
Here one can see 25 recived tasks. All tasks are SETI MB, Intel GPU requests SETI MB too, so, 25 recived/was available...but none for idle device.
It's not human psychology IMO, it's bug in BOINC scheme of work fetch.

EDIT: I'm trying to get balance on that hostto have as much work as possible within 200 tasks limit for all devices ( cause host works with netword disabled most of time). Balance was reached (with only ~10 Intel GPU tasks in cache ) but then SETI started to give more shorties hence again 200 limits hits. But nevertheless, to provide as much as 25 tasks but ignore idle device.. it's plain wrong fetch decision.
ID: 52563 · Report as offensive
Raistmer

Send message
Joined: 9 Apr 06
Posts: 302
Message 52564 - Posted: 15 Feb 2014, 14:49:30 UTC - in response to Message 52563.  
Last modified: 15 Feb 2014, 14:50:19 UTC

And next work fetch log:

2/15/2014 6:47:17 PM | SETI@home | Reporting 4 completed tasks
2/15/2014 6:47:17 PM | SETI@home | Requesting new tasks for CPU and NVIDIA and intel_gpu
2/15/2014 6:47:17 PM | SETI@home | [sched_op] CPU work request: 174020.43 seconds; 0.00 devices
2/15/2014 6:47:17 PM | SETI@home | [sched_op] NVIDIA work request: 12601.86 seconds; 0.00 devices
2/15/2014 6:47:17 PM | SETI@home | [sched_op] intel_gpu work request: 172800.00 seconds; 1.00 devices
2/15/2014 6:47:19 PM | SETI@home | Scheduler request completed: got 4 new tasks
2/15/2014 6:47:19 PM | SETI@home | [sched_op] Server version 703
2/15/2014 6:47:19 PM | SETI@home | Project requested delay of 303 seconds
2/15/2014 6:47:19 PM | SETI@home | [sched_op] estimated total CPU task duration: 11195 seconds
2/15/2014 6:47:19 PM | SETI@home | [sched_op] estimated total NVIDIA task duration: 2128 seconds
2/15/2014 6:47:19 PM | SETI@home | [sched_op] estimated total intel_gpu task duration: 0 seconds


So, it's not coincidence, it the way BOINC suipplies work. idle device remains idle even when host recivers more tasks in ALLOWED cathegory (!)
ID: 52564 · Report as offensive
Raistmer

Send message
Joined: 9 Apr 06
Posts: 302
Message 52565 - Posted: 15 Feb 2014, 15:23:48 UTC - in response to Message 52559.  
Last modified: 15 Feb 2014, 15:25:05 UTC

You've asked the same thing at Seti, right?

As I answered to Richard - not quite the same. I hope difference in that situation and current one is clear. Currently BOINC fails to keep available device busy though it COULD by all applied settings. So, wrong decision at fetch, not just obey some limits (and to me flaw in server-side code, cause client done all it could, it asked for work for idle device and reported to server that device is idle).
ID: 52565 · Report as offensive
Raistmer

Send message
Joined: 9 Apr 06
Posts: 302
Message 52566 - Posted: 15 Feb 2014, 15:29:19 UTC
Last modified: 15 Feb 2014, 15:30:08 UTC

And one more log:

2/15/2014 7:19:42 PM | SETI@home | [sched_op] Starting scheduler request
2/15/2014 7:19:42 PM | SETI@home | Sending scheduler request: To fetch work.
2/15/2014 7:19:42 PM | SETI@home | Reporting 3 completed tasks
2/15/2014 7:19:42 PM | SETI@home | Requesting new tasks for CPU and NVIDIA and intel_gpu
2/15/2014 7:19:42 PM | SETI@home | [sched_op] CPU work request: 136706.06 seconds; 0.00 devices
2/15/2014 7:19:42 PM | SETI@home | [sched_op] NVIDIA work request: 4142.12 seconds; 0.00 devices
2/15/2014 7:19:42 PM | SETI@home | [sched_op] intel_gpu work request: 164160.00 seconds; 1.00 devices
2/15/2014 7:19:45 PM | SETI@home | Scheduler request completed: got 3 new tasks
2/15/2014 7:19:45 PM | SETI@home | [sched_op] Server version 703
2/15/2014 7:19:45 PM | SETI@home | Project requested delay of 303 seconds
2/15/2014 7:19:45 PM | SETI@home | [sched_op] estimated total CPU task duration: 7626 seconds
2/15/2014 7:19:45 PM | SETI@home | [sched_op] estimated total NVIDIA task duration: 4355 seconds
2/15/2014 7:19:45 PM | SETI@home | [sched_op] estimated total intel_gpu task duration: 12127 seconds


Finally I reduced cache to level where all devices were able to get some work.
But again, even this log demonstrates flaw in fetch logic.
Intel GPU device was the one who needs work most, nevertheless NV GPUs were granted with work too.

To sum up: there is no priority implemented in server side of BOINC code inside fetch decision though such priority required.
ID: 52566 · Report as offensive
Richard Haselgrove
Volunteer tester
Help desk expert

Send message
Joined: 5 Oct 06
Posts: 5082
United Kingdom
Message 52567 - Posted: 15 Feb 2014, 15:48:01 UTC - in response to Message 52563.  

BOINC 7.2.33, recommended one.

Richard, it's not the case we discussed before.
Here one can see 25 recived tasks. All tasks are SETI MB, Intel GPU requests SETI MB too, so, 25 recived/was available...but none for idle device.
It's not human psychology IMO, it's bug in BOINC scheme of work fetch.

Well, in the message I linked, you reported 36 tasks, and received 36 tasks in return. Later in the same thread, you reported 43 tasks, and got 43 tasks in return - I commented on that one, as potential evidence that the 'work in progress' limits were coming into play. The only difference this time is that you didn't include the section of the log that would have told us how many (if any) tasks were being reported. "Requesting new tasks for CPU and NVIDIA and intel_gpu" is identical in both cases.

it's plain wrong fetch decision.

I see nothing wrong with the fetch decision (the client request). I think it would be less ambiguous in English to describe it as a 'wrong supply decision', to emphasise that it's the server response that needs investigating in detail - as I think you accepted in your later post.
ID: 52567 · Report as offensive
Raistmer

Send message
Joined: 9 Apr 06
Posts: 302
Message 52571 - Posted: 15 Feb 2014, 17:51:31 UTC - in response to Message 52567.  


it's plain wrong fetch decision.

I see nothing wrong with the fetch decision (the client request). I think it would be less ambiguous in English to describe it as a 'wrong supply decision', to emphasise that it's the server response that needs investigating in detail - as I think you accepted in your later post.


Yes, supply decision is wrong.
ID: 52571 · Report as offensive

Message boards : Questions and problems : Is it normal to fulfill all requests but from idle device?

Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.