This seems to be a bug. Work fetch reporting "no tasks available" as "not highest priority"

Author	Message
marmot Send message Joined: 16 Sep 13 Posts: 82	Message 67853 - Posted: 17 Feb 2016, 20:21:00 UTC Last modified: 17 Feb 2016, 20:21:44 UTC <work_fetch_debug>1</work_fetch_debug> 2/17/2016 2:09:38 PM \| MindModeling@Beta \| [work_fetch] share 0.519 is the highest work_fetch share in the preparation list. The client comes back with: 2/17/2016 2:11:01 PM \| MindModeling@Beta \| [work_fetch] share 0.000 no applications and the server status does report no available work units. But the final result is: 2/17/2016 2:09:38 PM \| MindModeling@Beta \| Not requesting tasks: don't need (CPU: not highest priority project; NVIDIA GPU: ) instead of reporting: 2/17/2016 2:20:09 PM \| MindModeling@Beta \| Project has no tasks available which is what a manual update request returns with. ID: 67853 ·

Jord Volunteer tester Help desk expert Send message Joined: 29 Aug 05 Posts: 15480	Message 67957 - Posted: 22 Feb 2016, 9:39:01 UTC - in response to Message 67853. Best post it as an issue in https://github.com/BOINC/boinc/issues, there it'll get immediate attention from the developers, if need be. ID: 67957 ·

Richard Haselgrove Volunteer tester Help desk expert Send message Joined: 5 Oct 06 Posts: 5081	Message 67958 - Posted: 22 Feb 2016, 10:32:39 UTC Last modified: 22 Feb 2016, 10:35:53 UTC The log message "not highest priority" comes from your local client, not from the server. It didn't even ask the server for work - so it won't get any reply relating to work availability. ID: 67958 ·

marmot Send message Joined: 16 Sep 13 Posts: 82	Message 67967 - Posted: 22 Feb 2016, 18:32:00 UTC - in response to Message 67958. Last modified: 22 Feb 2016, 19:00:33 UTC The log message "not highest priority" comes from your local client, not from the server. It didn't even ask the server for work - so it won't get any reply relating to work availability. (Thanks for the response) 2/17/2016 2:09:38 PM \| MindModeling@Beta \| Not requesting tasks: don't need (CPU: not highest priority project; NVIDIA GPU: ) I understand (guess I forgot, my memory is decaying) comes from the REC calcs and prio weighted sorting. In this case the [work_fetch] share 0.519 was the highest of any in the entire list by 50%. But of the prio numbers NFS was at something like -373 while MM was at around -3.5 so work fetch share was irrelevant if the project can't beat the most negative prio. The NFS project is somehow getting outrageous prios and I do not know how to counteract it. Here is the current situation: 2/22/2016 11:37:55 AM \| MindModeling@Beta \| [work_fetch] share 0.324 . . . 2/22/2016 11:37:55 AM \| NFS@Home \| [work_fetch] share 0.001 2/22/2016 11:37:55 AM \| MindModeling@Beta \| [work_fetch] REC 3509.462 prio -1.989 can request work . . . 2/22/2016 11:37:55 AM \| NFS@Home \| [work_fetch] REC 1949.634 prio -263.383 can request work Resource shares assigned: MindModeling = 240 NFS = 001 There are open cores waiting for work and sitting idle because the client decides MindModeling is not the highest priority unless manually updated. Mindmodeling rarely gives any work, maybe 20,000 WU once a week. NFS has been sending a steady stream and this machine has been doing NFS WU's for 3 weeks steadily. In order to get any work on MM I have to manually update every few minutes to get past NFS lock on work. So is there a bug if the project's prio is about 2 to 3 orders of magnitude greater (abs()) than it should be? NFS prio should be down around -2.63 to -0.263 as it's already at 45% of maximum potential RAC while MindModeling reports -1.9 prio and is at 3-7% of potential RAC from days on end of no work. How does the NFS REC get such a high magnitude prio of -263 as it's credit payout is NOT 150 times greater than MindModeling? The expected credit per WU is about 3x greater as is the expected RAC over MindmModeling on this particular machine. BTW, I tried <rec_half_life_days>30</rec_half_life_days> over the last week and it hasn't had any noticeable effect. ID: 67967 ·

Richard Haselgrove Volunteer tester Help desk expert Send message Joined: 5 Oct 06 Posts: 5081	Message 67969 - Posted: 22 Feb 2016, 19:51:25 UTC - in response to Message 67967. So is there a bug if the project's prio is about 2 to 3 orders of magnitude greater (abs()) than it should be? NFS prio should be down around -2.63 to -0.263 as it's already at 45% of maximum potential RAC while MindModeling reports -1.9 prio and is at 3-7% of potential RAC from days on end of no work. How does the NFS REC get such a high magnitude prio of -263 as it's credit payout is NOT 150 times greater than MindModeling? The expected credit per WU is about 3x greater as is the expected RAC over MindmModeling on this particular machine. Prio is not abs() - it's straight maths. The highest possible priority is zero, everything negative is lower. And -263.383 is very much lower indeed. All of this is done on REC, not RAC - the project's actual payout is ignored, instead it's done on something a lot closer to the academic definition of a cobblestone. If MindModeling rarely has work, the chances of hitting it with a random fetch are very low. So most requests will get a response of 'no work available', and will trigger one of the many backoffs - this one per project, per resource. That appears lower down the <work_fetch_debug> log, and looks like 22/02/2016 19:49:52 \| FiND@Home \| [work_fetch] share 0.000 no applications (resource backoff: 740.17, inc 600.00) ID: 67969 ·

marmot Send message Joined: 16 Sep 13 Posts: 82	Message 67980 - Posted: 23 Feb 2016, 23:52:10 UTC - in response to Message 67969. So is there a bug if the project's prio is about 2 to 3 orders of magnitude greater (abs()) than it should be? NFS prio should be down around -2.63 to -0.263 as it's already at 45% of maximum potential RAC while MindModeling reports -1.9 prio and is at 3-7% of potential RAC from days on end of no work. How does the NFS REC get such a high magnitude prio of -263 as it's credit payout is NOT 150 times greater than MindModeling? The expected credit per WU is about 3x greater as is the expected RAC over MindmModeling on this particular machine. Prio is not abs() That was to indicate absolute magnitude of the value -263 and nothing to do with how it's calculated which has yet to be mentioned. abs(-263) is equivalent to \|-263\| = 263. My degrees are in mathematics and physics but most on here seem to be computer science and engineering backgrounds so I used the function call notation. - it's straight maths. The highest possible priority is zero, everything negative is lower. And -263.383 is very much lower indeed. And so is it a bug that that \|value\| is so large? That's the crux of my last post. The 'highest' possible priority mathematically is technically 0 which is > than all negative numbers but the project that is given highest priority over all others when a work fetch is made is the one with the lowest negative value prio as shown by how NFS gets priority over all other projects. If MindModeling rarely has work, the chances of hitting it with a random fetch are very low. So most requests will get a response of 'no work available', and will trigger one of the many backoffs - MindModeling immediately cancels backoffs and work is received from regular manual attempts. 2/23/2016 5:31:37 PM \| \| [work_fetch] No project chosen for work fetch 2/23/2016 5:31:39 PM \| \| [work_fetch] Request work fetch: Backoff ended for MindModeling@Beta Backoff's maybe the norm but are irrelevant in this situation. Actually, I've noticed other projects that also turn off the backoff feature. 1) So why is NFS is dominating at -263 prio? 2) Can a user turn off the REC feature or make adjustments so that work flow is more suited to their needs? I posted a link over at NFS in hopes that we can figure out why this REC for NFS is so much stronger in magnitude than other projects. ID: 67980 ·

Richard Haselgrove Volunteer tester Help desk expert Send message Joined: 5 Oct 06 Posts: 5081	Message 67981 - Posted: 24 Feb 2016, 0:04:07 UTC - in response to Message 67980. Too late in my time zone for a full discussion, but my understanding is that work is requested for a resource (CPU or GPU) from the project which is 'highest' priority (in the 'closest below zero' sense) for which work fetch is currently allowed. "currently allowed" covers a multitude of sins: Recent scheduler contact, project requested backoff --> disallowed NNT set by user --> disallowed Project has no apps for resource --> disallowed User deselected resource in project preferences --> disallowed Recent work request received 'no work available' response --> disallowed The last line is the killer, and 'recent' can be up to 24 hours ago. It might be helpful if you could post one single, complete [work_fetch_debug] log cycle from your machine, and we could work through it together, project by project, and work out why BOINC didn't request work until it reached NFS. ID: 67981 ·

marmot Send message Joined: 16 Sep 13 Posts: 82	Message 68098 - Posted: 3 Mar 2016, 5:39:32 UTC - in response to Message 67981. Recent work request received 'no work available' response --> disallowed The last line is the killer, and 'recent' can be up to 24 hours ago. Ah, that is a piece of relevant info I did not have This is certainly part of the problem for why Mind Modeling was losing to NFS as the server was sending out only small batches of WU's at a time and hitting 0 work available every few minutes. Why vLHC was failing is still baffling but instead of spending more time trying to figure out the issue I moved NFS to a couple of single core VM's and halted any other work for it on BOINC clients with shared projects and things appear to be running smoothly, for now. It might be helpful if you could post one single, complete [work_fetch_debug] log cycle from your machine, and we could work through it together, project by project, and work out why BOINC didn't request work until it reached NFS. If the problem arises again; will do. Your help is appreciated and thankyou for the responses. ID: 68098 ·

Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.