Why doesn't Boinc schedule earlier deadlines first?

Author	Message
Keith Myers Volunteer tester Help desk expert Send message Joined: 17 Nov 16 Posts: 868	Message 95068 - Posted: 15 Jan 2020, 0:52:58 UTC - in response to Message 94963. That's one specific failure by a project developer that should have known better, not a general rule. In fact, I think the project in question may have lost the ability to write their own applications (staff turnover), and bought in a replacement app not originally designed to run under BOINC (they're now using the wrapper). @Richard, I don't know who wrote the new acemd3 app at GPUGrid. Thought it was Toni. Ostensibly, the reason given for the new app was to get out from under the yearly loss of license for the underlying application and failure to renew the license in time before the expiration. That always caused an upset in keeping tasks running. Now with the wrapper app they don't have to worry about a constant renewal of software licenses. Don't know whether the wrapper app prevents the task from restarting on a different type of card or whether the fault is the science app. Can other projects deploying the wrapper app successfully resume paused tasks on restart on different hardware? ID: 95068 ·

Richard Haselgrove Volunteer tester Help desk expert Send message Joined: 5 Oct 06 Posts: 5081	Message 95107 - Posted: 15 Jan 2020, 8:21:50 UTC - in response to Message 95068. That's one specific failure by a project developer that should have known better, not a general rule. In fact, I think the project in question may have lost the ability to write their own applications (staff turnover), and bought in a replacement app not originally designed to run under BOINC (they're now using the wrapper). @Richard, I don't know who wrote the new acemd3 app at GPUGrid. Thought it was Toni. Ostensibly, the reason given for the new app was to get out from under the yearly loss of license for the underlying application and failure to renew the license in time before the expiration. That always caused an upset in keeping tasks running. Now with the wrapper app they don't have to worry about a constant renewal of software licenses. Don't know whether the wrapper app prevents the task from restarting on a different type of card or whether the fault is the science app. Can other projects deploying the wrapper app successfully resume paused tasks on restart on different hardware? Multiple issues, pointing in multiple different directions, here. Seems I over-simplified them. 1) The licence. It needed renewing periodically, and the project failed to do that in a timely fashion. Buy a diary! But more seriously, it shows that they were using somebody else's software, and - presumably - paying them money for the privilege. Their software isn't home-written by the project. There are different ways of licensing external software - source code, precompiled library - and they may have switched models. I don't know. 2) Can't switch devices. We looked into that. The new Science app compiles CUDA code at startup, for the specific device it's running on. We could work round that, by forcing re-compile at restart. But the new app also bakes the device name into the checkpoint (restart data) file: that's the blocker. 3) Using the wrapper. Nothing to do with science and devices - the wrapper handles the communications between the science app and the BOINC client. Previously, this was done by linking the communications tool - the BOINC API library - into the science app at compile time, and calling the API functions directly when needed. That requires in-house code-level knowledge of BOINC (previously held by MJH, but I think he's left the project). The wrapper is a kludge for people who don't have access to the application source code, or who can't/won't learn how to program the API. Those are the separate straws in the wind. I hope that clarifies the separate indicators that led me to my conclusion. ID: 95107 ·

ProDigit Send message Joined: 8 Nov 19 Posts: 718	Message 95132 - Posted: 15 Jan 2020, 15:47:57 UTC - in response to Message 94964. Running a torrent download is VERY, VERY, VERY, VERY much simpler than performing the sort of calculations that most projects do. A torrent download is simply a bit string spilt into small segments, each segment has a unique identifier which includes a sequence number. Get all the bits and bolt them together in the right order and the job's done. Now most torrent tools set up an array for all the bits and populate it in sequence order as they go along, keeping a "vacant slots" array alongside so it knows what segments are missing, and as progress gets nearer completion so it gets faster to search this array, and so apparently raise the priority of the task. Now think about a science application, there are some pretty simple calculations, like an FFT, then some pattern matching and maybe some 2d-pattern recognition, 3d-shape matching, a bit of matrix rotation and inversion, result collation etc. And that is all being done on apparently random incoming data and returns a "sensible" result. All this has to be done in the correct sequence, and the input from one stage is the output from a previous stage (or several previous stages). Changing the "priority" during such a run is a very good way of getting corruption that in the data due to loss of synchronisation in processes. Even a humble single-core application may actually be running several threads which must be correctly synchronised to get the correct answer, and one thing changing priority can do is to upset the exact timing and sequencing. We're not talking about that, We're talking about priority ranking on nearly completed tasks. ID: 95132 ·

robsmith Volunteer tester Help desk expert Send message Joined: 25 May 09 Posts: 1283	Message 95134 - Posted: 15 Jan 2020, 15:55:18 UTC Stop changing your mind about what is being talked about - YOU raised the subject of Torrent priorities, and when someone responds explaining why one cannot compare Torrent priorities with the priorities used by other types of application YOU object. ID: 95134 ·

robsmith Volunteer tester Help desk expert Send message Joined: 25 May 09 Posts: 1283	Message 95136 - Posted: 15 Jan 2020, 16:02:46 UTC - in response to Message 95056. And as such has nothing to do with BOINC, but might have a lot to do with each and every project that uses computers to do calculations of any sort. Remember BOINC does none of the science work, it only provides an environment in which some science may be done. Agreed. Therefore it's on topic. Well, you've talked yourself into the self-contradictory stance - as it has nothing to do with BOINC then it should NOT be being discussed here, but on the PROJECT forum. ID: 95136 ·

Keith Myers Volunteer tester Help desk expert Send message Joined: 17 Nov 16 Posts: 868	Message 95144 - Posted: 15 Jan 2020, 17:59:51 UTC - in response to Message 95107. Multiple issues, pointing in multiple different directions, here. Seems I over-simplified them. 1) The licence. It needed renewing periodically, and the project failed to do that in a timely fashion. Buy a diary! But more seriously, it shows that they were using somebody else's software, and - presumably - paying them money for the privilege. Their software isn't home-written by the project. There are different ways of licensing external software - source code, precompiled library - and they may have switched models. I don't know. 2) Can't switch devices. We looked into that. The new Science app compiles CUDA code at startup, for the specific device it's running on. We could work round that, by forcing re-compile at restart. But the new app also bakes the device name into the checkpoint (restart data) file: that's the blocker. 3) Using the wrapper. Nothing to do with science and devices - the wrapper handles the communications between the science app and the BOINC client. Previously, this was done by linking the communications tool - the BOINC API library - into the science app at compile time, and calling the API functions directly when needed. That requires in-house code-level knowledge of BOINC (previously held by MJH, but I think he's left the project). The wrapper is a kludge for people who don't have access to the application source code, or who can't/won't learn how to program the API. Those are the separate straws in the wind. I hope that clarifies the separate indicators that led me to my conclusion. Thanks for the clarification Richard. Yes I remember now the discussion on the forum about the science app compiling the CUDA code at startup. Didn't remember the device name blocker though. The software that GPUGrid uses is licensed from Acellera. Hence the acemd application name. They had to keep renewing the Windows license through that company. https://www.acellera.com/products/molecular-dynamics-software-gpu-acemd/ Strange thing was/is Gianni De Fabritiis is the CEO of the company and one of the principals of GPUGrid. As you said couldn't somebody have put a license renewal date into a calendar or business planner. ID: 95144 ·

Jord Volunteer tester Help desk expert Send message Joined: 29 Aug 05 Posts: 15480	Message 95155 - Posted: 15 Jan 2020, 21:42:26 UTC 👁️ ID: 95155 ·

robsmith Volunteer tester Help desk expert Send message Joined: 25 May 09 Posts: 1283	Message 95156 - Posted: 15 Jan 2020, 21:57:13 UTC And the fact that what was implied, or said, is totally and utterly wrong? That shows that you do not understand what BOINC is about, and what it can and cannot do, or what it should, or should not do. If you want to change BOINC I suggest that you join the BOINC steering group, properly explain to them your ideas, and listen to their explanation, and don't be surprised if you are grilled on your ideas. Some may join the very long list of possible enhancements, and some may be rejected, such is the way of things.... ID: 95156 ·

ProDigit Send message Joined: 8 Nov 19 Posts: 718	Message 95162 - Posted: 15 Jan 2020, 22:45:29 UTC - in response to Message 95158. Last modified: 15 Jan 2020, 22:50:24 UTC And the fact that what was implied, or said, is totally and utterly wrong? That shows that you do not understand what BOINC is about, and what it can and cannot do, or what it should, or should not do. What's incorrect about saying that Boinc prioritizes tasks (as in which one gets to go first)? If you want to change BOINC I suggest that you join the BOINC steering group, properly explain to them your ideas, and listen to their explanation, and don't be surprised if you are grilled on your ideas. Some may join the very long list of possible enhancements, and some may be rejected, such is the way of things.... Best for us novices to discuss it in here, then someone technical like Richard can go ask :-) Jord, is that a red eye from drink or anger? Probably because someone got offended because he was totally off topic, and couldn't deal with it, and had to go all alpha on the forums. I wasn't impolite, just stating that whatever he talked about, had nothing to do with the topic at hand. Obviously people like you (Peter), who read posts, will understand it. I doubt Mr CAPS read any of what this topic is about. But back on topic, Task priorities! In a case scenario where a device is frequently turned off, finishing the task that is nearly finished should take priority above starting any other task (even if it's deadline is very soon). You'll probably run into situations where both tasks will expire, if the device is turned off for a few days or weeks; rather than lose only 1. ID: 95162 ·

robsmith Volunteer tester Help desk expert Send message Joined: 25 May 09 Posts: 1283	Message 95163 - Posted: 15 Jan 2020, 22:51:13 UTC That is not what was said or implied by prodigit's post about torrents -- he said they change priority during operation, which is correct for some torrent tools, and that BOINC should do the same. Actually that is a bit of a simplistic view in that the performance of a large torrent download appears to increase as it gets nearer completion, and a real change as the segment location and place routine has less work to do as the torrent gets towards the end (fewer empty slots to fill). Either of these may appear to be a change in priority. This is very different to the way BOINC operates. In "normal" mode there is a fixed sequence of tasks, first-in-first-out which applies to both task-swap and task end. There is a "panic" mode, which is that if BOINC notices that a that isn't running is probably not going to finish by its deadline it is promoted up the queue and gets run at either the next available task swap, or when a task completes. ID: 95163 ·

Jord Volunteer tester Help desk expert Send message Joined: 29 Aug 05 Posts: 15480	Message 95166 - Posted: 15 Jan 2020, 23:13:10 UTC - in response to Message 95158. Last modified: 15 Jan 2020, 23:19:29 UTC Jord, is that a red eye from drink or anger? That's me warning everyone that I am keeping an eye on things. But now no more, behave all. 😴 ID: 95166 ·

ProDigit Send message Joined: 8 Nov 19 Posts: 718	Message 95167 - Posted: 15 Jan 2020, 23:20:17 UTC - in response to Message 95166. Last modified: 15 Jan 2020, 23:20:35 UTC Jord, is that a red eye from drink or anger? That's me warning everyone that I am keeping an eye on things. Does that mean we are getting the task priorities sorted out? ;) ID: 95167 ·

robsmith Volunteer tester Help desk expert Send message Joined: 25 May 09 Posts: 1283	Message 95187 - Posted: 16 Jan 2020, 9:25:22 UTC What you are talking about is the "panic" mode - where a task that is in danger of not being completed in time gets promoted up the queue - and this was already mentioned in one of my posts last night. There is a known downside to this - if a project sends out tasks with gross under-estimates of its "FLOP count" then it's guessed run-time will be too short, couple that with a very close "return by" date while you have other tasks from other projects with reasonably accurate expected run-times BOINC can get very confused - particularly if another project does much the same thing at the same time. This is very much an issue with the project(s) concerned, they should be more realistic in both setting the target dates and the FLOPS count. Then there is the situation where a project sends out work with impossible run-time and return-date combinations - sending a task that will take 3 days, but a deadline in two days. No amount of queue jumping will find that extra day :-( ID: 95187 ·

Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.