Message boards :
Questions and problems :
Bunch 'o Noob questions
Message board moderation
Author | Message |
---|---|
Send message Joined: 27 Jul 12 Posts: 4 |
All, I am trying to get my organization to use their spare cloud (Microsoft Azure) processing power to be used on Boinc. Due to the nature of how we use these instances, a lot of the processing power (possibly hundreds of cores at a time) go running, but unused. The problem is that these cloud servers get 'deleted' fairly often, and on schedule. The longest a cloud server will run is about 4-5 days, with shutdowns at night. The shorter ones run a day or less. When a cloud server is deleted, presumably, any work done on Boinc that has not been reported is lost, and any work queued to be done on Boinc is never going to be done - it should be assigned to someone else. So, a few questions: - What active projects have numerous 'short' tasks, so it is less likely that a deletion will cause the loss of a lot of work? - Can a single task use more than one core to complete faster (so, fewer half-done abandoned tasks)? - What happens to a queued task on a system that goes offline permanently? Does Boinc detect that they have been abandoned and re-assign them after a certain period? - Can I configure Boinc to report immediately on completion of a task to minimize the odds of its loss due to a cloud server deletion? - Is there a way to minimize the size of the pending task queue or eliminate it? - Is this kind of use appropriate for Boinc, or does it really just screw things up? -->Adam |
Send message Joined: 29 Aug 05 Posts: 15484 |
- What active projects have numerous 'short' tasks, so it is less likely that a deletion will cause the loss of a lot of work? I think you're looking at it from the wrong side here. As when you have short tasks, your cache is normally bigger than when you have longer lasting tasks. But in essence, one project that has has small tasks is Primegrid. - Can a single task use more than one core to complete faster (so, fewer half-done abandoned tasks)? Only if the project has multi-threading applications, such as Milkyway does. Otherwise it's one task per CPU core. - What happens to a queued task on a system that goes offline permanently? Does Boinc detect that they have been abandoned and re-assign them after a certain period? The tasks will pass their deadline and time out, then they'll be resent to other computers. - Can I configure Boinc to report immediately on completion of a task to minimize the odds of its loss due to a cloud server deletion? Depends on BOINC version. If you use BOINC 5 or 6, you can use the <report_results_immediately> option in a cc_config.xml file. Under BOINC 7.0 this has been changed, the RRI may be ignored and instead the normal rules for reporting will be used. - Is there a way to minimize the size of the pending task queue or eliminate it? The best way to run on a computer that gets deleted a lot is to run a minimal cache (0 days minimum + 0 days maximum buffer). Then any work done gets reported immediately already. - Is this kind of use appropriate for Boinc, or does it really just screw things up? It's possible to use BOINC this way. However, you may be getting angry PMs from people who think you lose work a lot. Some people think this stuff is really important and anyone doing it different than they do must be challenged or flogged. ;-) |
Send message Joined: 27 Jul 12 Posts: 4 |
Many thanks! That is very helpful. Are there any other 'small task size' projects? Under 6 hours would be good. Under 1 hour would enable us to use the WBT-training instances as well (those will be created-and-deleted more often, hours instead of days, but I just figure they won't work). Unfortunately, I think my corporation would rather go for a 'real' charity (medical research, scientific advancement), rather than PrimeGrid, which if I understand correctly, is just finding really big prime numbers. Although interesting, it's tough to sell as having a significant benefit to humanity. My logic behind short tasks was that on average, each time a server is deleted, some done work on the server will be lost. On average, we will lose half-a-task of work per CPU. If we reduce the size of a task, less lost work will result. If tasks are really big (over 10 hours), the server may go down before even completing one, and no work gets done. -->Adam |
Send message Joined: 29 Aug 05 Posts: 15484 |
Are there any other 'small task size' projects? Under 6 hours would be good. That highly depends on the processor, really. Or the GPU, if there is one. Mind giving out that information? Then it's easier to advise on which project(s) to run. |
Send message Joined: 27 Jul 12 Posts: 4 |
These are Microsoft Azure (or another cloud vendor) instances. They rent us a 4Core/7GB RAM instance, but don't provide details on what those CPUs are. They are supposed to be 1.6Ghz class or better cores. I am assuming the GPU will be weak, as these are intended as server instances. The OS is Windows Server 2008R2. I did find this from someone who ran CPU-z on an Azure cloud instance that indicates a 2.1Ghz AMD Opteron core. However: - There is no guarantee that the CPUs are consistent. - These instances are virtualized, and the real CPU may be obfuscated. - It was the Dublin datacenter, we will primarily use a North America datacenter. - The poster used one of their 'small' instances, which is single core. We would buy 'large' instances, which are quad-core. http://coderead.wordpress.com/2011/11/29/cpu-z-on-an-azure-compute-instance/ -->Adam |
Send message Joined: 29 Aug 05 Posts: 15484 |
OK, that gives a ball-park. Under 6 hour tasks can be found on just about every project out there, but for CPDN and some of the render projects. That said, you'll still have to try them, really. There's no real guarantee that an estimated 6 hour task will take 6 hours on a 1.6GHz CPU. They can take longer, they can take shorter. depends on the project and the application (optimization, geared towards using MMX/SSE/SSE2/SSE3 etc. or not). |
Send message Joined: 27 Jul 12 Posts: 4 |
Many thanks. That is very helpful, and I think I can get this to work. -->Adam |
Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License,
Version 1.2 or any later version published by the Free Software Foundation.