Message boards : API : Can close-coupled workflow type application be deployed on a BOINC server?
Message board moderation
Author | Message |
---|---|
Send message Joined: 13 Dec 13 Posts: 21 |
we have an earth science work flow which can be divided into around ten sections. The ten sections are interdependent such that (for instance,)Section C's input should be Section A and B's output. And each section takes ~10 to ~90 minutes to process on regular workstations. Would this kind of project be efficient or even possible to deploy on a BOINC Server? As my understanding, BOINC will be very useful for computing tasks that are loosely coupled, i.e., the computing units should be quite independent on each other and the granule of the computing unit should be very light. |
Send message Joined: 4 Jul 12 Posts: 321 |
This is totally what BOINC can do. Each of your section would be a BOINC separate application. You start with generating work for Section A and B (which are independent) and then generate work for C using the output from A and B. For each application you have to write a work generator, validator and assimilator. The assimilators of Application A and B would put the output data someplace where the work generator of Application C can pick them up. A computer scientist from St. Jude Children's Research Hospital (Memphis, Tennessee) built a python based workflow tool for this use-case. I don't have a contact so you should ask on the boinc_projects mailinglist if he is still around. |
Send message Joined: 13 Dec 13 Posts: 21 |
Cool. Thanks! |
Send message Joined: 13 Dec 13 Posts: 21 |
Hi Christian, I am pretty new with BOINC. Thank you for your feedback. In our case, we have large data sets for each section of work flow. For instance, a single output file of Section A could be as large as 3GB. And the processing time could be more than an hour. We can not split each section into smaller pieces since we don't have the source code. Do you have more suggestions such that we can deploy the project more efficiently? |
Send message Joined: 4 Jul 12 Posts: 321 |
What do you mean with "as large as"? Is it normally smaller? Is this compressed or can you get some increase using compression? A 3GB result file is very bad for volunteer computing as the volunteer might not have a good upload connection. So this seems to be unsuitable for Volunteer Computing. Maybe you have to compute this step on your own and only parallelize the other sections. Volunteer Computing is only efficient when you can split the work in small parts. Big input files can also be handled but not big result files. |
Send message Joined: 13 Dec 13 Posts: 21 |
Thanks Christian. By 'as large as', I mean a typical output file size is ~3GB. As you said, it is not good for average volunteer computing environment. |
Send message Joined: 20 Nov 12 Posts: 801 |
Do you need the output from each part or just the last one? If just the last one couldn't you send one large task that consist of ten parts? This way the large output files would be just temporary files the host can remove after the task is completed. If each part takes about an hour (or two or ten...) total runtime would still be perfectly reasonable. |
Send message Joined: 5 Oct 06 Posts: 5130 |
I think the biggest problem is those 3GB upload files. This sort of project might work well in a closed community like a university campus, where you can rely on all users having high-speed (ideally gigabit) bi-directional ethernet connections. Some city-states with high fibre-optic penetration might also be a possibility. But the general volunteer community around the world would struggle with limited upload speeds. I suppose one extra question that needs to be asked is - roughly how many of these multipart jobs are you intending to process (and how quickly)? Do you think recruitment from a closed community pool could supply you with enough participants? |
Send message Joined: 13 Dec 13 Posts: 21 |
If all the clients are supposed to be in the same LAN, then it may not be such a big problem. Thanks for the sharing of your ideas! |
Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License,
Version 1.2 or any later version published by the Free Software Foundation.