Resource share not being respected by BOINC

Message boards : Questions and problems : Resource share not being respected by BOINC
Message board moderation

To post messages, you must log in.

AuthorMessage
Sebh007

Send message
Joined: 21 Jun 20
Posts: 2
Message 99386 - Posted: 22 Jun 2020, 6:49:04 UTC

I have been running BOINC for years and am currently running 7.16.7 on a variety of Windows machines.

Until COVID-19, I have run CPDN exclusively, at which point I thought I should run some Rosetta to help identify a protein or similar more quickly. My resource share is set so that CPDN is supposed to get 99.90% of the resources and Rosetta should get 0.10% of the resources (at least that what is shown on the Projects tab).

In the individual Project Properties windows, the Scheduling priority for CPDN is shown as -0.01 and for Rosetta as -997.20.

In practice, since the end of March when I added the Rosetta project to the various machines, they have not run any CPDN at all, even though they have tasks waiting to run. On one of the machines I manually suspended all the Rosetta tasks to just make sure that any CPDN tasks waiting would actually run, and the CPDN ones do run until I re-enable all the Rosetta tasks at which point they switch to 'Waiting to run'.

The Project tab reports the CPDN Work done (over years) as being 8,300,790 and the Rosetta Work done as 2,850,662 since March. It reports the Avg.work done as 49.81 for CPDN and 11,970.46 for Rosetta.

Granted that the deadline for the CPDN tasks are months later than the Rosetta deadlines, but I would have thought that the weighting of the CPDN project would have overridden that? Perhaps I'm being naive.

All comments most welcome. Thanks.

Seb
ID: 99386 · Report as offensive
robsmith
Volunteer tester
Help desk expert

Send message
Joined: 25 May 09
Posts: 1283
United Kingdom
Message 99387 - Posted: 22 Jun 2020, 6:55:14 UTC

Just now CPDN doesn't have any tasks to send out, which does make it a bit difficult for BOINC to do any sensible resource sharing
ID: 99387 · Report as offensive
Bryn Mawr
Help desk expert

Send message
Joined: 31 Dec 18
Posts: 284
United Kingdom
Message 99388 - Posted: 22 Jun 2020, 8:12:26 UTC

How big a cache do you have? Does your cache mean that when you do get a CPDN job it is put on hold because the Rosetta jobs would not make their deadline?
ID: 99388 · Report as offensive
Profile Dave
Help desk expert

Send message
Joined: 28 Jun 10
Posts: 2518
United Kingdom
Message 99393 - Posted: 22 Jun 2020, 13:24:27 UTC

The problem is to do with the very long deadlines on the CPDN tasks and the fact that they run for a length of between about five days and over a month depending on the speed of your machines. The most recent Windows batches may be a little quicker than that. (I only run windows tasks when there are no Linux ones.) I don't know how short the deadlines are on the Rosetta tasks but if they will not finish if you wait for the CPDN tasks to finish then BOINC will notice this and give them priority. At some point it will decide enough is enough and not download more Rosetta tasks (unless you have free cores due to CPDN not having work.) Long term it should average out to what you want so long as you don't mess around with it as much as I do in which case you are stuck with micro managing it as constantly suspending tasks/projects so it does what you want messes up it's workings completely.

The long deadlines for CPDN are to some extent historic as there was a time when a fast machine could take nine months to finish a task and shorter deadlines meant CPDN hogging the computer time at the expense of other projects. My own view is CPDN should now shorten the deadlines to somewhere around the two month mark. (This could of course temporarily mess things up for those who have their settings based on the status quo.)
ID: 99393 · Report as offensive
robsmith
Volunteer tester
Help desk expert

Send message
Joined: 25 May 09
Posts: 1283
United Kingdom
Message 99394 - Posted: 22 Jun 2020, 13:38:52 UTC

Rosetta has selectable run times in the range 1-12 hours (I think that's the range in which you can set). Which is far shorter than those of CPDN which are measured in days. In part the reason for CPDN's long run times and hence long deadlines is to allow for slower machines, but also the tasks are designed to be able to safely swap out periodically, so they can sit dormant for days, or even weeks without causing any problems.
Resource sharing is a long term thing not an "instant average" - it can take weeks, if not months for BOINC to sort things out, particularly if you have a large work cache and one project has no work to send out. As has been said having a small cache work better than having a large one - I run 2 or 3 days plus an extra 0.01 which works for me (WCG + CPDN) provided there is a steady flow of work available from both projects. Just now (and for the last few weeks) CPDN hasn't been sending out work, I suspect this is in part down to the fact that the Uni where the servers are based (controlled from) is in lock-down just now.
ID: 99394 · Report as offensive
Profile Jord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15477
Netherlands
Message 99395 - Posted: 22 Jun 2020, 14:01:32 UTC

Rosetta's deadline is 3 days, but the scientists there want you to return the results ASAP as they're really waiting for them.
So when CPDN has 2 months as deadline, and BOINC already knows how long tasks from this project generally take, it's no wonder it puts them on hold and does the Rosetta's first. It may also not know yet how long these Rosetta tasks take - even tasks set for 2 hours can take 5 or more I noticed on my 3900x. Micromanaging to allow CPDN to continue doesn't help.

I wouldn't pair Rosetta against CPDN, because of the mess it will give due to the wildly differing deadlines.
ID: 99395 · Report as offensive
Sebh007

Send message
Joined: 21 Jun 20
Posts: 2
Message 99396 - Posted: 22 Jun 2020, 14:52:55 UTC - in response to Message 99395.  

Thanks all. That all makes complete sense however frustrating it might be.

I've only got one CPDN task across all machines currently and that has a deadline of 15th Feb 2021, so it's not going to compete for a long time methinks!

I shall micromanage until it's done I think - it's only another 17 days or so - and then leave BOINC to it's own devices. If there's no CPDN work available, then I'm not unduly bothered, but I just don't fancy getting on the wrong side of Greta Thunberg!

I might consider segregating the two projects on a machine by machine basis in the future, but then that would mean all that lovely resource going to waste on a CPDN-dedicated machine with no work.

Thanks again - much appreciated.
ID: 99396 · Report as offensive
Profile Dave
Help desk expert

Send message
Joined: 28 Jun 10
Posts: 2518
United Kingdom
Message 99397 - Posted: 22 Jun 2020, 15:19:11 UTC
Last modified: 22 Jun 2020, 15:29:28 UTC

Just now (and for the last few weeks) CPDN hasn't been sending out work, I suspect this is in part down to the fact that the Uni where the servers are based (controlled from) is in lock-down just now.


They have set up remote access for all the CPDN staff. I think it is more to do with the universities around the world who use Oxford to get their climate work done by crunchers. Most days I check to see if there are hints of work in the pipeline but the last lot of Windows work came without any hints unless you count a Trello card appearing a couple of hours or less beforehand and sometimes hints come a month before the actual work appears!

Edit:The last Windows batches didn't even have anything on the Trello cards. the last Linux ones did however.
ID: 99397 · Report as offensive
Les Bayliss
Help desk expert

Send message
Joined: 25 Nov 05
Posts: 1654
Australia
Message 99404 - Posted: 22 Jun 2020, 22:55:13 UTC - in response to Message 99396.  

I've only got one CPDN task across all machines currently and that has a deadline of 15th Feb 2021, so it's not going to compete for a long time methinks!
The problem with the long deadline, is that the researchers use a different method.
Once they have enough results back, they can close the batch of work, and then people just keep on processing it and accumulating zip uploads for no useful purpose.

All of which has been discussed on the cpdn message board.
ID: 99404 · Report as offensive

Message boards : Questions and problems : Resource share not being respected by BOINC

Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.