Projects Running Past Deadline

Message boards : Questions and problems : Projects Running Past Deadline
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile ritterm
Avatar

Send message
Joined: 4 Jul 08
Posts: 82
United States
Message 18677 - Posted: 20 Jul 2008, 20:33:13 UTC

My BOINC manager got into a situation where two projects were running past their deadlines. Is this the way it's supposed to work? Is all that work for naught? If so, why did the manager start/resume work on those WUs?

I have since suspended those WUs. I'm wondering now if I should abort them or let them finish.

Thanks,

Mark
ID: 18677 · Report as offensive
Profile Jord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15480
Netherlands
Message 18679 - Posted: 20 Jul 2008, 21:07:57 UTC

Normally BOINC will try to get all tasks in by the deadline. It won't be able to do this if you constantly tell it to run which projects, or allow projects to fetch work while it really didn't want to get any as it deemed your work load to be too much already.

So, how many projects and which are you attached to?
How many CPUs/cores do you have?
Which tasks are running over their deadline?
How much time is still estimated on them, can you get them in before the 3rd person does? For if you can, you can still get credit for them.

BOINC won't automatically kill tasks that go over the deadline as there are projects out there that don't care much about the deadline. CPDN and QCN Alpha come to mind. You can keep on running those models till well after the deadline and in the case of CPDN still get credit. QCN doesn't do credits (yet).
ID: 18679 · Report as offensive
Profile ritterm
Avatar

Send message
Joined: 4 Jul 08
Posts: 82
United States
Message 18682 - Posted: 20 Jul 2008, 21:46:35 UTC - in response to Message 18679.  

So, how many projects and which are you attached to?


On the host in question, I'm running 12 projects and attached to all, although I don't have work for one of them.

How many CPUs/cores do you have?


The host has one T7200 2.0 GHz Core 2 Duo

Which tasks are running over their deadline? How much time is still estimated on them?


Only tasks from Milkway@home are over their deadline. They had been running for about 7 hours with an esitmated 8 hours to go when I suspended them (which was about 6 hours past their deadline). There's a third WU that hasn't started.

Can you get them in before the 3rd person does?


I'm not sure what you mean by "the 3rd person?

BOINC won't automatically kill tasks that go over the deadline as there are projects out there that don't care much about the deadline. CPDN and QCN Alpha come to mind. You can keep on running those models till well after the deadline and in the case of CPDN still get credit. QCN doesn't do credits (yet).


Since they're MW@H, do you have any experience indicating that I should abort at this point?

This is the first time I've run up against deadlines. I run BOINC on 3 hosts and recently they all got a batch of 6-10 MW@H WUs at the same time. Given the estimated run times and short deadlines, it was immediately apparent that all 3 would have to run virtually nothing but MW@H for 2-3 days to finish them. I did do a little run-time management on my own, but that was because I didn't want to run MW@H exclusively. Other than that, I don't do much tweaking - I let the BOINC manager do it's own thing. Also, I haven't changed resource allocations for the projects for quite a while.
ID: 18682 · Report as offensive
Profile Jord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15480
Netherlands
Message 18686 - Posted: 20 Jul 2008, 22:47:27 UTC - in response to Message 18682.  
Last modified: 20 Jul 2008, 23:44:16 UTC

Only tasks from Milkway@home are over their deadline. They had been running for about 7 hours with an esitmated 8 hours to go when I suspended them (which was about 6 hours past their deadline). There's a third WU that hasn't started.

OK, Milkyway changed their tasks from the small ones to very big ones, virtually overnight and without changing the deadline. These tasks run long on all computers. I had one and while the old ones took a mere 10 minutes, the new one ran for 10 hours.

Can you get them in before the 3rd person does?


I'm not sure what you mean by "the 3rd person?

When tasks go over the deadline, they are sent out again to a third computer. This is to make sure the work gets done true to quorum. if you in the mean time manage to get the task in before the third computer returns this task, you still get credit for it.

Check the tasks on the third computer where they've been sent to, if you can see how long it generally takes for these monsters to run on that system. If you doubt they get in on time, first set Milkyway to No New Tasks, then abort the ones you have and report them. You won't get credit, of course.

Then check their forums and wait until Travis is back and has changed the deadline, before allowing work fetch on this project again.
ID: 18686 · Report as offensive
John McLeod VII
Avatar

Send message
Joined: 29 Aug 05
Posts: 147
Message 18695 - Posted: 21 Jul 2008, 0:17:54 UTC - in response to Message 18686.  
Last modified: 21 Jul 2008, 0:19:20 UTC

Only tasks from Milkway@home are over their deadline. They had been running for about 7 hours with an esitmated 8 hours to go when I suspended them (which was about 6 hours past their deadline). There's a third WU that hasn't started.

OK, Milkyway changed their tasks from the small ones to very big ones, virtually overnight and without changing the deadline. These tasks run long on all computers. I had one and while the old ones took a mere 10 minutes, the new one ran for 10 hours.

Can you get them in before the 3rd person does?


I'm not sure what you mean by "the 3rd person?

When tasks go over the deadline, they are sent out again to a third computer. This is to make sure the work gets done true to quorum. if you in the mean time manage to get the task in before the third computer returns this task, you still get credit for it.

Check the tasks on the third computer where they've been sent to, if you can see how long it generally takes for these monsters to run on that system. If you doubt they get in on time, first set Milkyway to No New Tasks, then abort the ones you have and report them. You won't get credit, of course.

Then check their forums and wait until Travis is back and has changed the deadline, before allowing work fetch on this project again.

Milkyway also did not change the estimate of how much work was needed. This is a problem that the project has to correct. Cancel one of these two and let the other run to completion. Letting one run to completion will set the Duration Correction Factor for the project (or you can stop BOINC and hand edit the client_state.xml file to make the <duration_correction_factor> for Milkyway about 100 times what it currently is).

EDIT:

The real kicker that is causing the late returns is the lack of change in fpops_est, not the lack of change in the deadline.

BOINC WIKI
ID: 18695 · Report as offensive
Profile Jord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15480
Netherlands
Message 18697 - Posted: 21 Jul 2008, 0:30:17 UTC - in response to Message 18695.  

(or you can stop BOINC and hand edit the client_state.xml file to make the <duration_correction_factor> for Milkyway about 100 times what it currently is).

Wow... just checked mine. I still got it running on one of my hosts, the RDCF moved from around 0.91 to 51.4 after running just one of them big ones. ;-)
ID: 18697 · Report as offensive
Profile ritterm
Avatar

Send message
Joined: 4 Jul 08
Posts: 82
United States
Message 18699 - Posted: 21 Jul 2008, 0:43:43 UTC - in response to Message 18686.  
Last modified: 21 Jul 2008, 0:48:08 UTC

Check the tasks on the third computer where they've been sent to, if you can see how long it generally takes for these monsters to run on that system. If you doubt they get in on time, first set Milkyway to No New Tasks, then abort the ones you have and report them. You won't get credit, of course.


Hmm...Checking the tasks I have at the MW website would seem to indicate that they've only been sent out to one other host. Am I just seeing the host that the WU got re-issued to because mine was past deadline? Or, is it possible that one is the only other one working it and I can "safely" continue to run it?
ID: 18699 · Report as offensive
Profile ritterm
Avatar

Send message
Joined: 4 Jul 08
Posts: 82
United States
Message 18700 - Posted: 21 Jul 2008, 0:53:38 UTC - in response to Message 18697.  

Wow... just checked mine. I still got it running on one of my hosts, the RDCF moved from around 0.91 to 51.4 after running just one of them big ones. ;-)


I don't know what it *was*, but mine is over 54 right now.
ID: 18700 · Report as offensive
Profile Jord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15480
Netherlands
Message 18701 - Posted: 21 Jul 2008, 0:57:36 UTC - in response to Message 18699.  

Hmm...Checking the tasks I have at the MW website would seem to indicate that they've only been sent out to one other host. Am I just seeing the host that the WU got re-issued to because mine was past deadline?

No, sorry, my mistake for just typing things without checking them first. The minimum quorum on Milkyway is 1, with initial replication also being 1. So the second host you see is the one it's been resent to.
ID: 18701 · Report as offensive
Profile ritterm
Avatar

Send message
Joined: 4 Jul 08
Posts: 82
United States
Message 18703 - Posted: 21 Jul 2008, 1:36:52 UTC - in response to Message 18701.  
Last modified: 21 Jul 2008, 1:47:53 UTC

No, sorry, my mistake for just typing things without checking them first. The minimum quorum on Milkyway is 1, with initial replication also being 1. So the second host you see is the one it's been resent to.


Ah, okay, I see what you're looking at for quorum and replication in the WU details.

It looks like I got lucky on one of the three I have past due - it just finished and got what looks like full credit. It shows a little under 10 hours of CPU time, which is about 5 hours less than the expected time after working 50% of the WU. I guess they can run very fast near the end?

A few minutes later...Now my WU status at MW shows only two WUs - the one I finished a bit ago and another that I haven't done much work on. The one I have with over 8-hours invested - and is still running - is no longer listed. :-(
ID: 18703 · Report as offensive
Profile Jord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15480
Netherlands
Message 18704 - Posted: 21 Jul 2008, 1:50:34 UTC - in response to Message 18703.  

A few minutes later...Now my WU status at MW shows only two WUs - the one I finished a bit ago and another that I haven't done much work on. The one I have with over 8-hours invested - and is still running - is no longer listed. :-(

Yes, their database purge is measured in mere minutes, which is very annoying.
Hopefully they will fix that as well.
ID: 18704 · Report as offensive
Profile ritterm
Avatar

Send message
Joined: 4 Jul 08
Posts: 82
United States
Message 18705 - Posted: 21 Jul 2008, 3:14:58 UTC - in response to Message 18704.  

Thanks for all the great info...learning a lot here. :-)
ID: 18705 · Report as offensive

Message boards : Questions and problems : Projects Running Past Deadline

Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.