Projects work units wasting large amounts of time.

Message boards : Questions and problems : Projects work units wasting large amounts of time.
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile adrianxw
Avatar

Send message
Joined: 2 Oct 05
Posts: 400
Denmark
Message 88938 - Posted: 20 Nov 2018, 19:53:14 UTC

If a projects work units normally run to completion in just over an hour, but odd units suddenly start running for enormous amounts of time and private messages to the admin are unanswered, is there another way of contacting them?
Wave upon wave of demented avengers march cheerfully out of obscurity into the dream.
ID: 88938 · Report as offensive
robsmith
Volunteer tester
Help desk expert

Send message
Joined: 25 May 09
Posts: 1283
United Kingdom
Message 88943 - Posted: 20 Nov 2018, 20:25:35 UTC

Without knowing which project is concerned....
The content of tasks being delivered to you my have changed
The project administrator could be occupied with other work
He may be one that doesn't read his private messages very often
Are the tasks actually taking a very long time to complete, or is it that the estimated time to completion is wrong.
Some projects have a "back door" to the project administrator, but without knowing which project you are talking about nobody can point you in the right direction.
And a whole load more.
ID: 88943 · Report as offensive
Profile adrianxw
Avatar

Send message
Joined: 2 Oct 05
Posts: 400
Denmark
Message 88947 - Posted: 20 Nov 2018, 21:11:54 UTC - in response to Message 88943.  
Last modified: 20 Nov 2018, 21:36:52 UTC

The work units normally run from 0 to 100% in just over an hour on this machine, (4GHz i7 no overclock), then upload. My results page shows values from 4600 to 6401 seconds for example from jobs returned today. The one I noticed, and have suspended has elapsed 17:38:40 and shows 100% complete, remaining 00:00:00. Resuming it, the elapsed increases, but that is it. There are other crunchers in a thread on the message board that are reporting similar stories. Looking at the status page for the work unit I have suspended here, it has a download error, which I discount, and two "Timed out - no reply" jobs both sent 6 Nov. The implication is that these just ran on until they reached the cutoff point and were posted as no reply, they may still be running!. I checked my CPU usage, and when the job is running, I show all cores and threads running at 100%, so the implication is that the job has entered a loop from which it has no way out at termination. I've not aborted the wu, as it will simply go to someone else and run them for hours. I've sent a private message, but have no way of seeing if that has been read by Radim. The project is Asteroids.
Wave upon wave of demented avengers march cheerfully out of obscurity into the dream.
ID: 88947 · Report as offensive
Profile adrianxw
Avatar

Send message
Joined: 2 Oct 05
Posts: 400
Denmark
Message 88948 - Posted: 20 Nov 2018, 21:25:20 UTC - in response to Message 88947.  

The forum thread is here...

http://asteroidsathome.net/boinc/forum_thread.php?id=715
Wave upon wave of demented avengers march cheerfully out of obscurity into the dream.
ID: 88948 · Report as offensive
robsmith
Volunteer tester
Help desk expert

Send message
Joined: 25 May 09
Posts: 1283
United Kingdom
Message 88949 - Posted: 20 Nov 2018, 21:44:50 UTC

Two possible actions.
First "reset" the project concerned, this can trigger problematic tasks into finishing properly, albeit with a very long run time.
Second "abort" the task. It may be that it is a problematic one, or it might just be stalled on your PC, either way round the project will send it out to another user for whom it will either end normally or fail.
Many project administrators only work on the project a few hours a week, so it may take them a long time to actually get to be in a position where they can respond.
ID: 88949 · Report as offensive
Profile adrianxw
Avatar

Send message
Joined: 2 Oct 05
Posts: 400
Denmark
Message 88950 - Posted: 20 Nov 2018, 21:51:47 UTC - in response to Message 88949.  
Last modified: 20 Nov 2018, 21:59:16 UTC

What does resetting actually do?

max # of error/total/success tasks 7, 20, 20

This could affect a good number of people and waste a large number of hours before it dies by itself.

>>>
Many project administrators only work on the project a few hours a week, so it may take them a long time to actually get to be in a position where they can respond.
<<<

Hence my original question.
Wave upon wave of demented avengers march cheerfully out of obscurity into the dream.
ID: 88950 · Report as offensive
robsmith
Volunteer tester
Help desk expert

Send message
Joined: 25 May 09
Posts: 1283
United Kingdom
Message 88963 - Posted: 21 Nov 2018, 6:11:44 UTC
Last modified: 21 Nov 2018, 6:14:23 UTC

Sadly, until it dies by itself the majority of project teams will not interfere with a task that is running, or trying to run, just in case it does eventually run properly to completion.
Resetting a project clears various pointers and internal (invisible) caches used by the applications thus potentially helping them to run more smoothly.
If the project administrator is unreachable for whatever reason then alternative paths are less than likely to work.
Looking at the Asteroids forum there are a number of people having the same problem, indeed you have raised the question there. Looking at the various threads over there the "best advise" is to abandon any stuck tasks and hope that the administrator does realise there is a batch of damaged tasks around.

Edit-
It looks as if someone knows about this problem and is trying to resolve it: http://asteroidsathome.net/boinc/forum_thread.php?id=713&postid=6037#6037
ID: 88963 · Report as offensive
Profile adrianxw
Avatar

Send message
Joined: 2 Oct 05
Posts: 400
Denmark
Message 88964 - Posted: 21 Nov 2018, 6:26:27 UTC
Last modified: 21 Nov 2018, 7:22:28 UTC

Surely, aborting the job simply makes it available to someone else, who may waste days of CPU time on it. And then the next person, and the next.....

And resetting the project has killed the damn job and sent it back, exactly what I didn't want to do.
Wave upon wave of demented avengers march cheerfully out of obscurity into the dream.
ID: 88964 · Report as offensive
Profile Dave
Help desk expert

Send message
Joined: 28 Jun 10
Posts: 2515
United Kingdom
Message 88972 - Posted: 21 Nov 2018, 10:13:51 UTC - in response to Message 88964.  

Surely, aborting the job simply makes it available to someone else, who may waste days of CPU time on it. And then the next person, and the next.....


Projects tend to set a maximum number of times a task will be resent. For CPDN this is normally three. a _0, _1 or _2 indicate whether a task is a retread or not. For some projects such as CPDN many tasks fail due to computer problems rather than a problem with the task hence the reason for three or whatever goes at it.
ID: 88972 · Report as offensive
Profile adrianxw
Avatar

Send message
Joined: 2 Oct 05
Posts: 400
Denmark
Message 88973 - Posted: 21 Nov 2018, 13:30:29 UTC
Last modified: 21 Nov 2018, 13:36:54 UTC

Yes, I know, but his is set...

max # of error/total/success tasks 7, 20, 20

... and the problem causes tasks to run for ages, and they KNOW the problem exists. He is wasting many hours of crunchers CPU time. I have dropped the project from my machines. This is totally unacceptable. It should be black listed for at least a month.
Wave upon wave of demented avengers march cheerfully out of obscurity into the dream.
ID: 88973 · Report as offensive
Profile Dave
Help desk expert

Send message
Joined: 28 Jun 10
Posts: 2515
United Kingdom
Message 88974 - Posted: 21 Nov 2018, 13:36:42 UTC - in response to Message 88973.  

Yes, I know, but his is set...

max # of error/total/success tasks 7, 20, 20


Just as on CPDN where most of my crunching is, 3 seems a bit on the low side sometimes, especially for Linux tasks where there are lots of crunchers who don't know how to ensure they have all the requisite 32bit libs and so crash everything, that seems to me way over the top.
ID: 88974 · Report as offensive

Message boards : Questions and problems : Projects work units wasting large amounts of time.

Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.