Detection of stalled projects?

Message boards : Questions and problems : Detection of stalled projects?
Message board moderation

To post messages, you must log in.

AuthorMessage
ProDigit

Send message
Joined: 8 Nov 19
Posts: 718
United States
Message 97188 - Posted: 30 Mar 2020, 14:24:22 UTC

Does Boinc client/manager detect when a project has stalled?
reason being, I dislike the stock setting of switching tasks every 10 minutes.
Some CPU projects take 6 to 7 hours to finish, and want to configure this setting to 420 minutes, so that each project can run uninterrupted start to finish.

I just wonder if there are a lot of people reporting projects that stall?
Or if a project stalls, does the client catch this? (eg: if a project hasn't increased by 0.001% in 10 minutes, it gets restarted/disconnected)
I don't run beta projects btw.
ID: 97188 · Report as offensive
Profile Dave
Help desk expert

Send message
Joined: 28 Jun 10
Posts: 2515
United Kingdom
Message 97189 - Posted: 30 Mar 2020, 14:35:35 UTC - in response to Message 97188.  

Does Boinc client/manager detect when a project has stalled?
reason being, I dislike the stock setting of switching tasks every 10 minutes.
Some CPU projects take 6 to 7 hours to finish, and want to configure this setting to 420 minutes, so that each project can run uninterrupted start to finish.

I just wonder if there are a lot of people reporting projects that stall?
Or if a project stalls, does the client catch this? (eg: if a project hasn't increased by 0.001% in 10 minutes, it gets restarted/disconnected)
I don't run beta projects btw.


Longer CPDN tasks on older machines you are looking at at least 40 minutes for 0.001% so would get caught out by that if it existed.

In the past going back over ten years, CPDN used to have some tasks that would stall or loop without ever crashing or finishing so it certainly wasn't there then though it could be now but I doubt it.
ID: 97189 · Report as offensive
Profile Jord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15477
Netherlands
Message 97192 - Posted: 30 Mar 2020, 15:14:18 UTC - in response to Message 97188.  

reason being, I dislike the stock setting of switching tasks every 10 minutes.
BOINC Manager->Computing Preferences->Computing->Switch between tasks every N minutes->set N to wanted value->Save.
Same option available in the global computing preferences on each project's website.

BOINC will override this value when it finds you have too much work in cache and all based on the present guestimate that work is unable to be done before the deadline. Then BOINC will go into earliest deadline first mode and run as many tasks for a short amount of time to calculate it all tasks can be done before their deadline or not. During this mode tasks can be swapped in and out after several minutes.
ID: 97192 · Report as offensive
ProDigit

Send message
Joined: 8 Nov 19
Posts: 718
United States
Message 97201 - Posted: 30 Mar 2020, 19:02:13 UTC - in response to Message 97192.  

reason being, I dislike the stock setting of switching tasks every 10 minutes.
BOINC Manager->Computing Preferences->Computing->Switch between tasks every N minutes->set N to wanted value->Save.
Same option available in the global computing preferences on each project's website.

BOINC will override this value when it finds you have too much work in cache and all based on the present guestimate that work is unable to be done before the deadline. Then BOINC will go into earliest deadline first mode and run as many tasks for a short amount of time to calculate it all tasks can be done before their deadline or not. During this mode tasks can be swapped in and out after several minutes.

Yeah, I know the setting,
I was just worried by setting it to 420min, I could have a task get stuck (idle) after 5 minutes, and remain that way for 435 minutes occupying CPU slots, without actually doing anything.
ID: 97201 · Report as offensive
ProDigit

Send message
Joined: 8 Nov 19
Posts: 718
United States
Message 97413 - Posted: 9 Apr 2020, 13:48:28 UTC - in response to Message 97189.  
Last modified: 9 Apr 2020, 13:54:59 UTC


Longer CPDN tasks on older machines you are looking at at least 40 minutes for 0.001% so would get caught out by that if it existed.

In the past going back over ten years, CPDN used to have some tasks that would stall or loop without ever crashing or finishing so it certainly wasn't there then though it could be now but I doubt it.

The way things are right now, it's very hard to get a hold of any device running a CPU at below 1,6Ghz. Even atom processors, and mobile processors all run at 2Ghz or more. (the occasional small cores running 1,2Ghz, like on a Raspberry Pi).

Meanwhile modern multi-threaded CPUs are running somewhere between 3 and 4 Ghz or faster.
Even on long tasks, on a 4Ghz, I see 0.01% every 10 seconds.

The only ones that could possibly give errors, are the ones that don't give exact percentages, like GPU grid, who sometimes increases percentages in blocks of tens of percents.

Even with a turd of a CPU, running at 1 Ghz, to increase 0.01% in 1 minute (10x faster than your example), will need an entire week running a task non-stop.
I think such tasks are out of the question, or if they are available, this would not be the hardware to run it on, or such tasks need to be modified so regular hardware can finish it within 1 hour to 1 day.

0.01% every minute is a good value for resetting a project if it doesn't make it.
It's 7x slower than the longest WUs I'm currently running.
The only CPUs that would get in trouble (not hitting 0.01% in under a minute) are those that run below 600Mhz.
I doubt anyone has a laptop that is that slow (because desktops this slow went extinct a good 20 years ago), that still runs Boinc.

And if you do have a WU that's 7x longer than my current longest WUs of about 1 day, then well... We need to move forward, and throw old scrap to the scrapyard. The annual electric bill costs more than buying a $100 chromebook. that will crunch twice as fast, on less than $10 on electricity a year.
ID: 97413 · Report as offensive
Richard Haselgrove
Volunteer tester
Help desk expert

Send message
Joined: 5 Oct 06
Posts: 5077
United Kingdom
Message 97416 - Posted: 9 Apr 2020, 13:58:39 UTC - in response to Message 97413.  

I ran a Dell Precision workstation, circa 2007 - the first that could be fitted with dual Xeon quad-core CPUs. CPDN tasks took about 4 months on that, even after I'd upgraded the RAM to quad-channel. There's an awful lot of ocean and atmosphere to be modelled.
ID: 97416 · Report as offensive
BOINC Moderator
Volunteer moderator
Project administrator
Avatar

Send message
Joined: 10 Mar 20
Posts: 66
Message 97419 - Posted: 9 Apr 2020, 14:10:08 UTC - in response to Message 97413.  

or such tasks need to be modified so regular hardware can finish it within 1 hour to 1 day.
Here you go again being uninformed and spouting nonsense due to it. Do you know what CPDN does? Climate Prediction. They don't really care for tasks to be returned within an hour or day, because climate simulations can be run in many months. Which is also why their largest tasks still run for many many months, and essentially have no deadline - even when BOINC puts a maximum deadline of a year to getting the work returned.

I know you don't care about what we tell you, you think you know better than everyone else, you don't want to learn about the program, if anything you think that the projects and people should do things as you see fit. Did you show the same attitude at Folding@Home that they banished you? Because if you did, I think it's no wonder you were kicked.

Now, whether that's going to happen here or not is all up to you. This is a warning.
It's fine to post your opinion, but do so after you tried to learn why projects run their data the way they do. Before you demand CPDN runs their proprietary multi-million lines FORTRAN code on a GPU, because that's the next thing you're probably going to have a go at.
ID: 97419 · Report as offensive

Message boards : Questions and problems : Detection of stalled projects?

Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.