Errors/Timed out

Message boards : Questions and problems : Errors/Timed out
Message board moderation

To post messages, you must log in.

AuthorMessage
Sirius B
Avatar

Send message
Joined: 12 Jun 09
Posts: 2098
Ireland
Message 100239 - Posted: 7 Aug 2020, 21:20:28 UTC

It is my understanding that as long as a sufficient caches is set for the capacity of one's rig, all tasks should complete within its deadline.
I don't micromanage Boinc & only manage when switching to a different project.
As I am solely crunching WCG I either change profile or project to reach specific target.
I have just had 20 tasks error & 11 tasks time out.
The tasks that error do not show up in the error tab but the aborted tab - I did not abort them so assuming that they were server aborted, even though they do not show up in that tab. Looking at them, I can see that they would have definitely timed out.
My cache is set for 6 days with 0.01 additional so can't understand as how this occurred.
Ideas anyone?
ID: 100239 · Report as offensive
robsmith
Volunteer tester
Help desk expert

Send message
Joined: 25 May 09
Posts: 1283
United Kingdom
Message 100241 - Posted: 7 Aug 2020, 21:49:27 UTC

Given the short deadlines that WCG normally uses it would be worth trying dropping your cache to something like or 3 days plus 0.01
I had a bunch of WCG tasks that were server aborted some time back - I just happened to be looking at the screen when it happened, so I guess that if they find "something wrong" with a block of tasks, or the application they dump them and possibly send them out later on when they've solved the problem.
ID: 100241 · Report as offensive
Sirius B
Avatar

Send message
Joined: 12 Jun 09
Posts: 2098
Ireland
Message 100243 - Posted: 7 Aug 2020, 21:53:52 UTC - in response to Message 100241.  
Last modified: 7 Aug 2020, 22:11:59 UTC

Thanks Rob, reduced it to 4. Not that it will matter for some time as now getting 1 for 1 due to the number of failed tasks.
Hopefully it shouldn't take too long. :-)
Must have been a good boy. :-)
07/08/2020 21:43:25 | World Community Grid | Sending scheduler request: To fetch work.
07/08/2020 21:43:25 | World Community Grid | Requesting new tasks for CPU
07/08/2020 21:43:28 | World Community Grid | Scheduler request completed: got 145 new tasks
Much better as got some more then this:
07/08/2020 22:27:53 | World Community Grid | Reporting 1 completed tasks
07/08/2020 22:27:53 | World Community Grid | Not requesting tasks: don't need (CPU: job cache full; NVIDIA GPU: )
07/08/2020 22:27:56 | World Community Grid | Scheduler request completed
ID: 100243 · Report as offensive
robsmith
Volunteer tester
Help desk expert

Send message
Joined: 25 May 09
Posts: 1283
United Kingdom
Message 100244 - Posted: 8 Aug 2020, 6:23:42 UTC

While you are on the enforced diet you will be able to make sure that everything else is OK - I hope that everything is OK.
ID: 100244 · Report as offensive
Sirius B
Avatar

Send message
Joined: 12 Jun 09
Posts: 2098
Ireland
Message 100245 - Posted: 8 Aug 2020, 6:41:42 UTC - in response to Message 100244.  
Last modified: 8 Aug 2020, 6:48:48 UTC

Diet? I don't think so. Last time this happened was earlier this year with TN-Grid. Thought then that there would be many time-outs. I was wrong as only had a couple.
Getting up a short time ago (too hot to get a really good sleep), checked on BM, didn't like what I saw so checked event log,
Before that 145 batch, it D/L'ed 134 followed by another 38 after the 145. (got it set on NNT now)

Had they all been 3 hr xx min like the last bunch, then maybe but they are all 5hr 16min. Now WCG has some slack in both Linux & Win. On my rigs it ranges between 15 - 30min between cpu time & real time.
So for argument's sake, lets make that 16min, so in 24 hrs should complete 4.8 tasks per core.
Times that x8 = 38.4.
So for just 4 days that should be 153.6 tasks
If deadline of 7 days is taken into account that is 268.8
Oops...
Will keep an eye on them.
Secondly, with the amount of failed tasks returned, shouldn't Boinc have backed off & stuck to what it 1st did - 1 out 1 in regardless of what setting the cache was/is?
Edit: Checking on the device stats on WCG the 8 core has averaged 42 tasks per day this month.
ID: 100245 · Report as offensive
Richard Haselgrove
Volunteer tester
Help desk expert

Send message
Joined: 5 Oct 06
Posts: 5077
United Kingdom
Message 100246 - Posted: 8 Aug 2020, 7:30:07 UTC

Unless that machine is away from the internet for extended periods, you really don't need a multi-day work cache. All my machines are on less than a day, and most of them are on 6 hours.
ID: 100246 · Report as offensive
Sirius B
Avatar

Send message
Joined: 12 Jun 09
Posts: 2098
Ireland
Message 100247 - Posted: 8 Aug 2020, 7:46:45 UTC - in response to Message 100246.  

Okay have set it to 1 day & I'll see what happens when this batch completes. Only had it set on 6 due to crunching Seti too long & didn't want to keep hitting the servers too often. Is there any need to have any additional then?
ID: 100247 · Report as offensive
Richard Haselgrove
Volunteer tester
Help desk expert

Send message
Joined: 5 Oct 06
Posts: 5077
United Kingdom
Message 100248 - Posted: 8 Aug 2020, 8:26:39 UTC - in response to Message 100247.  
Last modified: 8 Aug 2020, 8:27:10 UTC

Well, it's kind not to keep hitting anybody else's servers, too - especially when they're taking a mighty hitting from the influx of SETI exiles.

I actually set mine to 0.25 (that's the six hours) plus 0.05 (about an hour). If you finish any work, you're going to hit the server anyway within the hour, to report it. So you might as well grab a top-up while you're at it - saves making a double hit.
ID: 100248 · Report as offensive
Sirius B
Avatar

Send message
Joined: 12 Jun 09
Posts: 2098
Ireland
Message 100256 - Posted: 9 Aug 2020, 10:59:56 UTC - in response to Message 100248.  

I actually set mine to 0.25 (that's the six hours) plus 0.05 (about an hour). If you finish any work, you're going to hit the server anyway within the hour, to report it. So you might as well grab a top-up while you're at it - saves making a double hit.
Unless I'm completely misunderstanding the role that a cache plays, I find that such a small cache produces problems with some projects, for example, WCG African Rainfall. That project takes between 32 to 42 hours per task on my dual cores, so with such a small cache, no work would be received.
ID: 100256 · Report as offensive
Richard Haselgrove
Volunteer tester
Help desk expert

Send message
Joined: 5 Oct 06
Posts: 5077
United Kingdom
Message 100257 - Posted: 9 Aug 2020, 11:43:07 UTC - in response to Message 100256.  

Doesn't affect me. I'm just finishing some CPDN tasks, well into their 11th day of running. They were downloaded immediately I released the 'No New Tasks' for the project (it was a special request run - not my usual fare).

I think you're confusing runtimes with deadlines. BOINC won't download a task if it thinks you can't complete it before deadline, but that's not the same thing as the cache setting.
ID: 100257 · Report as offensive
Sirius B
Avatar

Send message
Joined: 12 Jun 09
Posts: 2098
Ireland
Message 100259 - Posted: 9 Aug 2020, 12:15:49 UTC - in response to Message 100257.  

Possibly. With an 0.25 & 0.05 cache, how many tasks will be downloaded per core on an initial run?
ID: 100259 · Report as offensive
Richard Haselgrove
Volunteer tester
Help desk expert

Send message
Joined: 5 Oct 06
Posts: 5077
United Kingdom
Message 100261 - Posted: 9 Aug 2020, 14:20:54 UTC - in response to Message 100259.  

Turn on <sched_op_debug> logging, and it'll you the size of the work request in seconds.

The very first request (when you first join) is always 1 second.

If you are entirely empty, the first 'real' request will be 25,920 seconds per device (core or GPU). How many tasks that turns into depends on the running average estimate of the speed of your device. If you have some work already, the request will be reduced accordingly.
ID: 100261 · Report as offensive
Sirius B
Avatar

Send message
Joined: 12 Jun 09
Posts: 2098
Ireland
Message 100308 - Posted: 14 Aug 2020, 18:43:20 UTC - in response to Message 100261.  

It's turned on so I'll see what happens later this evening.
As for the 317 tasks it downloaded, it looks it is not as bad as I perceived. It has 29 tasks remaining with 8 crunching.
4 will complete within deadline the other 4 will over run by 2 to 24 minutes so more than likely I'll be credited with them.
The remaining 21 will timeout by their deadline of 21:45 tonight.
Looking at my results status I am surprised by the rigs stating "no reply". Mostly all 64 bit with all flavours of O/S.
I'm left wondering - Is that an issue with either WCG's or Boinc's scheduler?
ID: 100308 · Report as offensive
Sirius B
Avatar

Send message
Joined: 12 Jun 09
Posts: 2098
Ireland
Message 100309 - Posted: 14 Aug 2020, 21:42:52 UTC

Surprised as the 4 I thought would over run, completed within 2 minutes of deadline. Actually saw 13 get aborted.
Changed cache to 1 day, 0 additional & resumed download & only got 42 tasks. Good enough as this rig averages 45-55 a day
Just cannot understand why this issue as while crunching MCM to get to Ruby, no problems whatsoever. Changing back to MCM for Emerald after reaching Ruby on MIPS...

As for this, pretty high numbers:
14/08/2020 22:14:25 | World Community Grid | [sched_op] CPU work request: 224340.11 seconds; 0.00 devices
ID: 100309 · Report as offensive

Message boards : Questions and problems : Errors/Timed out

Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.