Adjust time between project contacts?

Message boards : Questions and problems : Adjust time between project contacts?
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Profile Gary Charpentier
Avatar

Send message
Joined: 23 Feb 08
Posts: 2462
United States
Message 93387 - Posted: 29 Oct 2019, 20:53:54 UTC - in response to Message 93385.  

Can I change a setting at my end to make Boinc only report tasks every 15 minutes? With Milkyway, they seem to have set it so they only send me new work if I haven't contacted the project in 10 minutes. I assume the admins have done this to reduce server load. I find it absurd that my Boinc client is reporting 3 tasks as completed every 2 minutes after having downloaded a block of 600. This seems to be adding unnecessary server load. It also means it runs out of work and sits idle for a while after each 600.

Doesn't seem to be here https://boinc.berkeley.edu/wiki/Client_configuration The opposite way is however.
ID: 93387 · Report as offensive
Richard Haselgrove
Volunteer tester
Help desk expert

Send message
Joined: 5 Oct 06
Posts: 5077
United Kingdom
Message 93390 - Posted: 29 Oct 2019, 22:30:35 UTC

In theory, you can do it by careful manipulation of cache settings. My slower machines have settings of 0.25 + 0.05 days: grab an extra hour or so when fetching, then mull things over for another hour or more before requesting a refill. My fastest has 0.5 + 0.01 days - keep well filled up, little and often.

The only hard-wired value is 1 hour: if a completed task has been hanging around that long, report it and (probably) request new work.
ID: 93390 · Report as offensive
Richard Haselgrove
Volunteer tester
Help desk expert

Send message
Joined: 5 Oct 06
Posts: 5077
United Kingdom
Message 93391 - Posted: 29 Oct 2019, 22:33:54 UTC - in response to Message 93388.  

Can the server not set some kind of gap?
Servers can and do set a minimum gap: you see it in the Event Log after every contact.

29/10/2019 22:32:22 | Milkyway@Home | Project requested delay of 91 seconds
I find it hard to believe that Milkyway is clever enough to set a second, secret, delay before issuing work.
ID: 93391 · Report as offensive
Richard Haselgrove
Volunteer tester
Help desk expert

Send message
Joined: 5 Oct 06
Posts: 5077
United Kingdom
Message 93395 - Posted: 29 Oct 2019, 23:14:32 UTC - in response to Message 93394.  

I don't get that request for a delay
Turns out you need <sched_op_debug> set in the logging options. That should show up as [sched_op], as in the other lines in the following example:

29/10/2019 23:10:26 | SETI@home | Scheduler request completed
29/10/2019 23:10:26 | SETI@home | [sched_op] Server version 709
29/10/2019 23:10:26 | SETI@home | Project requested delay of 303 seconds
29/10/2019 23:10:26 | SETI@home | [sched_op] handle_scheduler_reply(): got ack for task blc22_2bit_guppi_58692_03898_HIP79702_0124.28847.409.22.45.114.vlar_0
29/10/2019 23:10:26 | SETI@home | [sched_op] handle_scheduler_reply(): got ack for task blc22_2bit_guppi_58692_02278_HIP80797_0119.28898.0.22.45.227.vlar_1
29/10/2019 23:10:26 | SETI@home | [sched_op] handle_scheduler_reply(): got ack for task blc22_2bit_guppi_58692_04223_HIP79568_0125.28907.818.21.44.55.vlar_0
29/10/2019 23:10:26 | SETI@home | [sched_op] Deferring communication for 00:05:03
29/10/2019 23:10:26 | SETI@home | [sched_op] Reason: requested by project
That's a bug.
ID: 93395 · Report as offensive
Richard Haselgrove
Volunteer tester
Help desk expert

Send message
Joined: 5 Oct 06
Posts: 5077
United Kingdom
Message 93396 - Posted: 29 Oct 2019, 23:22:15 UTC

The other advantage of <sched_op_debug> is that you get to see exactly what's going on during the scheduler request:

29/10/2019 22:46:46 | NumberFields@home | Sending scheduler request: To report completed tasks.
29/10/2019 22:46:46 | NumberFields@home | Reporting 4 completed tasks
29/10/2019 22:46:46 | NumberFields@home | Requesting new tasks for CPU
29/10/2019 22:46:46 | NumberFields@home | [sched_op] CPU work request: 2856.25 seconds; 0.00 devices
29/10/2019 22:46:46 | NumberFields@home | [sched_op] NVIDIA GPU work request: 0.00 seconds; 0.00 devices
29/10/2019 22:46:46 | NumberFields@home | [sched_op] Intel GPU work request: 0.00 seconds; 0.00 devices
29/10/2019 22:46:49 | NumberFields@home | Scheduler request completed: got 1 new tasks
29/10/2019 22:46:49 | NumberFields@home | Project requested delay of 61 seconds
29/10/2019 22:46:49 | NumberFields@home | [sched_op] estimated total CPU task duration: 3230 seconds
If the 'estimated total task duration' is less than the work request (for whichever device), then you're asking for too much and hitting some other limit.
ID: 93396 · Report as offensive
Profile Keith Myers
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 17 Nov 16
Posts: 863
United States
Message 93398 - Posted: 30 Oct 2019, 0:26:25 UTC

Peter's sched_op_debug log shows the problem very well. It you are reporting any kind of work, you will ask for work and get none of it. The only way to get work is to NOT report any work in the previous scheduler connection 91 seconds earlier. If you can satisfy that condition you will get work.

Doesn't matter at all what the cache setting is set for. 10 days or 0.1 days has no effect. The only way to get work consistently is NOT report any work on the previous scheduler connection. Impossible to do with multiple fast cards and fast crunched tasks.

MW really needs to think about increasing the bundle count again to get tasks to last longer before reporting. They bundle 5 tasks now and they really need to bundle at least 10 or maybe 15 tasks into the work tasks so they take longer than 91 seconds to finish.
ID: 93398 · Report as offensive
Profile Keith Myers
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 17 Nov 16
Posts: 863
United States
Message 93407 - Posted: 30 Oct 2019, 6:26:14 UTC - in response to Message 93399.  

The only way I know how to do that is to stop network communication until you have crunched all 600 tasks, then reenable comms and report and then get more work. Considering it took the administrators over two months to fix the last server code mis-configuration when they upgraded to server version 1.04, I would not hold my breath waiting for the admins to figure out this configuration snafu. This problem has been discussed in the fora for over 3 months now and nothing has been accomplished though they know about the issue.
ID: 93407 · Report as offensive
floyd
Help desk expert

Send message
Joined: 23 Apr 12
Posts: 77
Message 93411 - Posted: 30 Oct 2019, 8:15:55 UTC
Last modified: 30 Oct 2019, 8:17:43 UTC

Just two things that haven't been addressed yet as far as I can see:
A) Peter's host doesn't connect to report tasks but to fetch tasks. The log shows that. Tasks are only reported as a side effect.
B) If you want to keep the frequency of connections low, the cache setting of 1+0 is a bad idea. It makes the client request very little amounts of work very often. I'd change the 0 to 0.01 or 0.02 at least, then see if BOINC's increasing delays take care of the rest. If that doesn't help, reduce the 1 day cache to a size that can actually be filled.
ID: 93411 · Report as offensive
floyd
Help desk expert

Send message
Joined: 23 Apr 12
Posts: 77
Message 93444 - Posted: 31 Oct 2019, 13:17:13 UTC - in response to Message 93418.  

I'm not sure if I understand. Are you saying the server refuses to send new tasks because you do return results? And that's intentional? I can't believe it.
ID: 93444 · Report as offensive
Profile Keith Myers
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 17 Nov 16
Posts: 863
United States
Message 93445 - Posted: 31 Oct 2019, 14:14:55 UTC - in response to Message 93444.  

I'm not sure if I understand. Are you saying the server refuses to send new tasks because you do return results? And that's intentional? I can't believe it.

Either intentional at MW or more likely the servers are misconfigured since none of my other projects display this behavior.
ID: 93445 · Report as offensive
floyd
Help desk expert

Send message
Joined: 23 Apr 12
Posts: 77
Message 93454 - Posted: 31 Oct 2019, 15:51:07 UTC - in response to Message 93449.  

Anybody got something other than 2 GPUs running MW in 1 host that can tell me what their limit is?
See this thread at the Milkyway forum. The limits are mentioned there (600 is the absolute maximum) and the discussion actually is about the effect you describe here, so it is known there. People guessed that you don't get tasks if you make a request too early, while a delay is still active, and the delay then gets reset creating a loop. That was also my first guess. I didn't read beyond the first page - the thread is five pages long - but you might want to do that. Interestingly Keith participated there, maybe he can remember something.
ID: 93454 · Report as offensive
1 · 2 · Next

Message boards : Questions and problems : Adjust time between project contacts?

Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.