Posts by Keith Myers

1) Message boards : Projects : GPUGrid (Message 89263)
Posted 17 hours ago by Profile Keith Myers
Post:
Don't think it is necessary to pursue this any further in this thread. I understand where and why I need to change the defaults to enable troubleshooting. Both projects are connecting now. I have changed the connect interval timeout to 60 seconds so shouldn't have any further issues.
2) Message boards : Projects : GPUGrid (Message 89260)
Posted 1 day ago by Profile Keith Myers
Post:
I have seen situations where the project comms will time out after 5 minutes which is the default setting unless changed via cc_config. Its possible that it would try 1st attempt and then timeout after 5 mins followed by another go before going into project backoff. That would mean at least 10 minutes plus the default backoff interval before it goes into project backoff.

Which was exactly the situation I faced with the GPUGrid project being down. It held off Seti connects for at least ten minutes, I say it was a lot longer than that. Don't have any proof though. All I know is that even in ten minutes I can process and report dozens of tasks that aren't being replaced because the scheduler connections is being thwarted by another projects connection attempts.
3) Message boards : Projects : GPUGrid (Message 89259)
Posted 1 day ago by Profile Keith Myers
Post:
In general, if something is written in the docs, it can be trusted. Occasionally, things get out of whack (something gets changed in code without updating the docs, fo example): if something like that comes to our attention, either Jord or I will try to correct it.

If nothing is written in the docs, it is not safe to make any assumptions: consistency is not guaranteed, however desirable. I've had a search through the code, and I can't find anywhere where the specific case of max_stdout_file_size=0 is handled. My best guess is that a new file would be created every time BOINC is restarted, and just one single run would be kept as the 'old' file. That's probably not what you wanted.


But -

<max_stderr_file_size>0</max_stderr_file_size>
<max_stdout_file_size>0</max_stdout_file_size>

appears to be the default values written into cc_config.xml anytime it gets fully populated. I certainly did not make any change to these parameters on all of my machines. I never even bothered to look at that parameter until you mentioned it. So I assume the developers intend the file to be recreated every time BOINC is started. Only if you are aware of these parameters and their defaults and you intend to use the files for history and troubleshooting does it need to get changed.
4) Message boards : Projects : GPUGrid (Message 89245)
Posted 1 day ago by Profile Keith Myers
Post:
Thanks for the clarification Jord. I guess I should increase from the BOINC default. Looks like docs say the value is in bytes. Looks like there isn't any consistency in what 0 means. For example the docs say 0 means no limit for the Event Log lines.
<max_event_log_lines>N</max_event_log_lines>
Maximum number of lines to display in BOINC Manager's Event Log window (default 2000, 0 means no limit).
5) Message boards : Projects : GPUGrid (Message 89236)
Posted 2 days ago by Profile Keith Myers
Post:
If you're seeing, specifically, timeouts - you could try a couple of config options:

        <dont_contact_ref_site>1</dont_contact_ref_site>
Cuts out the 'internet access' check (doesn't pester Google). I guess we tend to know whether the internet is up without BOINC telling us...

        <http_transfer_timeout>60</http_transfer_timeout>
I think this controls scheduler requests as well. Default is 300 seconds - I think 60 is plenty on a decent connection (if it ain't happened by then, it ain't going to happen).

But with GPUGrid, I was getting a more proactive 'Couldn't connect to server' before the timeout. I'll compare your dropbox cc_config.xml with mine more thoroughly after dinner.

Just looked and yes the <http_transfer_timeout>300</http_transfer_timeout> is default set to 5 minutes so that explains the long timeout. I agree if it doesn't happen in 60 seconds, it ain't going to happen. See that I will have to change the default for all my hosts.
6) Message boards : Projects : GPUGrid (Message 89232)
Posted 2 days ago by Profile Keith Myers
Post:
Richard how do I document the project request failure for you. Right now MilkyWay has the database down and I just had its server connect hang up the machine for over five minutes. I thought you said the server connects can only last 45 seconds before timing out. This prevented Seti from contacting the scheduler.

From the Event Log just now.

Sun 16 Dec 2018 10:04:49 AM PST | Milkyway@Home | [sched_op] Starting scheduler request
Sun 16 Dec 2018 10:04:49 AM PST | Milkyway@Home | Sending scheduler request: To report completed tasks.
Sun 16 Dec 2018 10:04:49 AM PST | Milkyway@Home | Reporting 10 completed tasks
Sun 16 Dec 2018 10:04:49 AM PST | Milkyway@Home | Requesting new tasks for NVIDIA GPU
Sun 16 Dec 2018 10:04:49 AM PST | Milkyway@Home | [sched_op] CPU work request: 0.00 seconds; 0.00 devices
Sun 16 Dec 2018 10:04:49 AM PST | Milkyway@Home | [sched_op] NVIDIA GPU work request: 189859.02 seconds; 0.00 devices

bunch of Seti uploads

Sun 16 Dec 2018 10:09:58 AM PST | | Project communication failed: attempting access to reference site
Sun 16 Dec 2018 10:09:58 AM PST | Milkyway@Home | Scheduler request failed: Timeout was reached
Sun 16 Dec 2018 10:09:58 AM PST | Milkyway@Home | [sched_op] Deferring communication for 00:38:41
Sun 16 Dec 2018 10:09:58 AM PST | Milkyway@Home | [sched_op] Reason: Scheduler request failed
Sun 16 Dec 2018 10:10:00 AM PST | | Internet access OK - project servers may be temporarily down.


Seti finally gets a chance to connect

Sun 16 Dec 2018 10:10:03 AM PST | SETI@home | [sched_op] Starting scheduler request
Sun 16 Dec 2018 10:10:03 AM PST | SETI@home | Sending scheduler request: To fetch work.
Sun 16 Dec 2018 10:10:03 AM PST | SETI@home | Reporting 36 completed tasks
Sun 16 Dec 2018 10:10:03 AM PST | SETI@home | Requesting new tasks for CPU and NVIDIA GPU
Sun 16 Dec 2018 10:10:03 AM PST | SETI@home | [sched_op] CPU work request: 1037353.42 seconds; 0.00 devices
Sun 16 Dec 2018 10:10:03 AM PST | SETI@home | [sched_op] NVIDIA GPU work request: 191643.44 seconds; 0.00 devices


My stdoutdae.txt is currently 2.7MB but that only covers one day. So by the time you asked for documentation, the events had been purged from that file. I have the standard log flags plus only one other sched_op_debug so I can see how many seconds of work I request each connection. That is the only "extra" information included in the log other than the normal stuff.

[Edit] Curious as to why my stdoutdae.txt is only 2.7MB and covers only a day. I have <max_stdout_file_size>0</max_stdout_file_size> and I believe that 0 means no limit.
7) Message boards : Projects : GPUGrid (Message 89214)
Posted 3 days ago by Profile Keith Myers
Post:
Impossible to document the fault now that GPUGrid has returned. But that was not what I saw on my client. I had simply shut down BOINC to do some stability testing after changing from UMA to NUMA memory access modes. I had not run any rescheduler. I was simply returning to running BOINC. The GPUGrid request just kept happening over and over. It took something like a minute and half to timeout and then started another request. Over and over again. It must have been a half hour before the GPUGrid finally started incrementing additional server connect backoffs which finally let my other projects in.

As I stated, no supporting documentation available as I have stopped and restarted BOINC too many times now and don't have any Event Log entries from that day.
8) Message boards : Projects : GPUGrid (Message 89197)
Posted 4 days ago by Profile Keith Myers
Post:
And I discovered that it is the first project to attempt communication when BOINC is started which I think is alphabetical. Which promptly stalls out any further communications for my other projects. Didn't realize this was the way BOINC worked. Until GPUGrid timed out finally on its attempt to connect, it held off any other projects normal connections. This caused me to let my Seti cache fall down by a hundred tasks until finally it was given the chance to report and request work.

As far as I am concerned this is a flaw in the BOINC communications protocol. No misbehaving project should prevent any normal running project from communicating.
9) Message boards : The Lounge : The Seti is Down Cafe (Message 89146)
Posted 9 days ago by Profile Keith Myers
Post:
Even better, we got to have a second outage for the week. Makes Tuesday's look like a 'walk in the park'!
10) Message boards : The Lounge : The Seti is Down Cafe (Message 88849)
Posted 13 Nov 2018 by Profile Keith Myers
Post:
The San Joaquin and Sacramento Valleys are basins too. Just look at a satellite photo of California in the winter when the tule fog sets in. Just a big long bowl of white.
11) Message boards : The Lounge : The Seti is Down Cafe (Message 88847)
Posted 13 Nov 2018 by Profile Keith Myers
Post:
Yes, the web pages didn't show the servers up. But a little while later I was able to report. No work yet though.
12) Message boards : The Lounge : The Seti is Down Cafe (Message 88845)
Posted 13 Nov 2018 by Profile Keith Myers
Post:
Web pages are back. Servers not yet.
13) Message boards : The Lounge : The Seti is Down Cafe (Message 88841)
Posted 13 Nov 2018 by Profile Keith Myers
Post:
Sky gazing up here has been a lost cause in recent years. We have a naked eye visible comet coming up and unless I travel a couple of states, no chance of seeing it.
14) Message boards : The Lounge : The Seti is Down Cafe (Message 88839)
Posted 13 Nov 2018 by Profile Keith Myers
Post:
Been a really bad year for forest fires and smoke in Northern California. We had two months of fires and smoke earlier in August-September from the Carr Fire near Redding. That one caused really bad air since it was to the north of me and that is where the prevailing winds come from most of the year. Besides having to breathe bad air it has also reduced my solar generation by 2MWh so far and that is what offsets my use of the computers for crunching.
15) Message boards : The Lounge : The Seti is Down Cafe (Message 88833)
Posted 13 Nov 2018 by Profile Keith Myers
Post:
Likely that is your own smoke climate, not from the Camp Fire. Winds have been 90% prevailing from the north-east taking the smoke to the south and west. From the smoke maps, seems like lots of fire in Oregon and Washington that are generating their own smoke. If anyone caught the Monday Night Football game last night from Santa Clara, you would have seen smoke from the Camp Fire in the stadium. Santa Clara is 220 miles to the south-west of Paradise.

Haven't seen the real sun since the fire started. Just a orange dot, if that. PM2.5 particulate matter has been off the charts, literally. The charts only go to 600 and all the peaks are beyond that.
My Purple Air II sensor has been in the orange and red since the beginning of the fire.
Purple Air map
16) Message boards : The Lounge : The Seti is Down Cafe (Message 88702)
Posted 31 Oct 2018 by Profile Keith Myers
Post:
Haha, LOL, I was crunching my backup projects within an hour of the start of the outrage. 400 or 500 tasks only last so long.
17) Message boards : The Lounge : The Seti is Down Cafe (Message 88695)
Posted 31 Oct 2018 by Profile Keith Myers
Post:
This one definitely qualifies as a Grand Mal outrage. Still an hour to go.
18) Message boards : Questions and problems : How to solve libcurl3 dependency in Boinc Manager (Message 88534)
Posted 19 Oct 2018 by Profile Keith Myers
Post:
I will try and use the curl34 ppa that RickToTheMax clued me in on with another attempt to install 18.10 in a test partition later today with TBar's versions of BOINC.
19) Message boards : Questions and problems : How to solve libcurl3 dependency in Boinc Manager (Message 88520)
Posted 19 Oct 2018 by Profile Keith Myers
Post:
Yes, I goofed. The client is the one that has the libcurl3 dependency. What do you mean the one from your libcurl3-less distribution?
20) Message boards : BOINC Manager : IS there a BOINC Manager version that doesn't need libcurl3? (Message 88514)
Posted 19 Oct 2018 by Profile Keith Myers
Post:
IS there a BOINC Manager version that doesn't need libcurl3?


Next 20

Copyright © 2018 University of California. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.