BOINC doing no work since Linux update

Message boards : Questions and problems : BOINC doing no work since Linux update
Message board moderation

To post messages, you must log in.

AuthorMessage
joeyjojo

Send message
Joined: 30 Apr 13
Posts: 9
United Kingdom
Message 48861 - Posted: 30 Apr 2013, 11:52:49 UTC
Last modified: 30 Apr 2013, 11:54:13 UTC

My fedora box was recently updated from 17 to 18 and now my BOINC won't run any work. My boinc-gui currently shows 6 tasks, of which 2 are "Waiting to run" and 4 are "Ready to start".

I had brief success by using "Reset project" on each of the three projects I run, but they have since stalled again.

My cc_config.xml contains only my exclusive apps (which obviously aren't running at the moment. Work is suspended correctly when I have them loaded):

<cc_config>
  <options>
     <exclusive_app>eclipse</exclusive_app>
     <exclusive_app>matlab</exclusive_app>
  </options>
</cc_config>


I've deleted global_prefs.xml and restarted the machine. global_prefs_override.xml currently contains:

<global_preferences>
   <run_on_batteries>1</run_on_batteries>
   <run_if_user_active>1</run_if_user_active>
   <run_gpu_if_user_active>0</run_gpu_if_user_active>
   <idle_time_to_run>0.000000</idle_time_to_run>
   <suspend_cpu_usage>0.000000</suspend_cpu_usage>
   <start_hour>0.000000</start_hour>
   <end_hour>0.000000</end_hour>
   <net_start_hour>0.000000</net_start_hour>
   <net_end_hour>0.000000</net_end_hour>
   <leave_apps_in_memory>0</leave_apps_in_memory>
   <confirm_before_connecting>1</confirm_before_connecting>
   <hangup_if_dialed>0</hangup_if_dialed>
   <dont_verify_images>0</dont_verify_images>
   <work_buf_min_days>0.100000</work_buf_min_days>
   <work_buf_additional_days>0.500000</work_buf_additional_days>
   <max_ncpus_pct>100.000000</max_ncpus_pct>
   <cpu_scheduling_period_minutes>60.000000</cpu_scheduling_period_minutes>
   <disk_interval>60.000000</disk_interval>
   <disk_max_used_gb>10.000000</disk_max_used_gb>
   <disk_max_used_pct>50.000000</disk_max_used_pct>
   <disk_min_free_gb>0.100000</disk_min_free_gb>
   <vm_max_used_pct>75.000000</vm_max_used_pct>
   <ram_max_used_busy_pct>50.000000</ram_max_used_busy_pct>
   <ram_max_used_idle_pct>90.000000</ram_max_used_idle_pct>
   <max_bytes_sec_up>0.000000</max_bytes_sec_up>
   <max_bytes_sec_down>0.000000</max_bytes_sec_down>
   <cpu_usage_limit>100.000000</cpu_usage_limit>
   <daily_xfer_limit_mb>0.000000</daily_xfer_limit_mb>
   <daily_xfer_period_days>0</daily_xfer_period_days>
</global_preferences>


The log since restarting says:

Tue 30 Apr 2013 14:47:17 BST | | Starting BOINC client version 7.0.29 for x86_64-pc-linux-gnu
Tue 30 Apr 2013 14:47:17 BST | | log flags: file_xfer, sched_ops, task
Tue 30 Apr 2013 14:47:17 BST | | Libraries: libcurl/7.27.0 NSS/3.14.3.0 zlib/1.2.7 libidn/1.26 libssh2/1.4.3
Tue 30 Apr 2013 14:47:17 BST | | Data directory: /home/scratch/boinc
Tue 30 Apr 2013 14:47:17 BST | | Processor: 4 GenuineIntel Intel(R) Core(TM) i5-3570 CPU @ 3.40GHz [Family 6 Model 58 Stepping 9]
Tue 30 Apr 2013 14:47:17 BST | | Processor: 6.00 MB cache
Tue 30 Apr 2013 14:47:17 BST | | Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm ida arat epb xsaveopt pln pts dtherm tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms
Tue 30 Apr 2013 14:47:17 BST | | OS: Linux: 3.8.9-200.fc18.x86_64
Tue 30 Apr 2013 14:47:17 BST | | Memory: 3.75 GB physical, 8.01 GB virtual
Tue 30 Apr 2013 14:47:17 BST | | Disk: 234.16 GB total, 170.74 GB free
Tue 30 Apr 2013 14:47:17 BST | | Local time is UTC +1 hours
Tue 30 Apr 2013 14:47:17 BST | | No usable GPUs found
Tue 30 Apr 2013 14:47:17 BST | | Config: don't compute while eclipse is running
Tue 30 Apr 2013 14:47:17 BST | | Config: don't compute while matlab is running
Tue 30 Apr 2013 14:47:17 BST | | A new version of BOINC is available. <a href=http://boinc.berkeley.edu/download.php>Download it.</a>
Tue 30 Apr 2013 14:47:17 BST | rosetta@home | URL http://boinc.bakerlab.org/rosetta/; Computer ID 1571776; resource share 100
Tue 30 Apr 2013 14:47:17 BST | climateprediction.net | URL http://climateprediction.net/; Computer ID 1242828; resource share 100
Tue 30 Apr 2013 14:47:17 BST | SETI@home | URL http://setiathome.berkeley.edu/; Computer ID 6802872; resource share 100
Tue 30 Apr 2013 14:47:17 BST | | No general preferences found - using defaults
Tue 30 Apr 2013 14:47:17 BST | | Reading preferences override file
Tue 30 Apr 2013 14:47:17 BST | | Preferences:
Tue 30 Apr 2013 14:47:17 BST | | max memory usage when active: 1919.02MB
Tue 30 Apr 2013 14:47:17 BST | | max memory usage when idle: 3454.23MB
Tue 30 Apr 2013 14:47:17 BST | | max disk usage: 10.00GB
Tue 30 Apr 2013 14:47:17 BST | | don't use GPU while active
Tue 30 Apr 2013 14:47:17 BST | | (to change preferences, visit the web site of an attached project, or select Preferences in the Manager)
Tue 30 Apr 2013 14:47:17 BST | | Not using a proxy
Tue 30 Apr 2013 14:50:28 BST | | System clock was turned backwards; clearing timeouts


Any ideas?
ID: 48861 · Report as offensive
Profile Jord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15483
Netherlands
Message 48862 - Posted: 30 Apr 2013, 13:16:50 UTC - in response to Message 48861.  

I see:
Tue 30 Apr 2013 14:50:28 BST | | System clock was turned backwards; clearing timeouts

Try to figure out by how much the system clock was turned backwards and what effect that had on any of the work in cache.

Also check all the status columns in BOINC Manager (Projects tab, Tasks tab). Other than Waiting to Run and Ready to Start, are there any other messages there?

Is this BOINC by Fedora repositories, or BOINC by Berkeley?
ID: 48862 · Report as offensive
joeyjojo

Send message
Joined: 30 Apr 13
Posts: 9
United Kingdom
Message 48869 - Posted: 30 Apr 2013, 16:29:28 UTC - in response to Message 48862.  

Thanks Jord.

The clock goes back by 1 hour when it updates after turning on as we're in British Summer Time. This hasn't been a problem for the last year or so.

No other messages. Network connection is fine.

This is the Fedora package of BOINC:

Name        : boinc-client
Arch        : x86_64
Version     : 7.0.29
Release     : 2.r25790svn.fc18
ID: 48869 · Report as offensive
Profile Jord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15483
Netherlands
Message 48871 - Posted: 30 Apr 2013, 17:21:38 UTC - in response to Message 48869.  

You do have BOINC set to run based on preferences?
The waiting to run and ready to start are tasks run by CPU apps?
Deleting global_prefs.xml will not do much, it'll be remade the next time BOINC contacts any project. So for continuity, can you please post the contents of global_prefs.xml as well? It'll hold some preferences that cannot be set through the override file.

And as you may have noticed, there's a new BOINC out, 7.0.65. Though not yet through rpm. But if you know what you're doing you could move everything BOINC to your /home directory and install Berkeley BOINC instead, see what that does.
ID: 48871 · Report as offensive
Claggy

Send message
Joined: 23 Apr 07
Posts: 1112
United Kingdom
Message 48873 - Posted: 30 Apr 2013, 17:44:00 UTC - in response to Message 48869.  
Last modified: 30 Apr 2013, 17:48:55 UTC

We recently had that same symtom posted to the boinc_alpha list by SekRob:

[boinc_alpha] Boinc 7.0.xx Computing stops when clock is set backwards!

His parting post was:

Gianfranco,

It's all about what's a sure fire delay to overcome this "BOINC allergic
reaction to" time management quirk. Will test 60 seconds later and
probably shorten it further to find out what's a permanent stable value.
This baby is up in 17 seconds now, but doubt all systems are that fast
at booting into a full operational state. Sacrificing 60-120 seconds
does no one pain, at least it does not to me. To make it easy to
maintain, could create a startup_delay.xml file in the /etc/boinc-client
folder and let the startup script read that. Maybe let it read the
<start_delay> value from the cc_config.xml. If not present, assume X
seconds. At the present the <start_delay> is ineffectual. BOINC starts,
and then just sits there counting off the time. Maybe not, as then a 120
seconds value would become times 2 i.e. 240 seconds. Whatever simple
solution, so folk wont have to get into the script files. Or, check up
with Canonical why that is... serial boot and this being repeated. On
Windows it's a one time thing at End of March and October. Seemingly the
issue goes back to 2007 per this discussion thread:
http://ubuntu.5.n6.nabble.com/DST-changes-lost-with-every-boot-resume-in-feisty-td1398641.html

Meantime, reading up, I may have found the conflict in the Ubuntu
config. There's a ticked Auto-detect location and a time zone named UTC
reading out the system time (which is set to CET), so guess it is
stepping through UTC plus CET, but hey per Auto-detect the time at
CET-Rome is one hour earlier, so pronto, goes back again. Deleted the
UTC zone and will watch on next boot (Don't ask me why it's possible to
have both options ticked at the same time).


-- SekeRob


DA's answer was:

I'll keep looking into this.

But to repeat: the root problem here is that your system
clock is changing by large amounts.
System clock is Unix time,
which doesn't vary with time zone or daylight savings time.


How you set your time zone varies between distros; see
http://www.cyberciti.biz/faq/howto-linux-unix-change-setup-timezone-tz-variable/
None of these involves changing the system clock.

-- David


and Charlie answered:

Changing the clock time is not the same as the switch to and from Daylight Savings. To simulate that, change your setting of the time zone you are in, not your clock. As David says, the underlying system time does not change, only the adjustment used to _show_ the current time of day.

I have just tested this on both Mac OS X (which is built on top of BSD UNIX) and also Windows XP. In both cases, the Manager display continued to update when is witched from Pacific Daylight Time (UTC-8) to Alaska Daylight Time (UTC-9), though the time of day displays all changed.

Cheers,
--Charlie


Claggy
ID: 48873 · Report as offensive
joeyjojo

Send message
Joined: 30 Apr 13
Posts: 9
United Kingdom
Message 48874 - Posted: 30 Apr 2013, 20:39:53 UTC

Thanks both.

Ageless - CPU apps. global_prefs.xml hasn't been recreated on restarting. Before I deleted it it contained:

<global_preferences>
    <source_project>http://setiathome.berkeley.edu/</source_project>
    <source_scheduler>http://setiboinc.ssl.berkeley.edu/sah_cgi/cgi</source_scheduler>

<mod_time>1305159453</mod_time>
<run_on_batteries>0</run_on_batteries>
<run_if_user_active>1</run_if_user_active>
<run_gpu_if_user_active>0</run_gpu_if_user_active>
<idle_time_to_run>3</idle_time_to_run>
<suspend_if_no_recent_input>0</suspend_if_no_recent_input>
<suspend_cpu_usage>0</suspend_cpu_usage>
<leave_apps_in_memory>0</leave_apps_in_memory>
<cpu_scheduling_period_minutes>60</cpu_scheduling_period_minutes>
<max_cpus>0</max_cpus>
<max_ncpus_pct>100</max_ncpus_pct>
<cpu_usage_limit>100</cpu_usage_limit>
<disk_max_used_gb>20</disk_max_used_gb>
<disk_min_free_gb>10</disk_min_free_gb>
<disk_max_used_pct>50</disk_max_used_pct>
<disk_interval>60</disk_interval>
<vm_max_used_pct>75</vm_max_used_pct>
<ram_max_used_busy_pct>50</ram_max_used_busy_pct>
<ram_max_used_idle_pct>90</ram_max_used_idle_pct>
<work_buf_min_days>0</work_buf_min_days>
<work_buf_additional_days>0.25</work_buf_additional_days>
<confirm_before_connecting>0</confirm_before_connecting>
<hangup_if_dialed>0</hangup_if_dialed>
<max_bytes_sec_down>0</max_bytes_sec_down>
<max_bytes_sec_up>0</max_bytes_sec_up>
<daily_xfer_limit_mb>0</daily_xfer_limit_mb>
<daily_xfer_period_days>0</daily_xfer_period_days>
<dont_verify_images>0</dont_verify_images>
</global_preferences>


which I know because the .xml~ file created by gedit is still sat in the folder from when I was tinkering with it. Installing myself is not an option as I don't have root access, only access to the package manager (yum).

Claggy: interesting, thanks. Again, I don't have root access so none of those linked ways to change the timezone are available to me. According to this though the timezone is set by /etc/localtime, which on my machine is currently a link to ../usr/share/zoneinfo/Europe/London, which looks normal (apart from the .. which I wouldn't think was necessary). I've have a look tomorrow at the GUI to see if it tells me which timezone I'm in and ask a sysadmin to check it's GMT.

As I said before though Fedora has been doing this -1 hour change for a long time and only recently started misbehaving. The newest boinc-client package for FC17 is the same one as I have now, 7.0.29-1 (I have 7.0.29-2 now I believe).
ID: 48874 · Report as offensive
Profile Jord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15483
Netherlands
Message 48875 - Posted: 30 Apr 2013, 21:39:41 UTC - in response to Message 48874.  
Last modified: 30 Apr 2013, 21:40:50 UTC

I've have a look tomorrow at the GUI to see if it tells me which timezone I'm in and ask a sysadmin to check it's GMT.

Your BOINC states: Tue 30 Apr 2013 14:47:17 BST | | Local time is UTC +1 hours
Which is correct for Great Britain at this time. You've changed to summer time as well end of March.

global_prefs.xml isn't made on restarting the client, it's made when the client contacts any of the projects. Then the scheduler will check if it's there, and if not, push the project preferences and computing preferences onto the computer.

Okay, in your global_prefs.xml file it says: <max_cpus>0</max_cpus>
This value is set by On multiprocessors, use at most X processors and is the minimum amount of CPUs you want BOINC to use. If set to zero... how do you expect BOINC to do any work? ;-)

So go to Seti's computing preferences and change that value to 4, if you want to use all 4 cores on your i5. You can then use On multiprocessors, use at most X% of the processors to fine tune the amount.
Then push the update to the client: BOINC Manager->Projects tab->Seti->Update.


@Claggy, thanks for bringing that up as well. I'd forgotten about that thread in Alpha, mostly as I had only glanced at it, as I was busy with other stuff at the time.
ID: 48875 · Report as offensive
joeyjojo

Send message
Joined: 30 Apr 13
Posts: 9
United Kingdom
Message 48880 - Posted: 30 Apr 2013, 23:12:23 UTC - in response to Message 48875.  

I noticed that too (max cpus) and changed it to several values including 4 with no success (after reloading prefs each time). But you never know, maybe it'll work tomorrow. Cheers.
ID: 48880 · Report as offensive
joeyjojo

Send message
Joined: 30 Apr 13
Posts: 9
United Kingdom
Message 48916 - Posted: 2 May 2013, 11:16:00 UTC
Last modified: 2 May 2013, 11:18:12 UTC

Hi again.

Changing the prefs to 4 CPUs temporarily fixed my problem, but the machine is now again stuck and not running any work. It picked up several new Rosetta tasks and a SETI task, completed them, and is now sat idle. See screenshot.



I notice in the log that the only scheduler requests are the ones requested by me (the last one being yesterday afternoon). Could this be the problem? Before you ask, there are no messages in Projects or Tasks, and nothing strange in the log. Network activity is set to "always available".
ID: 48916 · Report as offensive
Profile Jord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15483
Netherlands
Message 48917 - Posted: 2 May 2013, 13:37:54 UTC - in response to Message 48916.  

If it isn't a preference somewhere that's doing this, then to me it looks mostly like it's a problem with 7.0.29

Any late entries in stderrdae.txt?
I see in http://rpmfind.net/linux/rpm2html/search.php?query=boinc-client that 7.0.65 is now the client of choice on many of the Fedora versions. Mind checking if there's one for yours and try to update to it, or try the one for Fedora 19?
ID: 48917 · Report as offensive
joeyjojo

Send message
Joined: 30 Apr 13
Posts: 9
United Kingdom
Message 48921 - Posted: 2 May 2013, 16:42:16 UTC - in response to Message 48917.  
Last modified: 2 May 2013, 16:52:47 UTC

No stderrae.txt (I checked the usual places and also used `find`).

Prefs are all as above (apart from ncpus of course).

As it says on that list, 7.0.29-2 is the latest for FC18. I don't have root so can't install a newer one myself. I can ask the admin though.

I've aborted those stuck jobs and a load of new tasks have been downloaded. Will see how we get on!

Edit: aha, log after cancelling says:

Thu 02 May 2013 17:39:53 BST | SETI@home | Sending scheduler request: To fetch work.
Thu 02 May 2013 17:39:53 BST | SETI@home | Reporting 1 completed tasks, requesting new tasks for CPU
Thu 02 May 2013 17:39:56 BST | SETI@home | Scheduler request completed: got 26 new tasks
Thu 02 May 2013 17:39:58 BST | SETI@home | Started download of setiathome-5.28.x86_64-pc-linux-gnu
... loads of SETI downloads ...
Thu 02 May 2013 17:40:35 BST | SETI@home | Finished download of 02mr13ac.12074.18068.14.11.232
Thu 02 May 2013 17:42:45 BST | rosetta@home | Sending scheduler request: To report completed tasks.
Thu 02 May 2013 17:42:45 BST | rosetta@home | Reporting 10 completed tasks, not requesting new tasks
Thu 02 May 2013 17:42:47 BST | rosetta@home | Scheduler request completed


Why is is not requesting new tasks? Too many SETI ones in the queue already or something else? It only has new SETI work, not Rosetta or CPDN. All are set to allow new work on the project tab, I checked by clicking the button twice to make sure:

Thu 02 May 2013 17:49:31 BST | climateprediction.net | work fetch suspended by user
Thu 02 May 2013 17:49:32 BST | climateprediction.net | work fetch resumed by user
Thu 02 May 2013 17:49:33 BST | SETI@home | work fetch suspended by user
Thu 02 May 2013 17:49:34 BST | SETI@home | work fetch resumed by user
Thu 02 May 2013 17:49:35 BST | rosetta@home | work fetch suspended by user
Thu 02 May 2013 17:49:36 BST | rosetta@home | work fetch resumed by user


If I click update for Rosetta the log says:
Thu 02 May 2013 17:51:12 BST | rosetta@home | update requested by user
Thu 02 May 2013 17:51:14 BST | rosetta@home | Sending scheduler request: Requested by user.
Thu 02 May 2013 17:51:14 BST | rosetta@home | Not reporting or requesting tasks
Thu 02 May 2013 17:51:16 BST | rosetta@home | Scheduler request completed


CPDN seems to have a valid excuse:
Thu 02 May 2013 17:52:14 BST | climateprediction.net | update requested by user
Thu 02 May 2013 17:52:17 BST | climateprediction.net | Sending scheduler request: Requested by user.
Thu 02 May 2013 17:52:17 BST | climateprediction.net | Reporting 1 completed tasks, requesting new tasks for CPU
Thu 02 May 2013 17:52:19 BST | climateprediction.net | Scheduler request completed: got 0 new tasks
Thu 02 May 2013 17:52:19 BST | climateprediction.net | Project has no tasks available
ID: 48921 · Report as offensive
Profile Jord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15483
Netherlands
Message 48924 - Posted: 2 May 2013, 17:10:23 UTC - in response to Message 48921.  
Last modified: 2 May 2013, 17:10:59 UTC

No stderrae.txt (I checked the usual places and also used `find`).

Yeah, well, if you searched for that, you won't find it. I asked for stderrdae.txt :-)

Why is is not requesting new tasks? Too many SETI ones in the queue already or something else?

Now, first off, 7.0.29 was never a recommended version, but has always been a development version only. So it could be a myriad of bugs.
The work request is set by the low water mark (Maintain enough tasks to keep busy for at least) value. Above it and there won't be any work rquest, fall under it and presto we'll ask work, and then ask work from the most eligible project (the one with the lowest RAC really, if all projects use the same resource share).

If you truly want to see why project A won't take work in, while project B might be on the brink of, you'll have to use the appropriate flags set. The main flags are:

<cpu_sched_debug>: problems involving the choice of applications to run.
<work_fetch_debug>: problems involving work fetch (which projects are asked for work, and how much).
<rr_simulation>: problems involving jobs being run in high-priority mode.
<sched_op_debug>: problems involving scheduler operations and other low level information.

Use these flags from the cc_config.xml file.

In your case, I would set work_fetch_debug and cpu_sched_debug.
But as said, your BOINC is an old one, the bug --if it is a bug-- could've been fixed already. Only way to know is by using a newer version. And so we'll have to wait for your admin.
ID: 48924 · Report as offensive
joeyjojo

Send message
Joined: 30 Apr 13
Posts: 9
United Kingdom
Message 48927 - Posted: 2 May 2013, 17:26:37 UTC - in response to Message 48924.  

Whoops ;-) stderrdae.txt doesn't exist either.

I'll let it work overnight and see what happens. If tomorrow it's still being odd I'll send the admin a note. As you say, there is a 7.0.65 package for FC19 which he can probably put on.

Thanks for your help.
ID: 48927 · Report as offensive
joeyjojo

Send message
Joined: 30 Apr 13
Posts: 9
United Kingdom
Message 48952 - Posted: 3 May 2013, 13:51:01 UTC

Updated to 7.0.65 and all appears to work normally. Thanks. Shame it was just a stupid bug.
ID: 48952 · Report as offensive
SekeRob2

Send message
Joined: 6 Jul 10
Posts: 585
Italy
Message 48958 - Posted: 3 May 2013, 16:58:30 UTC

In console:

1) sudo updatedb
2) locate stderrdae.txt

or any other file... near instant reply if it exists [Private hot tip from an Alpha Mail List tester]
Coelum Non Animum Mutant, Qui Trans Mare Currunt
ID: 48958 · Report as offensive
joeyjojo

Send message
Joined: 30 Apr 13
Posts: 9
United Kingdom
Message 48980 - Posted: 4 May 2013, 16:59:47 UTC - in response to Message 48958.  
Last modified: 4 May 2013, 16:59:54 UTC

Thanks. I always use locate as a quick check, followed by find in cases like this as some paths are not kept on the db.
ID: 48980 · Report as offensive

Message boards : Questions and problems : BOINC doing no work since Linux update

Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.