Message boards :
Questions and problems :
BOINC doing no work since Linux update
Message board moderation
Author | Message |
---|---|
Send message Joined: 30 Apr 13 Posts: 9 |
My fedora box was recently updated from 17 to 18 and now my BOINC won't run any work. My boinc-gui currently shows 6 tasks, of which 2 are "Waiting to run" and 4 are "Ready to start". I had brief success by using "Reset project" on each of the three projects I run, but they have since stalled again. My cc_config.xml contains only my exclusive apps (which obviously aren't running at the moment. Work is suspended correctly when I have them loaded): <cc_config> <options> <exclusive_app>eclipse</exclusive_app> <exclusive_app>matlab</exclusive_app> </options> </cc_config> I've deleted global_prefs.xml and restarted the machine. global_prefs_override.xml currently contains: <global_preferences> <run_on_batteries>1</run_on_batteries> <run_if_user_active>1</run_if_user_active> <run_gpu_if_user_active>0</run_gpu_if_user_active> <idle_time_to_run>0.000000</idle_time_to_run> <suspend_cpu_usage>0.000000</suspend_cpu_usage> <start_hour>0.000000</start_hour> <end_hour>0.000000</end_hour> <net_start_hour>0.000000</net_start_hour> <net_end_hour>0.000000</net_end_hour> <leave_apps_in_memory>0</leave_apps_in_memory> <confirm_before_connecting>1</confirm_before_connecting> <hangup_if_dialed>0</hangup_if_dialed> <dont_verify_images>0</dont_verify_images> <work_buf_min_days>0.100000</work_buf_min_days> <work_buf_additional_days>0.500000</work_buf_additional_days> <max_ncpus_pct>100.000000</max_ncpus_pct> <cpu_scheduling_period_minutes>60.000000</cpu_scheduling_period_minutes> <disk_interval>60.000000</disk_interval> <disk_max_used_gb>10.000000</disk_max_used_gb> <disk_max_used_pct>50.000000</disk_max_used_pct> <disk_min_free_gb>0.100000</disk_min_free_gb> <vm_max_used_pct>75.000000</vm_max_used_pct> <ram_max_used_busy_pct>50.000000</ram_max_used_busy_pct> <ram_max_used_idle_pct>90.000000</ram_max_used_idle_pct> <max_bytes_sec_up>0.000000</max_bytes_sec_up> <max_bytes_sec_down>0.000000</max_bytes_sec_down> <cpu_usage_limit>100.000000</cpu_usage_limit> <daily_xfer_limit_mb>0.000000</daily_xfer_limit_mb> <daily_xfer_period_days>0</daily_xfer_period_days> </global_preferences> The log since restarting says: Tue 30 Apr 2013 14:47:17 BST | | Starting BOINC client version 7.0.29 for x86_64-pc-linux-gnu Any ideas? |
Send message Joined: 29 Aug 05 Posts: 15483 |
I see: Tue 30 Apr 2013 14:50:28 BST | | System clock was turned backwards; clearing timeouts Try to figure out by how much the system clock was turned backwards and what effect that had on any of the work in cache. Also check all the status columns in BOINC Manager (Projects tab, Tasks tab). Other than Waiting to Run and Ready to Start, are there any other messages there? Is this BOINC by Fedora repositories, or BOINC by Berkeley? |
Send message Joined: 30 Apr 13 Posts: 9 |
Thanks Jord. The clock goes back by 1 hour when it updates after turning on as we're in British Summer Time. This hasn't been a problem for the last year or so. No other messages. Network connection is fine. This is the Fedora package of BOINC: Name : boinc-client Arch : x86_64 Version : 7.0.29 Release : 2.r25790svn.fc18 |
Send message Joined: 29 Aug 05 Posts: 15483 |
You do have BOINC set to run based on preferences? The waiting to run and ready to start are tasks run by CPU apps? Deleting global_prefs.xml will not do much, it'll be remade the next time BOINC contacts any project. So for continuity, can you please post the contents of global_prefs.xml as well? It'll hold some preferences that cannot be set through the override file. And as you may have noticed, there's a new BOINC out, 7.0.65. Though not yet through rpm. But if you know what you're doing you could move everything BOINC to your /home directory and install Berkeley BOINC instead, see what that does. |
Send message Joined: 23 Apr 07 Posts: 1112 |
We recently had that same symtom posted to the boinc_alpha list by SekRob: [boinc_alpha] Boinc 7.0.xx Computing stops when clock is set backwards! His parting post was: Gianfranco, DA's answer was: I'll keep looking into this. and Charlie answered: Changing the clock time is not the same as the switch to and from Daylight Savings. To simulate that, change your setting of the time zone you are in, not your clock. As David says, the underlying system time does not change, only the adjustment used to _show_ the current time of day. Claggy |
Send message Joined: 30 Apr 13 Posts: 9 |
Thanks both. Ageless - CPU apps. global_prefs.xml hasn't been recreated on restarting. Before I deleted it it contained: <global_preferences> <source_project>http://setiathome.berkeley.edu/</source_project> <source_scheduler>http://setiboinc.ssl.berkeley.edu/sah_cgi/cgi</source_scheduler> <mod_time>1305159453</mod_time> <run_on_batteries>0</run_on_batteries> <run_if_user_active>1</run_if_user_active> <run_gpu_if_user_active>0</run_gpu_if_user_active> <idle_time_to_run>3</idle_time_to_run> <suspend_if_no_recent_input>0</suspend_if_no_recent_input> <suspend_cpu_usage>0</suspend_cpu_usage> <leave_apps_in_memory>0</leave_apps_in_memory> <cpu_scheduling_period_minutes>60</cpu_scheduling_period_minutes> <max_cpus>0</max_cpus> <max_ncpus_pct>100</max_ncpus_pct> <cpu_usage_limit>100</cpu_usage_limit> <disk_max_used_gb>20</disk_max_used_gb> <disk_min_free_gb>10</disk_min_free_gb> <disk_max_used_pct>50</disk_max_used_pct> <disk_interval>60</disk_interval> <vm_max_used_pct>75</vm_max_used_pct> <ram_max_used_busy_pct>50</ram_max_used_busy_pct> <ram_max_used_idle_pct>90</ram_max_used_idle_pct> <work_buf_min_days>0</work_buf_min_days> <work_buf_additional_days>0.25</work_buf_additional_days> <confirm_before_connecting>0</confirm_before_connecting> <hangup_if_dialed>0</hangup_if_dialed> <max_bytes_sec_down>0</max_bytes_sec_down> <max_bytes_sec_up>0</max_bytes_sec_up> <daily_xfer_limit_mb>0</daily_xfer_limit_mb> <daily_xfer_period_days>0</daily_xfer_period_days> <dont_verify_images>0</dont_verify_images> </global_preferences> which I know because the .xml~ file created by gedit is still sat in the folder from when I was tinkering with it. Installing myself is not an option as I don't have root access, only access to the package manager (yum). Claggy: interesting, thanks. Again, I don't have root access so none of those linked ways to change the timezone are available to me. According to this though the timezone is set by /etc/localtime, which on my machine is currently a link to ../usr/share/zoneinfo/Europe/London, which looks normal (apart from the .. which I wouldn't think was necessary). I've have a look tomorrow at the GUI to see if it tells me which timezone I'm in and ask a sysadmin to check it's GMT. As I said before though Fedora has been doing this -1 hour change for a long time and only recently started misbehaving. The newest boinc-client package for FC17 is the same one as I have now, 7.0.29-1 (I have 7.0.29-2 now I believe). |
Send message Joined: 29 Aug 05 Posts: 15483 |
I've have a look tomorrow at the GUI to see if it tells me which timezone I'm in and ask a sysadmin to check it's GMT. Your BOINC states: Tue 30 Apr 2013 14:47:17 BST | | Local time is UTC +1 hours Which is correct for Great Britain at this time. You've changed to summer time as well end of March. global_prefs.xml isn't made on restarting the client, it's made when the client contacts any of the projects. Then the scheduler will check if it's there, and if not, push the project preferences and computing preferences onto the computer. Okay, in your global_prefs.xml file it says: <max_cpus>0</max_cpus> This value is set by On multiprocessors, use at most X processors and is the minimum amount of CPUs you want BOINC to use. If set to zero... how do you expect BOINC to do any work? ;-) So go to Seti's computing preferences and change that value to 4, if you want to use all 4 cores on your i5. You can then use On multiprocessors, use at most X% of the processors to fine tune the amount. Then push the update to the client: BOINC Manager->Projects tab->Seti->Update. @Claggy, thanks for bringing that up as well. I'd forgotten about that thread in Alpha, mostly as I had only glanced at it, as I was busy with other stuff at the time. |
Send message Joined: 30 Apr 13 Posts: 9 |
I noticed that too (max cpus) and changed it to several values including 4 with no success (after reloading prefs each time). But you never know, maybe it'll work tomorrow. Cheers. |
Send message Joined: 30 Apr 13 Posts: 9 |
Hi again. Changing the prefs to 4 CPUs temporarily fixed my problem, but the machine is now again stuck and not running any work. It picked up several new Rosetta tasks and a SETI task, completed them, and is now sat idle. See screenshot. I notice in the log that the only scheduler requests are the ones requested by me (the last one being yesterday afternoon). Could this be the problem? Before you ask, there are no messages in Projects or Tasks, and nothing strange in the log. Network activity is set to "always available". |
Send message Joined: 29 Aug 05 Posts: 15483 |
If it isn't a preference somewhere that's doing this, then to me it looks mostly like it's a problem with 7.0.29 Any late entries in stderrdae.txt? I see in http://rpmfind.net/linux/rpm2html/search.php?query=boinc-client that 7.0.65 is now the client of choice on many of the Fedora versions. Mind checking if there's one for yours and try to update to it, or try the one for Fedora 19? |
Send message Joined: 30 Apr 13 Posts: 9 |
No stderrae.txt (I checked the usual places and also used `find`). Prefs are all as above (apart from ncpus of course). As it says on that list, 7.0.29-2 is the latest for FC18. I don't have root so can't install a newer one myself. I can ask the admin though. I've aborted those stuck jobs and a load of new tasks have been downloaded. Will see how we get on! Edit: aha, log after cancelling says: Thu 02 May 2013 17:39:53 BST | SETI@home | Sending scheduler request: To fetch work. Why is is not requesting new tasks? Too many SETI ones in the queue already or something else? It only has new SETI work, not Rosetta or CPDN. All are set to allow new work on the project tab, I checked by clicking the button twice to make sure: Thu 02 May 2013 17:49:31 BST | climateprediction.net | work fetch suspended by user If I click update for Rosetta the log says: Thu 02 May 2013 17:51:12 BST | rosetta@home | update requested by user CPDN seems to have a valid excuse: Thu 02 May 2013 17:52:14 BST | climateprediction.net | update requested by user |
Send message Joined: 29 Aug 05 Posts: 15483 |
No stderrae.txt (I checked the usual places and also used `find`). Yeah, well, if you searched for that, you won't find it. I asked for stderrdae.txt :-) Why is is not requesting new tasks? Too many SETI ones in the queue already or something else? Now, first off, 7.0.29 was never a recommended version, but has always been a development version only. So it could be a myriad of bugs. The work request is set by the low water mark (Maintain enough tasks to keep busy for at least) value. Above it and there won't be any work rquest, fall under it and presto we'll ask work, and then ask work from the most eligible project (the one with the lowest RAC really, if all projects use the same resource share). If you truly want to see why project A won't take work in, while project B might be on the brink of, you'll have to use the appropriate flags set. The main flags are: <cpu_sched_debug>: problems involving the choice of applications to run. <work_fetch_debug>: problems involving work fetch (which projects are asked for work, and how much). <rr_simulation>: problems involving jobs being run in high-priority mode. <sched_op_debug>: problems involving scheduler operations and other low level information. Use these flags from the cc_config.xml file. In your case, I would set work_fetch_debug and cpu_sched_debug. But as said, your BOINC is an old one, the bug --if it is a bug-- could've been fixed already. Only way to know is by using a newer version. And so we'll have to wait for your admin. |
Send message Joined: 30 Apr 13 Posts: 9 |
Whoops ;-) stderrdae.txt doesn't exist either. I'll let it work overnight and see what happens. If tomorrow it's still being odd I'll send the admin a note. As you say, there is a 7.0.65 package for FC19 which he can probably put on. Thanks for your help. |
Send message Joined: 30 Apr 13 Posts: 9 |
Updated to 7.0.65 and all appears to work normally. Thanks. Shame it was just a stupid bug. |
Send message Joined: 6 Jul 10 Posts: 585 |
In console: 1) sudo updatedb 2) locate stderrdae.txt or any other file... near instant reply if it exists [Private hot tip from an Alpha Mail List tester] Coelum Non Animum Mutant, Qui Trans Mare Currunt |
Send message Joined: 30 Apr 13 Posts: 9 |
Thanks. I always use locate as a quick check, followed by find in cases like this as some paths are not kept on the db. |
Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License,
Version 1.2 or any later version published by the Free Software Foundation.