Message boards : BOINC Manager : 6.12.26 Mac BOINC as Service Broken
Message board moderation
Author | Message |
---|---|
Send message Joined: 15 Jan 11 Posts: 24 |
I run BOINC as a service on my PPC & Intel Macs using "Make_BOINC_Service.sh" script. After installing 6.12.26 on my Mac Pro running SL 10.6.7 the service became flaky. Sometimes it would be running when I logged in, sometimes it would not be running but would start after I logged in, and sometimes it wouldn't run and BOINCManager wouldn't be able to connect to boinc. Reverting to 6.10.58 resolved the problem. |
Send message Joined: 29 Aug 05 Posts: 15569 |
Can you at least test with 6.12.28 if that one works better? http://boinc.berkeley.edu/download_all.php I will forward your complaint to the developer, see what he says. |
Send message Joined: 15 Jan 11 Posts: 24 |
I'll post my findings when I've had time to test it for a while. |
Send message Joined: 25 May 11 Posts: 1 |
I have encountered the same problem on my MacBook Pro 10.6.7. And to add to the above comments, when the daemon quits boinc_master proceeds to run out of control, using all cpu, massively overheating my MBP. I have reverted to 6.10.58 which does not do this. I have only one project running as this is my daily machine and have gone through detach, uninstall, remove Make_BOINC_service and rerun that, reinstall, reattach, etc. and then switched to 6.10.58 which runs fine. I will test the 6.12.28 when I can. |
Send message Joined: 15 Jan 11 Posts: 24 |
6.12.28 hung for a minute after installation, but then it connected to localhost and work commenced. Today upon unlocking the screen I found my CPUs at idle, launching BOINCManager showed it was disconnected from localhost. |
Send message Joined: 17 Jul 06 Posts: 287 |
6.12.28 hung for a minute after installation, but then it connected to localhost and work commenced. Today upon unlocking the screen I found my CPUs at idle, launching BOINCManager showed it was disconnected from localhost. I have not been able to reproduce any problems with running BOINC 6.12.28 as a service on my Mac Pro running OS 10.6.7. It sounds as if perhaps your BOINC Client might be crashing. Please check for a crash report for BOINC that coincides with your BOINC Manager's disconnection from localhost. It should be either in /Library/Logs/CrashReporter/ or /Users/USERNAME/Library/Logs/CrashReporter/. If you find it, please post the part up to, but not including, the section titled Binary Images. Thank you. Charlie Fenton BOINC / SETI@home Macintosh & Windows Programmer |
Send message Joined: 15 Jan 11 Posts: 24 |
I have no crash logs related to boinc in either of those locations, or in Console. I haven't pruned my crash logs since these incidents so it would seem that no crash entries were written. |
Send message Joined: 17 Jul 06 Posts: 287 |
I have no crash logs related to boinc in either of those locations, or in Console. I haven't pruned my crash logs since these incidents so it would seem that no crash entries were written. OK, I am puzzled. Loss of connection between the BOINC Manager and the BOINC Client is usually caused by the Client no longer running. If I might ask you to check a couple of more things: The next time this happens, please run the /Applications/Utilities/Activity Monitor. The Manager will appear as BOINC (upper case) with its icon and the Client as boinc (lower case) with no icon. If it is running as a service, the Client will have a small process ID (PID). If the Client was launched by the Manager, the Client will have a larger PID than the Manager and its parent will be the Manager (to see the parent, select the Client in the list and click the Inspect icon at the top of the Activity Monitor window.) There may also be useful information at the end of the stdoutdae.txt and stderrdae.txt files in the /Library/Application Support/BOINC Data/ directory. Finally, please confirm that you are using the current version of the Make_BOINC_Service.sh script. The text of the script should include the line revised 1/6/08 to use launchd. Thanks for helping solve this puzzle. Cheers, --Charlie Charlie Fenton BOINC / SETI@home Macintosh & Windows Programmer |
Send message Joined: 10 Jan 11 Posts: 58 |
I have the same problem as Penguirl but I do not run BOINC as a service. I simply launch the manager from the Dock. It runs under my account, which is fine since it is the only account on this computer and it runs 24/7. This problem never seems to arise if I am actively using the machine, which comprises a good chunk of the day. However, if I leave it overnight, I find in the morning that attempting to click anything results in an infinite "Communicating with BOINC Manager..." message and I need to restart BOINC entirely. Oddly it would appear my tasks continue to run even when I cannot click anything in the Manager. Sorry if this is throwing a wrench in the mix. Next time this problem crops up I will go into Activity Monitor and also the Console, as the current apps I am running use Java (Constellation's TrackJack). Beyond that...bit lost. |
Send message Joined: 15 Jan 11 Posts: 24 |
The only BOINC related process running at the latest event (all CPUs are idle) is BOINCMenubar 2, no other BOINC process' are running with Activity Monitor.app set to show all process. BOINCMenubar 2 says "This Computer (localhost) Host not connected." stderrdae.txt has a LOT of "shmat: Too many open files" at the beginning of the document, followed by a bunch of "md5_file: can't open projects/szdg.lpds.sztaki.hu_szdg/caa03215-1b38-489c-abd4-252c594ad6ff_f1075c82-cfb2-4ba5-bb7a-0b69cbe700c4_565740_1_1" in the middle, then "GetMACAddress returned 0x00000005" appears twice, and then a LOT of "shmat:Too many open files" again. stdoutdae.txt shows normal looking start, resuming, uploading, etc… of workunits but ends with "12-Jun-2011 19:49:16 [yoyo@home] Computation for task ogr_110610103054_79_0 finished 12-Jun-2011 19:49:16 [yoyo@home] Resuming task ecm_es_1307868011_2_1232P.C271_1455_0 using ecm version 1 12-Jun-2011 19:49:18 [yoyo@home] Started upload of ogr_110610103054_79_0_0 12-Jun-2011 19:49:18 [yoyo@home] Started upload of ogr_110610103054_79_0_1 12-Jun-2011 19:49:18 [---] Can't open client_state_next.xml: fopen() failed 12-Jun-2011 19:49:18 [---] Couldn't write state file: fopen() failed; giving up" I am using Make_BOINC_Service.sh dated 01/06/08 to use launchd BOINCManager says at launch "BOINC Manager - Daemon Start Failed BOINCManager is not able to start a BOINC client. Please start the daemon and try again." And then the CPUs jumped to 100%, relaunch of BOINCManager works normally. |
Send message Joined: 17 Jul 06 Posts: 287 |
stderrdae.txt has a LOT of "shmat: Too many open files" at the beginning of the document, followed by a bunch of "md5_file: can't open projects/szdg.lpds.sztaki.hu_szdg/caa03215-1b38-489c-abd4-252c594ad6ff_f1075c82-cfb2-4ba5-bb7a-0b69cbe700c4_565740_1_1" in the middle, then "GetMACAddress returned 0x00000005" appears twice, and then a LOT of "shmat:Too many open files" again. It sounds like this has nothing to do with running BOINC as a service. Have you tried removing the file /Library/LaunchDaemons/edu.berkeley.boinc.plist and restarting the computer to run for a while not as a service? The messages you report indicates that you may be running out of shared memory segments. Please see this post for an explanation and a possible workaround. Please let me know if this helps. By the way, because of the small number of shared memory segments available on the Mac and some other UNIX / Linux systems, BOINC moved away from using shmget and shmat almost 4 years ago in favor of memory-mapped files, and only supports the older method for backward compatibility with legacy project applications. All BOINC projects should have upgraded years ago. Perhaps some of the projects you are running may be using very old code. Cheers, --Charlie Charlie Fenton BOINC / SETI@home Macintosh & Windows Programmer |
Send message Joined: 17 Jul 06 Posts: 287 |
By the way, because of the small number of shared memory segments available on the Mac and some other UNIX / Linux systems, BOINC moved away from using shmget and shmat almost 4 years ago in favor of memory-mapped files, and only supports the older method for backward compatibility with legacy project applications. All BOINC projects should have upgraded years ago. Perhaps some of the projects you are running may be using very old code. You can tell whether a given project application uses the old shared memory logic by checking your /Library/Application Support/BOINC Data/client_state.xml file. Each If you prefer, send me a private message and I'll respond with an email address where you can send the file and we will examine it. Thanks. Cheers, --Charlie Charlie Fenton BOINC / SETI@home Macintosh & Windows Programmer |
Send message Joined: 15 Jan 11 Posts: 24 |
I have configured shared memory in the past on my PPC machines, does this still apply to a SL Xeon? It seems odd that this wasn't an issue with 6.10.x, it only started with the update to 6.12.x. All of the <api_version> are at or above 6. I have not yet tested without the launch daemon but I will let you know as soon as I do. |
Send message Joined: 17 Jul 06 Posts: 287 |
I have configured shared memory in the past on my PPC machines, does this still apply to a SL Xeon?My experience is that it applies even more because, to the best of my knowledge, Apple has not increased the shared memory segment limit even though the Mac Pro has more cores and hence can run more processes. It seems odd that this wasn't an issue with 6.10.x, it only started with the update to 6.12.x.I agree, but that seems to be what the error messages indicate. I'm afraid we have little choice but to find the problem by trial and error. All of theWell, that does seem to reduce the likelihood that I'm on the right track, but you never know .... I have not yet tested without the launch daemon but I will let you know as soon as I do.Thanks. Cheers, --Charlie Charlie Fenton BOINC / SETI@home Macintosh & Windows Programmer |
Send message Joined: 13 Oct 10 Posts: 120 |
Hi, for your information I was forced to downgrade to 6.10.58 after having 6.12.26 on my iMac (OS X latest version) for some days : I would regularly find the CPU completely idle not running any project in the morning, or eventually with lots of file errors (I can't remember which error) on ALL the project at the same time (I run many different projects). It would be quite difficult to restart boinc then, having some erratic behavior, boinc not being ran anymore by boinc_master but by my user, BM not being able to connect to boinc again, and/or having two different boinc running at the same time... Putting back 6.10.58 did fix everything, so it cannot be related to some specific project using the shared memory stuff as mentioned above. I'm running boinc as a service, but I ignore if the issue was related to this, or not. I had difficulties to reinstall the 6.10.58 (only one project at a time was running on my i7, even with proper multiCPU option to 100%), I had to use uninstall script + find / delete every boinc related stuff before reinstalling 6.10.58 before it would work again with 8 projects in parallel. I also had to set it up as a service again. |
Send message Joined: 17 Jul 06 Posts: 287 |
Hi all, Thank you all for your input. We have found a number of serious bugs in 6.12.26. If you are feeling a bit adventurous, you might want to try BOINC version 6.12.33, which is currently in testing. You can get BOINC 6.12.33 here. Cheers, --Charlie Charlie Fenton BOINC / SETI@home Macintosh & Windows Programmer |
Send message Joined: 10 Jan 11 Posts: 58 |
Thanks Charlie (and others who posted here) for letting me know it's not something I did wrong whilst installing BOINC! I have downloaded the 6.12.33 for Mac (I run the latest 64-bit Snow Leopard) and will see how it fares. The 6.12.26 never lasted through the night - would always disconnect from the client, similar problems to those stated above. Might I recommend you remove 6.12.26 entirely from the download list, if it is in fact full of serious bugs. Or at least make it something other than the "recommended version" ;) Gonna test that now. |
Send message Joined: 15 Jan 11 Posts: 24 |
Turns out I already have /etc/sysctl.conf with the settings: kern.sysv.shmmax=16777216 kern.sysv.shmmin=1 kern.sysv.shmmni=128 kern.sysv.shmseg=32 kern.sysv.shmall=4096 It must have carried over as I migrated machines. |
Send message Joined: 10 Jan 11 Posts: 58 |
I don't have that file...a bunch of other kernel configurations but not that one in particular. Interesting. At any rate the 6.12.33 client made it through the night alright and shows no signs of serious bugs. Good! |
Send message Joined: 15 Jan 11 Posts: 24 |
Currently I am running BOINC 6.12.33 as an app, 5 of the CPUs are running at low percentage just slightly above the system requirements, and 3 are at 0%. BOINC is connected to localhost, the WUs that are "running" are not elapsing any time. Quitting BOINCManager brought the 5 active CPUs down no nearly 0%. |
Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License,
Version 1.2 or any later version published by the Free Software Foundation.