Mac OS X: Can't destroy/create shared memory

Message boards : BOINC client : Mac OS X: Can't destroy/create shared memory
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Nathan Herring

Send message
Joined: 7 Sep 05
Posts: 13
United States
Message 1586 - Posted: 6 Dec 2005, 0:35:36 UTC

After many days of running on Mac OS X 10.4.3, on a dual G5 machine, eventually the message log starts having errors when trying to change which work unit it is working on:

[i]Date Time[/i] Couldn't destroy shared memory: system shmctl


After several of those during the work unit switch, eventually it is followed by:

[i]Date Time  Project[/i] Can't create shared memory: system shmget

[i]Date Time  Project[/i] Unrecoverable error for result [i]result_name[/i]


The client then quickly repeatedly downloads work units and aborts them, until it hits the quota of work units.

This is my work development machine, so I run CodeWarrior, Xcode and various versions of MacBU products. I know Office uses shared memory, but I am not personally familiar with the implementation.

Is anyone else seeing this behavior?

-nh
ID: 1586 · Report as offensive
Bill Michael

Send message
Joined: 30 Aug 05
Posts: 297
Message 1587 - Posted: 6 Dec 2005, 0:57:48 UTC

There is one other person having this same problem, but they are running OS X Server. The thread is over on SETI, Problem with Mac clients?. I'm afraid my Mac development days ended about the same time OS X came out, so I haven't been able to be much help. I'm pointing him over here as well - maybe the two of you can figure something out!

ID: 1587 · Report as offensive
Eric Stewart

Send message
Joined: 6 Dec 05
Posts: 10
United States
Message 1589 - Posted: 6 Dec 2005, 2:57:43 UTC

Hmmm ... I too am working on an up to date dual G5 (2.7 GHz). How much RAM do you have?

Are you running the Server version or just the normal one?

So far the only solution I've found is to reboot (this is quite impractical).

I don't think I'm running code warrior but the XCode tools are installed as well ... I'm also not familliar with the Mac BU products ...

The machine spends it's days running terminals for me (ssh'd into my linux boxes), firefox, RDC to a Windows 2003 Server, and running iTunes. That's pretty much it, beyond working on SETI packets ...

I'm no programmer - I'm a systems admininstrator. So even if I could get the BOINC source, I don't know what kind of assistance I could offer ... I'm fairly good at running things in debugging mode and foisting off the output onto someone else, once I know *how* to run something in debugging mode. Thing is, I think the client software is giving us the important information ...
ID: 1589 · Report as offensive
Eric Stewart

Send message
Joined: 6 Dec 05
Posts: 10
United States
Message 1606 - Posted: 6 Dec 2005, 10:48:07 UTC

I had an epiphany this morning ...

What if this is a "memory management on a dual processor system" thing? I know it should be the same as single processor, but maybe it's not ...
ID: 1606 · Report as offensive
Paul D. Buck

Send message
Joined: 29 Aug 05
Posts: 225
Message 1646 - Posted: 7 Dec 2005, 11:09:34 UTC

If the fix suggested in the link works, I would love to have your log files and a note so I could add this as an example to the Wiki ...

Zip up the TXT files in the BOINC directory and send them to p.d.buck@comcast.net

Thanks!
ID: 1646 · Report as offensive
Eric Stewart

Send message
Joined: 6 Dec 05
Posts: 10
United States
Message 1687 - Posted: 8 Dec 2005, 14:41:40 UTC

Paul:

It may fend off the eventuality, but no - it is not a permanent fix.

And, now that I think about it, it may have just been a placebo - the "fix" requires a reboot to implement, and when the problem reoccurs after the "fix", a reboot makes it go away (until it happens again).

It would be my guess that there's an issue with memory access on dual processor Macs - creation doesn't seem to be the problem (until you run out), but destruction (as per Nathan's original post, and my comments in the SETI thread) remains a problem, and the "fix" may only delay the inevitable reoccurance of "create" errors.

Mind you this is with the full blown GUI client - I haven't had a chance to try any others yet ...
ID: 1687 · Report as offensive
Paul D. Buck

Send message
Joined: 29 Aug 05
Posts: 225
Message 1692 - Posted: 8 Dec 2005, 17:09:39 UTC

Yes, well, that is why I wanted the logs. And I can say all of that ...
ID: 1692 · Report as offensive
Bill Michael

Send message
Joined: 30 Aug 05
Posts: 297
Message 1700 - Posted: 8 Dec 2005, 23:38:03 UTC - in response to Message 1687.  

It would be my guess that there's an issue with memory access on dual processor Macs


I would limit this even further, as _most_ dual processor Macs aren't reporting the problem. Both of you who are, have "more advanced" things running - OS X Server in one case, and a lot of development tools in the other. While I would understand Server being "different", I don't know why XTools, etc., would make any difference... but there are an awful lot of Dual and Quad systems on 10.4.3 out there crunching. If the problem was common, I think we'd see a lot more reports of it. Not that this helps _you_ two! :-/

ID: 1700 · Report as offensive
Paul D. Buck

Send message
Joined: 29 Aug 05
Posts: 225
Message 1710 - Posted: 9 Dec 2005, 8:17:41 UTC

Well, I am running "Tiger" and x-code so I am not sure it is that ...

But, the server version is different so ...
ID: 1710 · Report as offensive
Eric Stewart

Send message
Joined: 6 Dec 05
Posts: 10
United States
Message 1716 - Posted: 9 Dec 2005, 12:50:55 UTC

Nathan didn't specify "Server" when he posted his original message, and I haven't seen a response from him yet verifying or disproving the idea, so it could still be limited to "Server on a dual PowerMac G5" thing ...

Paul: You've got my logs - do you need them again? I can resend them to you; or if I didn't provide what you wanted, could you be more specific?

I wonder how many folks running Tiger on a dual processor G5 turn their machines off? Or reboot them daily (or every couple of days)? Both of those situations would resolve the problem. Given how the PowerMac G5's are presented when you configure one at the Apple site, I would say very very few people probably order them with Server installed. And then how many of those that do keep them on 24 hours a day running BOINC? So far, it may just be me. And maybe Nathan.

My G5 is currently having issues uploading/downloading packets (I guess because the seti database servers are busy), so I can't provide much information at this time ... it's finished with all the work it has ...

I'm willing to give all information I possibly could to resolve this issue - I just need to know what I should try (and the database servers to talk to me ;) or what information people need ...
ID: 1716 · Report as offensive
Bill Michael

Send message
Joined: 30 Aug 05
Posts: 297
Message 1717 - Posted: 9 Dec 2005, 12:57:14 UTC - in response to Message 1716.  

I'm willing to give all information I possibly could to resolve this issue - I just need to know what I should try (and the database servers to talk to me ;) or what information people need ...


Are you running any other projects? If you suspended SETI while they're having trouble, and ran, say, Einstein, it would tell us if it was SETI-specific (their app) or something system-wide... If it's application specific, then possibly changing to an Altivec-optimized app would solve it.

ID: 1717 · Report as offensive
Eric Stewart

Send message
Joined: 6 Dec 05
Posts: 10
United States
Message 1720 - Posted: 9 Dec 2005, 15:27:53 UTC

I had to reboot - when I signed on to Einstein, I couldn't create any shared memory objects.

After the reboot though, Einstein exhibits the same behavior:

Fri Dec 9 10:23:09 2005|Einstein@Home|Pausing result l1_0197.0__0197.4_0.1_T04_S4lD_2 (removed from memory)
Fri Dec 9 10:23:09 2005|Einstein@Home|Pausing result l1_0197.0__0197.0_0.1_T05_S4lD_2 (removed from memory)
Fri Dec 9 10:23:10 2005||Couldn't destroy shared memory: system shmctl
Fri Dec 9 10:23:10 2005||request_reschedule_cpus: process exited
Fri Dec 9 10:23:11 2005||Couldn't destroy shared memory: system shmctl
Fri Dec 9 10:23:11 2005||request_reschedule_cpus: process exited

It would appear that the main BOINC client is where the issue lies. But that's just me talking out of my butt ...
ID: 1720 · Report as offensive
Bill Michael

Send message
Joined: 30 Aug 05
Posts: 297
Message 1721 - Posted: 9 Dec 2005, 15:49:35 UTC - in response to Message 1720.  

It would appear that the main BOINC client is where the issue lies. But that's just me talking out of my butt ...


I suspect you're right though. Are you running 5.2.13? If so, we might try _older_ versions, see if this is something they messed up recently... I don't know how much time you want to spend on this. If you're willing, I'll dig up URLs for two older versions that I had good luck with, that should be compatible with your current xml files and such.

I probably won't be back on for at least 12 hours though...

ID: 1721 · Report as offensive
Eric Stewart

Send message
Joined: 6 Dec 05
Posts: 10
United States
Message 1722 - Posted: 9 Dec 2005, 16:14:32 UTC

Well it's my work machine we're talking about, and the weekend is coming ... so *I* probably won't get to it until Monday.

I am using 5.2.13 now but I'm pretty sure I saw the behavior in earlier versions. I can always give a couple a try though ...
ID: 1722 · Report as offensive
Paul D. Buck

Send message
Joined: 29 Aug 05
Posts: 225
Message 1749 - Posted: 10 Dec 2005, 10:04:45 UTC - in response to Message 1716.  

Paul: You've got my logs - do you need them again? I can resend them to you; or if I didn't provide what you wanted, could you be more specific?

Eric,

Sorry ... yes I have them ... mind is not working well ... :(
ID: 1749 · Report as offensive
Eric Stewart

Send message
Joined: 6 Dec 05
Posts: 10
United States
Message 1868 - Posted: 14 Dec 2005, 11:32:13 UTC

Bill:

Haven't seen you post the versions; I can tell you you might have to go back before 5.2.8; I think I saw the behavior on that one ...

I've checked and I've cleaned up after myself :( I don't have any older versions on my hard drive to check at this time.
ID: 1868 · Report as offensive
Bill Michael

Send message
Joined: 30 Aug 05
Posts: 297
Message 1875 - Posted: 14 Dec 2005, 14:22:21 UTC - in response to Message 1868.  

Haven't seen you post the versions; I can tell you you might have to go back before 5.2.8; I think I saw the behavior on that one ...


Sorry to let this slip through the cracks... been rather busy around BOINCdom the last few days! :-(

I tried to check the bugs database, just to make sure you weren't doing all this hassle unnecessarily, if someone had already identified the problem - but the bugs database is down... perfect. I did go through the last couple of months of developer mailing list archives looking for anything (and I'm impressed by how much Mac stuff IS being done...) but didn't see anything that looked related to this.

Here is V4.72, and V5.2.4. I wouldn't go back any further than 4.72 as that is where the "new scheduler" code came in, and I'm not sure how the project website info would react to having missing information.

If we can definitely say "this started happening between version x and version y", then I know who to get that info sent to now, anyway. I _am_ pretty sure by this point that this is an OS X Server problem, or at least that it's activated by something that Server does that 'normal' installs don't do. It could be something that is set up within OS X like the journalled file system was though, where before 10.4 only server versions had it "turned on", but you _could_ turn it on in any version.

I've also learned a bit about how BOINC uses shared memory, but nothing that helps with the problem. Sigh.

ID: 1875 · Report as offensive
Eric Stewart

Send message
Joined: 6 Dec 05
Posts: 10
United States
Message 1893 - Posted: 14 Dec 2005, 17:50:09 UTC

V5.2.4:

Wed Dec 14 12:38:59 2005||Suspending computation and network activity - user is active
Wed Dec 14 12:38:59 2005|Einstein@Home|Pausing result l1_0197.0__0197.3_0.1_T12_S4lD_1 (removed from memory)
Wed Dec 14 12:38:59 2005|Einstein@Home|Pausing result l1_0197.0__0197.4_0.1_T12_S4lD_1 (removed from memory)
Wed Dec 14 12:39:00 2005||Couldn't destroy shared memory: system shmctl
Wed Dec 14 12:39:00 2005||request_reschedule_cpus: process exited
Wed Dec 14 12:39:01 2005||Couldn't destroy shared memory: system shmctl
Wed Dec 14 12:39:01 2005||request_reschedule_cpus: process exited

V4.72 wanted an account key; I found my SETI, but it wouldn't let me attach to the project. I don't think I ever got one for Einstein. :(
ID: 1893 · Report as offensive
Bill Michael

Send message
Joined: 30 Aug 05
Posts: 297
Message 1894 - Posted: 14 Dec 2005, 18:01:54 UTC

Eric, thanks for all the time you've put into this. I think at this point it's going to require a developer to take a look at it. I'll try to get the right guy looking at this thread...

ID: 1894 · Report as offensive
Nathan Herring

Send message
Joined: 7 Sep 05
Posts: 13
United States
Message 2147 - Posted: 20 Dec 2005, 22:03:10 UTC

Sorry I haven't responded quickly!

I am running 10.4.3 non-server. I run Xcode, and by MacBU products, I mean the Microsoft MacBU (Microsoft Office, Microsoft Messenger, Remote Desktop Connection, etc.) Again, Microsoft Office uses shared memory and may be influencing how it is used by BOINC.

Eric: Are you running Microsoft Office or Messenger on that machine?

If necessary, I can make myself familiar with the shared memory implementation in Office, but that does not seem like a fun task. If Eric isn't running Office or Messenger then we can remove that from the list of problem sources.
ID: 2147 · Report as offensive
1 · 2 · Next

Message boards : BOINC client : Mac OS X: Can't destroy/create shared memory

Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.