(Waiting for GPU Memory) Status OSX

Message boards : Questions and problems : (Waiting for GPU Memory) Status OSX
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Jimmy G (BA)

Send message
Joined: 26 Sep 11
Posts: 41
Message 40382 - Posted: 26 Sep 2011, 22:13:11 UTC

Hi Folks,

After a several year hiatus from BOINC I returned to crunching the other day with with a new multiprocessor computer...

iMac 12,2
Intel Core i7 @ 3.4GHz 16GB RAM
AMD Radeon HD 6970M 2GB VRAM
OSX 10.6.8

...running...

BOINC Manager 6.12.35 (x86)

...for...

SETI@Home
ClimatePrediction.net

I spent the first couple of days experimenting with the preferences (both online and in BOINC Manager) to see how to get the best (optimal) performance out of this setup.

I quickly found that running either SETI or Climate individually at a "100 resource share" setting (other process not running) worked best. I found that Climate was being a system hog with just 4 WUs on 4 CPUs (slowing things to slow down too much for my liking). So I decided to run SETI (full 8 processors) while I was using the computer during the day and relegated Climate (4 Processors) to the overnight hours.

And this worked fine until this morning when I went to make the "Suspend/Resume" swap between the two processes. The eight (8) suspended processes from last night would not run in the queue and one-by-one began to show "Waiting to run (waiting for GPU memory)" in their Status in BOINC Manager. Then new WUs began to run from the queue but began to have the same problem. So I suspended SETI completely and went to check to make sure all my preferences were as I wanted them...everything seemed fine, so I resumed SETI to complete the queue (No New Tasks) to see if the errant (now) ten (10) WUs would finally run...after a full day of waiting all of the remaining WUs have run without problem and the 10 "(waiting on GPU memory)" WUs are still waiting patiently in line.

I've spent the day looking for posts on this issue, reading the manual, learning about OSX and how GPU cruching is still in the pipe...I've tried playing with settings online and offline, Run based on Preferences, Run Always, Suspend, Resume and...nada.

I've considered "Quiting" BOINC Manager but was concerned that my 4 Climate WUs with over 100 hours invested would get zapped out of existence.

Anyone have any suggestions, work-arounds, chants, etc.? I'd hate to have to abort those WUs!

Thanks in advance! :)
Jimmy G
ID: 40382 · Report as offensive
Profile Jord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15478
Netherlands
Message 40389 - Posted: 27 Sep 2011, 7:10:44 UTC - in response to Message 40385.  
Last modified: 27 Sep 2011, 7:11:03 UTC

No matter what you do, Seti still doesn't run any science apps by default on an ATI GPU. You'd have to use third party applications for that, as in ones from Dotsch or Lunatics. But as far as I know, there are none for OS X, as ATI GPU detection itself is flaky still under OS X.

@Jimmy: The "waiting for GPU memory" thing is a bug that got squashed only recently, but only in 6.13 if I am not mistaken. However I am not telling you to go use 6.13, as this is extremely buggy. Instead I'll go ask the developer for the Mac to back-port the fix to a next 6.12, or ask if he has done so already.
ID: 40389 · Report as offensive
Jimmy G (BA)

Send message
Joined: 26 Sep 11
Posts: 41
Message 40394 - Posted: 27 Sep 2011, 14:40:27 UTC - in response to Message 40385.  

Hi Dagorath,

Thanks so much for the clues and insights...I do appreciate it! Some commentary and feedback on your points below...

Your climateprediction (CPDN) tasks won't disappear if you shutdown BOINC. Be sure to shutdown BOINC client not just BOINC manager. Click Advanced -> Shutdown Connected Client to do that.


Using Advanced->Shutdown Connected Client in Boinc Manager 6.12.35 (x86) in OSX 10.6.8 results in a dialog box asking for conformation, clicking OK results then a secondary dialog box asking the user to connect to another host. So "Yes or No at this point?" was answered here in this post...

Connected client won't shut down:
http://setiathome.berkeley.edu/forum_thread.php?id=53235#887451

..."No" apparently being the correct answer.

As a matter of note, there is no confirmation of this decision sequence shown to the user nor do the actions appear in any form in the Event Log!

Also, Quitting the BOINC Manager (after performing the above Quit Client routine above) and restarting BOINC Manager did not resolve the (waiting for GPU memory) issue!

You should make backups of CPDN tasks so that if they crash you can restore them. As you can see they take many hours to run and it's a shame when you get hit by a power outage or whatever and lose a CPDN task after 20 days of time invested in it. The easiest way to backup CPDN is to just copy the entire BOINC folder to BOINC.bkp or whatever but you have to do that when BOINC client is shutdown. So maybe do that when you shutdown now.


Backing up important data is good advice here. However, after examining the resource folders for SETI and CPDN (Climate)...

Macintosh Hard Drive -> Library -> Application Support -> BOINC Data -> projects -> climateprediction.net

Macintosh Hard Drive -> Library -> Application Support -> BOINC Data -> projects -> setiathome.berkeley.edu

...I felt confident enough that this information would remain intact and in place after both a Client and Application shut down and run. And, such was the case.

When you restart BOINC client it might fix the problem with "waiting for GPU memory". It might not.


As noted, it did not.

You may have to reboot the OS or even power down the computer.


I was almost tempted to go this route for my next step, but being a curious fellow I decided to investigate further into what was going on...

I decided to fire up Activity Monitor (Applications -> Utilities -> Activity Monitor) to see what was happening with the Shutdown Connected Client and Quit and Restart BOINC Manager path. What I discovered was that all of the malfunctioning SETI WUs were still being held in memory...there was no memory resource dump on quitting! Aha! So, I decided to play my hunch and quit one of the WUs using Activity Monitor and restarted Boinc Manager and SETI...voila!...no more (waiting for GPU memory) hang, the WU functioned properly! I "rinsed and repeated" with the remaining WUs and, now...all is well! :)

I suspect what has happened is you've got 8 SETI GPU tasks and they all want memory at the same time. Perhaps you haven't seen this problem before because you've never had 8 SETI GPU tasks all line up at the same time, perhaps until now it was a mixture of GPU and non-GPU tasks which would demand less GPU memory.


I had only tried this project swapping routine a few times and I am still not sure what might have precipitated this problem. FWIW, I did a subsequent swap back into CPDN for the remainder of the day and into the overnight with no problems going in that "direction". We'll see what happens tomorrow morning when I again switch from CPDN back into SETI.

I see you have 2 GB vid RAM so it should handle more than 1 GPU task easily. You may have to do some configuring with an app_info.xml file. Jord can tell you more about that.


It was my understanding that GPU memory/processing is not yet supported using SETI@Home in 6.12.35 (x86) on OSX 10.6.8 on a "Sandy Bridge" iMac with an AMD HD 6970M?!

FWIW, toggling the Use GPU while computer is in use setting in preferences on my setup does nothing...I confirmed this in my setup experiments using OpenGL Drive Monitor...

(Macintosh Hard Drive -> Developer -> Applications -> Graphics Tools -> OpenGL Drive Monitor)

...and neither the ATIRADEONX3000GLDriver nor the AppleIntelHDGraphicsGLDriver show any activity while the BOINC and client softwares are running. I also confirmed that no VRAM was being used in these applications using a 3rd party OSX monitoring utility.



One more thought, before you shutdown BOINC, make a note of whether they're all GPU tasks then suspend all of them and set SETI to No New Tasks. When you restart BOINC, Resume the SETI tasks one by one, slowly, and see at which point the "waiting for Memory" status returns, if it returns at all.


More excellent advice...as noted in my original post, I did suspend all tasks and set the program to "No New Tasks" to see if the errant tasks would perform properly when all of the other tasks had completed...and, as noted, that did not solve the problem. The problem, apparently, lays in the fact that a quit does not perform a memory dump of WUs whether good or rogue, so a restart of softwares will only pick up where one left off with the WU...good or rogue! A System restart will clean out the errant WU RAM as you note!

My conclusion...

The softwares need to dump their RAM assets on a quit...perhaps the programmers can effect a fix to this?!


Error corrections, insights and questions welcomed!

Thanks again for your time and energy on this issue!

:)
Jimmy G

ID: 40394 · Report as offensive
Jimmy G (BA)

Send message
Joined: 26 Sep 11
Posts: 41
Message 40395 - Posted: 27 Sep 2011, 15:13:08 UTC - in response to Message 40389.  

No matter what you do, Seti still doesn't run any science apps by default on an ATI GPU. You'd have to use third party applications for that, as in ones from Dotsch or Lunatics. But as far as I know, there are none for OS X, as ATI GPU detection itself is flaky still under OS X.


Apparently an issue with AVX support using GCC in OSX...

Re: Moving to newer version of GCC:
http://lists.apple.com/archives/xcode-users/2011/May/msg00228.html

...though I thought I read somewhere about AVX support being added to 10.6.8. and Sandy Bridge being AVX enabled/native/ready?

@Jimmy: The "waiting for GPU memory" thing is a bug that got squashed only recently, but only in 6.13 if I am not mistaken. However I am not telling you to go use 6.13, as this is extremely buggy. Instead I'll go ask the developer for the Mac to back-port the fix to a next 6.12, or ask if he has done so already.


Well, who knows what'll happen when you start letting me hit the buttons and flip the switches! Ha! As I said, I'm not sure what precipitated the problem to begin with...!

Follow up...all the previously errant WUs processed without a hitch and SETI is currently downloading and processing a fresh batch of WUs as we speak!

I love happy endings. :)
Jimmy G
ID: 40395 · Report as offensive
Profile Jord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15478
Netherlands
Message 40404 - Posted: 27 Sep 2011, 23:01:43 UTC

Jimmy, the developers ask you to run with a debug option on.
To do so, use any clear text editor to make a file called cc_config.xml in your BOINC Data directory (default in OS X at /Library/Applications Support/BOINC/

In it add the following lines:
<cc_config>
<log_flags>
<task_debug>1</task_debug>
</log_flags>
</cc_config>


When done and the file is saved (make sure it's only got the .xml extension, nothing else; save in ANSI format if possible), open BOINC Manager->Advanced view->Advanced->Read config file. That will start the debugging. Then wait until you see the same behaviour on any of your tasks and post the part of the log here that then hopefully shows what the problem is. You can recognize the debug lines of the tasks as they start with [task].
ID: 40404 · Report as offensive
Jimmy G (BA)

Send message
Joined: 26 Sep 11
Posts: 41
Message 40406 - Posted: 28 Sep 2011, 0:01:04 UTC - in response to Message 40404.  
Last modified: 28 Sep 2011, 0:03:20 UTC

Hi Jord,

Glad to help!

I copy-and-pasted the text you provided into TextEdit and saved the document using the Word 2003 Format (.xml) choice in the File Format drop down menue that appeared in the Save As dialog box...page saved as "cc_config.xml" as requested. Get info shows the document to be a XML Text Document that will open with Dashcode. If you'd like to see the contents of how that document opens in Dashcode, kindly let me know.

Just so I know (and can get more familiar with the workings of BOINC Manager)...I then installed the document in Macintosh HD/Library/Application Support/BOINC Data and then tried to read/run(?) the config file as you described, however nothing happened...e.g. no window/document/popup/whatever opened to indicate that the request was being carried out...um, so what should happen when I "open BOINC Manager->Advanced view->Advanced->Read config file"?

Does BOINC Manager need to be quit and restarted for this to take effect?

:)
Jimmy G
ID: 40406 · Report as offensive
Jimmy G (BA)

Send message
Joined: 26 Sep 11
Posts: 41
Message 40407 - Posted: 28 Sep 2011, 0:14:24 UTC - in response to Message 40406.  
Last modified: 28 Sep 2011, 0:15:02 UTC

Hi Jord,

Followup from my previous post...

I ran the BOINC Manager Event Log to see if anything showed up in there...

Tue Sep 27 19:38:06 2011 Re-reading cc_config.xml (black text)
Tue Sep 27 19:38:06 2011 Missing start tag in cc_config.xml (red text)
Tue Sep 27 19:38:06 2011 log flags: file_xfer, sched_ops, task (black text)

...hope that helps!

:)
JG
ID: 40407 · Report as offensive
Jimmy G (BA)

Send message
Joined: 26 Sep 11
Posts: 41
Message 40409 - Posted: 28 Sep 2011, 1:32:06 UTC - in response to Message 40408.  
Last modified: 28 Sep 2011, 1:39:23 UTC

Hi Dagorath,

Thanks for the update on this...I was thinking perhaps some code was missing or maybe it didn't like the format.

For the benefit of our viewing audience it should be noted by Mac users that TextEdit v1.6 saves only in the following formats (as appears in the Save As drop down menu)...

Rich Text Format
Rich Text Format with Attachments
Web Page (.html)
Web Archive
OpenDocument Text (.odt)
Word 2007 Format (.docx)
Word 2003 (.xml)
Word 97 Format (.doc)

...and is not suitable for creating XML documents for these purposes. Apparently the old .txt format went bye-bye along the way... :(



You should not have saved the file in Word 2003 (.xml) format. That's what's caused the error.


Noted.

Delete cc_config.xml, copy and paste that xml code Jord gave you in his last post into TextEdit. Double check that you did the copy/paste properly because it won't work if even 1 character is missing. When you save the file save it in "plain text (ANSI)" format if that's one of the available formats in the list. If not then "plain text" or "text" will do. After saving it, double check that it has the .xml extension and not .txt. If it has .txt extension then rename it. Do not save it in Word 2003 format.


Noted above. I saved the document using BBEEdit's TextWrangler and reinstalled the new document in the BOINC Data folder...and, voila!...Read Config File is generating an ever growing list of [task] entries in the Event Log.


Then do Advanced -> Read Config File again and look in the Event Log to see if it read the file without error(s).


Thanks so much for your help in troubleshooting that!

My next question...now that this is running I see that it is doing an endless cycle of [task] result (WU#) checkpointed messages for each of the eight (8) SETI WUs every minute, on-the-minute and the log is quickly growing exceptionally long! Is this right?

Thanks so much again! :)
JG
ID: 40409 · Report as offensive
Jimmy G (BA)

Send message
Joined: 26 Sep 11
Posts: 41
Message 40411 - Posted: 28 Sep 2011, 1:52:34 UTC - in response to Message 40410.  

Hi Dagorath,

Well, hopefully, this will reveal something!

Thanks so much for your patience on working through this with me!

:)
JG
ID: 40411 · Report as offensive
Jimmy G (BA)

Send message
Joined: 26 Sep 11
Posts: 41
Message 40417 - Posted: 28 Sep 2011, 13:00:27 UTC - in response to Message 40412.  

Hi Dagorath,

Thanks for the tips on searching for messages in the log, it'll come in helpful should the need arise!

As for the ever growing log file...are there any max size limitations I should be "mindful of"? It would be nice to be able to purge the log file every day, short of having to do a shutdown and restart of the softwares.

FWIW, during this morning's "suspend CPDN/resume SETI" project swap I put a couple of the WUs into suspend (the rest being not) to see if that precipitated any bad behavior...nothing to report. I'll be tossing in other possible user variables between swaps to see if I can break the thing again...I'll keep you posted here should I succeed!

:)
JG
ID: 40417 · Report as offensive
Profile Jord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15478
Netherlands
Message 40418 - Posted: 28 Sep 2011, 15:13:41 UTC - in response to Message 40417.  

BOINC writes the log to a file in the data directory, this happens automatically. That file's called stdoutdae.txt and has a maximum size of 2 megabyte. When it grows to beyond that size, BOINC renames stdoutdae.txt to stdoutdae.old, overwriting any such file that had the name before, and then starts a new stdoutdae.txt from fresh.

BOINC Manager can store a lot of these lines, I'm not sure anymore these days how many, somewhere between a 900 and 5,000 at least.

You don't want to stop & restart every so often, as that resets the science apps as well. We're trying to reproduce the problem you ran into, which won't happen if you reset BOINC's status every day.
ID: 40418 · Report as offensive
Jimmy G (BA)

Send message
Joined: 26 Sep 11
Posts: 41
Message 40443 - Posted: 29 Sep 2011, 16:44:19 UTC - in response to Message 40418.  

Hi Jord,

Thanks for the info on the stdoutdae.txt and stdoutdae.old files...good to know.

Update...

On a hunch I decided to rifle through the stdoutdae.old file and discovered a few things. But, first a bit of back story...

I recall that early in my preference tweakings that I was trying to determine just how many processor cores to allot to BOINC and had settled on 4, thinking that since the 3.4GHz i7 in my iMac has only 4 physical cores and that the remaining 4 cores are virtual that there would be no sense to try and use them. So I had set my preferences, both in BOINC Manager and online, to On multiprocessors, Use at most 4 processors with On Multiprocessors. use at most 100% of the processors selected. And for the first few days BOINC Manager used 4 processors only.


In using the software I was continuing to play with preference settings, both in software and online to see if I could discern how things were being applied to BOINC Manager (trying to learn, here). And somewhere along the way BOINC Manager was showing me that 8 WUs were now being worked on instead of, what up until then had been, the "normal" 4! I remember thinking, "what the hey?" and tried to investigate how that had occurred. Preferences both in software and online still said 4 and 100%. Hmmm.

Later that evening when I went to suspend SETI and resume CPDN for the night an anomaly occurred...on putting SETI into Suspend the Project window in Advanced View went blank, no projects to be seen! "What the hey?" So I went under the Task Tab and saw everything gone in there, too. "Hmmm, what to do? Restart the software? Don't want to lose everything. Hmmm. Checked preferences, everything seems fine there. Decided to look into the menus to see if there was anything of help there to be found." While in the View drop down I decided to go back to Project view (instead of hitting the window tab) and...voila!...everything came back into view! Cool, problem solved, I switched CPDN on for the night and called it a day.

The following morning when I went to swap CPDN for SETI is when I began encountering the Waiting for GPU memory problem.

Back to the stdoutdae.old file...

I decided to investigate that file to see if it would tell me where this change over from 4 to 8 took place and why...what happened during the whiteout that occured when changing from SETI to CPDN...what happened when the first Waiting for GPU appeared. And, there are some notations at those events in the log, but I have no way of knowing what they're telling me. Possibly this is all related?

Would you folks like to take a look at this information? I can either post what I think are the important lines here or I can send the file along to you in an email with the times marked for when those events took place. Let me know what works for you.

:)
JG
ID: 40443 · Report as offensive
Profile Gundolf Jahn

Send message
Joined: 20 Dec 07
Posts: 1069
Germany
Message 40448 - Posted: 29 Sep 2011, 17:18:31 UTC - in response to Message 40443.  

So I had set my preferences, both in BOINC Manager and online, to On multiprocessors, Use at most 4 processors with On Multiprocessors. use at most 100% of the processors selected. And for the first few days BOINC Manager used 4 processors only.

That must have been a coincidence, since the "On multiprocessors, use at most 4 processors" preference is obsolete.

The correct one to use if you want to use 4 out of 8 cores (either real or virtual) is: On multiprocessors, use at most 50% of the processors.

If you have ever used the local preferences, you'll have to either continue using only them or clear them to return to the online ones, since the local preferences always take precedence.

Gruß,
Gundolf
ID: 40448 · Report as offensive
Profile Jord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15478
Netherlands
Message 40449 - Posted: 29 Sep 2011, 17:28:52 UTC - in response to Message 40443.  

Can you email me the file, please? I've sent you my email address in PM.
CPDN does take up quite a bit of memory. I have noticed at times when heavy memory using applications start up or stop, that BOINC Manager loses the connection with the client (the blanking of the tabs). But usually afterwards, when all's loaded in memory, that fixes itself. That was probably causing your Projects tab going blank.

Similarly, 4 or 8 CPDN tasks being loaded in memory can take up quite a bit of memory. When you then switch to Seti, the CPDN tasks need to be unloaded from memory first before Seti can take up that memory space. With slow(er) memory this can take a while. heck, even with DDR3 it can take multiple tens of seconds before a 1GB file is unloaded from memory.

And if you were already low on memory and possibly swapping out to virtual memory (on disk), plus now switching to Seti (depending on what you run there, Multibeam or Astropulse), one application trying to unload from memory, another trying to load into memory at the same time... I can see how that cause memory restraints, even on 16GB.

I do know that in older BOINC versions there's a bug where when the application does not have enough memory to live in, that it shows the "Waiting for GPU memory" error. That one was fixed... in 6.13. ;-)
Charlie will back-port it to the next 6.12, but since it's not a showstopper bug, this can take a while. And besides, all the developers would like to know exactly what causes the bug. Meet Jimmy G: 'Guinea pig extraordinare'. :-)
ID: 40449 · Report as offensive
Profile Jord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15478
Netherlands
Message 40450 - Posted: 29 Sep 2011, 17:35:08 UTC - in response to Message 40448.  

That must have been a coincidence, since the "On multiprocessors, use at most 4 processors" preference is obsolete.

Actually, it isn't obsolete. It's still being used by BOINC 6. That preference sets the minimum amount of processors to use, while the Percentage preference will set the limit.

So if your global_prefs.xml file contains <max_ncpus>2</max_ncpus> and your global_prefs_override.xml file contains <max_ncpus_pct>0.000000</max_ncpus_pct>, BOINC will still use 2 CPUs.


ID: 40450 · Report as offensive
Jimmy G (BA)

Send message
Joined: 26 Sep 11
Posts: 41
Message 40466 - Posted: 29 Sep 2011, 23:54:27 UTC - in response to Message 40450.  
Last modified: 29 Sep 2011, 23:55:18 UTC

@ Gundolf...

Well, all I can say is that when I had it set to 1 processor it used 1 processor, when I bumped it up to 2 it used 2, and when I asked for it to use 4 it used 4. (OSX 10.6.8 BOINC Manager 6.12.35 (x86)) What I was noting, earlier, was the change from 4 to 8 processors occurring without user interaction. At all times in my setup experiments 100% CPU had been constant....at least I don't recall ever having changed that parameter.


@ Jord...

You've got mail...the file and notes are on in your mailbox. :)

FWIW, as for memory issues. I can say, from monitoring these activities in realtime, that excessive RAM usage has never been an issue with SETI...at most, system, browser, other softwares and SETI working simultaneously never exceeded 60% of the 16GB RAM.

Forgive my ignorance, but how would one know if a SETI WU was a either Astropulse or Multibeam?

v6.13 being, um, beta? Let me get comfortable driving this Buick, first...then I'll consider your offer to take the new one for a spin. :) FWIW, a lot has changed here in SETIland since I last crunched numbers on a 2GHz G5 single-processor iMac a few years back...Rip Van Winkle effect...heck, in one week's time I've more than doubled what that old machine had taken months to produce...all that, without any GPU-cruching! (BTW, are those numbers for real?! Yikes!)

So, uh, Guinea pig is higher rank than Lab Rat? ;)

Thanks for the xml cpu lines...not sure how to put those to use... :)

Best,
JG
ID: 40466 · Report as offensive
1 · 2 · Next

Message boards : Questions and problems : (Waiting for GPU Memory) Status OSX

Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.