(Waiting for GPU Memory) Status OSX

Message boards : Questions and problems : (Waiting for GPU Memory) Status OSX
Message board moderation

To post messages, you must log in.

Previous · 1 · 2

AuthorMessage
Jimmy G (BA)

Send message
Joined: 26 Sep 11
Posts: 41
Message 40488 - Posted: 30 Sep 2011, 13:24:28 UTC - in response to Message 40469.  
Last modified: 30 Sep 2011, 13:26:07 UTC

Well, all I can say is that when I had it set to 1 processor it used 1 processor, when I bumped it up to 2 it used 2, and when I asked for it to use 4 it used 4. (OSX 10.6.8 BOINC Manager 6.12.35 (x86)) What I was noting, earlier, was the change from 4 to 8 processors occurring without user interaction. At all times in my setup experiments 100% CPU had been constant....at least I don't recall ever having changed that parameter.


I've changed parameters and totally forgotten about it. It happens. Another explanation is that you started with hyper-threading off and later turned hyper-threading on and forgot about that. BOINC would see and use 4 CPUs with hyper-threading off, 8 with hyper-threading on, if your % CPUs setting in BOINC was always at 100%.


While playing with the parameters I was trying to maintain a systematic approach to my experiments...my primary (initial) focus, as I recall, was to discover how the extra processors and their usage would effect a workable balance for me of both WU processing and personal system performance. In that regard, my adjustments were merely to On multiprocessors, use at most # processors used.

Further investigation and experiment was done as to what the project-specific parameter Resource Share # did, plus some investigation into the globally-applied parameter On multiprocessors, use at most % of the processors and On multiprocessors, use at most % of CPU time did.

I noted in my email to Jord a CPU change from 4 to 8 which occurred during one of my Preference tweaks...

25-Sep-2011 10:09:13 [---] Preferences:
25-Sep-2011 10:09:13 [---] max memory usage when active: 8192.00MB
25-Sep-2011 10:09:13 [---] max memory usage when idle: 14745.60MB
25-Sep-2011 10:09:13 [---] max disk usage: 100.00GB
25-Sep-2011 10:09:13 [---] Number of usable CPUs has changed from 4 to 8.
25-Sep-2011 10:09:13 [---] suspend work if non-BOINC CPU load exceeds 25 %
25-Sep-2011 10:09:13 [---] (to change preferences, visit the web site of an attached project, or select Preferences in the Manager)


...however, on reexamining both my software and web-based preferences they are all still set to 4 cpus. So that I can learn, what exactly is causing this, seeming, disagreement between what the software is set to (4 cpus) and what the software is actually using (8 cpus)? Whatever it is, it is not intuitive or obvious and if it's confusing to me I'm sure that it's confusing to a lot of other folks. (Um, GUI feedback?!)

Furhtermore, in Gundolf's post he states...

"The correct one to use if you want to use 4 out of 8 cores (either real or virtual) is: On multiprocessors, use at most 50% of the processors."


...which to me is redundant and not intuitive. My interpretation (usage) of the On multiprocessors, use at most # % of the processors was that a value of 100% meant that I was using 100% of the 4 cpus I was allowing...I mean, really, what would lead me to believe, as a user, that there were two identical (though differently worded) functions for total number of cpus? Put another way, I thought the choice was, "how many cylinders do you want to fire, and how much throttle are you going to allow?" but in reality the choice is, "how many cylinders do you want to fire, and how many cylinders do you want to fire?" with the caveat that the % selection will override the # selection. Or, am I still not getting it?

On your comment about hyperthreading...I am not seeing where I might have had the opportunity in the preferences (either in BOINC Manager or web-based) to select for hyper-threading. How might this have occurred and/or how does one set this parameter?

:)
JG
ID: 40488 · Report as offensive
Profile Jord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15478
Netherlands
Message 40580 - Posted: 7 Oct 2011, 11:22:44 UTC

A big quiet has thundered in here... so luckily another person manages to reproduce the error quite easily on his Mac. We're keeping an eye on him, here. :)
ID: 40580 · Report as offensive
Jimmy G (BA)

Send message
Joined: 26 Sep 11
Posts: 41
Message 40585 - Posted: 7 Oct 2011, 14:09:28 UTC - in response to Message 40580.  

A big quiet has thundered in here... so luckily another person manages to reproduce the error quite easily on his Mac. We're keeping an eye on him, here. :)


Belated Update...

My apologies for not weighing in sooner on this issue but I've been trying to take a more reasoned approach to determine what sequence of actions is actually precipitating this phenomenon. Unfortunately, this has been requiring some time on my end to achieve that understanding. Also, I had been waiting on a few replies from you on my earlier thread questions about the different SETI WUS, GPU usage and the ranking of Guinea pigs... :) ...that, and I figured you folks needed some time to look over the information I sent in the email. So, figuring you were busy, I just went about my investigations.

I have been able reproduce the (Waiting for GPU Memory) status, and have also succeeded in getting the BOINC Manager to crash completely, and I'm looking into an overnight log gap that appears to be telling me that there was an unattended BOINC Manager software restart somewhere along the way. However, I've only managed to accomplish this by working in a haphazard fashion of changing between the software preferences and the online web-based preferences. I can compose those log findings for you and send them along if you'd like, but they're incomplete as far as which preference settings I was using.

Forgive my not using the proper terminologies as I go along, but my suspicion is that the (Waiting for GPU Memory) status is being caused by some sort of WU tag in its preferences when a user switches from using the BOINC Manager's software preferences to the online, web-based preferences and/or vice versa. To pursue that line of discovery I decided that I needed a stable baseline to work off of...

So I decided earlier this week to quit all projects and BOINC Manager, due a system restart and fire everything up in a logical fashion using only BOINC Manager software preferences to run the show. The first test was to continue my preferred method of having SETI and MilkyWay run during the day and have CPDN run during the overnight. I also tried using just SETI during the day and CPDN at night. And running MilkyWay during the day and CPDN at night. So far, all the swaps have run without event.

As of this morning I have a diminishing queue of both SETI and MilkyWay WUs that are of the stable BOINC Manger software preferences flavor. What I'm looking to do next is add to the mix a batch of online web-based preferences WUs into the mix to see if that path causes any disruption. Then I plan to switch the SETI WUs back to Boinc Manager set WUS and see what happens. Somewhere along this chain of switching back and forth has been causing the (Waiting for GPU Memory) status to appear...I'm looking to see if it is repeatable at will.

That's all I've got, so far.

:)
ID: 40585 · Report as offensive
Profile Jord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15478
Netherlands
Message 40586 - Posted: 7 Oct 2011, 14:46:20 UTC - in response to Message 40585.  

I have been informed that a fix for this problem has been around since the middle of September, but that this one wasn't back-ported into the 6.12 range. It's been put on the to do list and will be included in a next 6.12; release date as yet unknown.
ID: 40586 · Report as offensive
Jimmy G (BA)

Send message
Joined: 26 Sep 11
Posts: 41
Message 40588 - Posted: 7 Oct 2011, 15:33:35 UTC - in response to Message 40586.  
Last modified: 7 Oct 2011, 15:36:21 UTC

I have been informed that a fix for this problem has been around since the middle of September, but that this one wasn't back-ported into the 6.12 range. It's been put on the to do list and will be included in a next 6.12; release date as yet unknown.


Thanks for the update, Jord...anyway of them knowing/telling if this was being caused by some sort of WU pref/config file setting as I suspect?

:)
ID: 40588 · Report as offensive
Profile Jord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15478
Netherlands
Message 40589 - Posted: 7 Oct 2011, 15:41:53 UTC - in response to Message 40588.  

No, it's caused by the GUI_RPC.
Changeset [trac]changeset:24176[/trac] writes:
client, GUI RPC, Manager:
in GUI RPC, change RESULT.gpu_mem_wait to scheduler_wait. It means that the app did a boinc_temporary_exit(), and is waiting to be rescheduled. GPU mem wait is one source of this, not the only one.


As was seen in the log on the Seti forums, the task suspends and unsuspends very rapidly. This rapid cycle causes BOINC to temporarily exit the science application. Since the code up to that point said that this only happens when the GPU is out of memory, the error of "GPU waiting for memory" was shown.

The fix adds other causes to the case.
ID: 40589 · Report as offensive
Jimmy G (BA)

Send message
Joined: 26 Sep 11
Posts: 41
Message 40597 - Posted: 7 Oct 2011, 21:48:02 UTC - in response to Message 40589.  

No, it's caused by the GUI_RPC.
Changeset [trac]changeset:24176[/trac] writes:
client, GUI RPC, Manager:
in GUI RPC, change RESULT.gpu_mem_wait to scheduler_wait. It means that the app did a boinc_temporary_exit(), and is waiting to be rescheduled. GPU mem wait is one source of this, not the only one.


As was seen in the log on the Seti forums, the task suspends and unsuspends very rapidly. This rapid cycle causes BOINC to temporarily exit the science application. Since the code up to that point said that this only happens when the GPU is out of memory, the error of "GPU waiting for memory" was shown.

The fix adds other causes to the case.


His log looks nearly identical to the log section I encountered with Milky Way (and then SETI) when BOINC Manager decided to go south on me giving me a cascade of (Waiting for GPU Memory) statuses. Here's the beginning of the train wreck...

30-Sep-2011 22:39:31 [Milkyway@home] [task] result ps_nbody_test3_3876806_0 checkpointed
30-Sep-2011 22:40:31 [Milkyway@home] [task] result ps_nbody_test3_3876806_0 checkpointed
30-Sep-2011 22:41:31 [Milkyway@home] [task] result ps_nbody_test3_3876806_0 checkpointed
30-Sep-2011 22:41:52 [Milkyway@home] [task] Process for ps_nbody_test3_3876806_0 exited
30-Sep-2011 22:41:52 [Milkyway@home] [task] task_state=EXITED for ps_nbody_test3_3876806_0 from handle_exited_app
30-Sep-2011 22:41:52 [Milkyway@home] [task] process exited with status 0
30-Sep-2011 22:41:52 [Milkyway@home] Computation for task ps_nbody_test3_3876806_0 finished
30-Sep-2011 22:41:52 [Milkyway@home] [task] result state=FILES_UPLOADING for ps_nbody_test3_3876806_0 from CS::app_finished
30-Sep-2011 22:41:52 [Milkyway@home] [task] result state=FILES_UPLOADED for ps_nbody_test3_3876806_0 from CS::update_results
30-Sep-2011 22:41:52 [Milkyway@home] [task] ACTIVE_TASK::start(): forked process: pid 89358
30-Sep-2011 22:41:52 [Milkyway@home] [task] task_state=EXECUTING for ps_nbody_test3_3876933_0 from start
30-Sep-2011 22:41:52 [Milkyway@home] Starting task ps_nbody_test3_3876933_0 using milkyway_nbody version 60
30-Sep-2011 22:42:53 [Milkyway@home] [task] result ps_nbody_test3_3876933_0 checkpointed
30-Sep-2011 22:43:54 [Milkyway@home] [task] result ps_nbody_test3_3876933_0 checkpointed
30-Sep-2011 22:44:55 [Milkyway@home] [task] result ps_nbody_test3_3876933_0 checkpointed
30-Sep-2011 22:45:54 [Milkyway@home] [task] result ps_nbody_test3_3876933_0 checkpointed
30-Sep-2011 22:46:54 [Milkyway@home] [task] result ps_nbody_test3_3876933_0 checkpointed
30-Sep-2011 22:47:54 [Milkyway@home] [task] result ps_nbody_test3_3876933_0 checkpointed
30-Sep-2011 22:48:04 [Milkyway@home] [task] Process for ps_nbody_test3_3876933_0 exited
30-Sep-2011 22:48:04 [Milkyway@home] [task] task_state=EXITED for ps_nbody_test3_3876933_0 from handle_exited_app
30-Sep-2011 22:48:04 [Milkyway@home] [task] process exited with status 0
30-Sep-2011 22:48:04 [Milkyway@home] Computation for task ps_nbody_test3_3876933_0 finished
30-Sep-2011 22:48:04 [Milkyway@home] [task] result state=FILES_UPLOADING for ps_nbody_test3_3876933_0 from CS::app_finished
30-Sep-2011 22:48:04 [Milkyway@home] [task] result state=FILES_UPLOADED for ps_nbody_test3_3876933_0 from CS::update_results
30-Sep-2011 22:48:04 [Milkyway@home] [task] ACTIVE_TASK::start(): forked process: pid 89402
30-Sep-2011 22:48:04 [Milkyway@home] [task] task_state=EXECUTING for ps_nbody_test3_3857021_1 from start
30-Sep-2011 22:48:04 [Milkyway@home] Starting task ps_nbody_test3_3857021_1 using milkyway_nbody version 60
30-Sep-2011 22:49:05 [Milkyway@home] [task] result ps_nbody_test3_3857021_1 checkpointed
30-Sep-2011 22:50:05 [Milkyway@home] [task] result ps_nbody_test3_3857021_1 checkpointed
30-Sep-2011 22:51:05 [Milkyway@home] [task] result ps_nbody_test3_3857021_1 checkpointed
30-Sep-2011 22:52:05 [Milkyway@home] [task] result ps_nbody_test3_3857021_1 checkpointed
30-Sep-2011 22:52:07 [Milkyway@home] [task] Process for ps_nbody_test3_3857021_1 exited
30-Sep-2011 22:52:07 [Milkyway@home] [task] task_state=EXITED for ps_nbody_test3_3857021_1 from handle_exited_app
30-Sep-2011 22:52:07 [Milkyway@home] [task] process exited with status 0
30-Sep-2011 22:52:07 [Milkyway@home] Computation for task ps_nbody_test3_3857021_1 finished
30-Sep-2011 22:52:07 [Milkyway@home] [task] result state=FILES_UPLOADING for ps_nbody_test3_3857021_1 from CS::app_finished
30-Sep-2011 22:52:07 [Milkyway@home] [task] result state=FILES_UPLOADED for ps_nbody_test3_3857021_1 from CS::update_results
30-Sep-2011 22:52:07 [SETI@home] [task] ACTIVE_TASK::start(): forked process: pid 89431
30-Sep-2011 22:52:07 [SETI@home] [task] task_state=EXECUTING for 21jn11ac.11318.17659.15.10.46_0 from start
30-Sep-2011 22:52:07 [SETI@home] Restarting task 21jn11ac.11318.17659.15.10.46_0 using setiathome_enhanced version 605
30-Sep-2011 22:52:07 [Milkyway@home] [task] ACTIVE_TASK::start(): forked process: pid 89432
30-Sep-2011 22:52:07 [Milkyway@home] [task] task_state=EXECUTING for ps_separation_82_2s_mix0_1_3208472_1 from start
30-Sep-2011 22:52:07 [Milkyway@home] Restarting task ps_separation_82_2s_mix0_1_3208472_1 using milkyway version 82
30-Sep-2011 22:52:07 [Milkyway@home] [task] ACTIVE_TASK::start(): forked process: pid 89433
30-Sep-2011 22:52:07 [Milkyway@home] [task] task_state=EXECUTING for ps_separation_17_3s_fix_2_4253692_1 from start
30-Sep-2011 22:52:07 [Milkyway@home] Restarting task ps_separation_17_3s_fix_2_4253692_1 using milkyway version 82
30-Sep-2011 22:52:07 [Milkyway@home] [task] ACTIVE_TASK::start(): forked process: pid 89434
30-Sep-2011 22:52:07 [Milkyway@home] [task] task_state=EXECUTING for ps_separation_17_3s_fix_2_4252652_1 from start
30-Sep-2011 22:52:07 [Milkyway@home] Restarting task ps_separation_17_3s_fix_2_4252652_1 using milkyway version 82
30-Sep-2011 22:52:07 [Milkyway@home] [task] ACTIVE_TASK::start(): forked process: pid 89435
30-Sep-2011 22:52:07 [Milkyway@home] [task] task_state=EXECUTING for ps_separation_10_3s_fix20_2_4272531_1 from start
30-Sep-2011 22:52:07 [Milkyway@home] Restarting task ps_separation_10_3s_fix20_2_4272531_1 using milkyway version 82
30-Sep-2011 22:52:07 [Milkyway@home] [task] ACTIVE_TASK::start(): forked process: pid 89436
30-Sep-2011 22:52:07 [Milkyway@home] [task] task_state=EXECUTING for ps_separation_82_2s_mix0_1_3223144_0 from start
30-Sep-2011 22:52:07 [Milkyway@home] Restarting task ps_separation_82_2s_mix0_1_3223144_0 using milkyway version 82
30-Sep-2011 22:52:07 [Milkyway@home] [task] ACTIVE_TASK::start(): forked process: pid 89437
30-Sep-2011 22:52:07 [Milkyway@home] [task] task_state=EXECUTING for ps_separation_82_2s_mix4_1_3223104_0 from start
30-Sep-2011 22:52:07 [Milkyway@home] Restarting task ps_separation_82_2s_mix4_1_3223104_0 using milkyway version 82
30-Sep-2011 22:52:07 [Milkyway@home] [task] ACTIVE_TASK::start(): forked process: pid 89442
30-Sep-2011 22:52:07 [Milkyway@home] [task] task_state=EXECUTING for ps_separation_13_3s_fix20_2_4279916_0 from start
30-Sep-2011 22:52:07 [Milkyway@home] Starting task ps_separation_13_3s_fix20_2_4279916_0 using milkyway version 82
30-Sep-2011 22:52:43 [Milkyway@home] [task] Process for ps_separation_82_2s_mix4_1_3223104_0 exited
30-Sep-2011 22:52:43 [Milkyway@home] Task ps_separation_82_2s_mix4_1_3223104_0 exited with zero status but no 'finished' file
30-Sep-2011 22:52:43 [Milkyway@home] If this happens repeatedly you may need to reset the project.
30-Sep-2011 22:52:43 [Milkyway@home] [task] task_state=UNINITIALIZED for ps_separation_82_2s_mix4_1_3223104_0 from handle_premature_exit
30-Sep-2011 22:52:43 [Milkyway@home] [task] task called temporary_exit(600.000000)
30-Sep-2011 22:52:43 [Milkyway@home] [task] task_state=UNINITIALIZED for ps_separation_82_2s_mix4_1_3223104_0 from temporary exit
30-Sep-2011 22:52:43 [SETI@home] [task] task_state=SUSPENDED for 21jn11ac.11318.17659.15.10.46_0 from suspend
30-Sep-2011 22:52:43 [Milkyway@home] [task] task_state=SUSPENDED for ps_separation_82_2s_mix0_1_3208472_1 from suspend
30-Sep-2011 22:52:43 [Milkyway@home] [task] task_state=SUSPENDED for ps_separation_17_3s_fix_2_4253692_1 from suspend
30-Sep-2011 22:52:43 [Milkyway@home] [task] task_state=SUSPENDED for ps_separation_17_3s_fix_2_4252652_1 from suspend
30-Sep-2011 22:52:43 [Milkyway@home] [task] task_state=SUSPENDED for ps_separation_10_3s_fix20_2_4272531_1 from suspend
30-Sep-2011 22:52:43 [Milkyway@home] [task] task_state=SUSPENDED for ps_separation_82_2s_mix0_1_3223144_0 from suspend
30-Sep-2011 22:52:43 [Milkyway@home] [task] task_state=SUSPENDED for ps_separation_13_3s_fix20_2_4279916_0 from suspend
30-Sep-2011 22:52:43 [Milkyway@home] [task] ACTIVE_TASK::start(): forked process: pid 89453
30-Sep-2011 22:52:43 [Milkyway@home] [task] task_state=EXECUTING for ps_nbody_test3_3877257_0 from start
30-Sep-2011 22:52:43 [Milkyway@home] Starting task ps_nbody_test3_3877257_0 using milkyway_nbody version 60
30-Sep-2011 22:52:43 [Milkyway@home] Sending scheduler request: To fetch work.
30-Sep-2011 22:52:43 [Milkyway@home] Reporting 15 completed tasks, requesting new tasks for CPU
30-Sep-2011 22:52:44 [Milkyway@home] [task] Process for ps_separation_82_2s_mix0_1_3223144_0 exited
30-Sep-2011 22:52:44 [Milkyway@home] Task ps_separation_82_2s_mix0_1_3223144_0 exited with zero status but no 'finished' file
30-Sep-2011 22:52:44 [Milkyway@home] If this happens repeatedly you may need to reset the project.

:)
ID: 40597 · Report as offensive
Profile Jord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15478
Netherlands
Message 40628 - Posted: 11 Oct 2011, 9:50:14 UTC

A fix is available in BOINC 6.12.41. See change log.
You can get 6.12.41 from here.

ID: 40628 · Report as offensive
Jimmy G (BA)

Send message
Joined: 26 Sep 11
Posts: 41
Message 40631 - Posted: 11 Oct 2011, 12:00:10 UTC - in response to Message 40628.  

A fix is available in BOINC 6.12.41. See change log.
You can get 6.12.41 from here.


Hi Jord,

Thanks for the update on this... :)

From your link...

Disclaimer
...
- Expect work failures, deadline misses and losing all your accumulated work in progress, ...



I'm currently sitting on several CPDN eggs that I'd like to see to see hatch (one already cracked before it was ready...>schniff<), but I'll keep an eye on the updates for BOINC Manager over there at the change log and consider "upgrading" when the time comes.

Kudos and thanks to all who've been making these fixes happen!

:)
ID: 40631 · Report as offensive
Profile Jord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15478
Netherlands
Message 40632 - Posted: 11 Oct 2011, 12:44:50 UTC - in response to Message 40631.  

That disclaimer was necessary as a lot of people would use Alpha versions as if they were recommended versions, and complain loudly if their new client crashed and took all their work with it.

But these 6.12s are at end-of-test-life. Any day now and they'll become recommended. As long as the developers do not receive many complaints or other weird bugs about them, that is. ;-)
ID: 40632 · Report as offensive
Previous · 1 · 2

Message boards : Questions and problems : (Waiting for GPU Memory) Status OSX

Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.