Thread '6.12.5 (Alpha) - Breaks GPU Pause settings?'

Author	Message
Jacob Klein Volunteer tester Help desk expert Send message Joined: 9 Nov 10 Posts: 63	Message 35655 - Posted: 9 Nov 2010, 5:27:06 UTC I'm not sure this is the correct place to post this, but... I think I found a bug in 6.12.5 (Alpha). I have 2 GPUs, and I'm using 6.12.5 (Alpha) with the settings: "Use GPU while computer is in use" unchecked, with an idle timeout of 1 minute. For previous releases like 6.12.4, after the GPUs started crunching, if I started using the computer, BOINC would pause the GPU crunching until the 1 minute timeout was reached. This new release does not seem to be pausing GPU crunching at all when I am using the computer, despite my settings. Regards, Jacob Klein ID: 35655 ·

Jord Volunteer tester Help desk expert Send message Joined: 29 Aug 05 Posts: 15649	Message 35658 - Posted: 9 Nov 2010, 6:28:06 UTC - in response to Message 35655. If you want to use an alpha and report on possible bugs, do use debug flags in cc_config.xml In this case make sure you post a start-up log with the <coproc_debug>, <sched_op_debug> and <cpu_sched_debug> flags enabled. I don't mind warning the developers about this thread, but they'll be expecting more than just your word for it. Hence the request for the debug log. Let BOINC speak for itself. ID: 35658 ·

Pepo Send message Joined: 3 Apr 06 Posts: 547	Message 35671 - Posted: 9 Nov 2010, 18:53:26 UTC - in response to Message 35658. Not just the idle-time GPU usage. My Seti Beta Enh 6.08 cuda task is ignoring manual GPU snooze and continues to run, in one case it eventually got preempted after a checkpoint (18:29:48), maybe 10 minutes after previously snoozing GPU use. But it does not happen after many other checkpoints, although snoozing all applications works fine: 18:14:09 \| \| [cpu_sched] suspending GPU activity 18:15:41 \| \| [cpu_sched] resuming GPU activity 18:15:46 \| \| [cpu_sched] suspending GPU activity 18:16:15 \| \| Suspending computation - user request 18:16:15 \| SETI@home Beta Test \| [cpu_sched] Preempting 18no09aj.27102.890.6.13.116.vlar_2 (removed from memory) 18:16:15 \| SETI@home Beta Test \| [task] task_state=QUIT_PENDING for 18no09aj.27102.890.6.13.116.vlar_2 from preempt 18:16:16 \| SETI@home Beta Test \| [task] Process for 18no09aj.27102.890.6.13.116.vlar_2 exited 18:16:16 \| SETI@home Beta Test \| [task] task_state=UNINITIALIZED for 18no09aj.27102.890.6.13.116.vlar_2 from handle_premature_exit 18:16:20 \| \| Resuming computation 18:16:20 \| SETI@home Beta Test \| [cpu_sched] Starting 18no09aj.27102.890.6.13.116.vlar_2(resume) 18:16:20 \| SETI@home Beta Test \| [task] task_state=EXECUTING for 18no09aj.27102.890.6.13.116.vlar_2 from start 18:16:39 \| \| [cpu_sched] resuming GPU activity 18:19:22 \| \| [cpu_sched] suspending GPU activity 18:19:28 \| \| Suspending computation - user request 18:19:28 \| SETI@home Beta Test \| [cpu_sched] Preempting 18no09aj.27102.890.6.13.116.vlar_2 (removed from memory) 18:19:28 \| SETI@home Beta Test \| [task] task_state=QUIT_PENDING for 18no09aj.27102.890.6.13.116.vlar_2 from preempt 18:19:29 \| SETI@home Beta Test \| [task] Process for 18no09aj.27102.890.6.13.116.vlar_2 exited 18:19:29 \| SETI@home Beta Test \| [task] task_state=UNINITIALIZED for 18no09aj.27102.890.6.13.116.vlar_2 from handle_premature_exit 18:19:33 \| \| Resuming computation 18:19:33 \| SETI@home Beta Test \| [cpu_sched] Starting 18no09aj.27102.890.6.13.116.vlar_2(resume) 18:19:33 \| SETI@home Beta Test \| [task] task_state=EXECUTING for 18no09aj.27102.890.6.13.116.vlar_2 from start 18:24:02 \| \| Suspending computation - user request 18:24:02 \| SETI@home Beta Test \| [cpu_sched] Preempting 18no09aj.27102.890.6.13.116.vlar_2 (removed from memory) 18:24:02 \| SETI@home Beta Test \| [task] task_state=QUIT_PENDING for 18no09aj.27102.890.6.13.116.vlar_2 from preempt 18:24:03 \| SETI@home Beta Test \| [task] Process for 18no09aj.27102.890.6.13.116.vlar_2 exited 18:24:03 \| SETI@home Beta Test \| [task] task_state=UNINITIALIZED for 18no09aj.27102.890.6.13.116.vlar_2 from handle_premature_exit 18:24:11 \| \| Resuming computation 18:24:11 \| SETI@home Beta Test \| [cpu_sched] Starting 18no09aj.27102.890.6.13.116.vlar_2(resume) 18:24:11 \| SETI@home Beta Test \| [task] task_state=EXECUTING for 18no09aj.27102.890.6.13.116.vlar_2 from start 18:29:48 \| SETI@home Beta Test \| [task] result 18no09aj.27102.890.6.13.116.vlar_2 checkpointed 18:29:48 \| SETI@home Beta Test \| [cpu_sched] Preempting 18no09aj.27102.890.6.13.116.vlar_2 (removed from memory) 18:29:48 \| SETI@home Beta Test \| [task] task_state=QUIT_PENDING for 18no09aj.27102.890.6.13.116.vlar_2 from preempt 18:29:49 \| SETI@home Beta Test \| [task] Process for 18no09aj.27102.890.6.13.116.vlar_2 exited 18:29:49 \| SETI@home Beta Test \| [task] task_state=UNINITIALIZED for 18no09aj.27102.890.6.13.116.vlar_2 from handle_premature_exit 18:53:53 \| \| [cpu_sched] resuming GPU activity 18:53:53 \| SETI@home Beta Test \| [cpu_sched] Starting 18no09aj.27102.890.6.13.116.vlar_2(resume) 18:53:53 \| SETI@home Beta Test \| [task] task_state=EXECUTING for 18no09aj.27102.890.6.13.116.vlar_2 from start 18:53:53 \| SETI@home Beta Test \| Restarting task 18no09aj.27102.890.6.13.116.vlar_2 using setiathome_enhanced version 608 18:56:03 \| \| [cpu_sched] suspending GPU activity 18:59:19 \| SETI@home Beta Test \| [task] result 18no09aj.27102.890.6.13.116.vlar_2 checkpointed 19:04:53 \| SETI@home Beta Test \| [task] result 18no09aj.27102.890.6.13.116.vlar_2 checkpointed It is either the client not telling the app to exit, or the app ignoring the request, for many checkpoints in a row: 19:29:52 \| \| [cpu_sched] Request CPU reschedule: GPU mode changed 19:29:53 \| \| [cpu_sched] suspending GPU activity 19:29:53 \| \| [cpu_sched] Request CPU reschedule: GPU suspension 19:29:53 \| \| [cpu_sched] schedule_cpus(): start 19:29:53 \| SETI@home Beta Test \| [coproc] CUDA instance 0: confirming for 18no09aj.27102.890.6.13.116.vlar_2 19:29:53 \| \| [cpu_sched] enforce_schedule: end I have no other cuda tasks onboard ATM, so cant't test other apps. Peter ID: 35671 ·

Pepo Send message Joined: 3 Apr 06 Posts: 547	Message 35673 - Posted: 9 Nov 2010, 22:00:50 UTC - in response to Message 35671. Reported & got confirmed by developers. Should be "fixed later today", appearing in the next release. Peter ID: 35673 ·

Richard Haselgrove Volunteer tester Help desk expert Send message Joined: 5 Oct 06 Posts: 5149	Message 35675 - Posted: 10 Nov 2010, 0:25:22 UTC There's also a report at GPUGrid of a breakage: It's a disaster here. All 5 machines that I installed 6.12.5 on quit running a full complement of CPU programs and refused to run AQUA at all. Reverted to 6.12.4 and they all started behaving properly again except of course for the GPU cache shrinkage bug. User later reports running FreeHAL [nci], which I can't reproduce at the moment, but might be a clue. Too late for me to work on tonight, but I'll have another look in the morning. Reference: http://www.gpugrid.net/forum_thread.php?id=2231#19393 ID: 35675 ·

Jord Volunteer tester Help desk expert Send message Joined: 29 Aug 05 Posts: 15649	Message 35676 - Posted: 10 Nov 2010, 1:36:31 UTC Last modified: 10 Nov 2010, 1:36:49 UTC [trac]changeset:22668[/trac] client: change scheduling policy to allow multithread jobs to coexist with GPU jobs that use significant CPU time. -- Old: run a MT job only if total CPU usage will be < #CPUs + 1. So if you have some GPU jobs running and their CPU usage sums to < 1, BOINC will run a MT job too. But if CPU usage > 1 BOINC won't run the MT job, and some CPUs will be idle. Note: to maximize throughput, it might be better to run either GPU jobs or MT jobs, but not both at the same time. However, volunteers don't like it when CPUs are idle. So... -- New: ignore the CPU usage of GPU jobs in deciding whether to run MT jobs. So we'll run a 4-core MT job (at low priority) even if GPU jobs (which run at normal priority) use > 1 CPU. (Yes, the MT job might run very slow) ID: 35676 ·

jjwhalen Send message Joined: 10 Jan 06 Posts: 17	Message 35695 - Posted: 11 Nov 2010, 20:10:00 UTC - in response to Message 35676. [trac]changeset:22668[/trac] client: change scheduling policy to allow multithread jobs to coexist with GPU jobs that use significant CPU time. -- Old: run a MT job only if total CPU usage will be < #CPUs + 1. So if you have some GPU jobs running and their CPU usage sums to < 1, BOINC will run a MT job too. But if CPU usage > 1 BOINC won't run the MT job, and some CPUs will be idle. Note: to maximize throughput, it might be better to run either GPU jobs or MT jobs, but not both at the same time. However, volunteers don't like it when CPUs are idle. So... -- New: ignore the CPU usage of GPU jobs in deciding whether to run MT jobs. So we'll run a 4-core MT job (at low priority) even if GPU jobs (which run at normal priority) use > 1 CPU. (Yes, the MT job might run very slow) Sounds like a good change. None of my hosts would run (MT) AQUA@home under BOINC 6.12.5α/Windows, including the two that have no GPU installed. ID: 35695 ·

Jacob Klein Volunteer tester Help desk expert Send message Joined: 9 Nov 10 Posts: 63	Message 35699 - Posted: 12 Nov 2010, 13:55:44 UTC I'm now using 6.12.6 (Alpha), and my GPUs now pause correctly. Thanks for helping to get this fixed. ID: 35699 ·

Copyright © 2025 University of California.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.