6.12.5 (Alpha) - Breaks GPU Pause settings?

Message boards : BOINC client : 6.12.5 (Alpha) - Breaks GPU Pause settings?
Message board moderation

To post messages, you must log in.

AuthorMessage
Jacob Klein
Volunteer tester
Help desk expert

Send message
Joined: 9 Nov 10
Posts: 63
United States
Message 35655 - Posted: 9 Nov 2010, 5:27:06 UTC

I'm not sure this is the correct place to post this, but... I think I found a bug in 6.12.5 (Alpha).

I have 2 GPUs, and I'm using 6.12.5 (Alpha) with the settings: "Use GPU while computer is in use" unchecked, with an idle timeout of 1 minute.

For previous releases like 6.12.4, after the GPUs started crunching, if I started using the computer, BOINC would pause the GPU crunching until the 1 minute timeout was reached.

This new release does not seem to be pausing GPU crunching at all when I am using the computer, despite my settings.

Regards,
Jacob Klein
ID: 35655 · Report as offensive
Profile Jord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15482
Netherlands
Message 35658 - Posted: 9 Nov 2010, 6:28:06 UTC - in response to Message 35655.  

If you want to use an alpha and report on possible bugs, do use debug flags in cc_config.xml

In this case make sure you post a start-up log with the <coproc_debug>, <sched_op_debug> and <cpu_sched_debug> flags enabled.

I don't mind warning the developers about this thread, but they'll be expecting more than just your word for it. Hence the request for the debug log. Let BOINC speak for itself.
ID: 35658 · Report as offensive
Pepo
Avatar

Send message
Joined: 3 Apr 06
Posts: 547
Slovakia
Message 35671 - Posted: 9 Nov 2010, 18:53:26 UTC - in response to Message 35658.  

Not just the idle-time GPU usage. My Seti Beta Enh 6.08 cuda task is ignoring manual GPU snooze and continues to run, in one case it eventually got preempted after a checkpoint (18:29:48), maybe 10 minutes after previously snoozing GPU use. But it does not happen after many other checkpoints, although snoozing all applications works fine:
18:14:09 |  | [cpu_sched] suspending GPU activity
18:15:41 |  | [cpu_sched] resuming GPU activity
18:15:46 |  | [cpu_sched] suspending GPU activity
18:16:15 |  | Suspending computation - user request
18:16:15 | SETI@home Beta Test | [cpu_sched] Preempting 18no09aj.27102.890.6.13.116.vlar_2 (removed from memory)
18:16:15 | SETI@home Beta Test | [task] task_state=QUIT_PENDING for 18no09aj.27102.890.6.13.116.vlar_2 from preempt
18:16:16 | SETI@home Beta Test | [task] Process for 18no09aj.27102.890.6.13.116.vlar_2 exited
18:16:16 | SETI@home Beta Test | [task] task_state=UNINITIALIZED for 18no09aj.27102.890.6.13.116.vlar_2 from handle_premature_exit
18:16:20 |  | Resuming computation
18:16:20 | SETI@home Beta Test | [cpu_sched] Starting 18no09aj.27102.890.6.13.116.vlar_2(resume)
18:16:20 | SETI@home Beta Test | [task] task_state=EXECUTING for 18no09aj.27102.890.6.13.116.vlar_2 from start
18:16:39 |  | [cpu_sched] resuming GPU activity
18:19:22 |  | [cpu_sched] suspending GPU activity
18:19:28 |  | Suspending computation - user request
18:19:28 | SETI@home Beta Test | [cpu_sched] Preempting 18no09aj.27102.890.6.13.116.vlar_2 (removed from memory)
18:19:28 | SETI@home Beta Test | [task] task_state=QUIT_PENDING for 18no09aj.27102.890.6.13.116.vlar_2 from preempt
18:19:29 | SETI@home Beta Test | [task] Process for 18no09aj.27102.890.6.13.116.vlar_2 exited
18:19:29 | SETI@home Beta Test | [task] task_state=UNINITIALIZED for 18no09aj.27102.890.6.13.116.vlar_2 from handle_premature_exit
18:19:33 |  | Resuming computation
18:19:33 | SETI@home Beta Test | [cpu_sched] Starting 18no09aj.27102.890.6.13.116.vlar_2(resume)
18:19:33 | SETI@home Beta Test | [task] task_state=EXECUTING for 18no09aj.27102.890.6.13.116.vlar_2 from start
18:24:02 |  | Suspending computation - user request
18:24:02 | SETI@home Beta Test | [cpu_sched] Preempting 18no09aj.27102.890.6.13.116.vlar_2 (removed from memory)
18:24:02 | SETI@home Beta Test | [task] task_state=QUIT_PENDING for 18no09aj.27102.890.6.13.116.vlar_2 from preempt
18:24:03 | SETI@home Beta Test | [task] Process for 18no09aj.27102.890.6.13.116.vlar_2 exited
18:24:03 | SETI@home Beta Test | [task] task_state=UNINITIALIZED for 18no09aj.27102.890.6.13.116.vlar_2 from handle_premature_exit
18:24:11 |  | Resuming computation
18:24:11 | SETI@home Beta Test | [cpu_sched] Starting 18no09aj.27102.890.6.13.116.vlar_2(resume)
18:24:11 | SETI@home Beta Test | [task] task_state=EXECUTING for 18no09aj.27102.890.6.13.116.vlar_2 from start
18:29:48 | SETI@home Beta Test | [task] result 18no09aj.27102.890.6.13.116.vlar_2 checkpointed
18:29:48 | SETI@home Beta Test | [cpu_sched] Preempting 18no09aj.27102.890.6.13.116.vlar_2 (removed from memory)
18:29:48 | SETI@home Beta Test | [task] task_state=QUIT_PENDING for 18no09aj.27102.890.6.13.116.vlar_2 from preempt
18:29:49 | SETI@home Beta Test | [task] Process for 18no09aj.27102.890.6.13.116.vlar_2 exited
18:29:49 | SETI@home Beta Test | [task] task_state=UNINITIALIZED for 18no09aj.27102.890.6.13.116.vlar_2 from handle_premature_exit
18:53:53 |  | [cpu_sched] resuming GPU activity
18:53:53 | SETI@home Beta Test | [cpu_sched] Starting 18no09aj.27102.890.6.13.116.vlar_2(resume)
18:53:53 | SETI@home Beta Test | [task] task_state=EXECUTING for 18no09aj.27102.890.6.13.116.vlar_2 from start
18:53:53 | SETI@home Beta Test | Restarting task 18no09aj.27102.890.6.13.116.vlar_2 using setiathome_enhanced version 608
18:56:03 |  | [cpu_sched] suspending GPU activity
18:59:19 | SETI@home Beta Test | [task] result 18no09aj.27102.890.6.13.116.vlar_2 checkpointed
19:04:53 | SETI@home Beta Test | [task] result 18no09aj.27102.890.6.13.116.vlar_2 checkpointed


It is either the client not telling the app to exit, or the app ignoring the request, for many checkpoints in a row:
19:29:52 |  | [cpu_sched] Request CPU reschedule: GPU mode changed
19:29:53 |  | [cpu_sched] suspending GPU activity
19:29:53 |  | [cpu_sched] Request CPU reschedule: GPU suspension
19:29:53 |  | [cpu_sched] schedule_cpus(): start
19:29:53 | SETI@home Beta Test | [coproc] CUDA instance 0: confirming for 18no09aj.27102.890.6.13.116.vlar_2
19:29:53 |  | [cpu_sched] enforce_schedule: end


I have no other cuda tasks onboard ATM, so cant't test other apps.

Peter
ID: 35671 · Report as offensive
Pepo
Avatar

Send message
Joined: 3 Apr 06
Posts: 547
Slovakia
Message 35673 - Posted: 9 Nov 2010, 22:00:50 UTC - in response to Message 35671.  

Reported & got confirmed by developers.
Should be "fixed later today", appearing in the next release.

Peter
ID: 35673 · Report as offensive
Richard Haselgrove
Volunteer tester
Help desk expert

Send message
Joined: 5 Oct 06
Posts: 5082
United Kingdom
Message 35675 - Posted: 10 Nov 2010, 0:25:22 UTC

There's also a report at GPUGrid of a breakage:

It's a disaster here. All 5 machines that I installed 6.12.5 on quit running a full complement of CPU programs and refused to run AQUA at all. Reverted to 6.12.4 and they all started behaving properly again except of course for the GPU cache shrinkage bug.

User later reports running FreeHAL [nci], which I can't reproduce at the moment, but might be a clue. Too late for me to work on tonight, but I'll have another look in the morning.

Reference:
http://www.gpugrid.net/forum_thread.php?id=2231#19393
ID: 35675 · Report as offensive
Profile Jord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15482
Netherlands
Message 35676 - Posted: 10 Nov 2010, 1:36:31 UTC
Last modified: 10 Nov 2010, 1:36:49 UTC

[trac]changeset:22668[/trac]


  • client: change scheduling policy to allow multithread jobs to coexist with GPU jobs that use significant CPU time.
    -- Old: run a MT job only if total CPU usage will be < #CPUs + 1. So if you have some GPU jobs running and their CPU usage sums to < 1, BOINC will run a MT job too. But if CPU usage > 1 BOINC won't run the MT job, and some CPUs will be idle.

    Note: to maximize throughput, it might be better to run either GPU jobs or MT jobs, but not both at the same time. However, volunteers don't like it when CPUs are idle. So...
    -- New: ignore the CPU usage of GPU jobs in deciding whether to run MT jobs. So we'll run a 4-core MT job (at low priority) even if GPU jobs (which run at normal priority) use > 1 CPU. (Yes, the MT job might run very slow)


ID: 35676 · Report as offensive
jjwhalen
Avatar

Send message
Joined: 10 Jan 06
Posts: 17
United States
Message 35695 - Posted: 11 Nov 2010, 20:10:00 UTC - in response to Message 35676.  

[trac]changeset:22668[/trac]


  • client: change scheduling policy to allow multithread jobs to coexist with GPU jobs that use significant CPU time.
    -- Old: run a MT job only if total CPU usage will be < #CPUs + 1. So if you have some GPU jobs running and their CPU usage sums to < 1, BOINC will run a MT job too. But if CPU usage > 1 BOINC won't run the MT job, and some CPUs will be idle.

    Note: to maximize throughput, it might be better to run either GPU jobs or MT jobs, but not both at the same time. However, volunteers don't like it when CPUs are idle. So...
    -- New: ignore the CPU usage of GPU jobs in deciding whether to run MT jobs. So we'll run a 4-core MT job (at low priority) even if GPU jobs (which run at normal priority) use > 1 CPU. (Yes, the MT job might run very slow)



Sounds like a good change.

None of my hosts would run (MT) AQUA@home under BOINC 6.12.5α/Windows, including the two that have no GPU installed.

ID: 35695 · Report as offensive
Jacob Klein
Volunteer tester
Help desk expert

Send message
Joined: 9 Nov 10
Posts: 63
United States
Message 35699 - Posted: 12 Nov 2010, 13:55:44 UTC

I'm now using 6.12.6 (Alpha), and my GPUs now pause correctly.
Thanks for helping to get this fixed.
ID: 35699 · Report as offensive

Message boards : BOINC client : 6.12.5 (Alpha) - Breaks GPU Pause settings?

Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.