BOINC 6.10.43 - Runs two task on single gpu

Message boards : Questions and problems : BOINC 6.10.43 - Runs two task on single gpu
Message board moderation

To post messages, you must log in.

AuthorMessage
Sebastian Bobrecki

Send message
Joined: 1 Oct 09
Posts: 11
Poland
Message 32054 - Posted: 9 Apr 2010, 10:34:36 UTC

Hello

I have problem with new boinc 6.10.43 on 32bit Windows XP. I have six nvidia gpus in this box. Problem is that boinc sometimes runs two task on single gpu.

Some examples:
There are six gpus numbered form 0 to 5. Short after reboot everything works great. Six tasks each running on one gpu. After some time I see that it still runs six tasks but on gpu number 1 are two of them and gpu number 5 is unoccupied. Sometimes it is even worse cause there are two task on gpu number 0, tho of them on gpu number 1 and both number 4 and 5 are free. The second scenario is rare but the first is very frequent.

Link to this host on SETI
ID: 32054 · Report as offensive
Profile Jord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15480
Netherlands
Message 32056 - Posted: 9 Apr 2010, 10:59:35 UTC - in response to Message 32054.  

Please enable the following debug flags in cc_config.xml:

<cc_config>
<log_flags>
<coproc_debug>1</coproc_debug>
<cpu_sched_debug>1</cpu_sched_debug>
</log_flags>
</cc_config>


Run with those flags on until you hit the problem again, then post the log from just before that it happened to where it happened (approximately). Please do not post the whole log (all 1,000 lines).

You can also find this info in the stdoutdae.txt file in BOINC's Data directory.
ID: 32056 · Report as offensive
Sebastian Bobrecki

Send message
Joined: 1 Oct 09
Posts: 11
Poland
Message 32061 - Posted: 9 Apr 2010, 14:06:04 UTC

Ok. I set this options. Now waiting...
ID: 32061 · Report as offensive
Sebastian Bobrecki

Send message
Joined: 1 Oct 09
Posts: 11
Poland
Message 32067 - Posted: 9 Apr 2010, 15:59:30 UTC

Hmm. I got one about 17:30 CEST. But in file I have over 800 lines from just one minute. So maybe it will be better if i give You a link to download this file gzipped?
ID: 32067 · Report as offensive
Sebastian Bobrecki

Send message
Joined: 1 Oct 09
Posts: 11
Poland
Message 32068 - Posted: 9 Apr 2010, 16:06:29 UTC

ID: 32068 · Report as offensive
Sebastian Bobrecki

Send message
Joined: 1 Oct 09
Posts: 11
Poland
Message 32069 - Posted: 9 Apr 2010, 16:10:41 UTC

Additionally when I hit "Suspend" on one of this task then it start next task on proper gpu number 5.
ID: 32069 · Report as offensive
Profile Jord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15480
Netherlands
Message 32073 - Posted: 9 Apr 2010, 16:51:56 UTC

I've forwarded the thread to the developers. It may be that they need some extra logs with other/extra flags. I'll give those if need be.

And with thanks.
ID: 32073 · Report as offensive
Sebastian Bobrecki

Send message
Joined: 1 Oct 09
Posts: 11
Poland
Message 32077 - Posted: 9 Apr 2010, 17:08:55 UTC

I have found that it happen in specific moment. One of this gpus are significantly slower than others about 6 times. And when it finish computation then estimated time for queued task jumps from about 13 minutes to almost 2 hours. I see then a message that boinc think that it can't make queued task in time. And then it do this "silliness".
ID: 32077 · Report as offensive
Richard Haselgrove
Volunteer tester
Help desk expert

Send message
Joined: 5 Oct 06
Posts: 5081
United Kingdom
Message 32078 - Posted: 9 Apr 2010, 17:29:07 UTC

You double-compressed the log file, which confused me for a moment.

Yes, it starts with

09-Apr-2010 17:30:11 [SETI@home] Computation for task 21fe07ae.3786.14387.8.10.4_0 finished
09-Apr-2010 17:30:11 [---] [cpu_sched_debug] Request CPU reschedule: handle_finished_apps
09-Apr-2010 17:30:11 [---] [cpu_sched_debug] schedule_cpus(): start
09-Apr-2010 17:30:11 [SETI@home] [cpu_sched_debug] Result 13dc06ae.20481.890.15.10.196_2 projected to miss deadline.
...
09-Apr-2010 17:30:11 [SETI@home] [cpu_sched_debug] Project has 313 projected NVIDIA GPU deadline misses

In theory, the allocation is right:

09-Apr-2010 17:30:11 [SETI@home] [coproc_debug] Assigning CUDA instance 0 to 13dc06ae.20481.890.15.10.196_2
09-Apr-2010 17:30:11 [SETI@home] [coproc_debug] Assigning CUDA instance 1 to 12ja07ae.7563.19295.13.10.254_0
09-Apr-2010 17:30:11 [SETI@home] [coproc_debug] Assigning CUDA instance 2 to 12ja07ae.7563.19295.13.10.252_0
09-Apr-2010 17:30:11 [SETI@home] [coproc_debug] Assigning CUDA instance 3 to 12ja07ae.7563.19295.13.10.249_1
09-Apr-2010 17:30:11 [SETI@home] [coproc_debug] Assigning CUDA instance 4 to 12ja07ae.7563.19295.13.10.248_1
09-Apr-2010 17:30:11 [SETI@home] [coproc_debug] Assigning CUDA instance 5 to 12ja07ae.7563.19295.13.10.247_0

but it may have been confused by

09-Apr-2010 17:30:12 [---] [cpu_sched_debug] coproc quit pending, deferring start
09-Apr-2010 17:30:12 [---] [cpu_sched_debug] Request enforce CPU schedule: coproc quit retry

Roll on per device client DCF!
ID: 32078 · Report as offensive
Claggy

Send message
Joined: 23 Apr 07
Posts: 1112
United Kingdom
Message 32079 - Posted: 9 Apr 2010, 17:37:07 UTC - in response to Message 32078.  

Hiamps has been complaining about this issue on and off for months,
but never bothered posting any logs, even when reminded about it.

Claggy
ID: 32079 · Report as offensive
Profile Jord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15480
Netherlands
Message 32085 - Posted: 9 Apr 2010, 21:36:39 UTC

Sebastian, the developers thank you for the logs, they think the bug is extremely serious. In answer they'll soon (within 24 hours) post a new private BOINC for you to test with. It'll have extra messages for the <coproc_debug> flag and a possible first fix.

In the mean time, you can edit your cc_config.xml file to run only with the <coproc_debug> flag (change <cpu_sched_debug>1</cpu_sched_debug> to <cpu_sched_debug>0</cpu_sched_debug>, save file and re-read config file from the Advanced menu), or temporarily disable it until you have the new BOINC if you're worried about all the extra messages.
ID: 32085 · Report as offensive
Profile Joseph Stateson
Volunteer tester
Avatar

Send message
Joined: 27 Jun 08
Posts: 641
United States
Message 32087 - Posted: 10 Apr 2010, 1:39:54 UTC
Last modified: 10 Apr 2010, 1:59:46 UTC

I as able to duplicate the problem on vista 64, 6.10.43. I had two 6.08 tasks running and "resumed" collatz. Collatz immediately went to device 0



I brought up gpuz and the load on both my gts250 and 9800gtx+ are "0" . After about 2 minutes one of the tasks switched to device 1. Currently, both task seem to be making progress but gpuz and msi afterburner both show 0 gpu load.

[EDIT] I do not remember if collatz was originally on device 0 when suspended. Perhaps it just took a minute or to before the seti task was switched to 1.


ID: 32087 · Report as offensive
Profile Joseph Stateson
Volunteer tester
Avatar

Send message
Joined: 27 Jun 08
Posts: 641
United States
Message 32089 - Posted: 10 Apr 2010, 3:03:33 UTC
Last modified: 10 Apr 2010, 3:30:24 UTC

Please ignore the stated "gpu load is 0" I posted above. There is a load and the two boards are working, but I am getting 0 for the gpu load which is incorrect. I checked another system (XP and single gts250) and gpuz and msi both show 0 for its gpu load which I know is incorrect. From the test I ran it would appear that it simple takes about 1-2 minutes for one of the gpu's to switch fron device 0 to device 1 after one device is resumed. During that time both collatz and seti seemingly where on device 0. The may not be the same problem as reported in this thread. HTH.

[EDIT}

I have two collatz tasks supposidly running on device 0 for the last 10 minutes. Both GPUs are running warm so I assume both are being used. By alternately suspending and resuming tasks I was able to get two task stuck on device 0. Since both are crunching and both gpu's are running warm I suspect both are being used although I do see "Device 0" for both.

again, HTH.
ID: 32089 · Report as offensive
Profile Jord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15480
Netherlands
Message 32137 - Posted: 12 Apr 2010, 17:29:53 UTC

With apologies for the late delivery, some other things were in the way. Sebastian, please check your private messages, you have a personal BOINC to play with. :-)
ID: 32137 · Report as offensive
Sebastian Bobrecki

Send message
Joined: 1 Oct 09
Posts: 11
Poland
Message 32168 - Posted: 13 Apr 2010, 15:12:21 UTC - in response to Message 32137.  

With apologies for the late delivery, some other things were in the way. Sebastian, please check your private messages, you have a personal BOINC to play with. :-)


Ok. I have it installed and running.
ID: 32168 · Report as offensive
Sebastian Bobrecki

Send message
Joined: 1 Oct 09
Posts: 11
Poland
Message 32175 - Posted: 13 Apr 2010, 19:37:54 UTC - in response to Message 32168.  

Currently everything looks ok.
ID: 32175 · Report as offensive
Sebastian Bobrecki

Send message
Joined: 1 Oct 09
Posts: 11
Poland
Message 32203 - Posted: 15 Apr 2010, 14:35:56 UTC - in response to Message 32175.  

Still no symptoms.
ID: 32203 · Report as offensive
Profile Jord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15480
Netherlands
Message 32204 - Posted: 15 Apr 2010, 14:47:06 UTC - in response to Message 32203.  

It never happens when you want it to. :-)
ID: 32204 · Report as offensive

Message boards : Questions and problems : BOINC 6.10.43 - Runs two task on single gpu

Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.