Work fetch problem with more than 1 ATI GPU, app_info & exclusions

Message boards : Questions and problems : Work fetch problem with more than 1 ATI GPU, app_info & exclusions
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · Next

AuthorMessage
Sunny129
Avatar

Send message
Joined: 17 Aug 12
Posts: 11
United States
Message 45920 - Posted: 8 Oct 2012, 15:50:08 UTC
Last modified: 8 Oct 2012, 16:29:08 UTC

oops...didn't realize we had already established that this bug does not affect hosts w/ just 1 AMD GPU and 1 nVidia GPU. upon discovering this for myself last night, it dawned on me that i could use it to my advantage. you see, while my two GTX 560 Ti's both crunch the same project in the same box (and therefore isn't affected by the bug), my two HD 6950's do crunch separate projects in the same box, and consequently don't maintain work buffers properly. it then dawned on me that i could swap one GTX 560 Ti for one HD 6950 (resulting in either of my machines running one GTX 560 Ti and one HD 6950), and that i would no longer have to deal with the work buffer bug...

...so i removed one of the HD 6950's from my dual HD 6950 box, and replaced it w/ one of the GTX 560 Ti's from my dual GTX 560 Ti box...unfortunately, with both cards installed, Windows refused to recognize the nVidia GPU unless it was installed in the lower of 2 PCIe x16 slots. this forced me to put the HD 6950 in the upper PCIe x16 slot (sandwiched between the CPU fans/heatsink and the lower GPU), and it resulted in HD 6950 temps approaching 90°C while crunching Milkyway@Home...totally unacceptable to say the least. it turns out that Windows' failure to recognize the GTX 560 Ti when installed in the lower PCIe x16 slot is an issue w/ my specific motherboard, an ASUS M4A89GTD PRO/USB3 (as opposed to a general chipset issue). i believe this to be true b/c the initial testing i did on this was also done on an 890GX chipset motherboard (an MSI 890GXM-G65 to be exact), and i had no trouble running the nVidia GPU in the upper PCIe x16 slot while running an AMD GPU in the lower one.
ID: 45920 · Report as offensive
Profile Beyond
Avatar

Send message
Joined: 16 Aug 12
Posts: 39
United States
Message 45936 - Posted: 10 Oct 2012, 4:21:59 UTC - in response to Message 45920.  

it turns out that Windows' failure to recognize the GTX 560 Ti when installed in the lower PCIe x16 slot is an issue w/ my specific motherboard, an ASUS M4A89GTD PRO/USB3 (as opposed to a general chipset issue). i believe this to be true b/c the initial testing i did on this was also done on an 890GX chipset motherboard (an MSI 890GXM-G65 to be exact), and i had no trouble running the nVidia GPU in the upper PCIe x16 slot while running an AMD GPU in the lower one.

A bit OT as it is unrelated to the bug in question, but yes this is a problem with many GX motherboards. The ATI card has to be in the primary PCIe slot for both GPUs to be recognized.
ID: 45936 · Report as offensive
Profile Beyond
Avatar

Send message
Joined: 16 Aug 12
Posts: 39
United States
Message 46013 - Posted: 16 Oct 2012, 21:07:23 UTC

Any chance of getting a fix for this bug?
ID: 46013 · Report as offensive
Profile Jord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15483
Netherlands
Message 46130 - Posted: 26 Oct 2012, 20:29:23 UTC - in response to Message 46013.  

Wheels are in motion, developers have been kicked. Perhaps in next week's BOINC release.
ID: 46130 · Report as offensive
Profile Jord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15483
Netherlands
Message 46156 - Posted: 29 Oct 2012, 21:38:19 UTC

In the upcoming client, 7.0.37, there's code that should fix this problem. We expect this client to be released for testing sometime this week.
ID: 46156 · Report as offensive
Sunny129
Avatar

Send message
Joined: 17 Aug 12
Posts: 11
United States
Message 46157 - Posted: 29 Oct 2012, 21:51:27 UTC

can't wait to try it!
ID: 46157 · Report as offensive
Profile Beyond
Avatar

Send message
Joined: 16 Aug 12
Posts: 39
United States
Message 46185 - Posted: 1 Nov 2012, 17:38:18 UTC - in response to Message 46156.  

In the upcoming client, 7.0.37, there's code that should fix this problem. We expect this client to be released for testing sometime this week.

Ageless, thank you very much for following up on this issue.
ID: 46185 · Report as offensive
Profile Beyond
Avatar

Send message
Joined: 16 Aug 12
Posts: 39
United States
Message 46222 - Posted: 7 Nov 2012, 14:21:19 UTC - in response to Message 46156.  

In the upcoming client, 7.0.37, there's code that should fix this problem.

Unfortunately there is no improvement at all in 7.0.38. The GPU projects still refuse to request work until the queues run completely dry. The message is usually:

Not requesting tasks: project is not highest priority

but sometimes:

Not requesting tasks: don't need

I guess it's back to 7.0.2 for now. :-(
ID: 46222 · Report as offensive
Profile Jord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15483
Netherlands
Message 46223 - Posted: 7 Nov 2012, 16:04:15 UTC - in response to Message 46222.  

Please email David on this. He isn't keeping track of this thread.
And since you have all the necessary info or can easily get it, you best email him about it. His email address can be found at http://boinc.berkeley.edu/trac/wiki/ProjectPeople. You can also email the boinc_alpha list.
ID: 46223 · Report as offensive
Profile Beyond
Avatar

Send message
Joined: 16 Aug 12
Posts: 39
United States
Message 46708 - Posted: 10 Dec 2012, 21:19:20 UTC

I posted to the alpha list a while back: no reply, and it's still not fixed in 7.0.40. In fact 7.0.40 introduces new work fetch problems :-(
I'll try again...
ID: 46708 · Report as offensive
Profile Jord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15483
Netherlands
Message 46814 - Posted: 15 Dec 2012, 9:12:08 UTC

What about with BOINC 7.0.42, which amongst its fix-claims has:
- Avoid GPU starvation in certain situations where <exclude_gpu> is used in cc_config.xml
ID: 46814 · Report as offensive
Profile Beyond
Avatar

Send message
Joined: 16 Aug 12
Posts: 39
United States
Message 46816 - Posted: 15 Dec 2012, 15:25:49 UTC - in response to Message 46814.  

What about with BOINC 7.0.42, which amongst its fix-claims has:
- Avoid GPU starvation in certain situations where <exclude_gpu> is used in cc_config.xml

Hi Ageless, I've been testing 7.0.42 which was supposed to fix this issue. Unfortunately there is no improvement at all. Still, the last version of BOINC that can fetch GPU work properly in this scenario is 7.0.2, which has its own issues but at least it's usable.
ID: 46816 · Report as offensive
Profile Beyond
Avatar

Send message
Joined: 16 Aug 12
Posts: 39
United States
Message 47145 - Posted: 9 Jan 2013, 14:04:16 UTC

Still not fixed, no improvement with 7.0.44 :-(
ID: 47145 · Report as offensive
Joe Bloggs

Send message
Joined: 6 Jan 13
Posts: 40
Hong Kong
Message 47165 - Posted: 11 Jan 2013, 5:14:31 UTC

May I ask what is the point of making these exclusions? Does one app run better on one gpu model and another app run better on the other model? Is the difference big enough to warrant micromanagement on this scale?

Also version 7.0.4x introduces the new app_config.xml mechanism, by which you can set stuff like number of a particular app to run on one gpu without a full app_info. Perhaps you can try setting the exclusions in that file and see if the results are any different?
ID: 47165 · Report as offensive
Sunny129
Avatar

Send message
Joined: 17 Aug 12
Posts: 11
United States
Message 47177 - Posted: 11 Jan 2013, 12:48:30 UTC - in response to Message 47165.  

May I ask what is the point of making these exclusions? Does one app run better on one gpu model and another app run better on the other model? Is the difference big enough to warrant micromanagement on this scale?

Also version 7.0.4x introduces the new app_config.xml mechanism, by which you can set stuff like number of a particular app to run on one gpu without a full app_info. Perhaps you can try setting the exclusions in that file and see if the results are any different?

in short, yes - some projects run better on AMD/ATI hardware, while others run better on nVidia hardware. likewise, the same can be said for certain applications within a particular project. Beyond only has to micromanage an older version of BOINC that doesn't have these work fetch issues b/c he's running certain applications. if he wanted to, he could certainly choose to run a current version of BOINC and just deal with the work fetch issue that crop up when using more than one AMD or nVidia GPU in the same host. lately i haven't had to deal with the issue b/c i've got the same project/application running on my dual AMD GPU box, as is my dual nVidia GPU box.

that said, it hadn't dawned on me yet that we just might be able to skirt the issue by bypassing the use <exclude_gpu> statements in the cc_config.xml file altogether, and using the new app_config.xml feature instead. although i'm not sure if you can tell a GPU to run "zero" tasks from a particular project/app. traditionally in the app_info.xml file, and now in the new app_config.xml file, n=1 is for 1 task, n=0.5 is for 2 simultaneous tasks, n=0.33 is for 3 simultaneous tasks, and so on and so forth. convergence says that n=0 corresponds to an infinite number of simultaneous tasks, as impossible as we know that to be...so perhaps n=0 has simply been coded as "run zero tasks." i'm not set up to test this for reasons mentioned above. perhaps someone running different projects/applications on two AMD GPUs or two nVidia GPUs in the same box?
ID: 47177 · Report as offensive
Profile Beyond
Avatar

Send message
Joined: 16 Aug 12
Posts: 39
United States
Message 47409 - Posted: 20 Jan 2013, 0:11:51 UTC - in response to Message 47165.  

May I ask what is the point of making these exclusions? Does one app run better on one gpu model and another app run better on the other model? Is the difference big enough to warrant micromanagement on this scale?

Yes. For instance POEM uses so much CPU that one GPU has to be excluded in order to avoid contention for CPU resources. It has nothing to do with GPU models.
ID: 47409 · Report as offensive
Profile Beyond
Avatar

Send message
Joined: 16 Aug 12
Posts: 39
United States
Message 47550 - Posted: 28 Jan 2013, 5:51:38 UTC - in response to Message 47165.  

Also version 7.0.4x introduces the new app_config.xml mechanism, by which you can set stuff like number of a particular app to run on one gpu without a full app_info. Perhaps you can try setting the exclusions in that file and see if the results are any different?

Tried the app_config.xml on POEM, no difference in work fetch.

I had high hopes for 7.0.45. Unfortunately work fetch has not improved and the exclusions problem has not been helped at all.
ID: 47550 · Report as offensive
SekeRob2

Send message
Joined: 6 Jul 10
Posts: 585
Italy
Message 47551 - Posted: 28 Jan 2013, 10:08:22 UTC - in response to Message 47550.  

It's a bit involved, but the gpgpu exclusions are set in the cc_config.xml, e.g. tell a specific app from a specific project to not run device 0 or 1.
<exclude_gpu>
   <url>project_URL</url>
   [<device_num>N</device_num>]
   [<type>nvidia|ati</type>]
   [<app>appname</app>]
</exclude_gpu>


<app> is the shortname, where the same tag in app_config.xml is called <name>

Why it is the way it is... don't know, but understand so much that backward compatibility has to be maintained... a volunteer upgrading will not find the cc_config.xml settings to have changed in functionality [something that has happened once or twice]

Guess the type field has to be expanded to also function for the Intel_gpu.
Coelum Non Animum Mutant, Qui Trans Mare Currunt
ID: 47551 · Report as offensive
Profile Jord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15483
Netherlands
Message 47553 - Posted: 28 Jan 2013, 13:54:36 UTC - in response to Message 47550.  

[quoteI had high hopes for 7.0.45. Unfortunately work fetch has not improved and the exclusions problem has not been helped at all.[/quote]
You will have to email the BOINC Alpha email list about that, as else the developers will not know about it.

Add logs with appropriate debug flags enabled to show what you see.
If necessary add screen shots, but when you do, add David as a CC as the list will drop these attachments.
ID: 47553 · Report as offensive
Profile Jord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15483
Netherlands
Message 47554 - Posted: 28 Jan 2013, 13:55:02 UTC - in response to Message 47551.  

Guess the type field has to be expanded to also function for the Intel_gpu.

I just sent a request for that to the alpha list.
ID: 47554 · Report as offensive
Previous · 1 · 2 · 3 · Next

Message boards : Questions and problems : Work fetch problem with more than 1 ATI GPU, app_info & exclusions

Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.