Message boards :
Questions and problems :
Work fetch problem with more than 1 ATI GPU, app_info & exclusions
Message board moderation
Previous · 1 · 2 · 3 · Next
Author | Message |
---|---|
Send message Joined: 17 Aug 12 Posts: 11 |
oops...didn't realize we had already established that this bug does not affect hosts w/ just 1 AMD GPU and 1 nVidia GPU. upon discovering this for myself last night, it dawned on me that i could use it to my advantage. you see, while my two GTX 560 Ti's both crunch the same project in the same box (and therefore isn't affected by the bug), my two HD 6950's do crunch separate projects in the same box, and consequently don't maintain work buffers properly. it then dawned on me that i could swap one GTX 560 Ti for one HD 6950 (resulting in either of my machines running one GTX 560 Ti and one HD 6950), and that i would no longer have to deal with the work buffer bug... ...so i removed one of the HD 6950's from my dual HD 6950 box, and replaced it w/ one of the GTX 560 Ti's from my dual GTX 560 Ti box...unfortunately, with both cards installed, Windows refused to recognize the nVidia GPU unless it was installed in the lower of 2 PCIe x16 slots. this forced me to put the HD 6950 in the upper PCIe x16 slot (sandwiched between the CPU fans/heatsink and the lower GPU), and it resulted in HD 6950 temps approaching 90°C while crunching Milkyway@Home...totally unacceptable to say the least. it turns out that Windows' failure to recognize the GTX 560 Ti when installed in the lower PCIe x16 slot is an issue w/ my specific motherboard, an ASUS M4A89GTD PRO/USB3 (as opposed to a general chipset issue). i believe this to be true b/c the initial testing i did on this was also done on an 890GX chipset motherboard (an MSI 890GXM-G65 to be exact), and i had no trouble running the nVidia GPU in the upper PCIe x16 slot while running an AMD GPU in the lower one. |
Send message Joined: 16 Aug 12 Posts: 39 |
it turns out that Windows' failure to recognize the GTX 560 Ti when installed in the lower PCIe x16 slot is an issue w/ my specific motherboard, an ASUS M4A89GTD PRO/USB3 (as opposed to a general chipset issue). i believe this to be true b/c the initial testing i did on this was also done on an 890GX chipset motherboard (an MSI 890GXM-G65 to be exact), and i had no trouble running the nVidia GPU in the upper PCIe x16 slot while running an AMD GPU in the lower one. A bit OT as it is unrelated to the bug in question, but yes this is a problem with many GX motherboards. The ATI card has to be in the primary PCIe slot for both GPUs to be recognized. |
Send message Joined: 16 Aug 12 Posts: 39 |
Any chance of getting a fix for this bug? |
Send message Joined: 29 Aug 05 Posts: 15484 |
Wheels are in motion, developers have been kicked. Perhaps in next week's BOINC release. |
Send message Joined: 29 Aug 05 Posts: 15484 |
In the upcoming client, 7.0.37, there's code that should fix this problem. We expect this client to be released for testing sometime this week. |
Send message Joined: 17 Aug 12 Posts: 11 |
can't wait to try it! |
Send message Joined: 16 Aug 12 Posts: 39 |
In the upcoming client, 7.0.37, there's code that should fix this problem. We expect this client to be released for testing sometime this week. Ageless, thank you very much for following up on this issue. |
Send message Joined: 16 Aug 12 Posts: 39 |
In the upcoming client, 7.0.37, there's code that should fix this problem. Unfortunately there is no improvement at all in 7.0.38. The GPU projects still refuse to request work until the queues run completely dry. The message is usually: Not requesting tasks: project is not highest priority but sometimes: Not requesting tasks: don't need I guess it's back to 7.0.2 for now. :-( |
Send message Joined: 29 Aug 05 Posts: 15484 |
Please email David on this. He isn't keeping track of this thread. And since you have all the necessary info or can easily get it, you best email him about it. His email address can be found at http://boinc.berkeley.edu/trac/wiki/ProjectPeople. You can also email the boinc_alpha list. |
Send message Joined: 16 Aug 12 Posts: 39 |
I posted to the alpha list a while back: no reply, and it's still not fixed in 7.0.40. In fact 7.0.40 introduces new work fetch problems :-( I'll try again... |
Send message Joined: 29 Aug 05 Posts: 15484 |
What about with BOINC 7.0.42, which amongst its fix-claims has: - Avoid GPU starvation in certain situations where <exclude_gpu> is used in cc_config.xml |
Send message Joined: 16 Aug 12 Posts: 39 |
What about with BOINC 7.0.42, which amongst its fix-claims has: Hi Ageless, I've been testing 7.0.42 which was supposed to fix this issue. Unfortunately there is no improvement at all. Still, the last version of BOINC that can fetch GPU work properly in this scenario is 7.0.2, which has its own issues but at least it's usable. |
Send message Joined: 16 Aug 12 Posts: 39 |
Still not fixed, no improvement with 7.0.44 :-( |
Send message Joined: 6 Jan 13 Posts: 40 |
May I ask what is the point of making these exclusions? Does one app run better on one gpu model and another app run better on the other model? Is the difference big enough to warrant micromanagement on this scale? Also version 7.0.4x introduces the new app_config.xml mechanism, by which you can set stuff like number of a particular app to run on one gpu without a full app_info. Perhaps you can try setting the exclusions in that file and see if the results are any different? |
Send message Joined: 17 Aug 12 Posts: 11 |
May I ask what is the point of making these exclusions? Does one app run better on one gpu model and another app run better on the other model? Is the difference big enough to warrant micromanagement on this scale? in short, yes - some projects run better on AMD/ATI hardware, while others run better on nVidia hardware. likewise, the same can be said for certain applications within a particular project. Beyond only has to micromanage an older version of BOINC that doesn't have these work fetch issues b/c he's running certain applications. if he wanted to, he could certainly choose to run a current version of BOINC and just deal with the work fetch issue that crop up when using more than one AMD or nVidia GPU in the same host. lately i haven't had to deal with the issue b/c i've got the same project/application running on my dual AMD GPU box, as is my dual nVidia GPU box. that said, it hadn't dawned on me yet that we just might be able to skirt the issue by bypassing the use <exclude_gpu> statements in the cc_config.xml file altogether, and using the new app_config.xml feature instead. although i'm not sure if you can tell a GPU to run "zero" tasks from a particular project/app. traditionally in the app_info.xml file, and now in the new app_config.xml file, n=1 is for 1 task, n=0.5 is for 2 simultaneous tasks, n=0.33 is for 3 simultaneous tasks, and so on and so forth. convergence says that n=0 corresponds to an infinite number of simultaneous tasks, as impossible as we know that to be...so perhaps n=0 has simply been coded as "run zero tasks." i'm not set up to test this for reasons mentioned above. perhaps someone running different projects/applications on two AMD GPUs or two nVidia GPUs in the same box? |
Send message Joined: 16 Aug 12 Posts: 39 |
May I ask what is the point of making these exclusions? Does one app run better on one gpu model and another app run better on the other model? Is the difference big enough to warrant micromanagement on this scale? Yes. For instance POEM uses so much CPU that one GPU has to be excluded in order to avoid contention for CPU resources. It has nothing to do with GPU models. |
Send message Joined: 16 Aug 12 Posts: 39 |
Also version 7.0.4x introduces the new app_config.xml mechanism, by which you can set stuff like number of a particular app to run on one gpu without a full app_info. Perhaps you can try setting the exclusions in that file and see if the results are any different? Tried the app_config.xml on POEM, no difference in work fetch. I had high hopes for 7.0.45. Unfortunately work fetch has not improved and the exclusions problem has not been helped at all. |
Send message Joined: 6 Jul 10 Posts: 585 |
It's a bit involved, but the gpgpu exclusions are set in the cc_config.xml, e.g. tell a specific app from a specific project to not run device 0 or 1. <exclude_gpu> <url>project_URL</url> [<device_num>N</device_num>] [<type>nvidia|ati</type>] [<app>appname</app>] </exclude_gpu> <app> is the shortname, where the same tag in app_config.xml is called <name> Why it is the way it is... don't know, but understand so much that backward compatibility has to be maintained... a volunteer upgrading will not find the cc_config.xml settings to have changed in functionality [something that has happened once or twice] Guess the type field has to be expanded to also function for the Intel_gpu. Coelum Non Animum Mutant, Qui Trans Mare Currunt |
Send message Joined: 29 Aug 05 Posts: 15484 |
[quoteI had high hopes for 7.0.45. Unfortunately work fetch has not improved and the exclusions problem has not been helped at all.[/quote] You will have to email the BOINC Alpha email list about that, as else the developers will not know about it. Add logs with appropriate debug flags enabled to show what you see. If necessary add screen shots, but when you do, add David as a CC as the list will drop these attachments. |
Send message Joined: 29 Aug 05 Posts: 15484 |
Guess the type field has to be expanded to also function for the Intel_gpu. I just sent a request for that to the alpha list. |
Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License,
Version 1.2 or any later version published by the Free Software Foundation.