Message boards : BOINC client : multiple NVIDIA GPUs
Message board moderation
Author | Message |
---|---|
Send message Joined: 9 Sep 05 Posts: 128 |
I've recently installed two video cards in my Linux box. They are Nvidia GT 430. BOINC CC sees them just fine: 02-Mar-2012 13:30:51 [---] Starting BOINC client version 7.0.2 for x86_64-pc-linux-gnu 02-Mar-2012 13:30:51 [---] This a development version of BOINC and may not function properly 02-Mar-2012 13:30:51 [---] log flags: file_xfer, sched_ops, task, coproc_debug, cpu_sched, cpu_sched_debug 02-Mar-2012 13:30:51 [---] Libraries: libcurl/7.18.2 OpenSSL/0.9.8g zlib/1.2.3.3 libidn/1.8 libssh2/0.18 02-Mar-2012 13:30:51 [---] Data directory: censored 02-Mar-2012 13:30:51 [---] Processor: 8 AuthenticAMD Dual-Core AMD Opteron(tm) Processor 8218 [Family 15 Model 65 Stepping 2] 02-Mar-2012 13:30:51 [---] Processor: 1.00 MB cache 02-Mar-2012 13:30:51 [---] Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt rdtscp lm 3dnowext 3dnow rep_good extd_apicid pni cx16 lahf_lm cmp_legacy svm extapic cr8_legacy 02-Mar-2012 13:30:51 [---] OS: Linux: 2.6.36.1 02-Mar-2012 13:30:51 [---] Memory: 63.05 GB physical, 56.83 GB virtual 02-Mar-2012 13:30:51 [---] Disk: 66.40 GB total, 20.64 GB free 02-Mar-2012 13:30:51 [---] Local time is UTC +0 hours 02-Mar-2012 13:30:51 [---] NVIDIA GPU 0: GeForce GT 430 (driver version unknown, CUDA version 4.20, compute capability 2.1, 1024MB, 1001MB available, 280 GFLOPS peak) 02-Mar-2012 13:30:51 [---] NVIDIA GPU 1: GeForce GT 430 (driver version unknown, CUDA version 4.20, compute capability 2.1, 1024MB, 1001MB available, 280 GFLOPS peak) 02-Mar-2012 13:30:51 [---] OpenCL: NVIDIA GPU 0: GeForce GT 430 (driver version 295.20, device version OpenCL 1.1 CUDA, 1024MB) 02-Mar-2012 13:30:51 [---] OpenCL: NVIDIA GPU 1: GeForce GT 430 (driver version 295.20, device version OpenCL 1.1 CUDA, 1024MB) 02-Mar-2012 13:30:51 [---] NVIDIA library reports 2 GPUs Tasks requiring NVIDIA coprocessor run fine until their requirement is <coproc><count>1.0</count></coproc>. If this gets larger than 1.0, then it doesn't work at all. One example is project Moo! Wrap, which is a wrapper for DNETC projects. Their science application grabs all available GPUs. Project scheduler this correctly indicates by setting coproc count to 2.0. However BOINC CC claims that my system doesn't have enough NVIDIA GPUs available: 02-Mar-2012 13:30:53 [Moo! Wrapper] [cpu_sched_debug] insufficient NVIDIA for dnetc_r72_1330613363_72_192_0 Any idea about this? Metod ... |
Send message Joined: 23 Apr 07 Posts: 1112 |
Any idea about this? Try Boinc 7.0.18, Boinc 7.0.2 ia a very early Alpha, Claggy |
Send message Joined: 5 Oct 06 Posts: 5137 |
Are you attached to other projects? Was anything else active at the time? I think that's the message you would see if, for example: Two count=1.0 applications are running, separately, one on each GPU. One of the two tasks finishes, leaving one GPU free and one GPU occupied. The client scheduler considers scheduling a Moo! app next - it finds one available GPU, needs two, and backs out with the 'insufficent' message. In general, BOINC doesn't like to pre-empt CUDA applications - it can be inefficient. So, Moo! probably won't run until both GPUs are freed by single tasks finishing at exactly the same time (unlikely), the other projects run out of work, or Moo! is forced to run in High Priority by deadline pressure. One possible work round would be to set a really low Task Switch Interval, so that the other tasks become pre-emptible quickly. |
Send message Joined: 5 Oct 06 Posts: 5137 |
Afterthought - has anybody discussed this potential cross-project "play nice" issue at Moo! ? It's similar to issues we saw with BOINC in the early days of multi-threaded CPU applications. I think the solution there was to unceremoniously dump all single-threaded tasks when a MT app was scheduled to run (whether or not they should have continued running under 'Task Switch' rules). That wasn't very nice, either. |
Send message Joined: 9 Sep 05 Posts: 128 |
On this host I have multiple projects running: CPDN, Seti, Einstein and now Moo!. I don't have proper CUDA application for Seti (yet, see problem description right at the end of this post), Einstein provides with nice CUDA application. However, during this tests I disabled Einstein so it's only Moo! that wants to use NVIDIA. As BOINC CC decided NVIDIA resources were not ample enough, both GPUs stayed idle. If I edit client_state.xml and decrease count of CUDA for Moo! to 1.0, BOINC CC starts a couple of Moo! tasks. This is far from ideal as both tasks want to run on both GPUs. There's some discussion on Moo! boards about making dnetc application nicer - so that it would only occupy designated GPU and not all of them. However, as this project is only a wrapper around different type of distributed computing project, it's not totally in hands of project management to make needed changes to the crunching (I just can't call it science) application. I already tried to run newer BOINC CC but I'm unable to use Berkeley-provided executable - my OS distribution is somehow elderly so system libraries are too old. I probably should compile BOINC CC on my own. I would like to get some indication about whether this problem (single task requiring multi-GPU resource) is known to developers and if there had been some work done about it before I jump on it. Metod ... |
Send message Joined: 29 Aug 05 Posts: 15581 |
I forwarded it to Rom, asked him to pass by. Can't promise anything. :-) |
Send message Joined: 9 Sep 05 Posts: 128 |
Just for the record: now I'm running BOINC CC 7.0.18 and it still doesn't want to run tasks with requirement of more than 1.0 NVIDIA resource (thus leaving both GPUs idle). Metod ... |
Send message Joined: 3 Mar 12 Posts: 27 |
I can report the same problem running Moo! Wrapper on the ATI HD6990 card. If I use an app_info.xml file and say one GPU per task, it will load and run two tasks... But, the Dnet application looks to use ALL GPU in real time, so if you suspend one task, the other starts using both GPU's even though app_info doesn't prescribe that. That causes more problems as eventually it's trying to run two tasks on two GPU's at the same time. This behavior also seen running a collatz task on one GPU and Moo on another. If the collatz task is suspended, the Moo (distributed.net) task starts using both GPU's! Without app_info file, Moo! will never start. :) |
Copyright © 2025 University of California.
Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License,
Version 1.2 or any later version published by the Free Software Foundation.