GPU idling just because I set from 10 days to 1 day of work storage

Message boards : Questions and problems : GPU idling just because I set from 10 days to 1 day of work storage
Message board moderation

To post messages, you must log in.

AuthorMessage
Sandman192

Send message
Joined: 28 Aug 19
Posts: 49
United States
Message 93029 - Posted: 2 Oct 2019, 23:47:13 UTC

Ok, I've had no problems with BOINC running both 980 and 1080 Ti on the same computer.
All I did is select all but Prime Grid from the projects tab and have it store from 10 days of work to 1 day.
After a day I only have 1 GPU running. Instead of it getting more work for an idle GPU and regardless of how much work I still have for that ideal GPU it never got more Prime Grid for 1 of my ideal GPU or use what GPU projects I still have. That 1 day should be times the amount of GPUs I have and keep the second GPU bussy. Again, I had other GPU work from other projects that I've seen run off of a 980 but it's not running those projects still stored up on my computer.

Moral of the story is BOINC is not keeping all my GPUs with work. Changing the settings that I reported shouldn't ideal for the second GPU.
ID: 93029 · Report as offensive
Gary Roberts

Send message
Joined: 7 Sep 05
Posts: 130
Australia
Message 93031 - Posted: 3 Oct 2019, 1:19:23 UTC - in response to Message 93029.  

All I did is select all but Prime Grid from the projects tab and have it store from 10 days of work to 1 day.
Seeing as we have no idea of the list of projects that are "all but Prime Grid", perhaps you might like to mention the ones you selected rather than the ones you didn't. When you select a project on the projects tab, a number of commands then become 'available' for you to click on. Those are things like 'update', 'suspend', 'no new tasks', etc., but there is no command that will allow you to change the "days of work" for that selected project. If you have changed your work cache days setting, it would have been done elsewhere and would apply to all projects and not just to a selected project. Perhaps you should check the projects tab to see if any projects that normally supply you with GPU tasks have accidentally been 'suspended' or set to 'no new tasks' or something like that. Remember that the wording on a command button you see there shows the action that will happen if you click the button. For example, if a button for a selected project shows 'allow new tasks' then that project is currently set to NOT supply new tasks.

After a day I only have 1 GPU running. Instead of it getting more work for an idle GPU and regardless of how much work I still have for that ideal GPU it never got more Prime Grid for 1 of my ideal GPU or use what GPU projects I still have. That 1 day should be times the amount of GPUs I have and keep the second GPU bussy. Again, I had other GPU work from other projects that I've seen run off of a 980 but it's not running those projects still stored up on my computer.
I don't understand what you mean by "ideal GPU". I really don't understand what this whole paragraph is supposed to be describing. I gather you have only 1 of 2 separate GPUs that is actually crunching something (project and number of available tasks not specified) and that the other GPU is idle. You say that you have "other GPU work from other projects". If both GPUs are available and if one of them is idle, it seems like you must have done something like suspend the project that those "other GPU tasks" belong to. I don't know how available tasks would not start crunching if there was an idle and 'ready to go' GPU. Your assertion that the change in work cache size is somehow responsible just can't be correct. There is something else you've changed. In other words, somehow BOINC is being restricted by what you have (perhaps accidentally) changed and you need to find what that is.
Cheers,
Gary.
ID: 93031 · Report as offensive
Sandman192

Send message
Joined: 28 Aug 19
Posts: 49
United States
Message 93063 - Posted: 4 Oct 2019, 19:00:14 UTC - in response to Message 93031.  
Last modified: 4 Oct 2019, 19:17:24 UTC

perhaps you might like to mention the ones you selected rather than the ones you didn't

What? This has nothing to do with other ones that are not selected.
The point is that my GPU is idled and it's not getting any work for that GPU. Not even Prime Grid which is the only project I want is getting any work "that Prime works with my second 980 GPU before".
Here's the kicker... I just set BOINC to 10 days of work and it grabbed more Prime to work with the 980 and seemed to gab other work as well. Since I have "Send work from any subproject if selected projects have no work" check to yes just to see that it's for the idle GPU sprong back to life. Note: I had 10 days of work and store up to an additional 10 days of work for years with all GPUs running just fine until I set it to 1 day of work "meaning it still had lots of work for 2 GPUS and BOINC just stopped the next day just for 1 GPU.

I don't understand what you mean by "ideal GPU".

I'm sorry I meant idle. My spelling is off.

This is for BOINC developers and I ran into what seems to be a bug that will idle a GPU.
ID: 93063 · Report as offensive
robsmith
Volunteer tester
Help desk expert

Send message
Joined: 25 May 09
Posts: 1283
United Kingdom
Message 93064 - Posted: 4 Oct 2019, 19:31:41 UTC

What operating system are you running? - Windows has a nasty habit of updating GPU drivers with ones that don't have the parts needed for computational work. Two actions - first stop Windows updating your drivers, and second update the drivers from the nVida website - see the next note about versions.
Have you recently updated the GPU drivers using the very latest from nVidia - there are reports that these are "not as good as they might be" when it comes to computational work. This has happened before, GPU driver updates are most frequently associated with the performance when gaming. If so, roll back to one no newer than 430.x
In both cases you need to do a "clean" installation - this is a somewhat hidden option, buried beneath the "advanced installation" button. When the installation is complete do a re-boot to ensure you are using the driver you just installed, not the "old" one that might be lurking around in memory.
ID: 93064 · Report as offensive
Sandman192

Send message
Joined: 28 Aug 19
Posts: 49
United States
Message 93070 - Posted: 5 Oct 2019, 5:01:35 UTC - in response to Message 93064.  
Last modified: 5 Oct 2019, 5:02:37 UTC

[quote]What operating system are you running?
Windows 10

Have you recently updated the GPU drivers using the very latest from nVidia
Video driver v436.48. I know video drivers are not the issue.
And I told you before changing BOINC to 1 day and to only use Prime Grid for CPU and GPU (which Prime has run off of the 980 before) and it stopped working on the second GPU (980) the next day regardless that I still have 10 days of GPU work.

So it's not the drivers.

first stop Windows updating your drivers,
I can't tell Windows 10 NOT to update for drivers or anything else in that matter.
The first GPU kept working and the second just idled.

you need to do a "clean" installation

How often do you think people do a clean install?? Windows is only a year of installing.
ID: 93070 · Report as offensive
Sandman192

Send message
Joined: 28 Aug 19
Posts: 49
United States
Message 93432 - Posted: 31 Oct 2019, 1:52:06 UTC
Last modified: 31 Oct 2019, 2:07:38 UTC

I can't keep my second GPU running without keeping BOINC on 10 days of work settings.
BOINC stops my second GPU as soon as I click on 1 day of work settings.

Windows 10 and BOINC 7.14.2
ID: 93432 · Report as offensive
iowapipe

Send message
Joined: 18 Sep 19
Posts: 2
United States
Message 93488 - Posted: 2 Nov 2019, 17:16:54 UTC - in response to Message 93432.  

Just some advice: you have been given some advice already and seem unwilling to give it a try.

You also may not understand that the 'clean install' is for the driver package. When you download it from NVIDIA, you can get to advanced options... which is what the previous advice was addressing.

You can also try a fresh Windows installation if you have a spare disk handy: use it as your boot disk, install Windows, fresh video driver, and BOINC. Attach to the project and see if the problem is solved or replicated.
On my system, Windows 1903, I cannot run NVIDIAdrivers newer than 430.86. Or else BOINC crashes the system when running GPU tasks. Rolling back the driver fixed the problem after methodical testing.

As for not allowing Windows10 to update drivers, a quick Google will turn up detailed instruction in Advanced System Settings. Avail yourself.
ID: 93488 · Report as offensive
Nick Name

Send message
Joined: 14 Aug 19
Posts: 55
United States
Message 93491 - Posted: 2 Nov 2019, 18:42:41 UTC - in response to Message 93432.  

Is this your host?

http://www.primegrid.com/results.php?hostid=488838

If that's yours, I can see work has completed and validated on both GPUs, both CUDA and OpenCL, so we can probably rule out a driver problem. We can also rule out the usual suspects like BOINC only seeing one GPU (the most capable by default) or one having too low of compute capability.

My best guess without knowing more is that the 1080 Ti is so much faster than the 980 that BOINC is just not using it. I started using separate clients for each processor years ago because of this kind of silliness. I would try the following, in rough order of importance.

1. Check your BOINC startup log with both cache settings to see if they look the same.
2. If you're using web preferences, try changing them, preferably on a different project than where they are now (don't set them on World Community Grid, it's buggy even if you aren't using web prefs). Maybe something weird is going on there.
3. Try a GPU exclusion (<ignore_cuda_dev>N</ignore_cuda_dev> in cc_config) for the 1080 Ti to see if you can force the 980 to start working. It won't solve the problem but will help verify that BOINC can use the 980 with a low cache.
https://boinc.berkeley.edu/wiki/Client_configuration
4. Try a different version of BOINC.
5. If you can, set up another client with default options to see if the problem follows. If you don't feel comfortable with that reinstall BOINC to a different directory so that it's using default options. Just make sure that you set the use_all_gpus option in cc_config. If you try this, your BOINC data folder with your project data etc. should remain, but I'd copy it somewhere just to be safe.

It's not clear from the original post if this only affects PrimeGrid or all projects. If it's only PrimeGrid I would also try a different project to see if it's project specific or a general BOINC problem.

Finally, how long have you had both cards in the system? Did you always have this problem or did it start after a hardware / software change?
ID: 93491 · Report as offensive
Ghost Rider 51

Send message
Joined: 3 Nov 19
Posts: 2
United States
Message 93505 - Posted: 3 Nov 2019, 21:49:19 UTC - in response to Message 93063.  

perhaps you might like to mention the ones you selected rather than the ones you didn't

What? This has nothing to do with other ones that are not selected.
The point is that my GPU is idled and it's not getting any work for that GPU. Not even Prime Grid which is the only project I want is getting any work "that Prime works with my second 980 GPU before".
Here's the kicker... I just set BOINC to 10 days of work and it grabbed more Prime to work with the 980 and seemed to gab other work as well. Since I have "Send work from any subproject if selected projects have no work" check to yes just to see that it's for the idle GPU sprong back to life. Note: I had 10 days of work and store up to an additional 10 days of work for years with all GPUs running just fine until I set it to 1 day of work "meaning it still had lots of work for 2 GPUS and BOINC just stopped the next day just for 1 GPU.

I don't understand what you mean by "ideal GPU".

I'm sorry I meant idle. My spelling is off.

This is for BOINC developers and I ran into what seems to be a bug that will idle a GPU.


I wonder if, because you have 2 GPUs that are different (980 and 1080), maybe one empties it's queue faster than the other (and the work units are different). Then, because the remaining work for the one still active is more than the 1 day limit you set, boinc may be delaying sending work until both GPUs are less than the set amount before it will send more work for the idle one.

I had a similar situation when crunching for 2 different projects in that boinc did not seem to ever balance the load between the projects. It would seesaw back and forth from doing work on one project 100% of the time to doing work on the other project 100% of the time and never was able to balance it (on one machine).
My solution was to dedicate one machine to one project and another to the other project.

It could be doing the same for the different GPUs. Maybe if those GPUs were in different hosts this conflict would not occur.

I agree that load balancing should be a developer priority.
ID: 93505 · Report as offensive
Sandman192

Send message
Joined: 28 Aug 19
Posts: 49
United States
Message 93512 - Posted: 4 Nov 2019, 7:52:21 UTC - in response to Message 93491.  
Last modified: 4 Nov 2019, 7:57:18 UTC

Is this your host?

http://www.primegrid.com/results.php?hostid=488838

If that's yours, I can see work has completed and validated on both GPUs, both CUDA and OpenCL, so we can probably rule out a driver problem. We can also rule out the usual suspects like BOINC only seeing one GPU (the most capable by default) or one having too low of compute capability.

My best guess without knowing more is that the 1080 Ti is so much faster than the 980 that BOINC is just not using it. I started using separate clients for each processor years ago because of this kind of silliness. I would try the following, in rough order of importance.

1. Check your BOINC startup log with both cache settings to see if they look the same.
2. If you're using web preferences, try changing them, preferably on a different project than where they are now (don't set them on World Community Grid, it's buggy even if you aren't using web prefs). Maybe something weird is going on there.
3. Try a GPU exclusion (<ignore_cuda_dev>N</ignore_cuda_dev> in cc_config) for the 1080 Ti to see if you can force the 980 to start working. It won't solve the problem but will help verify that BOINC can use the 980 with a low cache.
https://boinc.berkeley.edu/wiki/Client_configuration
4. Try a different version of BOINC.
5. If you can, set up another client with default options to see if the problem follows. If you don't feel comfortable with that reinstall BOINC to a different directory so that it's using default options. Just make sure that you set the use_all_gpus option in cc_config. If you try this, your BOINC data folder with your project data etc. should remain, but I'd copy it somewhere just to be safe.

It's not clear from the original post if this only affects PrimeGrid or all projects. If it's only PrimeGrid I would also try a different project to see if it's project specific or a general BOINC problem.

Finally, how long have you had both cards in the system? Did you always have this problem or did it start after a hardware / software change?

It was running both GPUs just fine for months with the same BOINC v7.14.2 when setting for 10 days of work and just stopped halfway through a WU on the second 980 GPU when I now just set it to 1 day. That should be a big clue right there that there is something wrong with BOING.
If you want me to give a specific log file that you want then because BOINC sees both GPUs and I don't see anything that would stop BOINC from running both GPUs.
11/3/2019 9:04:30 PM | | Starting BOINC client version 7.14.2 for windows_x86_64
11/3/2019 9:04:30 PM | | log flags: file_xfer, sched_ops, task, task_debug
11/3/2019 9:04:30 PM | | Libraries: libcurl/7.47.1 OpenSSL/1.0.2g zlib/1.2.8
11/3/2019 9:04:30 PM | | Data directory: D:\ProgramData\BOINC
11/3/2019 9:04:30 PM | | Running under account Sandman192
11/3/2019 9:04:31 PM | | CUDA: NVIDIA GPU 0: GeForce GTX 1080 Ti (driver version 436.15, CUDA version 10.1, compute capability 6.1, 4096MB, 3548MB available, 12064 GFLOPS peak)
11/3/2019 9:04:31 PM | | CUDA: NVIDIA GPU 1: GeForce GTX 980 (driver version 436.15, CUDA version 10.1, compute capability 5.2, 4096MB, 3378MB available, 4979 GFLOPS peak)
11/3/2019 9:04:31 PM | | OpenCL: NVIDIA GPU 0: GeForce GTX 1080 Ti (driver version 436.15, device version OpenCL 1.2 CUDA, 11264MB, 3548MB available, 12064 GFLOPS peak)
11/3/2019 9:04:31 PM | | OpenCL: NVIDIA GPU 1: GeForce GTX 980 (driver version 436.15, device version OpenCL 1.2 CUDA, 4096MB, 3378MB available, 4979 GFLOPS peak)
11/3/2019 9:04:32 PM | | Host name: Puget-135563
11/3/2019 9:04:32 PM | | Processor: 12 GenuineIntel Intel(R) Core(TM) i7-5930K CPU @ 3.50GHz [Family 6 Model 63 Stepping 2]
11/3/2019 9:04:32 PM | | Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss htt tm pni ssse3 fma cx16 sse4_1 sse4_2 movebe popcnt aes f16c rdrandsyscall nx lm avx avx2 vmx tm2 dca pbe fsgsbase bmi1 smep bmi2
11/3/2019 9:04:32 PM | | OS: Microsoft Windows 10: Professional x64 Edition, (10.00.18362.00)
11/3/2019 9:04:32 PM | | Memory: 46.41 GB physical, 53.16 GB virtual
11/3/2019 9:04:32 PM | | Disk: 1.82 TB total, 165.13 GB free


No web preferences used. Just client preferences here.

Sorry, I'm not going to try a different version of BOINC since it works fine just not the 1 day of work.

Sorry again, I am not going to change the config files. Just having it stop in the middle of work on the second GPU when setting to 1 day is not normal in any way.

It works on BOTH GPU WUs all from SETI, Prime, Asteroid@home and so on for months UNTIL ALL I DID IS SET BOING CLIENT TO ON DAY.
ID: 93512 · Report as offensive
Nick Name

Send message
Joined: 14 Aug 19
Posts: 55
United States
Message 93517 - Posted: 4 Nov 2019, 16:22:05 UTC - in response to Message 93512.  

I'd enable <coproc_debug> to see if that sheds any light, and compare output using both settings.
ID: 93517 · Report as offensive
Richard Haselgrove
Volunteer tester
Help desk expert

Send message
Joined: 5 Oct 06
Posts: 5077
United Kingdom
Message 93518 - Posted: 4 Nov 2019, 17:21:54 UTC - in response to Message 93517.  

I think coproc_debug is mainly concerned with device detection. Despite the name, cpu_sched_debug gives more information about what tasks to run, or not run, on all devices. In particular, GPUs.
ID: 93518 · Report as offensive
Nick Name

Send message
Joined: 14 Aug 19
Posts: 55
United States
Message 93521 - Posted: 4 Nov 2019, 20:58:24 UTC - in response to Message 93518.  

Thanks, I've never had reason to use either before, I was just going off the description.
ID: 93521 · Report as offensive
Sandman192

Send message
Joined: 28 Aug 19
Posts: 49
United States
Message 93522 - Posted: 4 Nov 2019, 22:50:08 UTC
Last modified: 4 Nov 2019, 23:03:22 UTC

Ahaa, remember I told you that it stopped my second 980 GPU. I just let it go. The next day it started my 980 back up???

BOINC is now running both GPUs before enabling the <cpu_sched_debug> and <coproc_debug> log.
11/4/2019 4:37:02 PM | World Community Grid | [cpu_sched_debug] scheduling MIP1_00239106_2039_0 (high priority)
11/4/2019 4:37:02 PM | World Community Grid | [cpu_sched_debug] scheduling MIP1_00239222_0864_0 (high priority)
11/4/2019 4:37:02 PM | World Community Grid | [cpu_sched_debug] scheduling MIP1_00239222_0387_0 (high priority)
11/4/2019 4:37:02 PM | PrimeGrid | [cpu_sched_debug] scheduling genefer_extreme_28200381_0
11/4/2019 4:37:02 PM | PrimeGrid | [cpu_sched_debug] scheduling genefer21_27130980_1
11/4/2019 4:37:02 PM | | [cpu_sched_debug] enforce_run_list: end
11/4/2019 4:37:05 PM | | Re-reading cc_config.xml
11/4/2019 4:37:05 PM | | Using proxy info from GUI
11/4/2019 4:37:05 PM | | Config: don't compute while Defraggler64.exe is running
11/4/2019 4:37:05 PM | | Config: don't use GPUs while Defraggler64.exe is running
11/4/2019 4:37:05 PM | | Config: use all coprocessors
11/4/2019 4:37:05 PM | | log flags: file_xfer, sched_ops, task, coproc_debug, cpu_sched_debug
11/4/2019 4:37:05 PM | | [cpu_sched_debug] Request CPU reschedule: Core client configuration
11/4/2019 4:37:06 PM | | [cpu_sched_debug] schedule_cpus(): start
11/4/2019 4:37:06 PM | PrimeGrid | [cpu_sched_debug] reserving 1.000000 of coproc NVIDIA
11/4/2019 4:37:06 PM | PrimeGrid | [cpu_sched_debug] add to run list: genefer_extreme_28200381_0 (NVIDIA GPU, FIFO) (prio -10.966601)
11/4/2019 4:37:06 PM | PrimeGrid | [cpu_sched_debug] reserving 1.000000 of coproc NVIDIA
11/4/2019 4:37:06 PM | PrimeGrid | [cpu_sched_debug] add to run list: genefer21_27130980_1 (NVIDIA GPU, FIFO) (prio -11.194488)
11/4/2019 4:37:06 PM | World Community Grid | [cpu_sched_debug] add to run list: MIP1_00239229_0537_0 (CPU, EDF) (prio -0.003340)
11/4/2019 4:37:06 PM | World Community Grid | [cpu_sched_debug] add to run list: MIP1_00239212_0902_0 (CPU, EDF) (prio -0.003362)
11/4/2019 4:37:06 PM | World Community Grid | [cpu_sched_debug] add to run list: MIP1_00239229_3153_0 (CPU, EDF) (prio -0.003384)
11/4/2019 4:37:06 PM | World Community Grid | [cpu_sched_debug] add to run list: MIP1_00239220_1905_0 (CPU, EDF) (prio -0.003406)
11/4/2019 4:37:06 PM | World Community Grid | [cpu_sched_debug] add to run list: MIP1_00239208_1573_1 (CPU, EDF) (prio -0.003428)
11/4/2019 4:37:06 PM | World Community Grid | [cpu_sched_debug] add to run list: MIP1_00239106_2039_0 (CPU, EDF) (prio -0.003450)
11/4/2019 4:37:06 PM | World Community Grid | [cpu_sched_debug] add to run list: MIP1_00239222_0864_0 (CPU, EDF) (prio -0.003472)
11/4/2019 4:37:06 PM | World Community Grid | [cpu_sched_debug] add to run list: MIP1_00239222_0387_0 (CPU, EDF) (prio -0.003494)
11/4/2019 4:37:06 PM | World Community Grid | [cpu_sched_debug] add to run list: MIP1_00239228_0151_0 (CPU, EDF) (prio -0.003516)
11/4/2019 4:37:06 PM | World Community Grid | [cpu_sched_debug] add to run list: MIP1_00239220_0561_0 (CPU, EDF) (prio -0.003539)
11/4/2019 4:37:06 PM | World Community Grid | [cpu_sched_debug] add to run list: MIP1_00239232_0607_0 (CPU, EDF) (prio -0.003561)
11/4/2019 4:37:06 PM | World Community Grid | [cpu_sched_debug] add to run list: MIP1_00239219_0989_0 (CPU, EDF) (prio -0.003583)
11/4/2019 4:37:06 PM | | [cpu_sched_debug] enforce_run_list(): start
11/4/2019 4:37:06 PM | | [cpu_sched_debug] preliminary job list:
11/4/2019 4:37:06 PM | PrimeGrid | [cpu_sched_debug] 0: genefer_extreme_28200381_0 (MD: no; UTS: no)
11/4/2019 4:37:06 PM | PrimeGrid | [cpu_sched_debug] 1: genefer21_27130980_1 (MD: no; UTS: no)
11/4/2019 4:37:06 PM | World Community Grid | [cpu_sched_debug] 2: MIP1_00239229_0537_0 (MD: yes; UTS: no)
11/4/2019 4:37:06 PM | World Community Grid | [cpu_sched_debug] 3: MIP1_00239212_0902_0 (MD: yes; UTS: no)
11/4/2019 4:37:06 PM | World Community Grid | [cpu_sched_debug] 4: MIP1_00239229_3153_0 (MD: yes; UTS: no)
11/4/2019 4:37:06 PM | World Community Grid | [cpu_sched_debug] 5: MIP1_00239220_1905_0 (MD: yes; UTS: no)
11/4/2019 4:37:06 PM | World Community Grid | [cpu_sched_debug] 6: MIP1_00239208_1573_1 (MD: yes; UTS: no)
11/4/2019 4:37:06 PM | World Community Grid | [cpu_sched_debug] 7: MIP1_00239106_2039_0 (MD: yes; UTS: no)
11/4/2019 4:37:06 PM | World Community Grid | [cpu_sched_debug] 8: MIP1_00239222_0864_0 (MD: yes; UTS: no)
11/4/2019 4:37:06 PM | World Community Grid | [cpu_sched_debug] 9: MIP1_00239222_0387_0 (MD: yes; UTS: no)
11/4/2019 4:37:06 PM | World Community Grid | [cpu_sched_debug] 10: MIP1_00239228_0151_0 (MD: yes; UTS: yes)
11/4/2019 4:37:06 PM | World Community Grid | [cpu_sched_debug] 11: MIP1_00239220_0561_0 (MD: yes; UTS: yes)
11/4/2019 4:37:06 PM | World Community Grid | [cpu_sched_debug] 12: MIP1_00239232_0607_0 (MD: yes; UTS: yes)
11/4/2019 4:37:06 PM | World Community Grid | [cpu_sched_debug] 13: MIP1_00239219_0989_0 (MD: yes; UTS: yes)
11/4/2019 4:37:06 PM | | [cpu_sched_debug] final job list:
11/4/2019 4:37:06 PM | World Community Grid | [cpu_sched_debug] 0: MIP1_00239228_0151_0 (MD: yes; UTS: yes)
11/4/2019 4:37:06 PM | World Community Grid | [cpu_sched_debug] 1: MIP1_00239220_0561_0 (MD: yes; UTS: yes)
11/4/2019 4:37:06 PM | World Community Grid | [cpu_sched_debug] 2: MIP1_00239232_0607_0 (MD: yes; UTS: yes)
11/4/2019 4:37:06 PM | World Community Grid | [cpu_sched_debug] 3: MIP1_00239219_0989_0 (MD: yes; UTS: yes)
11/4/2019 4:37:06 PM | World Community Grid | [cpu_sched_debug] 4: MIP1_00239229_0537_0 (MD: yes; UTS: no)
11/4/2019 4:37:06 PM | World Community Grid | [cpu_sched_debug] 5: MIP1_00239212_0902_0 (MD: yes; UTS: no)
11/4/2019 4:37:06 PM | World Community Grid | [cpu_sched_debug] 6: MIP1_00239229_3153_0 (MD: yes; UTS: no)
11/4/2019 4:37:06 PM | World Community Grid | [cpu_sched_debug] 7: MIP1_00239220_1905_0 (MD: yes; UTS: no)
11/4/2019 4:37:06 PM | World Community Grid | [cpu_sched_debug] 8: MIP1_00239208_1573_1 (MD: yes; UTS: no)
11/4/2019 4:37:06 PM | World Community Grid | [cpu_sched_debug] 9: MIP1_00239106_2039_0 (MD: yes; UTS: no)
11/4/2019 4:37:06 PM | World Community Grid | [cpu_sched_debug] 10: MIP1_00239222_0864_0 (MD: yes; UTS: no)
11/4/2019 4:37:06 PM | World Community Grid | [cpu_sched_debug] 11: MIP1_00239222_0387_0 (MD: yes; UTS: no)
11/4/2019 4:37:06 PM | PrimeGrid | [cpu_sched_debug] 12: genefer_extreme_28200381_0 (MD: no; UTS: no)
11/4/2019 4:37:06 PM | PrimeGrid | [cpu_sched_debug] 13: genefer21_27130980_1 (MD: no; UTS: no)
11/4/2019 4:37:06 PM | PrimeGrid | [coproc] NVIDIA instance 0; 1.000000 pending for genefer_extreme_28200381_0
11/4/2019 4:37:06 PM | PrimeGrid | [coproc] NVIDIA instance 0; 1.000000 pending for genefer21_27130980_1
11/4/2019 4:37:06 PM | PrimeGrid | [coproc] NVIDIA instance 1: confirming 1.000000 instance for genefer_extreme_28200381_0
11/4/2019 4:37:06 PM | PrimeGrid | [coproc] NVIDIA instance 0: confirming 1.000000 instance for genefer21_27130980_1
11/4/2019 4:37:06 PM | World Community Grid | [cpu_sched_debug] scheduling MIP1_00239228_0151_0 (high priority)
11/4/2019 4:37:06 PM | World Community Grid | [cpu_sched_debug] scheduling MIP1_00239220_0561_0 (high priority)
11/4/2019 4:37:06 PM | World Community Grid | [cpu_sched_debug] scheduling MIP1_00239232_0607_0 (high priority)
11/4/2019 4:37:06 PM | World Community Grid | [cpu_sched_debug] scheduling MIP1_00239219_0989_0 (high priority)
11/4/2019 4:37:06 PM | World Community Grid | [cpu_sched_debug] scheduling MIP1_00239229_0537_0 (high priority)
11/4/2019 4:37:06 PM | World Community Grid | [cpu_sched_debug] scheduling MIP1_00239212_0902_0 (high priority)
11/4/2019 4:37:06 PM | World Community Grid | [cpu_sched_debug] scheduling MIP1_00239229_3153_0 (high priority)
11/4/2019 4:37:06 PM | World Community Grid | [cpu_sched_debug] scheduling MIP1_00239220_1905_0 (high priority)
11/4/2019 4:37:06 PM | World Community Grid | [cpu_sched_debug] scheduling MIP1_00239208_1573_1 (high priority)
11/4/2019 4:37:06 PM | World Community Grid | [cpu_sched_debug] scheduling MIP1_00239106_2039_0 (high priority)
11/4/2019 4:37:06 PM | World Community Grid | [cpu_sched_debug] scheduling MIP1_00239222_0864_0 (high priority)
11/4/2019 4:37:06 PM | World Community Grid | [cpu_sched_debug] scheduling MIP1_00239222_0387_0 (high priority)
11/4/2019 4:37:06 PM | PrimeGrid | [cpu_sched_debug] scheduling genefer_extreme_28200381_0
11/4/2019 4:37:06 PM | PrimeGrid | [cpu_sched_debug] scheduling genefer21_27130980_1
11/4/2019 4:37:06 PM | | [cpu_sched_debug] enforce_run_list: end
11/4/2019 4:38:06 PM | | [cpu_sched_debug] Request CPU reschedule: periodic CPU scheduling
11/4/2019 4:38:06 PM | | [cpu_sched_debug] schedule_cpus(): start
11/4/2019 4:38:06 PM | PrimeGrid | [cpu_sched_debug] reserving 1.000000 of coproc NVIDIA
11/4/2019 4:38:06 PM | PrimeGrid | [cpu_sched_debug] add to run list: genefer_extreme_28200381_0 (NVIDIA GPU, FIFO) (prio -10.966596)
11/4/2019 4:38:06 PM | PrimeGrid | [cpu_sched_debug] reserving 1.000000 of coproc NVIDIA
11/4/2019 4:38:06 PM | PrimeGrid | [cpu_sched_debug] add to run list: genefer21_27130980_1 (NVIDIA GPU, FIFO) (prio -11.194483)
11/4/2019 4:38:06 PM | World Community Grid | [cpu_sched_debug] add to run list: MIP1_00239229_0537_0 (CPU, EDF) (prio -0.003340)
11/4/2019 4:38:06 PM | World Community Grid | [cpu_sched_debug] add to run list: MIP1_00239212_0902_0 (CPU, EDF) (prio -0.003362)
11/4/2019 4:38:06 PM | World Community Grid | [cpu_sched_debug] add to run list: MIP1_00239229_3153_0 (CPU, EDF) (prio -0.003385)
11/4/2019 4:38:06 PM | World Community Grid | [cpu_sched_debug] add to run list: MIP1_00239220_1905_0 (CPU, EDF) (prio -0.003407)
11/4/2019 4:38:06 PM | World Community Grid | [cpu_sched_debug] add to run list: MIP1_00239208_1573_1 (CPU, EDF) (prio -0.003429)
11/4/2019 4:38:06 PM | World Community Grid | [cpu_sched_debug] add to run list: MIP1_00239106_2039_0 (CPU, EDF) (prio -0.003451)
11/4/2019 4:38:06 PM | World Community Grid | [cpu_sched_debug] add to run list: MIP1_00239222_0864_0 (CPU, EDF) (prio -0.003473)
11/4/2019 4:38:06 PM | World Community Grid | [cpu_sched_debug] add to run list: MIP1_00239222_0387_0 (CPU, EDF) (prio -0.003495)
11/4/2019 4:38:06 PM | World Community Grid | [cpu_sched_debug] add to run list: MIP1_00239228_0151_0 (CPU, EDF) (prio -0.003517)
11/4/2019 4:38:06 PM | World Community Grid | [cpu_sched_debug] add to run list: MIP1_00239220_0561_0 (CPU, EDF) (prio -0.003539)
11/4/2019 4:38:06 PM | World Community Grid | [cpu_sched_debug] add to run list: MIP1_00239232_0607_0 (CPU, EDF) (prio -0.003561)
11/4/2019 4:38:06 PM | World Community Grid | [cpu_sched_debug] add to run list: MIP1_00239219_0989_0 (CPU, EDF) (prio -0.003583)
11/4/2019 4:38:06 PM | | [cpu_sched_debug] enforce_run_list(): start
11/4/2019 4:38:06 PM | | [cpu_sched_debug] preliminary job list:
11/4/2019 4:38:06 PM | PrimeGrid | [cpu_sched_debug] 0: genefer_extreme_28200381_0 (MD: no; UTS: no)
11/4/2019 4:38:06 PM | PrimeGrid | [cpu_sched_debug] 1: genefer21_27130980_1 (MD: no; UTS: no)
11/4/2019 4:38:06 PM | World Community Grid | [cpu_sched_debug] 2: MIP1_00239229_0537_0 (MD: yes; UTS: no)
11/4/2019 4:38:06 PM | World Community Grid | [cpu_sched_debug] 3: MIP1_00239212_0902_0 (MD: yes; UTS: no)
11/4/2019 4:38:06 PM | World Community Grid | [cpu_sched_debug] 4: MIP1_00239229_3153_0 (MD: yes; UTS: no)
11/4/2019 4:38:06 PM | World Community Grid | [cpu_sched_debug] 5: MIP1_00239220_1905_0 (MD: yes; UTS: no)
11/4/2019 4:38:06 PM | World Community Grid | [cpu_sched_debug] 6: MIP1_00239208_1573_1 (MD: yes; UTS: no)
11/4/2019 4:38:06 PM | World Community Grid | [cpu_sched_debug] 7: MIP1_00239106_2039_0 (MD: yes; UTS: no)
11/4/2019 4:38:06 PM | World Community Grid | [cpu_sched_debug] 8: MIP1_00239222_0864_0 (MD: yes; UTS: no)
11/4/2019 4:38:06 PM | World Community Grid | [cpu_sched_debug] 9: MIP1_00239222_0387_0 (MD: yes; UTS: no)
11/4/2019 4:38:06 PM | World Community Grid | [cpu_sched_debug] 10: MIP1_00239228_0151_0 (MD: yes; UTS: yes)
11/4/2019 4:38:06 PM | World Community Grid | [cpu_sched_debug] 11: MIP1_00239220_0561_0 (MD: yes; UTS: yes)
11/4/2019 4:38:06 PM | World Community Grid | [cpu_sched_debug] 12: MIP1_00239232_0607_0 (MD: yes; UTS: yes)
11/4/2019 4:38:06 PM | World Community Grid | [cpu_sched_debug] 13: MIP1_00239219_0989_0 (MD: yes; UTS: yes)
11/4/2019 4:38:06 PM | | [cpu_sched_debug] final job list:
11/4/2019 4:38:06 PM | World Community Grid | [cpu_sched_debug] 0: MIP1_00239228_0151_0 (MD: yes; UTS: yes)
11/4/2019 4:38:06 PM | World Community Grid | [cpu_sched_debug] 1: MIP1_00239220_0561_0 (MD: yes; UTS: yes)
11/4/2019 4:38:06 PM | World Community Grid | [cpu_sched_debug] 2: MIP1_00239232_0607_0 (MD: yes; UTS: yes)
11/4/2019 4:38:06 PM | World Community Grid | [cpu_sched_debug] 3: MIP1_00239219_0989_0 (MD: yes; UTS: yes)
11/4/2019 4:38:06 PM | World Community Grid | [cpu_sched_debug] 4: MIP1_00239229_0537_0 (MD: yes; UTS: no)
11/4/2019 4:38:06 PM | World Community Grid | [cpu_sched_debug] 5: MIP1_00239212_0902_0 (MD: yes; UTS: no)
11/4/2019 4:38:06 PM | World Community Grid | [cpu_sched_debug] 6: MIP1_00239229_3153_0 (MD: yes; UTS: no)
11/4/2019 4:38:06 PM | World Community Grid | [cpu_sched_debug] 7: MIP1_00239220_1905_0 (MD: yes; UTS: no)
11/4/2019 4:38:06 PM | World Community Grid | [cpu_sched_debug] 8: MIP1_00239208_1573_1 (MD: yes; UTS: no)
11/4/2019 4:38:06 PM | World Community Grid | [cpu_sched_debug] 9: MIP1_00239106_2039_0 (MD: yes; UTS: no)
11/4/2019 4:38:06 PM | World Community Grid | [cpu_sched_debug] 10: MIP1_00239222_0864_0 (MD: yes; UTS: no)
11/4/2019 4:38:06 PM | World Community Grid | [cpu_sched_debug] 11: MIP1_00239222_0387_0 (MD: yes; UTS: no)
11/4/2019 4:38:06 PM | PrimeGrid | [cpu_sched_debug] 12: genefer_extreme_28200381_0 (MD: no; UTS: no)
11/4/2019 4:38:06 PM | PrimeGrid | [cpu_sched_debug] 13: genefer21_27130980_1 (MD: no; UTS: no)
11/4/2019 4:38:06 PM | PrimeGrid | [coproc] NVIDIA instance 0; 1.000000 pending for genefer_extreme_28200381_0
11/4/2019 4:38:06 PM | PrimeGrid | [coproc] NVIDIA instance 0; 1.000000 pending for genefer21_27130980_1
11/4/2019 4:38:06 PM | PrimeGrid | [coproc] NVIDIA instance 1: confirming 1.000000 instance for genefer_extreme_28200381_0
11/4/2019 4:38:06 PM | PrimeGrid | [coproc] NVIDIA instance 0: confirming 1.000000 instance for genefer21_27130980_1
11/4/2019 4:38:06 PM | World Community Grid | [cpu_sched_debug] scheduling MIP1_00239228_0151_0 (high priority)
11/4/2019 4:38:06 PM | World Community Grid | [cpu_sched_debug] scheduling MIP1_00239220_0561_0 (high priority)
11/4/2019 4:38:06 PM | World Community Grid | [cpu_sched_debug] scheduling MIP1_00239232_0607_0 (high priority)
11/4/2019 4:38:06 PM | World Community Grid | [cpu_sched_debug] scheduling MIP1_00239219_0989_0 (high priority)
11/4/2019 4:38:06 PM | World Community Grid | [cpu_sched_debug] scheduling MIP1_00239229_0537_0 (high priority)
11/4/2019 4:38:06 PM | World Community Grid | [cpu_sched_debug] scheduling MIP1_00239212_0902_0 (high priority)
11/4/2019 4:38:06 PM | World Community Grid | [cpu_sched_debug] scheduling MIP1_00239229_3153_0 (high priority)
11/4/2019 4:38:06 PM | World Community Grid | [cpu_sched_debug] scheduling MIP1_00239220_1905_0 (high priority)
11/4/2019 4:38:06 PM | World Community Grid | [cpu_sched_debug] scheduling MIP1_00239208_1573_1 (high priority)
11/4/2019 4:38:06 PM | World Community Grid | [cpu_sched_debug] scheduling MIP1_00239106_2039_0 (high priority)
11/4/2019 4:38:06 PM | World Community Grid | [cpu_sched_debug] scheduling MIP1_00239222_0864_0 (high priority)
11/4/2019 4:38:06 PM | World Community Grid | [cpu_sched_debug] scheduling MIP1_00239222_0387_0 (high priority)
11/4/2019 4:38:06 PM | PrimeGrid | [cpu_sched_debug] scheduling genefer_extreme_28200381_0
11/4/2019 4:38:06 PM | PrimeGrid | [cpu_sched_debug] scheduling genefer21_27130980_1
11/4/2019 4:38:06 PM | | [cpu_sched_debug] enforce_run_list: end

I just have WCG to get CPU work just to get my 10 days of stored GPU work to drindle down.

P.S. Has anyone ever got WCG GPU work? I never got any work for years since I heard that WCG is supporting GPUs or is going to. Not even my 1080 got work.
ID: 93522 · Report as offensive
Nick Name

Send message
Joined: 14 Aug 19
Posts: 55
United States
Message 93524 - Posted: 5 Nov 2019, 3:08:19 UTC - in response to Message 93522.  

I'm not going to pretend I can decipher all of that, but I expect the issue is the drastic change in the cache. When you change the cache setting from ten days to one, the scheduler is losing its mind and going into high priority mode, and idling the 980 to save a thread. I'd expect it to idle both GPUs based on my experience but the deadline for those PrimeGrid jobs is likely playing a role with that.

This doesn't look like a bug to me, just the scheduler trying to cope with an extreme change in the work cache. If you leave things alone it will probably work out but take a few days. If it were my system I'd leave the cache alone, set all projects to NNT, let everything run out and then set the cache. This will keep the client from going into high priority mode when it doesn't really need to.

There definitely isn't any GPU work for WCG now. I think there was talk of developing it years ago but it never happened.
ID: 93524 · Report as offensive
Sandman192

Send message
Joined: 28 Aug 19
Posts: 49
United States
Message 93525 - Posted: 5 Nov 2019, 3:59:47 UTC - in response to Message 93524.  

Thank you. That was a good response and logic on what maybe is going on with BOINC.
ID: 93525 · Report as offensive

Message boards : Questions and problems : GPU idling just because I set from 10 days to 1 day of work storage

Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.