GPU tasks skipped after scheduler overcommits CPU cores

Message boards : Questions and problems : GPU tasks skipped after scheduler overcommits CPU cores
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4

AuthorMessage
Richard Haselgrove
Volunteer tester
Help desk expert

Send message
Joined: 5 Oct 06
Posts: 5078
United Kingdom
Message 103673 - Posted: 22 Mar 2021, 22:41:35 UTC - in response to Message 103672.  

... no real re-write with re-design ...
Guess who was bemoaning that last week.

https://setiathome.berkeley.edu/forum_thread.php?id=85717&postid=2070907#2070907
ID: 103673 · Report as offensive
Sandman192

Send message
Joined: 28 Aug 19
Posts: 49
United States
Message 103725 - Posted: 28 Mar 2021, 18:40:54 UTC - in response to Message 103594.  

You probably pushed up the priority of the other projects by being locked into working on TN-Grid for so long by deadline pressure. It will return to normal gradually, but over a period of several days.

See the Configuration Options page of the User Manual. Try setting the line

<rec_half_life_days>X</rec_half_life_days>
A project's scheduling priority is determined by its estimated credit in the last X days. Default is 10; set it larger if you run long high-priority jobs.
to something much smaller: one day, instead of the default 10, would sort things out quicker.

If I had to edit a setting to make it work then it's a bug if that setting is by default and not working as devs intended then it's a BUG. Does that stand for 10 days of extra work? If so then changing it from the BOINC options doesn't work. I've tried. 1 day, 10 days, nothing.
It's not just TN-Grid. WCG is the same way. Just now I have it set to get more work for WCG. And as soon as the other projects are done... All I'll get is WCG and nothing else. I've done this twice already. Never had this problem until I updated BOINC. For years... Ever since BOINC came out I had 20 projects and all 20 projects were "waiting to run". Now 1 project is "waiting to run" when WGC or TN-Grid is added. 10 days or 1. Does not matter.

I'm not changing any settings. I here to report a bug. The point is if I had to change settings every time I updated BOINC then why? That's when I notice problems.

P.S. Note: I had no work at all "waiting to run". So I added Milky Way only for GPU work. And all it did is run 1 of my 2 GPUs installed. Adding another project like Moo Wrapper into it and Milky Way started on the second GPU just fine???😲 And only after I added another GPU project.
ID: 103725 · Report as offensive
Profile Keith Myers
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 17 Nov 16
Posts: 863
United States
Message 103726 - Posted: 28 Mar 2021, 20:50:59 UTC

If you reduce your work cache size, you will never have tasks going into EDF mode and pre-empting other projects. Then the client can work as it is supposed to and obey project resource shares.

Also the <rec_half_life_days>X</rec_half_life_days> value is a legacy from cpu only project days and doesn't compensate for gpu work.
ID: 103726 · Report as offensive
Sandman192

Send message
Joined: 28 Aug 19
Posts: 49
United States
Message 104586 - Posted: 17 Jun 2021, 2:41:24 UTC - in response to Message 103726.  
Last modified: 17 Jun 2021, 2:41:38 UTC

If you reduce your work cache size, you will never have tasks going into EDF mode and pre-empting other projects. Then the client can work as it is supposed to and obey project resource shares.

Also the <rec_half_life_days>X</rec_half_life_days> value is a legacy from cpu only project days and doesn't compensate for gpu work.


Reducing cache size does nothing.
Still same results.
ID: 104586 · Report as offensive
Previous · 1 · 2 · 3 · 4

Message boards : Questions and problems : GPU tasks skipped after scheduler overcommits CPU cores

Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.