possible sched bug: einstein needs 1 cpu + 1 gpu

Message boards : Questions and problems : possible sched bug: einstein needs 1 cpu + 1 gpu
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile Joseph Stateson
Volunteer tester
Avatar

Send message
Joined: 27 Jun 08
Posts: 641
United States
Message 31904 - Posted: 2 Apr 2010, 2:26:03 UTC
Last modified: 2 Apr 2010, 2:30:59 UTC

On a quad system with 3 cpu tasks running and 2 collatz (.18 cpu + 1 nvidia each) I decided to suspend one of the collatz so that the einstein could start up, finish, and get out the door. It had been waiting for a free gpu. Since I already had 1 cpu free, then I assumed that the newly free nvidia and the unused cpu would allow the einstein to start up. It didnt happen and after about 15 minutes of waiting I closed boinc and restarted it. That worked.

So counting my cpu's, even with one supposidly free, part of it (.36 exectly) was probably running the two collatz. Suspending one of the collatz freed up a full gpu, but it seems that only .82 of a cpu was really available for einstein and it would not start. Restarting boinc must have added the same numbers up in a different order thus semingly breaking symmetry.

thanks for looking at my new math.
6.10.43
ID: 31904 · Report as offensive
Profile Jord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15492
Netherlands
Message 31909 - Posted: 2 Apr 2010, 9:40:07 UTC - in response to Message 31904.  

You do know that Einstein's ABP2 is a hybrid app which only does the FFT calculations on the GPU and all the rest on the CPU?
ID: 31909 · Report as offensive
Profile Joseph Stateson
Volunteer tester
Avatar

Send message
Joined: 27 Jun 08
Posts: 641
United States
Message 31910 - Posted: 2 Apr 2010, 15:04:36 UTC
Last modified: 2 Apr 2010, 15:30:25 UTC

Hi Jord

Yea - I sort of knew that as I am shown "1.0 cpu + 1 nvidia" by BM. However, I suspect there is some type of scheduling problem going on when more than 1 gpu is being used. For example, this windows 7 system had an idle 9800gtx+ for some reason and I do not know how to debug it.



So where is the other nvidia gpu? I would think that one of the einstein or one of the seti would be running based on the following "all tasks".





So what is causing the other nvidia not to run? It is shown in messages



I just closed boinc and restarted it and picked up an einstein



Note that the above show only 4 task running whereas the one with the unused nvidia had 6 tasks running. There are only 4 cpus on this system and all are allocated. I wonder if that supposidly "not cpu-intensive" freehal is pulling more cpu work that it is supposed to? I just checked again and I picked up two more tasks for a total of 6 including 2 gpu

ID: 31910 · Report as offensive
Profile Jord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15492
Netherlands
Message 31911 - Posted: 2 Apr 2010, 15:32:31 UTC - in response to Message 31910.  

As ever, when in doubt about whether or not BOINC does things correctly, run with debug flags and post the logs thereof. Without logs I won't forward this thread to the developers, they won't even try to figure out what went wrong or not.

<cpu_sched_debug>: problems involving the choice of applications to run.
<cpu_sched_status>: This tells you what is running, although it won't tell you why.
<coproc_debug>: Show details of coprocessor (GPU) scheduling.

ID: 31911 · Report as offensive
Profile Joseph Stateson
Volunteer tester
Avatar

Send message
Joined: 27 Jun 08
Posts: 641
United States
Message 31912 - Posted: 2 Apr 2010, 16:00:24 UTC
Last modified: 2 Apr 2010, 16:02:13 UTC

I have not seen this problem before. I just started crunching freehal recently, which seems not to use the cpu pool. I typically get 2 extra cpu tasks with freehal one of them and a gpu the other (on a system with 1 gpu). On systems with 2 gpus I see 6 tasks running (4 cpu + 2 gpu) and freehal is not running as it is waiting to upload. I have never seen 7 tasks. Then when freehal gets a slice I still have 6 tasks but have lost one of the others.

I am just guessing that freehal program is the problem. POssibly the scheduler marks it as non-cpu when it really is????
ID: 31912 · Report as offensive
Profile Joseph Stateson
Volunteer tester
Avatar

Send message
Joined: 27 Jun 08
Posts: 641
United States
Message 31913 - Posted: 2 Apr 2010, 16:11:24 UTC
Last modified: 2 Apr 2010, 16:43:19 UTC

I used the term "non cpu" for freehal but I have only seen that term used on my linux systems as shown here:



[EDIT AGAIN]

Not sure who / what is doing this, but when I view all my systems using boincview then all freehal show up as non-cpu intensive, not just the linux ones.

ID: 31913 · Report as offensive
Profile Joseph Stateson
Volunteer tester
Avatar

Send message
Joined: 27 Jun 08
Posts: 641
United States
Message 31916 - Posted: 2 Apr 2010, 19:12:32 UTC
Last modified: 2 Apr 2010, 19:13:38 UTC

Got 7 tasks running on my windows 7 system after all. I didnt think it was possible. This is a dual opteron (4 cores total) with one 9800gtx (BFG) and one 9800gtx+ (XFX) although boinc 6.10.43 thinks they are the same.

anyway, here are my 7 active tasks



I can see where scheduling can get confused. Note that einstein (the 1 + 1 guy) is not running anymore.

Supposidly, milkyway can run two tasks at once on the same gpu. Unfortunately, I have only (on this system) single precision gpu's tho I read where somebody actually got the single precision to work on that project.
ID: 31916 · Report as offensive

Message boards : Questions and problems : possible sched bug: einstein needs 1 cpu + 1 gpu

Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.