BOINC 6.12.26 and CUDA Task Problem

Message boards : Questions and problems : BOINC 6.12.26 and CUDA Task Problem
Message board moderation

To post messages, you must log in.

AuthorMessage
Alan Jordan

Send message
Joined: 10 May 11
Posts: 11
United Kingdom
Message 37755 - Posted: 11 May 2011, 7:18:32 UTC

I recently installed BOINC 6.12.26 following advice after problems with the standard Ubuntu Linux 11.04 version BOINC 6.10.59 (see my Thread 'BOINC Manager wastes available resources'). Unfortunately, I'm a Novice when it comes to the more technical aspects of Linux, and in installing BOINC 6.12.26 I managed to loose my previous Task configuration, and had to re-link to all my Projects, and download new Tasks. To date the problem configuration described in my previous Thread has yet to occur, so I am not sure whether BOINC 6.12.26 would handle this or not.

So far I would say that the Task Scheduling in BOINC 6.12.26 is improved. HOWEVER, there appears to be an issue with Einstein@Home Tasks labelled 'BRP3cuda32nv270', which want to run on my NVidia 9600GT GPU, using '0.2 CPUs + 1 NVIDIA GPUs'. Currently I have 5 such Tasks queued 'Waiting to run', ALL showing 'Waiting for GPU Memory'; I have not seen the latter Message before, so I assume this is a new feature of BOINC 6.12.26). Periodically the 'Waiting for GPU Memory' Messages disappear, and the queued Tasks attempt to run, but ALL fail after a few seconds, and return to the 'Waiting for GPU Memory' status. Checking the Event Log there are Messages that the Tasks exited with Zero Status and no Finished File, and that if this happens repeatedly you may need to Reset the Project.

I have seen such Messages before with such Einstein@Home Tasks, and on past experience resetting the Project has NO effect! HOWEVER, on past experience with BOINC 6.10.59, I would expect 2 or 3 of these Tasks to eventually complete normally, and 3 or 2 of these Tasks to eventually abort; for ALL 5 such Tasks to remain queued is unprecedented in my experience. Since I have made no other changes to my Ubuntu Linux configuration, I can only assume that this is a problem with BOINC 6.12.26; any suggestions?

Alan Jordan.
ID: 37755 · Report as offensive
Profile Jord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15483
Netherlands
Message 37756 - Posted: 11 May 2011, 10:19:50 UTC - in response to Message 37755.  

GPU memory problems cannot be fixed with a project reset or any other tinkling with BOINC, you will have to restart the computer. Any videocard memory trouble needs a full power cycle. I suspect that there's something stuck in video-memory, which is taking up space. That is why BOINC gives you the "Waiting for GPU memory" message.
ID: 37756 · Report as offensive
Alan Jordan

Send message
Joined: 10 May 11
Posts: 11
United Kingdom
Message 37760 - Posted: 11 May 2011, 11:44:11 UTC - in response to Message 37756.  

Many thanks for your very helpful comments and suggestion. I have tried restarting my PC several times, and it appears that my traditional combination of having Firefox and Thunderbird running in separate Ubuntu Workspaces, usually on the right-hand of my two Monitors, OFTEN, but NOT consistently, causes the problem. This inconsistency prompts two thoughts:-

1. Could this explain why some Einstein@Home Tasks labelled 'BRP3cuda32nv270' aborted under the old BOINC 6.10.59, whereas others completed OK?

2. Would more GPU Memory help? I am planning to replace my Desktop PC in the reasonably near future, and it is likely that the new PC would have a Graphics Adapter with more GPU Memory.

Alan Jordan.

PS - apologies to the Moderator for the 'typo' in the version number of BOINC!
ID: 37760 · Report as offensive
Profile Jord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15483
Netherlands
Message 37761 - Posted: 11 May 2011, 12:18:01 UTC - in response to Message 37760.  

1. Could this explain why some Einstein@Home Tasks labelled 'BRP3cuda32nv270' aborted under the old BOINC 6.10.59, whereas others completed OK?

How did they abort? By running them (calculation error), or with manual intervention (User abort)?

There's a myriad of things that can go wrong when doing calculations this intense on different computers. However, project specific questions are best asked at the project, since it's their science application running under BOINC. They ought to have an inkling of an idea why their app breaks on some tasks and not on others.

2. Would more GPU Memory help? I am planning to replace my Desktop PC in the reasonably near future, and it is likely that the new PC would have a Graphics Adapter with more GPU Memory.

Only as such that more things can get stuck in memory before you run into the same problems. ;)

Also, you may have the chance that you buy the latest newest newfanglest GPU there is and that your Linux doesn't have drivers for it for the next 7 fortnights. ;)

So choose wisely.

In short, the answer is no. It doesn't help as such.
ID: 37761 · Report as offensive
Richard Haselgrove
Volunteer tester
Help desk expert

Send message
Joined: 5 Oct 06
Posts: 5082
United Kingdom
Message 37762 - Posted: 11 May 2011, 13:33:46 UTC - in response to Message 37761.  

There's also 'abort because not started before deadline', which has happened at Einstein with some over-filled caches recently. That's an automatic BOINC (core client) action, although it shows as 'aborted by user' on the website.
ID: 37762 · Report as offensive
Alan Jordan

Send message
Joined: 10 May 11
Posts: 11
United Kingdom
Message 37764 - Posted: 11 May 2011, 15:55:13 UTC - in response to Message 37761.  

Einstein@Home CUDA Tasks aborting - I've probably used the wrong term here. Specifically these Tasks did not end because I manually aborted them, nor were they an 'abort because not started before deadline'.

Typically I would see such Tasks stuck in a form of Loop where the Elapsed Time indication would start at a some value, increase for a few seconds, then change back to the first value, and repeat continuously. Sometimes such Tasks would suddenly 'breakout' of the Loop and complete normally; more often I would find that they had failed after some time in the Loop with a 'Calculation Error'. I would say that under BOINC 6.10.59 about 40% of my Einstein@Home CUDA Tasks ended that way!

The reason I asked this question was that I have seen such Tasks behaving the same way under BOINC 6.12.26, but now they stop after a few minutes showing 'Waiting for GPU Memory'. Given the inconsistency of my GPU Memory problem (my PC has just run for 5 hours in my normal configuration without problems), I wondered whether the lack of GPU Memory was a situation that BOINC 6.10.59 couldn't handle correctly, so that Tasks failed with a 'Calculation Error', but BOINC 6.12.26 does?

More GPU Memory - I appreciate the point about taking longer before failing. I have no intention of buying the 'latest newest newfanglest GPU' for which there are no Linux drivers. Its just that even a fairly low spec. Graphics Card these days would have 1GB of GPU Memory, whereas my existing Nvidia 9600GT only has 512 MB.

Alan Jordan.
ID: 37764 · Report as offensive

Message boards : Questions and problems : BOINC 6.12.26 and CUDA Task Problem

Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.