GPU restarts from zero

Message boards : Questions and problems : GPU restarts from zero
Message board moderation

To post messages, you must log in.

AuthorMessage
Pierre Farrier-Wade
Avatar

Send message
Joined: 20 May 15
Posts: 2
South Africa
Message 62260 - Posted: 20 May 2015, 9:11:42 UTC

Hello

I've been running BOINC for quite some time on various machines here.
My work pc runs 4 tasks at once plus the NVIDIA GPU task.

The standard tasks run all the time (i.e. pc in use), the GPU doesn't as it slows down too much then.
Today it was running POEM and now Milkyway tasks.

Now, the problem is that when the GPU suspends when the pc is in use, then restarts again after idle delay, the work resets and starts again from zero (even if it's been busy for 1 or 2 hours.
I aborted the POEM task because it won't meet the deadline at this rate.
The Milkyway tasks may finish in time as they are 37:50.

BUT MY QUESTION IS:
Why is this happening? Any settings I need to change?
It's really pointless having the computer process away only to throw all that work away and never finish.
I can't say I noticed this happening before, could it be the version causing the issue?

Thank you, I really appreciate your time helping me with this.
ID: 62260 · Report as offensive
Pierre Farrier-Wade
Avatar

Send message
Joined: 20 May 15
Posts: 2
South Africa
Message 62261 - Posted: 20 May 2015, 10:10:16 UTC

UPDATE:
Milkyway tasks are running fine after all.
The roll back about 2 minutes then carry on.

So it looks like a POEM issue.
ID: 62261 · Report as offensive
Richard Haselgrove
Volunteer tester
Help desk expert

Send message
Joined: 5 Oct 06
Posts: 5082
United Kingdom
Message 62262 - Posted: 20 May 2015, 10:39:29 UTC

Sounds like a timing issue with checkpoints.

Project applications (both POEM and Milkyway) periodically write a 'checkpoint' file - a summary of everything done so far. They need to read this file back in every time they restart, to be able to carry on where they left off: without a checkpoint, they wind back to the beginning.

In the early stages of running, before the first checkpoint, BOINC displays an estimate of pseudo-progress, but hasn't actually done enough real work to create a checkpoint and restart file. After Milkyway winds back to zero displayed progress, and starts again on the real work, a checkpoint will have been created and progress should continue. I imagine POEM will do the same, but may take longer to reach the first checkpoint.

If you can, adjust your timing preferences so that the GPU tasks aren't interrupted so often - they will make better progress then.

Interrupting a CPU task isn't as bad, because you can ask for 'Leave applications in memory' when paused - they can then restart anywhere, without needing the checkpoint file. That's because CPU tasks use the computer's main memory - there's usually plenty of that, and it can be swapped to disk if needed. GPU tasks use the internal memory on the video card: that is more limited, and may be needed by your real work, so it's cleared every time the GPU task pauses, whatever your 'leave in memory' setting.
ID: 62262 · Report as offensive

Message boards : Questions and problems : GPU restarts from zero

Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.