Finish WU before reschedule

Message boards : BOINC client : Finish WU before reschedule
Message board moderation

To post messages, you must log in.

AuthorMessage
Knorr

Send message
Joined: 30 Mar 06
Posts: 2
Denmark
Message 3722 - Posted: 30 Mar 2006, 17:59:52 UTC

Hi there...

I'm working on several projects on one computer.

When BOINC is running, it has access to the internet constantly, so I aim to have a small work buffer.

One minor problem though.

When a WU is about 30 min from completion the client asks for new work.

When the new WU has been downloaded the client forces a reschedule of the CPU time.

This results in the client switching project every time.

How (if possible) can I fine tune the client to either:

- Download new work as now, but without switching project

- Wait with the download until a projects work buffer is COMPLETELY empty

?

Currently running LHC and Rosetta.

Rosetta WU takes 2 hr, and LHC about 3 hr at the moment.

I have the switch app at 130 min
Connect to network at 0.1 days

- Knorr
ID: 3722 · Report as offensive
Aurora Borealis
Avatar

Send message
Joined: 8 Jan 06
Posts: 448
Canada
Message 3723 - Posted: 30 Mar 2006, 18:41:14 UTC
Last modified: 30 Mar 2006, 18:41:51 UTC

As far as I know there is no way to override the algorithm used by Boinc.
Boinc was design to re-evaluate which WU to crunch, at the interval set for the connect time, at completion of a WU, and, when new work is downloaded. It check due dates, time to completion, project resource share and short term debt to make its decision.

Boinc V 7.4.36
Win7 i5 3.33G 4GB NVidia 470
ID: 3723 · Report as offensive
W-K ID 666

Send message
Joined: 30 Dec 05
Posts: 459
United Kingdom
Message 3736 - Posted: 1 Apr 2006, 11:10:35 UTC

The BOINC scheduler has to work the way Aurora Borealis explained because some projects require a very short (hours) turn-round. If a host, running such a project, had a very long switch between projects time then deadlines could be missed.
ID: 3736 · Report as offensive
feet1st

Send message
Joined: 4 Apr 06
Posts: 12
United States
Message 3769 - Posted: 4 Apr 2006, 22:33:00 UTC - in response to Message 3722.  

How (if possible) can I fine tune the client


I believe you are saying your objective is simply to complete the work unit that's already in progress before switching. Bottom line, BOINC doesn't work that way... unless you only have one project installed with work.

But that's ok, the WUs are made to be interrupted and take the CPU time they get from the scheduling algorythms. You can change your general preferences to "Leave applications in memory while preempted? YES". This will minimize any efficiency that may be lost when a WU is suspended without a recent checkpoint. You can also minimize the effect a bit if you increase your "Switch between applications every... minutes" setting to exceed your WU size preference.

Rosetta allows a project specific preference for your preferred WU size. If you've got high bandwidth available, and want to keep a tidy WU list, perhaps a short 2hr WU is for you. Switch between apps every 120 minutes or a bit more, and you'll be closer or totally completed more frequently. I have mine set to switch every 360min.

Also, every time you "connect to network" (in your case every 2.4hrs, .1 days) the scheduler comes in and makes a new assessment as to who should be running next. So, if you'd increase that value to connect say every .25 or .5 days, then you'd be reducing the number of interruptions from connections, and therefore reducing the number of times the scheduler makes a new assessment.
ID: 3769 · Report as offensive
W-K ID 666

Send message
Joined: 30 Dec 05
Posts: 459
United Kingdom
Message 3773 - Posted: 5 Apr 2006, 12:51:33 UTC

If you were to ask scheduler to check it is not in the last few mins of completeing a unit I would agree with you.
I've had a unit interupted at 99.34% complete, 'To completion' 14s. As that project only gets a 28% resourse share it sat in that condition, in memory for over 3 hours before completing.

Andy
ID: 3773 · Report as offensive
Knorr

Send message
Joined: 30 Mar 06
Posts: 2
Denmark
Message 3774 - Posted: 5 Apr 2006, 12:59:24 UTC - in response to Message 3769.  


But that's ok, the WUs are made to be interrupted and take the CPU time they get from the scheduling algorythms. You can change your general preferences to "Leave applications in memory while preempted? YES". This will minimize any efficiency that may be lost when a WU is suspended without a recent checkpoint. You can also minimize the effect a bit if you increase your "Switch between applications every... minutes" setting to exceed your WU size preference.


But a problem accours, when your computer isn't turned on 24/7.

mFluids' WUs haven't got checkpoints. If you run mFluids with little share of resources, and it gets paused at 90+% it could take several hours before it gets restartet. And if the computer is turned off before this happens, then it's all over again from 0% when the computer is booted again.

I agree that there's not much of a problem IF the computer is on 24/7 and has got enough memory to leave all apps in memory.

- Knorr
ID: 3774 · Report as offensive
Aurora Borealis
Avatar

Send message
Joined: 8 Jan 06
Posts: 448
Canada
Message 3777 - Posted: 5 Apr 2006, 13:27:40 UTC

The amount of RAM is not really a consideration except for the need of a minimum amount of individual projects. Most of the time the idle project is dumped into your virtual memory on your hard drive. I personally have as many as many as 6 or 7 projects 'in memory' at a time with only 500 Meg of RAM. More ram just save the swap time.

Boinc V 7.4.36
Win7 i5 3.33G 4GB NVidia 470
ID: 3777 · Report as offensive
tralala

Send message
Joined: 6 Apr 06
Posts: 11
Germany
Message 3781 - Posted: 6 Apr 2006, 9:31:26 UTC

I think it is really urgent to allow switching of WU only after a checkpoint has been reached. As others pointed out mfluids for example has no checkpoints at the moment and attribution.cpdn.org has a checkpoint only between an hour and several hours. It also consumes about 500 MB RAM. Loosing work of several hours and paging in and out 500 MB is annoying and unnecessary if you allow switching only after reaching a checkpoint (and then safely exit the application).

Furthermore an option would be nice to allow switching only after the completion of a WU. It would give the user more control what BOINC does and minimize any risk of loosing work due to switching.
ID: 3781 · Report as offensive
Metod, S56RKO

Send message
Joined: 9 Sep 05
Posts: 128
Slovenia
Message 3800 - Posted: 8 Apr 2006, 12:51:52 UTC - in response to Message 3781.  

I think it is really urgent to allow switching of WU only after a checkpoint has been reached. As others pointed out mfluids for example has no checkpoints at the moment and attribution.cpdn.org has a checkpoint only between an hour and several hours. It also consumes about 500 MB RAM. Loosing work of several hours and paging in and out 500 MB is annoying and unnecessary if you allow switching only after reaching a checkpoint (and then safely exit the application).

Furthermore an option would be nice to allow switching only after the completion of a WU. It would give the user more control what BOINC does and minimize any risk of loosing work due to switching.


Or the other way around: project science applications should be making checkpoints regularly. There's even a setting which could be used to indicate the frequency of checkpoints (if feasible for the given project): Write to disk at most every.

Why put your greef on BOINC developers when it's up to project developers to rpoduce decent science code? This reminds me of recent Einstein@Home development, where a skilled user made science app up to 4 times faster than official one - even without laying his hands on source code.
Metod ...
ID: 3800 · Report as offensive
tralala

Send message
Joined: 6 Apr 06
Posts: 11
Germany
Message 3801 - Posted: 8 Apr 2006, 13:46:14 UTC - in response to Message 3800.  

Or the other way around: project science applications should be making checkpoints regularly. There's even a setting which could be used to indicate the frequency of checkpoints (if feasible for the given project): Write to disk at most every.

Why put your grief on BOINC developers when it's up to project developers to produce decent science code? This reminds me of recent Einstein@Home development, where a skilled user made science app up to 4 times faster than official one - even without laying his hands on source code.


AFAIK for the Seasonal Attribution Project from cpdn there is no feasible way to increase the number of checkpoints. The climate model they're using is checkpointed every model day which takes in case of the high resolution model used for the seasonan project quite long. Rewriting the model to checkpoint every hour is beyond the financial and personal ressources of this project. They use as input already developed climate models and tune them for PCs. These climate models are highly complex and it is already a huge task to convert them for PCs. Changing the model itself would only lead to greater instability. It requires several years and dozens of scientists/programmers to write a climate model from scratch.

Science is too complex that you can always checkpoint in all applications at convenient time intervals. Furthermore even with a checkpoint every 10 minutes one might loose up to 10 minutes per hour given the current default switching time. That's about 16%. What a waste! The best solution would be to switch only at checkpoints and only if the time to completion for the current WU is > 1 hour.

P.S.: I totally agree with you that I wonder how inefficient some apps are. The speedup in Einstein is really unbelievable and lets one doubt if there was an effort put in optimizations in the first place. It seems the attitude is since BOINC-computers are for free there is no need to really bother optimizing the science apps. :-(
However BOINC itself should operate as optimal as possible hence the suggestion to switch only on checkpoints and not if the WU is almost finished.

ID: 3801 · Report as offensive
Keck_Komputers
Avatar

Send message
Joined: 29 Aug 05
Posts: 304
United States
Message 3822 - Posted: 10 Apr 2006, 3:53:42 UTC

JM7 is already on this. The first try did not work correctly and had to be backed out but I'm sure we will get this improvement eventually.
BOINC WIKI

BOINCing since 2002/12/8
ID: 3822 · Report as offensive
tralala

Send message
Joined: 6 Apr 06
Posts: 11
Germany
Message 3892 - Posted: 14 Apr 2006, 22:47:00 UTC - in response to Message 3822.  

JM7 is already on this. The first try did not work correctly and had to be backed out but I'm sure we will get this improvement eventually.


Well it seems it didn't make it into 5.4.X at least it is not listed in the version history.

Switching only on checkpoints could save a lot of wasted computer time. It's no longer a problem of alpha-project. Rosetta for example has nowadays big WU which can't checkpoint within an hour which will be started over and over indefinitely if "Leave app in memory" is not checked.
ID: 3892 · Report as offensive
Profile Jord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15483
Netherlands
Message 3893 - Posted: 14 Apr 2006, 23:29:17 UTC - in response to Message 3892.  

Rosetta for example has nowadays big WU which can't checkpoint within an hour which will be started over and over indefinitely if "Leave app in memory" is not checked.

So you set your switch between applications time to 70, 80, 90 or more minutes.
ID: 3893 · Report as offensive
tralala

Send message
Joined: 6 Apr 06
Posts: 11
Germany
Message 3903 - Posted: 15 Apr 2006, 9:34:29 UTC - in response to Message 3893.  

Rosetta for example has nowadays big WU which can't checkpoint within an hour which will be started over and over indefinitely if "Leave app in memory" is not checked.

So you set your switch between applications time to 70, 80, 90 or more minutes.


No problem for me, I take care of all my projects/task. But a big problem for the majority of users who use BOINC with default settings and unattended and a big waste of valueable computing time for the science.
ID: 3903 · Report as offensive
Metod, S56RKO

Send message
Joined: 9 Sep 05
Posts: 128
Slovenia
Message 3924 - Posted: 16 Apr 2006, 7:24:28 UTC - in response to Message 3903.  
Last modified: 16 Apr 2006, 7:27:22 UTC

Rosetta for example has nowadays big WU which can't checkpoint within an hour which will be started over and over indefinitely if "Leave app in memory" is not checked.

So you set your switch between applications time to 70, 80, 90 or more minutes.


No problem for me, I take care of all my projects/task. But a big problem for the majority of users who use BOINC with default settings and unattended and a big waste of valueable computing time for the science.


My guess is that if science project developers wanted to make their scientific code BOINC-friendly, they would put in code to do the check points. If they don't do it, this could mean several things. Right now I can think of:

  • it really is impossible (I can hardly beleive this)
  • nobody actually thought about it as they only run their project and scheduller doesn't stop the apps
  • they just won't bother


In any case if I wasn't secretly in love with that particular project, I'd just stand away from them. CPU time on my machines is precious resource and I won't let some project scientists throw it awy with BOINC un-friendly aps.

[addendum]
Anyhow, I wouldn't expect BOINC CC developers to put in special hooks just to save faces of those science project developers. They've got the toolkit, they should produce decent apps that use up all the available hooks and tools.


Metod ...
ID: 3924 · Report as offensive
Augustine
Avatar

Send message
Joined: 10 Mar 06
Posts: 73
Message 4014 - Posted: 21 Apr 2006, 15:55:23 UTC - in response to Message 3893.  
Last modified: 21 Apr 2006, 15:55:36 UTC


So you set your switch between applications time to 70, 80, 90 or more minutes.

Which may work on single-processor systems, but not on multi-processor or multi-core systems, when a processor may finish a WU, sometimes many times every hours for short-lived WUs from XtremLab, Leiden, HashClash, Tanpaku, etc, and cause rescheduling on the other processors or cores.

FWIW, I suggested here to move scheduling from the computer to each processor or core.

HTH

ID: 4014 · Report as offensive
behemoth

Send message
Joined: 6 May 06
Posts: 6
United Kingdom
Message 4218 - Posted: 6 May 2006, 23:26:10 UTC - in response to Message 3773.  

If you were to ask scheduler to check it is not in the last few mins of completeing a unit I would agree with you.
I've had a unit interupted at 99.34% complete, 'To completion' 14s. As that project only gets a 28% resourse share it sat in that condition, in memory for over 3 hours before completing.

Andy


This very situation happens to my WU's too. Some workunits stop with 8seconds to go till completetion others say about 5minutes.

a) Is there some sort of new setting bveing made called 'endgame leeway' that will allow for the user to configure, and consequetnly the the boinc-client to complete a work unit in these circumstances?

another thing is that completed work units are not automatically uploaded immediately upon completetion. My comps - on+connected 24/7.

b) There seems to me a need for a setting to allow for the upload of work units immediately that they are completed.

I run 12 projects - some on 1% some on 50%.

ID: 4218 · Report as offensive
Aurora Borealis
Avatar

Send message
Joined: 8 Jan 06
Posts: 448
Canada
Message 4220 - Posted: 6 May 2006, 23:52:46 UTC - in response to Message 4218.  


This very situation happens to my WU's too. Some workunits stop with 8seconds to go till completetion others say about 5minutes.

a) Is there some sort of new setting bveing made called 'endgame leeway' that will allow for the user to configure, and consequetnly the the boinc-client to complete a work unit in these circumstances?

I believe this is on the todo list.

another thing is that completed work units are not automatically uploaded immediately upon completetion. My comps - on+connected 24/7.

b) There seems to me a need for a setting to allow for the upload of work units immediately that they are completed.

I run 12 projects - some on 1% some on 50%.


Boinc is designed to do the upload as a two stage process. This is to reduce load on the project servers. The uploading occurs immediately on completion of the work. The reporting is done separately when the project is next contacted.
Please read for a fuller explanation of the Reporting Process

Boinc V 7.4.36
Win7 i5 3.33G 4GB NVidia 470
ID: 4220 · Report as offensive
behemoth

Send message
Joined: 6 May 06
Posts: 6
United Kingdom
Message 4229 - Posted: 7 May 2006, 11:19:04 UTC

Is it possible get gain sight of the todo list?


Regarding the upload of data.
I have for example a completed work unit in my boic client. Looking through the messages tab there _is_ a message saying it has been uploaded.

07/05/2006 09:57:53|SZTAKI Desktop Grid|Started upload of 2c2d0a79-89c4-49be-ac4b-5340c28df44d_1_0
07/05/2006 09:57:56|SZTAKI Desktop Grid|Finished upload of 2c2d0a79-89c4-49be-ac4b-5340c28df44d_1_0
07/05/2006 09:57:56|SZTAKI Desktop Grid|Throughput 11029 bytes/sec


I guess this is a connection to berkley (none to clear on the wiki or in app where this connection is made to - and I really cant be bothered to be arseing around with netstat at just the right moment to assertain this simple fact)? Would be nice to have a more informative message in the messages tab - something like 'uploading results file to dodar.com'

Additionally I cannot see the point in not uploading the other half of the unit immediately to the project's server - especially since in my particualr setup it is decreasingly unlikely that there is benefit to waiting till more of the same projects work units are completed within a reasonable time - far more likely that other projects' workunits will complete - the resources in regard to my internet connection and the resources at the destination servers is the same in 99% of the time.

A full 2 hours plus change and still the result had not cleared from my client so I clicked update on that project:

07/05/2006 12:13:13|SZTAKI Desktop Grid|Sending scheduler request to http://szdg.lpds.sztaki.hu/szdg/cgi-bin/scheduler
07/05/2006 12:13:13|SZTAKI Desktop Grid|Reason: Requested by user
07/05/2006 12:13:13|SZTAKI Desktop Grid|Reporting 1 results
07/05/2006 12:13:17|SZTAKI Desktop Grid|Scheduler request to http://szdg.lpds.sztaki.hu/szdg/cgi-bin/scheduler succeeded

A sligthly more informative message but still not very informative.




ID: 4229 · Report as offensive

Message boards : BOINC client : Finish WU before reschedule

Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.