Client task scheduling

Message boards : BOINC client : Client task scheduling
Message board moderation

To post messages, you must log in.

AuthorMessage
Paul Schauble

Send message
Joined: 29 Aug 05
Posts: 68
Message 40428 - Posted: 29 Sep 2011, 2:46:26 UTC

I've currently running BOINC 6.12.34. I am subscribed to 4 projects.

I've recently been watching how BOINC manages tasks. I've read the discussion of project debt and scheduling for this version and I'm still puzzled by how it works.

All of the discussion of task scheduling talk about the normal mode being to use debt to select a project to run. Nothing talks about how you select tasks once a project is selected. It seems obvious that once you select a project you should run the available task with the earliest deadline, but that's not the way it works.

First, I currently have 22 work units for Cosmology@Home. More than half (13) are partially completed. The last time BOINC switch into the project, instead of picking up one of the partially finished units, it started a new one that wasn't the earliest deadline.

Can someone please explain how tasks are selected to run withing a project?

Thanks
ID: 40428 · Report as offensive
Profile Jord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15483
Netherlands
Message 40433 - Posted: 29 Sep 2011, 6:38:29 UTC - in response to Message 40428.  

I'll answer that with a quote of how John McLeod VII over at Seti explained how BOINC follows the FIFO track at all times that it can:

The BOINC CPU scheduler can be in either of two modes at any given time for each processor independently (OK, this slide over multi CPU tasks a bit, it is covered in the code though).

At each task start or task switch opportunity the CPU scheduler estimates the completion times of all tasks in the system, and compares that to 90% of the time from now to the computation completion deadline (which is earlier than the report deadline that shows in the UI - more on that in a bit). If the estimated completion time is >= 90% of the computation deadline, then that task is marked as requiring high priority.

For each high priority task in deadline order from earliest to latest, assign the task to a CPU (or GPU as the case may be) until either the supply of CPUs (or GPUs) is exhausted or all of the high priority tasks are assigned to a CPU (or GPU).

For the remaining CPUs and GPUs, assign tasks in Round Robin order by project and FIFO within each project.

The two different operational modes do several things:

1) Meet all deadlines if it is at all possible to do so.
2) Ensures that long running tasks with distant deadlines are not starved.
3) Cycles through the different projects in what is hopefully an interesting order.

The computation deadline is the report deadline - (task switch interval + Connect Every X days + possibly 1 day (version dependent, removed in later versions)). This ensures that the task should be completed early enough to be updated and reported during a connection that falls between the completion of the task and the report date.

The estimation takes into account factors such as fraction of the time BOINC gets to do computation, and resource shares of all projects with tasks on the computer among other things. Which leads to the non-obvious outcome that low resource projects that actually get to download work typically get a lot of high priority CPU time to complete by deadline.



ID: 40433 · Report as offensive
Paul Schauble

Send message
Joined: 29 Aug 05
Posts: 68
Message 40440 - Posted: 29 Sep 2011, 16:14:27 UTC - in response to Message 40435.  

My problem is that in this case it isn't completing all the work units.

My machine was shut off for several days. As a result in Cosmology (long work units all about the same length) ordered by deadline I have 7 work units, then a several day gap, then more work units. It appears that the early wus will not finish by deadline.

At this point, I have Boinc running 4 high priority wus from Cosmology, but these are NOT 4 units from the early group. BOINC has run and completed units from the late group and even downloaded additional units. But seemingly will not run any of the early units.

I've finally shut off downloading, more units have finished, but not from the early group.

I'd offer to volunteer to work on the scheduler, but I understand it is rewritten in the next version.
ID: 40440 · Report as offensive
AmigaForever

Send message
Joined: 14 Jun 11
Posts: 46
Germany
Message 40441 - Posted: 29 Sep 2011, 16:35:13 UTC - in response to Message 40435.  

But with 13 tasks from 1 project started and the scheduler then gives the CPU to a task that hasn't been started then it's clear the scheduler is not following the FIFO rule. If 13 tasks are already started then, according to John, they must have started because they were in first. If they were in first then why doesn't the scheduler give the CPU back to one of them? Instead it starts a task that hasn't started because, according to John, it came in later.


I am not deep into this issue but have recognized some seemingly weird behavior of the scheduler myself. As far as I have come it is not just the FIFO rule, but instead the scheduler takes the (estimated) time needed to complete a task and compares it to other tasks (of the same projekt of course). Therefore if a partially completed task does only need "little" time to complete and a new task would get close to its deadline or even over it the new task gets CPU time so all tasks can complete within their deadlines.

Well, that is my theory, at last..... but I think it is reasonable and would be practicable, too.

And anyway, it seems to work so far.
ID: 40441 · Report as offensive
AmigaForever

Send message
Joined: 14 Jun 11
Posts: 46
Germany
Message 40442 - Posted: 29 Sep 2011, 16:38:03 UTC - in response to Message 40440.  

My problem is that in this case it isn't completing all the work units.

My machine was shut off for several days. As a result in Cosmology (long work units all about the same length) ordered by deadline I have 7 work units, then a several day gap, then more work units. It appears that the early wus will not finish by deadline.

At this point, I have Boinc running 4 high priority wus from Cosmology, but these are NOT 4 units from the early group. BOINC has run and completed units from the late group and even downloaded additional units. But seemingly will not run any of the early units.

I've finally shut off downloading, more units have finished, but not from the early group.



That would be weird indeed. Please let it run and complete and tell us how it finished. I would be really interested, maybe it offers some insight into the schedulers' work.
ID: 40442 · Report as offensive
whynot

Send message
Joined: 8 May 10
Posts: 89
Ukraine
Message 40502 - Posted: 1 Oct 2011, 14:58:04 UTC - in response to Message 40433.  


3) Cycles through the different projects in what is hopefully an interesting order.


May I hear more about 'interesting order', please?

I'm counting for science,
points just make me sick.
ID: 40502 · Report as offensive
Claggy

Send message
Joined: 23 Apr 07
Posts: 1112
United Kingdom
Message 40504 - Posted: 1 Oct 2011, 20:31:08 UTC - in response to Message 40502.  


3) Cycles through the different projects in what is hopefully an interesting order.


May I hear more about 'interesting order', please?

The interesting order depends on work availability at the attached projects,

Claggy
ID: 40504 · Report as offensive
whynot

Send message
Joined: 8 May 10
Posts: 89
Ukraine
Message 40606 - Posted: 8 Oct 2011, 15:16:27 UTC - in response to Message 40504.  


The interesting order depends on work availability at the attached projects,


Does it mean that client has some paranormal wisdom about WU availabitility in future? Or not? I'm confused.

And a bit of observation on FIFO. Before PG challenges I nomorework every project attached except SZTAKI that has four week deadlines. At a challenge start I suspend it and WUs are finished eventually after challenge end. This time, three days before the Equinox SZTAKI stops distributing work, so I've resumed ABC in order to keep cores busy. ABC has two week deadlines, WOO-RLLs are quite time consuming, thus I decided that as soon as I see that started WU doesn't finish before the challenge end I can safely resume ABC.

So, within 48 hours before bottom deadline a client has three cores and three days of work. What could possibly go wrong? The client starts to empty the queue from the end. That was scary. Within eight hours before the top deadline three hours of work weren't even touched. Those three hours were finished and uploaded, however they weren't reported. I had to manualy report them two minutes after deadline. Hopefully they weren't sent to someone in that time.

Shortly, I fail to see any FIFO.

I'm counting for science,
points just make me sick.
ID: 40606 · Report as offensive
AmigaForever

Send message
Joined: 14 Jun 11
Posts: 46
Germany
Message 40612 - Posted: 9 Oct 2011, 17:30:58 UTC - in response to Message 40608.  

Those three hours were finished and uploaded, however they weren't reported. I had to manualy report them two minutes after deadline. Hopefully they weren't sent to someone in that time.


That sucks. Use the report_tasks_immediately option in a cc_config.xml file.


That is a good issue to speak of - I never understood what the reporting feature does. Maybe someone can explain this to me, because I always took it that when results are uploaded the project is notified (shouldn't that be always in this way?) automatically....

So what's the difference anyway? Why need to report separately? Who goes this report to?


And moreover, why is there even a report_tasks_immediately option for this anyway, this should be done always and immediately by default (aka hardcoded), right?

Thanks guys.....
ID: 40612 · Report as offensive
Profile Jord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15483
Netherlands
Message 40613 - Posted: 9 Oct 2011, 19:56:36 UTC - in response to Message 40612.  

Uploading is just moving data from your hard drive to a hard drive on the server.
Reporting accesses the database, whereby it must be noted that when you report one task that it takes as much overhead on the database server as when you report ten tasks or one hundred tasks. So projects and BOINC prefer it if you report tasks in bunches, to keep things nice and tidy.

The RRI option is there to allow reports to be done immediately on projects like WCG which make daily backups of their data. That way you have a good chance that your reported tasks were in before the backup.

Other projects don't like RRI that much. Einstein and CPDN have been reported as disliking RRI, whereby you can lose data.

So use at your own discretion.
ID: 40613 · Report as offensive
SekeRob2

Send message
Joined: 6 Jul 10
Posts: 585
Italy
Message 40616 - Posted: 10 Oct 2011, 9:12:41 UTC - in response to Message 40613.  

"Daily Backups"? Maybe a blond moment, No, the RRI was [reluctantly activated again] on request of WCG for large contributors who re-image daily [DeepFreeze], regardless if work is finished or not... call it a Daily Restore [to virgin set-up state] ;-).
ID: 40616 · Report as offensive
AmigaForever

Send message
Joined: 14 Jun 11
Posts: 46
Germany
Message 40617 - Posted: 10 Oct 2011, 13:21:34 UTC - in response to Message 40613.  

Uploading is just moving data from your hard drive to a hard drive on the server.
Reporting accesses the database, whereby it must be noted that when you report one task that it takes as much overhead on the database server as when you report ten tasks or one hundred tasks. So projects and BOINC prefer it if you report tasks in bunches, to keep things nice and tidy.

The RRI option is there to allow reports to be done immediately on projects like WCG which make daily backups of their data. That way you have a good chance that your reported tasks were in before the backup.

Other projects don't like RRI that much. Einstein and CPDN have been reported as disliking RRI, whereby you can lose data.

So use at your own discretion.


Thanks Ageless, that really cleared things up.
:)
ID: 40617 · Report as offensive
AmigaForever

Send message
Joined: 14 Jun 11
Posts: 46
Germany
Message 40618 - Posted: 10 Oct 2011, 13:28:46 UTC - in response to Message 40608.  

Those three hours were finished and uploaded, however they weren't reported. I had to manualy report them two minutes after deadline. Hopefully they weren't sent to someone in that time.


I think you are on the safe side - AFAIK most projects do not take their deathlines strictly but give a bit time to it for exactly those cases.

Sory, I forgot to mention that in my earlier post.
ID: 40618 · Report as offensive
SekeRob2

Send message
Joined: 6 Jul 10
Posts: 585
Italy
Message 40622 - Posted: 10 Oct 2011, 14:28:12 UTC - in response to Message 40618.  

Those three hours were finished and uploaded, however they weren't reported. I had to manualy report them two minutes after deadline. Hopefully they weren't sent to someone in that time.


I think you are on the safe side - AFAIK most projects do not take their deathlines strictly but give a bit time to it for exactly those cases.

Sory, I forgot to mention that in my earlier post.

More accurately, projects do set a replacement task immediately ready in the distributor, WCG at least, and if the No Reply still shows up, a message is send to the repair men not to start the task i.e. abort. If the No Reply shows up before someone fetches the repair task, it will be changed to 'Other' status and withdrawn. Some members will see that and come to the forums to ask what was that about :D

It's rather odd that the client did not report immediately of it's own volitions with that deadline exceed. Maybe that could get incorporated into the coming client if not already.

--//--
ID: 40622 · Report as offensive

Message boards : BOINC client : Client task scheduling

Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.