Projects with no works

Author	Message
thierry.l Send message Joined: 18 Sep 07 Posts: 19	Message 12610 - Posted: 19 Sep 2007, 23:46:18 UTC Hi, Sometimes, some projects don't give any works, for any reasons, for a long time SIMAP, LHC, Proteins, ... It cause a problem on other projects (which are distributing WUs correctly) because of the long_term_debt which is normalized, so a project with no work got a big positive long_term_debt that increasing, and the others, a long_term_debt that decreasing below cpu_scheduling_period, and finally don't request for any new job. I suggest to limit long_term_debt for not runnable() projects. void CLIENT_STATE::adjust_debts() { ... // adjust long-term debts // if (p->runnable() \|\| p->wall_cpu_time_this_debt_interval \|\| ((p->long_term_debt < 0) && p->potentially_runnable())) { share_frac = p->resource_share/prrs; p->long_term_debt += share_frac*total_wall_cpu_time_this_debt_interval - p->wall_cpu_time_this_debt_interval; } total_long_term_debt += p->long_term_debt; regards ID: 12610 ·

Keck_Komputers Send message Joined: 29 Aug 05 Posts: 304	Message 12628 - Posted: 20 Sep 2007, 8:38:05 UTC There is already a mechanism in place to reduce the amount the LTD changes for projects with no work. While your method may work better, I for one would still like to see the LTD increase for any project that is not being worked on. The later clients also do a better job of getting work from somewhere even when LTD is not balanced. BOINC WIKI BOINCing since 2002/12/8 ID: 12628 ·

thierry.l Send message Joined: 18 Sep 07 Posts: 19	Message 12631 - Posted: 20 Sep 2007, 10:08:20 UTC - in response to Message 12628. I still have to reset long_term_debt even on 5.10.20. Anyway, I thought it was a good idea to stabilize long_term_debt around zero, as those projects were new at restart, or around cpu_scheduling_period, ready to run. ID: 12631 ·

John McLeod VII Send message Joined: 29 Aug 05 Posts: 147	Message 12632 - Posted: 20 Sep 2007, 10:43:11 UTC Last modified: 20 Sep 2007, 10:48:38 UTC BOINC will not keep work from all projects on the host. Therefore the LTD must be calculated for projects that are not runnable. All of the BOINC versions will get work from projects that have too low an LTD if there is not "enough" work on the system. The meaning of enough varies depending on the version. In 5.10.x it means that the specified queue ("Connect every X" plus "Extra Work") is not filled. In earlier versions, it meant either "Connect every X" was not filled (5.8.x) or that there was a CPU dry (earlier). [edit - clarification] Under certain circumstances, BOINC will not make any attempt to keep a task from each project on hand. In particular when a task is in EDF and the queue is full. For example, I have one computer that has been in EDF with a CPDN tqask for several months and has several months more to go. Any work that is downloaded will make this task even later that it is already going to be. If the not runnable rule was in place, CPDN would be immediately availabele for another download and another year of running wolo - breaking my resource shares completely. The best compromise that we have been able to come up with is the current one. If the project has no work on the system, and it is in communications deferral, then the LTD does not increase. BOINC WIKI ID: 12632 ·

thierry.l Send message Joined: 18 Sep 07 Posts: 19	Message 12634 - Posted: 20 Sep 2007, 11:17:43 UTC Sorry, I made a mistake, I thought that no active task for project and dont_request_more_work had the same effect that suspended_via_gui. In fact, I was wrong and may be responsable of those negative values. So, in order to suspend one project for any reason, we have to set dont_request_more_work to true until active task ended, and then suspend project (and reset dont_request_more_work). What is EDF ? ID: 12634 ·

Jord Volunteer tester Help desk expert Send message Joined: 29 Aug 05 Posts: 15482	Message 12635 - Posted: 20 Sep 2007, 11:40:36 UTC - in response to Message 12634. What is EDF ? Earliest Deadline First. ID: 12635 ·

thierry.l Send message Joined: 18 Sep 07 Posts: 19	Message 12639 - Posted: 20 Sep 2007, 12:36:43 UTC Thanks ! I see what you mean, if you got an EDF task running and a project with no work, long_term_debt of EDF will not have negative value as it is normalized. But anyway, if there's no work for 2nd project at the end of EDF task, the first should download another WU, no ? And if you got another project with works, the EDF project will have a big negative value on long_term_debt, so won't download. I mean, you may decide share, but if distributed works of different projects do not follow this share, you will have permanent misbehavior. My purpose was only to considere a no-WUs project with no impact on the others as it doesn't need my CPU - for now (like suspended but keeping an eye opened in case of restart). ID: 12639 ·

thierry.l Send message Joined: 18 Sep 07 Posts: 19	Message 20402 - Posted: 22 Sep 2008, 17:38:05 UTC - in response to Message 12639. Hi, I finally decided to change the boinc core in order to manage share in two ways - The actual one : share is managed for very long term, as example : 2 projects on one client, same share, one doesn't give any wu for 2 months, and then there's works to do, so it will run alone for 1 month in order to reach the same term as the other, so share is near 66% - 33 % (100 - 0 x 2, 0 - 100 x 1) . - The new one : share is managed between project with wu, you may select a delay (as parameter) which is the limit until a project with no work become invisible (in resource share), in the previous example, with a delay of 1 week to 2 months, share is near 83% - 17% (100 - 0 x 2, 50 - 50 x 1), share is as usual with a delay upper 2 months. A new attribute 'out_of_work_time' is added to project ( .xml too), if a project don't have any result (even suspended), and is suspended, or don't need more work, or need work but don't get any, then this attribute is set to 1, each time there's an adjust debt, the wall-cpu-time is added to it. If the attributes is upper the delay, then the project is temporary suspended (long term debt is reset to zero, project is not potentially runnable anymore (except for work request)). NB : - wall_cpu_time is stored when suspending tasks, and restored when resuming (I don't think ajusting debt on a project when client was sleeping is fair). - I kept in mind the message of the highland cow about the edf, so if all the cpus are running edf project, wall_cpu_time is not added to out_of_work_time. If you're interested in this new management about share, contact me. - source code based on 6.2.18 - regards ID: 20402 ·

John McLeod VII Send message Joined: 29 Aug 05 Posts: 147	Message 20404 - Posted: 22 Sep 2008, 18:42:37 UTC - in response to Message 20402. Hi, I finally decided to change the boinc core in order to manage share in two ways - The actual one : share is managed for very long term, as example : 2 projects on one client, same share, one doesn't give any wu for 2 months, and then there's works to do, so it will run alone for 1 month in order to reach the same term as the other, so share is near 66% - 33 % (100 - 0 x 2, 0 - 100 x 1) . - The new one : share is managed between project with wu, you may select a delay (as parameter) which is the limit until a project with no work become invisible (in resource share), in the previous example, with a delay of 1 week to 2 months, share is near 83% - 17% (100 - 0 x 2, 50 - 50 x 1), share is as usual with a delay upper 2 months. A new attribute 'out_of_work_time' is added to project ( .xml too), if a project don't have any result (even suspended), and is suspended, or don't need more work, or need work but don't get any, then this attribute is set to 1, each time there's an adjust debt, the wall-cpu-time is added to it. If the attributes is upper the delay, then the project is temporary suspended (long term debt is reset to zero, project is not potentially runnable anymore (except for work request)). NB : - wall_cpu_time is stored when suspending tasks, and restored when resuming (I don't think ajusting debt on a project when client was sleeping is fair). - I kept in mind the message of the highland cow about the edf, so if all the cpus are running edf project, wall_cpu_time is not added to out_of_work_time. If you're interested in this new management about share, contact me. - source code based on 6.2.18 - regards Really bad idea. Suppose that CPDN takes over for a year (yes, I have had this happen to some of my computers). Do you REALLY want CPDN to take over for ANOTHER year when it gets the first task downloaded (after a day or less of processing the other project)? What about your resource shares? BOINC WIKI ID: 20404 ·

thierry.l Send message Joined: 18 Sep 07 Posts: 19	Message 20414 - Posted: 22 Sep 2008, 22:56:16 UTC - in response to Message 20404. Last modified: 22 Sep 2008, 23:38:06 UTC Really bad idea. Suppose that CPDN takes over for a year (yes, I have had this happen to some of my computers). Do you REALLY want CPDN to take over for ANOTHER year when it gets the first task downloaded (after a day or less of processing the other project)? What about your resource shares? Yes, I know, but it works, I mean my attribute is based on work requests, if there's no work requests, then it work as today. 3 cases. - you have CPDN on your computer (I've got one too), and it is always on edf mode (not mine, you'd better move this project to a faster computer, don't you think ?), so there's no work requests for other projects on this computer, and field is not set, so share is as now. - you have CPDN on your computer and it run correctly, and an other project with works, so share is as now. - you have CPDN on your computer and it run correctly, but you've got also another projet like harvard clean energy, hydrogen, ... and don't have any works for now, so the long term is set to 0 on this project until wu's coming. My share is not based on eternity, but on projects that need my cpu now (running project), if they don't have jobs for me now, that's fine, but I don't want it cause problems on others (very low long term cause other projects to run one by one), and I don't want to detache, reattache in order to reset this attributes. If I have 10 projects on my computer, my share is 1/10 for each if they give me wus, if one don't give me wu, share become 1/9 for each, and so on. As I said, you may decide your share, but projects don't care of it, if they don't have any jobs to do, c'est la vie, some projects are closing, where are your share on those after 1 year of very big long term ? Are you sure riesel sieve will continue ? University projects are changing from year to an other, and they need my cpu now, not next year because today I'm running a very large long term project. I prefer this, but it's a choice, if you want it run as today, just set the parameter to zero (as default), but if you want to set a project with no job for 2 months in pending state, just set this parameter to 61 days. I think its a good idea, no ? Add-on : suppose there's 1 million hosts with a fifty-fifty share between seti and simap, as simap don't give works all the time, it means that seti will lose 1 million hosts each time simap give works, that's not the way I see the share. ID: 20414 ·

John McLeod VII Send message Joined: 29 Aug 05 Posts: 147	Message 20419 - Posted: 23 Sep 2008, 1:55:34 UTC - in response to Message 20414. Really bad idea. Suppose that CPDN takes over for a year (yes, I have had this happen to some of my computers). Do you REALLY want CPDN to take over for ANOTHER year when it gets the first task downloaded (after a day or less of processing the other project)? What about your resource shares? Yes, I know, but it works, I mean my attribute is based on work requests, if there's no work requests, then it work as today. 3 cases. - you have CPDN on your computer (I've got one too), and it is always on edf mode (not mine, you'd better move this project to a faster computer, don't you think ?), so there's no work requests for other projects on this computer, and field is not set, so share is as now. - you have CPDN on your computer and it run correctly, and an other project with works, so share is as now. - you have CPDN on your computer and it run correctly, but you've got also another projet like harvard clean energy, hydrogen, ... and don't have any works for now, so the long term is set to 0 on this project until wu's coming. My share is not based on eternity, but on projects that need my cpu now (running project), if they don't have jobs for me now, that's fine, but I don't want it cause problems on others (very low long term cause other projects to run one by one), and I don't want to detache, reattache in order to reset this attributes. If I have 10 projects on my computer, my share is 1/10 for each if they give me wus, if one don't give me wu, share become 1/9 for each, and so on. As I said, you may decide your share, but projects don't care of it, if they don't have any jobs to do, c'est la vie, some projects are closing, where are your share on those after 1 year of very big long term ? Are you sure riesel sieve will continue ? University projects are changing from year to an other, and they need my cpu now, not next year because today I'm running a very large long term project. I prefer this, but it's a choice, if you want it run as today, just set the parameter to zero (as default), but if you want to set a project with no job for 2 months in pending state, just set this parameter to 61 days. I think its a good idea, no ? Add-on : suppose there's 1 million hosts with a fifty-fifty share between seti and simap, as simap don't give works all the time, it means that seti will lose 1 million hosts each time simap give works, that's not the way I see the share. But you still have the problem where one task uses extra CPU time - the entire point of long term debt and resource shares is to share the CPU over time. With your modification, you might as well do away with these concepts entirely - leaving absoloutely no method of specifying how the projects should share resources. BOINC WIKI ID: 20419 ·

thierry.l Send message Joined: 18 Sep 07 Posts: 19	Message 20422 - Posted: 23 Sep 2008, 10:18:54 UTC - in response to Message 20419. But you still have the problem where one task uses extra CPU time - the entire point of long term debt and resource shares is to share the CPU over time. With your modification, you might as well do away with these concepts entirely - leaving absoloutely no method of specifying how the projects should share resources. I'm sorry, but I don't understand. - First, you may run boinc as now, if you want to. - Second, if you run many projects with works, long term is used as now, so if one project is running in edf mode, long term for this project will be negative and the others positive. - Third, the purpose of this change is to enable the possibility of using share for active projects, not for may be somedays it will be a running project. I think it's easier to manage active projects like this, as you can see in forum, they're some users that cannot manage correctly their share on some projects like Simap, because there's not always jobs to do. For example, you're running predictor on some computer, what is the long term for this project and for the others ? But may be you detach it ? I don't have to, because the new management is like a permanent dettach/reattach, I mean you may think that a project with no works for 1 month, 6 months or 1 year, as you wish, is an inactive project, so you may manually dettach this project (and lose the share on this project) - here you don't have to, the long term of this project is reset to 0 and the project become invisible to share management, but continue to seek for new works in case of. So you don't have to monitor your clients anymore on inactive projects, it's done by the client based on your choice (inactive delay). ID: 20422 ·

John McLeod VII Send message Joined: 29 Aug 05 Posts: 147	Message 20424 - Posted: 23 Sep 2008, 11:17:18 UTC - in response to Message 20422. But you still have the problem where one task uses extra CPU time - the entire point of long term debt and resource shares is to share the CPU over time. With your modification, you might as well do away with these concepts entirely - leaving absoloutely no method of specifying how the projects should share resources. I'm sorry, but I don't understand. - First, you may run boinc as now, if you want to. - Second, if you run many projects with works, long term is used as now, so if one project is running in edf mode, long term for this project will be negative and the others positive. - Third, the purpose of this change is to enable the possibility of using share for active projects, not for may be somedays it will be a running project. I think it's easier to manage active projects like this, as you can see in forum, they're some users that cannot manage correctly their share on some projects like Simap, because there's not always jobs to do. For example, you're running predictor on some computer, what is the long term for this project and for the others ? But may be you detach it ? I don't have to, because the new management is like a permanent dettach/reattach, I mean you may think that a project with no works for 1 month, 6 months or 1 year, as you wish, is an inactive project, so you may manually dettach this project (and lose the share on this project) - here you don't have to, the long term of this project is reset to 0 and the project become invisible to share management, but continue to seek for new works in case of. So you don't have to monitor your clients anymore on inactive projects, it's done by the client based on your choice (inactive delay). What I understood. If a project is running long term in EDF, no other project would ge asked for work. Those projects that are not asked for work will get their loong term debts set to zero. Since all long term debts are shifted such that the mean long term debt is zero, the long term debt of the project that is hogging all of the CPU time will quickly become zero. Thus effectively removing long term debt and resource shares from the program when ooe project ends up hogging the CPU for a while. You have the same problem if your host is attached to many projects. The host cannot handle work from all projects at once, and starts zerroing out the LTD of projects for which work cannot be downloaded at the moment. Another thing that happens in that situation is that some projects end up with very large negative LTD values and aren't asked for work for a very long time. You r modification would reset these projects back to 0 LTD long before their resource share indicates that it is their time. BOINC WIKI ID: 20424 ·

thierry.l Send message Joined: 18 Sep 07 Posts: 19	Message 20436 - Posted: 23 Sep 2008, 17:59:34 UTC - in response to Message 20424. Last modified: 23 Sep 2008, 18:03:44 UTC Ok, so you didn't unsterdand, sorry about it. As you wrote it If a project is running long term in EDF, no other project would ge asked for work. ... Another thing that happens in that situation is that some projects end up with very large negative LTD values and aren't asked for work for a very long time. So if they don't request for work, they don't have a work request that fail, so the attributes is not set, and everything is as usual. The conditions to initialize the counter are : - there's no results for project - project is suspended by user OR dont request more work is set by user OR there's a work request > 0 that failed. The counter increases if : - at least one cpu is not running an edf task The invisibility becomes if : - counter reach the limit the user set in "Computing preferences" As you can see, only the user decide how he manages the share on inactive project (inactive for user - suspending or stopping requests for work, or really inactive for users as he set the deadline beyond which he considers a project with no works is inactive). ID: 20436 ·

John McLeod VII Send message Joined: 29 Aug 05 Posts: 147	Message 20441 - Posted: 23 Sep 2008, 21:11:44 UTC - in response to Message 20436. Ok, so you didn't unsterdand, sorry about it. As you wrote it If a project is running long term in EDF, no other project would ge asked for work. ... Another thing that happens in that situation is that some projects end up with very large negative LTD values and aren't asked for work for a very long time. So if they don't request for work, they don't have a work request that fail, so the attributes is not set, and everything is as usual. The conditions to initialize the counter are : - there's no results for project - project is suspended by user OR dont request more work is set by user OR there's a work request > 0 that failed. The counter increases if : - at least one cpu is not running an edf task The invisibility becomes if : - counter reach the limit the user set in "Computing preferences" As you can see, only the user decide how he manages the share on inactive project (inactive for user - suspending or stopping requests for work, or really inactive for users as he set the deadline beyond which he considers a project with no works is inactive). It is already the case that if a project is being constantly asked for work and not providing any that the LTD does not increase. Your proposal is much more drastic in that in not only stops the increase, but drops the LTD to 0. The time before the LTD drops to 0 is completely random in that under some circumstances a project can be asked for work every minute or two, and under others it can be days between requests. Setting up for the days between requests could mean that the standard S@H outage is enough to trigger the reduction to 0. BOINC WIKI ID: 20441 ·

thierry.l Send message Joined: 18 Sep 07 Posts: 19	Message 20450 - Posted: 24 Sep 2008, 0:42:21 UTC - in response to Message 20441. Last modified: 24 Sep 2008, 0:56:33 UTC It is already the case that if a project is being constantly asked for work and not providing any that the LTD does not increase. Your proposal is much more drastic in that in not only stops the increase, but drops the LTD to 0. The time before the LTD drops to 0 is completely random in that under some circumstances a project can be asked for work every minute or two, and under others it can be days between requests. Setting up for the days between requests could mean that the standard S@H outage is enough to trigger the reduction to 0. Negative, the LTD is still increasing on project without works each time there's a work request or a task in edf mode even with the min_rpc_time limit, and also with normalization, as example you may detach a project with negative LTD, you will see that a suspended project will have its LTD increased. It still increasing and after 3 or 4 months the working project is under -global_prefs.cpu_scheduling_period, that's the point I want to avoid, because work fetch begin to be strange. For example, I've got 7 projects on one computer, 2 without any work for months, so other projects begin to wait that LTD needed for work request, so I finally got only one or two running projects while there's available works on server for others, except for the 2 continuing to increase their LTD. Only the first failure in request for work start the process, I'm not counting the number of failure, but the time between the first failure and now, based on wall_cpu_time. If one project request for work now, and an other 3 days later, what's the matter, it's a small gap near 6 months. Remember, that if there's a successful work request between the start of process and the deadline, LTD is still the same than with the actual process. Yes it's drastic, but what is the difference between that and detaching a project because it didn't give any work for 5 months, only the fact that the project is still under control of the client in case of restart, and avoid a reattach on server side (two hosts instead one). In anyway, you may use the same kind of process to definitly stop the LTD as min_rpc_time is not enough. ID: 20450 ·

thierry.l Send message Joined: 18 Sep 07 Posts: 19	Message 20461 - Posted: 24 Sep 2008, 12:03:47 UTC - in response to Message 20450. Example : based on one of my computer (which doesn't reach the limit for now) Project LTD superlink -6999 einstein -13357 hydrogen 44912 lhc 28974 milkyway -13826 mindmodeling -5173 qmc -13770 seti -35781 cosmology 23756 AIS -8732 As you can see the LTD is positive only on project with no work (or few). Based on backup 08/31, hydrogen was 37921, +7000 in 24 days. It's a lot, due in major part to superlink which is often running in edf mode. As my cpu_scheduling_period is 7200, it means that einstein, milkyway, qmc, seti and AIS are overworked(). My process should reset LTD on hydrogen and cosmology soon. ID: 20461 ·

thierry.l Send message Joined: 18 Sep 07 Posts: 19	Message 20514 - Posted: 26 Sep 2008, 17:22:12 UTC - in response to Message 20461. New values from today Project LTD superlink -10404 einstein -14965 hydrogen 51294 lhc 33060 milkyway -16793 mindmodeling -9831 qmc -8641 seti -41535 cosmology 29699 AIS -11883 Other changes : I changed the rr_simulation and total_resource_share functions too in order to disable the time used by a suspended project with no active or pending results. So each active projects are now requesting more works than before. - changes are not on this computer - I suppose it may cause overwork when restarting project, but it's not very different than adding project. I think it's much more like work buffer settings as those projects are not reserving work time anymore. ID: 20514 ·

John McLeod VII Send message Joined: 29 Aug 05 Posts: 147	Message 20517 - Posted: 26 Sep 2008, 21:41:00 UTC - in response to Message 20514. New values from today Project LTD superlink -10404 einstein -14965 hydrogen 51294 lhc 33060 milkyway -16793 mindmodeling -9831 qmc -8641 seti -41535 cosmology 29699 AIS -11883 Other changes : I changed the rr_simulation and total_resource_share functions too in order to disable the time used by a suspended project with no active or pending results. So each active projects are now requesting more works than before. - changes are not on this computer - I suppose it may cause overwork when restarting project, but it's not very different than adding project. I think it's much more like work buffer settings as those projects are not reserving work time anymore. Since projects that are in communications deferral are already removed from these calculations, you are potentially double removing them which could lead to a negative total resource share. BOINC WIKI ID: 20517 ·

thierry.l Send message Joined: 18 Sep 07 Posts: 19	Message 20582 - Posted: 29 Sep 2008, 17:58:40 UTC - in response to Message 20517. Last modified: 29 Sep 2008, 17:59:15 UTC New values from today Project LTD superlink -5040 einstein -20145 hydrogen 0 lhc 47388 milkyway -5563 mindmodeling 3140 qmc -3563 seti -7288 cosmology 0 AIS -8927 LTD reset. Since projects that are in communications deferral are already removed from these calculations, you are potentially double removing them which could lead to a negative total resource share. Well, I didn't see that as if (!p->active.size()) { double rsf = trs ? p->resource_share/trs : 1; p->cpu_shortfall = work_buf_total() * overall_cpu_frac() * ncpus * rsf; so there's a shortfall for a suspended project with no work and a project with no work but which is not suspended has a shorter shortfall (as total_resource_share contains suspended project too). and it seems to me that there's a link between shortfall and work_request. Another point, I'm testing, when a project with no work is getting work, set the STD to a value linked to the share and the LTD. The purpose is to start quickly a project with a big LTD, in order to avoid the increase of it as STD is smaller than some other projects. Actual LHC case : Small work and then nothing to do for 4 or 5 days. So in this case, LTD is bigger after the computation than before (when many projects are running). ID: 20582 ·

Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.