Resource share not regarded when requesting new work?

Message boards : BOINC client : Resource share not regarded when requesting new work?
Message board moderation

To post messages, you must log in.

AuthorMessage
Metod, S56RKO

Send message
Joined: 9 Sep 05
Posts: 128
Slovenia
Message 1146 - Posted: 26 Nov 2005, 9:38:17 UTC
Last modified: 26 Nov 2005, 9:40:31 UTC

Lately I'm seeing the following: when my cache drops below connect every setting, the BOINC CC will request new work from the project. The amount of work requested is the number of seconds difference between the amount of work in cache and the connect every setting. And the project assigns me about that amount of work.

Now the problem: I'm running 3 projects and the resource share is not even (40:40:20). I have connect every set to 3 days. If work cache of my 20%-project drops below 3 days (say to 2 days 6.5 hours), BOINC CC will request 63000 seconds of new work. And project server returned (in this particular case, but it's similar all the time) about 52100 seconds of work.
The problematic part is that with resource share of 20% this means I end up with more than 6 days of cache for this particluar project. With my connect everysetting of 3 days this is not a problem, I'll make within deadline. But if I had connect every set to say 8 days, I'd be missing deadlines. The same thing happens with all the projects I'm attached to.

I'm running 5.2.8 (but I'm seeing this with any 5.2.x) and all three projects I'm attached to are returning work.

[edit]Fixed typing mistkaes[/edit]
Metod ...
ID: 1146 · Report as offensive
Keck_Komputers
Avatar

Send message
Joined: 29 Aug 05
Posts: 304
United States
Message 1157 - Posted: 26 Nov 2005, 13:05:51 UTC

Yes it is good wisdow to divide the number of days cache you want by the number of projects you are attached to. So if you are attached to 4 projects and want 3 days cache put 0.75 in the web preferences.
BOINC WIKI

BOINCing since 2002/12/8
ID: 1157 · Report as offensive
Metod, S56RKO

Send message
Joined: 9 Sep 05
Posts: 128
Slovenia
Message 1166 - Posted: 26 Nov 2005, 17:22:40 UTC - in response to Message 1157.  

Yes it is good wisdow to divide the number of days cache you want by the number of projects you are attached to. So if you are attached to 4 projects and want 3 days cache put 0.75 in the web preferences.


The setting says connect every ... this means that BOINC CC should take care that cache for any project given its resource share should not exceed (at least greatly) amount desired by user.

There are numerous lenghty threads in SETI@Home forum about how user should not micro-manage BOINC and just let it do its job. Posts in fora are usually about schedulling but the same applies to caching.

My humble opinion is that when BOINC CC contacts server to requets more work, it should only request amount of work proportional to resource share for that project. It could take into consideration resource shares of projects which are not allowed to get more work (eg. due to too much negative LTD or due to no work available, etc).
Metod ...
ID: 1166 · Report as offensive
Keck_Komputers
Avatar

Send message
Joined: 29 Aug 05
Posts: 304
United States
Message 1203 - Posted: 27 Nov 2005, 15:50:08 UTC - in response to Message 1166.  

The setting says connect every ... this means that BOINC CC should take care that cache for any project given its resource share should not exceed (at least greatly) amount desired by user.

snip

My humble opinion is that when BOINC CC contacts server to requets more work, it should only request amount of work proportional to resource share for that project. It could take into consideration resource shares of projects which are not allowed to get more work (eg. due to too much negative LTD or due to no work available, etc).

I tend to agree with you here, and the 5.x.x version are better about not getting as big a cache with the same settings. But it does still try to have at least one workunit from every project with positive LTD, and this tends to make the cache larger than the settings would have it.
BOINC WIKI

BOINCing since 2002/12/8
ID: 1203 · Report as offensive
Metod, S56RKO

Send message
Joined: 9 Sep 05
Posts: 128
Slovenia
Message 1234 - Posted: 28 Nov 2005, 8:12:22 UTC - in response to Message 1203.  

The setting says connect every ... this means that BOINC CC should take care that cache for any project given its resource share should not exceed (at least greatly) amount desired by user.

snip

My humble opinion is that when BOINC CC contacts server to requets more work, it should only request amount of work proportional to resource share for that project. It could take into consideration resource shares of projects which are not allowed to get more work (eg. due to too much negative LTD or due to no work available, etc).

I tend to agree with you here, and the 5.x.x version are better about not getting as big a cache with the same settings. But it does still try to have at least one workunit from every project with positive LTD, and this tends to make the cache larger than the settings would have it.


That's just fine. However, in my particular case I had some more work in cache for that particular project. It was not the case of fetching one single WU due to no WU being in cache. It was the case of adding several WUs to cache of some WUs.

In particular: it's SETI@Home and Einstein@Home. WUs for those two projects take about 80 and 450 minutes respectively. Which is much lower than my cache setting (3 days at the moment). 3 days worth of cache for SETI (20% resource share) normaly means about 10 WUs in cache and for Einstein (40% resource share) this means about 3 WUs. These are number of WUs I normally see if I let BOINC CC to connect to projects at any time. When I take this laptop home (eg. during weekend), cache drains and on next connection to projects, cache fills up too much. This is what I'm complaining about.
Metod ...
ID: 1234 · Report as offensive
ksnash

Send message
Joined: 19 Nov 05
Posts: 10
United States
Message 1252 - Posted: 28 Nov 2005, 18:57:44 UTC

I don't know how the scheduer decides how much in cache is enough. There is no way to control it. I would like to have at least 3 days in cache with CPDN, Einstein and Setiathome and having resource share of 10, 10, 4, which is approxiamtely 1 wu per day, which should be: one CPDN, 3 einstein and 3 setiathome. What I get is einstein will download 15. Setiathome will download 17. Recently I only have the one CPDN like I am supposed to. I get the impression the the scheduler is trying to inch forward until it fails. The computer is fast enough that it can handle that much work and I didn't have to use it until recently.
ID: 1252 · Report as offensive
Metod, S56RKO

Send message
Joined: 9 Sep 05
Posts: 128
Slovenia
Message 1271 - Posted: 29 Nov 2005, 7:03:17 UTC
Last modified: 29 Nov 2005, 7:05:09 UTC

I don't know how the scheduller decides how much work in cache is enough. But to it seems that it works as follows:


  • let's assume that connect every is set to 3 days, which is 259200 seconds.
  • let's assume that we are attached to 3 different projects: SETI, Einstein and CPDN with resource shares of 10, 20 and 20 respectively
  • let's assume that LTD is not an issue here
  • let's assume that we have one CPDN WU which will take some more weeks to finish
  • let's assume that CDF has already settled
  • let's assume we have a bit more than 1 day worth of work in cache for both SETI and Einstein
  • let's assume that BOINC runs 100% of time, nothing else takes CPU cycles and that we really connect every 2.5 days
  • let's assume that estimated time for SETI WU is 1h30 (which is 5400 seconds) and estimated time for Einstein WU is 8h30 (which is 30600 seconds). This means that ideally (considering resource share), we would have 9.6 (actually 10) SETI and 3.39 (actually 4) Einstein WUs in cache or being crunched
  • let's assume that SETI and Einstein WUs just started to be crunched



Now let's assume that after 2.5 days we can connect again. During 2.5 days (which is 216 seconds) this host will crunch 8 SETI WUs and 2.82 (actually 2) Einstein WUs. Cache levels will be at 2 WUs for SETI and 2 (one being crunched) WUs for Einstein which is (considering resource shares) 54000 and 153000 seconds respectively.

Next, scheduller will request new work from both project servers. As it seems to me, it will request (259200 - 54000 = 205200 seconds) of work from SETI project server. Project server will accordingly assign (205200 seconds / 5400 seconds/WU) 38 WUs. On top of 2 WUs already in cache this means (regarding resource share) actually 12.5 days of work.
Similarly scheduller will request (259200 - 153000 = 106200 seconds) of work from Einstein project server, which in turn will assign 3.47 (actually 4) WUs. Which is (regarding resource share) actually 306000 seconds. On top of that 1 uncrunched and 1 partly crunched WUs in cache this means 396000 seconds or 4.58 days.

Both of the above means that suddenly we have more than 17 days of work in cache (not regarding CPDN). Which is not OK.

Most probably things are not that extreme. I'd really like to get tome insight from somebody who actually develops this part of scheduller (even if telling me that I'm plain wrong).

BTW, I normaly connect more frequently than 3 days so the phenomenon is not so much emphasized. Additionally, most of times, scheduller doesn't request new work from more than one project due to LTD. But still ...

[edit]typoes[/edit]


Metod ...
ID: 1271 · Report as offensive
Metod, S56RKO

Send message
Joined: 9 Sep 05
Posts: 128
Slovenia
Message 1290 - Posted: 29 Nov 2005, 13:38:31 UTC - in response to Message 1271.  
Last modified: 29 Nov 2005, 13:39:07 UTC

Ahemm ...

Both of the above means that suddenly we have more than 17 days of work in cache (not regarding CPDN). Which is not OK.


The above is not correct of course.

The point still remains: with connect every set to 3 days one can easily end up with 12-days of cache (or even longer; this is inversely proportional to the resource share). BOINC CC can easily enter EDF mode, which pisses off most of users.

Metod ...
ID: 1290 · Report as offensive
Paul D. Buck

Send message
Joined: 29 Aug 05
Posts: 225
Message 1447 - Posted: 2 Dec 2005, 10:30:12 UTC

We also still have the complication that the "connect every" setting is attempting to do two functions. Meaning it is not doing either very well ...

I would love to have a connect of 0.01 with a cache of 0.5 (to tide me over CLIENT-Side problems) where Dr. Anderson has typically only been interested in the issues of server side problems (run multiple projects).

Though my recollection is that he has agreed, at least in theory, that it may be a good idea to split the two values... however, no telling when that might take place, if ever.
ID: 1447 · Report as offensive
Bill Michael

Send message
Joined: 30 Aug 05
Posts: 297
Message 1466 - Posted: 2 Dec 2005, 17:18:00 UTC - in response to Message 1447.  
Last modified: 2 Dec 2005, 17:24:13 UTC

I would love to have a connect of 0.01 with a cache of 0.5 (to tide me over CLIENT-Side problems)


I've been thinking of ways to "make everybody happy"... and I see two basic situations, and then one "I want" condition. The situations are dial-up and always-connected, and the desire is to have work cached to get through server-side problems, or just to have work locally because it makes people more comfortable.

First off - Let's give up and call the current "connect-every" setting a cache and be done with it, since that's what it really is. We don't need a separate "connect-every" value, that's not what we have now anyway, what we have now is a kludge. Adding a _real_ "connect every" value, where BOINC won't connect until that much time has passed no matter what, would require massive changes to everything. So...

Rather than splitting it into two numbers, I think it should be a question and a number. This would mean only the question needs to be added, which would simplify the server changes a bit.

Just ask - "Do you have an always-on internet connection?" - if so, then BOINC wouldn't look at anything else in deciding when/if to connect - effectively a 0.001 "connect every" setting, as far as actual connection goes. It'll connect every time a result is ready to upload, it'll connect when it wants to report, it'll connect when it needs work (based on cache same as now on deciding "IF" it needs work). No delaying until some particular time has passed, if you need to connect, connect. (Code change required is simple; anywhere it currently says "if now>connect_time", just say "if now>connect_time or always_on". If there really are any such places.)

If not, if they are on dial-up, then BOINC should try to connect when it needs work, but NOT just when it has a result ready to return. Also, if the person is on dial-up, the "split" between uploading and reporting should go away. If BOINC is connected only "sometimes", whether that be once every few hours or once a week, you just _can't_ "assume" there will be another connection soon, and BOINC should do ALL the work it can while it's connected. BOINC _already_ ignores the "connect every" setting when deciding IF it should connect - it just does. That's fine, if we reduced the number of connection attempts a bit, namely by not connecting after every result is finished and ready to upload. (Code change needed in the result-complete-upload-it section only? I think so.)

The only change that I think is necessary for the SCHEDULER is that there should be a "lag" on requesting work if the user is on dial-up. Not a "min/max" queue situation, but just a hard-coded value on the LTD needed above what is present; instead of ">0", say, ">14400". In other words, today it says "I have a positive LTD for SETI, and a cache setting of .5. I have .49999 on hand, so I'll get work." - If they are on dial-up, it should allow the work on hand for a project to fall to four hours _less_ than it does now, to minimize the number of connections needed. This will also have the benefit of starting a debate on the boards about whether four hours is too much, too little, or just right. Can't have the boards getting boring. (Edit here:: If JMVII determines that a ratio would be better than a fixed value, for example, wait an extra "1/10 of cache size" or whatever, instead of a fixed 4 hours, he's the expert, and I certainly wouldn't argue. The goal is just to reduce connections for those on dial-up.)

Now - the cache becomes a true cache. If you're on dial-up, on a laptop, and you really can't connect but on the weekend, then you know you need to have 5-7 days worth of work. There's no point in insulting your intelligence by asking "how much work do you want" and then "how often can you connect". Just ask how much work you want. If you're always-on, then you can set this to what you prefer, to be able to ride out either server-side or host-side problems, or for whatever reason. Paul and I can set it at 0.5 and see no change from what is happening right now. Those who are always connected but insist on having 10 days work stuffed up, can continue forcing BOINC into EDF mode. (Edit:: Could we finally lower the maximum to 7 if we do this?) BUT... those who are on dial-up will have FEWER connection attempts to put up with, and the actual amount of work on hand will fall a little farther below the ideal before it tries to get more. This wouldn't get them a true "only connect every 7th day", which is what the current wording _implies_ but doesn't deliver, but it _would_ mean that BOINC only tries to connect a couple of times a day instead of constantly.

If nobody tears this apart and screams that I'm an idiot, I'll try to write it up a little more "formally" and post it somewhere where it might actually be read...

ID: 1466 · Report as offensive
Profile Andrew Hingston

Send message
Joined: 25 Nov 05
Posts: 55
United Kingdom
Message 1478 - Posted: 2 Dec 2005, 22:42:11 UTC
Last modified: 2 Dec 2005, 22:43:29 UTC

Well, maybe I haven't understood but it looks good to me.

I approach this from the perspective of a climateprediction user, and every project has different imperatives. We are hoping for a boost to BOINC in the UK in February in association with a television programme about climate change, and that will bring people who know nothing of SETI or of running multi-project. Complex options, or kludges designed to stop people overloading the Berkeley server, will baffle them.

So far as climateprediction is concerned, caches are looking more and more irrelevant. At present the project is not offering the basic slab models that some impatient people with faster machines were managing to polish off in under three weeks, and some of us with the most powerful processors are currently doing work on WUs that will finish in March or April. Even the others have been given WUs that require well over 1,000 hours on any machine. The point there is that the issue is not keeping the computer fed with work, but determining when BOINC should report on progress. For those with an always on connection, we don't want it to wait but do so without fuss, whether that is every hour or less often. For dial-up users, on the other hand, so long as the user checks in every month or so to say that the WU is still live, it does not really matter that there are progress files waiting to upload. What the user wants and needs is for BOINC to be able to upload when they are online, and to keep quiet when they are not.

The exception is at the end of a run. There is a problem there, because the user does need a prompt that the work is coming to an end and more is needed, but it might also be helpful after the run is complete and the final results are waiting to upload. If we forget the latter, the question 'how many days work do you want?' isn't very helpful or useful. Whatever the answer, the user will get lots. But I've no better idea.
ID: 1478 · Report as offensive
Paul D. Buck

Send message
Joined: 29 Aug 05
Posts: 225
Message 1483 - Posted: 3 Dec 2005, 7:38:39 UTC

Bill,

The only thing that I would add to your proposal would be to allow me to tell CPDN I want one model per CPU. :)

Failing that, I like what you said. It would do the same things that I would do with two numbers.
ID: 1483 · Report as offensive
Gary Roberts

Send message
Joined: 7 Sep 05
Posts: 130
Australia
Message 1486 - Posted: 3 Dec 2005, 10:32:24 UTC

Bill,

I've just reread your proposal for about the fourth time and each time I do I get more out of it and more inclined to congratulate you for the amount of effort you have obviously made in thinking through the issues. I don't have any problem with your proposals at all. We really do need the ability to independently set cache size and have it divorced from connection interval issues.

A long time ago I can clearly remember JM7 stating that his preference was to have two separate variables. He also stated that that was unacceptable to Dr A. Maybe a variable and a question might be more acceptable.

In any case, JM7 needs to be involved as surely his knowledge of what is the best way forward would be invaluable to the discussion. I've seen a few posts in various places with little snippets from JM7 which lead me to believe that he is far from finished with what he wants to achieve :). Discussing it all, with him chained down to the spot, might help clarify what is achievable and what is not :).

In any case, please feel encouraged to write it up more formally, particularly if you can get some detailed input from JM7.

Finally, we have taken this thread away from Metod's initial point about the resource share not being properly taken into account and each project filling up its cache as if it had 100% resource share. You don't notice this if your cache setting is 0.2 days but you sure as hell do if you bump it up a bit, say to 3.2 days to get a burst of work for say a long weekend when you won't be connecting. It's quite a shock to see a 15% resource share project suddenly get about two weeks (at 15%) work. Of course, EDF and LTD will fix this in the long run but why can't the scheduler just get 3.2 days work at 15% in the first place?


Cheers,
Gary.
ID: 1486 · Report as offensive

Message boards : BOINC client : Resource share not regarded when requesting new work?

Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.