Posts by thierry.l

1) Message boards : Projects : News on Project Outages (Message 43406)
Posted 11 Apr 2012 by thierry.l
Post:
Yes, I saw that too, but it was not possible to access collatz (not even ping it - from France or else with online ping -).

It's running now (upload / download / website).
2) Message boards : Projects : News on Project Outages (Message 43396)
Posted 11 Apr 2012 by thierry.l
Post:
Nope, the website is still offline, and project upload is still down.
Slicker must pedal faster.
3) Message boards : BOINC client : Projects with no works (Message 20598)
Posted 1 Oct 2008 by thierry.l
Post:

Yes, there is a link beteen shortfall and work request, however, that is NOT how the work requests are ordered. The project with the highest LTD that is contactable gets first chance. The reason that projects with negative LTD values get a shortfall requested is to avoid having a dry CPU if there is no contactable project with a higher LTD. Calculating the shortfall does not affect which project is contacted for work.


Yes, I know, this is not the purpose in this case, but the work request is enlarged, as sample, I got 10 projects on a computer, 5 are suspended (total share 250), 3 are running projects (with works - total share 550), and 2 without any works (total share 200), the work requests computed before the change on those 2 projects was near 8000 seconds of work (about 84600 * 100 / 1000), now it is more like 11000 seconds so near 84600 * 100 / 750, so I may have more work if project restart, and I think it is more like my settings (as I suspended those projects, I want the others use this freed cpu time as they were detached).
It works on active projects too, as total_resource_share is lower.

About the other point,

when a project with no work is getting work, set the STD to a value linked to the share and the LTD. The purpose is to start quickly a project with a big LTD, in order to avoid the increase of it as STD is smaller than some other projects.
Actual LHC case : Small work and then nothing to do for 4 or 5 days.
So in this case, LTD is bigger after the computation than before (when many projects are running).

It seems ok, for now, tests on LHC with a STD set to LTD * (runnable) share_frac at restart, this project is now running once download is completed and the cpu_schedule swap task, so its big LTD start to decrease, but I don't know yet, if this setting is correct (depending on LTD, it must not be too large or short and I'm thinking about a limit based on cpu_scheduling_period).
4) Message boards : BOINC client : Projects with no works (Message 20582)
Posted 29 Sep 2008 by thierry.l
Post:
New values from today

Project LTD
superlink -5040
einstein -20145
hydrogen 0
lhc 47388
milkyway -5563
mindmodeling 3140
qmc -3563
seti -7288
cosmology 0
AIS -8927

LTD reset.


Since projects that are in communications deferral are already removed from these calculations, you are potentially double removing them which could lead to a negative total resource share.

Well, I didn't see that as
 
        if (!p->active.size()) {
	    double rsf = trs ? p->resource_share/trs : 1;
            p->cpu_shortfall = work_buf_total() * overall_cpu_frac() * ncpus * rsf;

so there's a shortfall for a suspended project with no work
and a project with no work but which is not suspended has a shorter shortfall (as total_resource_share contains suspended project too).
and it seems to me that there's a link between shortfall and work_request.

Another point, I'm testing,
when a project with no work is getting work, set the STD to a value linked to the share and the LTD. The purpose is to start quickly a project with a big LTD, in order to avoid the increase of it as STD is smaller than some other projects.
Actual LHC case : Small work and then nothing to do for 4 or 5 days.
So in this case, LTD is bigger after the computation than before (when many projects are running).
5) Message boards : BOINC client : Projects with no works (Message 20514)
Posted 26 Sep 2008 by thierry.l
Post:
New values from today

Project LTD
superlink -10404
einstein -14965
hydrogen 51294
lhc 33060
milkyway -16793
mindmodeling -9831
qmc -8641
seti -41535
cosmology 29699
AIS -11883

Other changes :
I changed the rr_simulation and total_resource_share functions too in order to disable the time used by a suspended project with no active or pending results.
So each active projects are now requesting more works than before.
- changes are not on this computer -
I suppose it may cause overwork when restarting project, but it's not very different than adding project.
I think it's much more like work buffer settings as those projects are not reserving work time anymore.
6) Message boards : BOINC client : Projects with no works (Message 20461)
Posted 24 Sep 2008 by thierry.l
Post:
Example : based on one of my computer (which doesn't reach the limit for now)

Project LTD
superlink -6999
einstein -13357
hydrogen 44912
lhc 28974
milkyway -13826
mindmodeling -5173
qmc -13770
seti -35781
cosmology 23756
AIS -8732

As you can see the LTD is positive only on project with no work (or few).
Based on backup 08/31, hydrogen was 37921, +7000 in 24 days.
It's a lot, due in major part to superlink which is often running in edf mode.
As my cpu_scheduling_period is 7200, it means that einstein, milkyway, qmc, seti and AIS are overworked().
My process should reset LTD on hydrogen and cosmology soon.
7) Message boards : BOINC client : Projects with no works (Message 20450)
Posted 24 Sep 2008 by thierry.l
Post:

It is already the case that if a project is being constantly asked for work and not providing any that the LTD does not increase. Your proposal is much more drastic in that in not only stops the increase, but drops the LTD to 0. The time before the LTD drops to 0 is completely random in that under some circumstances a project can be asked for work every minute or two, and under others it can be days between requests. Setting up for the days between requests could mean that the standard S@H outage is enough to trigger the reduction to 0.


Negative, the LTD is still increasing on project without works each time there's a work request or a task in edf mode even with the min_rpc_time limit, and also with normalization, as example you may detach a project with negative LTD, you will see that a suspended project will have its LTD increased. It still increasing and after 3 or 4 months the working project is under -global_prefs.cpu_scheduling_period, that's the point I want to avoid, because work fetch begin to be strange. For example, I've got 7 projects on one computer, 2 without any work for months, so other projects begin to wait that LTD needed for work request, so I finally got only one or two running projects while there's available works on server for others, except for the 2 continuing to increase their LTD.

Only the first failure in request for work start the process, I'm not counting the number of failure, but the time between the first failure and now, based on wall_cpu_time.
If one project request for work now, and an other 3 days later, what's the matter, it's a small gap near 6 months.

Remember, that if there's a successful work request between the start of process and the deadline, LTD is still the same than with the actual process.

Yes it's drastic, but what is the difference between that and detaching a project because it didn't give any work for 5 months, only the fact that the project is still under control of the client in case of restart, and avoid a reattach on server side (two hosts instead one).

In anyway, you may use the same kind of process to definitly stop the LTD as min_rpc_time is not enough.
8) Message boards : BOINC client : Projects with no works (Message 20436)
Posted 23 Sep 2008 by thierry.l
Post:
Ok, so you didn't unsterdand, sorry about it.
As you wrote it

If a project is running long term in EDF, no other project would ge asked for work. ...
Another thing that happens in that situation is that some projects end up with very large negative LTD values and aren't asked for work for a very long time.


So if they don't request for work, they don't have a work request that fail, so the attributes is not set, and everything is as usual.
The conditions to initialize the counter are :
- there's no results for project
- project is suspended by user OR dont request more work is set by user OR there's a work request > 0 that failed.

The counter increases if :
- at least one cpu is not running an edf task

The invisibility becomes if :
- counter reach the limit the user set in "Computing preferences"

As you can see, only the user decide how he manages the share on inactive project (inactive for user - suspending or stopping requests for work, or really inactive for users as he set the deadline beyond which he considers a project with no works is inactive).
9) Message boards : BOINC client : Projects with no works (Message 20422)
Posted 23 Sep 2008 by thierry.l
Post:

But you still have the problem where one task uses extra CPU time - the entire point of long term debt and resource shares is to share the CPU over time. With your modification, you might as well do away with these concepts entirely - leaving absoloutely no method of specifying how the projects should share resources.


I'm sorry, but I don't understand.
- First, you may run boinc as now, if you want to.
- Second, if you run many projects with works, long term is used as now, so if one project is running in edf mode, long term for this project will be negative and the others positive.
- Third, the purpose of this change is to enable the possibility of using share for active projects, not for may be somedays it will be a running project. I think it's easier to manage active projects like this, as you can see in forum, they're some users that cannot manage correctly their share on some projects like Simap, because there's not always jobs to do.
For example, you're running predictor on some computer, what is the long term for this project and for the others ? But may be you detach it ?
I don't have to, because the new management is like a permanent dettach/reattach, I mean you may think that a project with no works for 1 month, 6 months or 1 year, as you wish, is an inactive project, so you may manually dettach this project (and lose the share on this project) - here you don't have to, the long term of this project is reset to 0 and the project become invisible to share management, but continue to seek for new works in case of.
So you don't have to monitor your clients anymore on inactive projects, it's done by the client based on your choice (inactive delay).
10) Message boards : BOINC client : Projects with no works (Message 20414)
Posted 22 Sep 2008 by thierry.l
Post:

Really bad idea. Suppose that CPDN takes over for a year (yes, I have had this happen to some of my computers). Do you REALLY want CPDN to take over for ANOTHER year when it gets the first task downloaded (after a day or less of processing the other project)? What about your resource shares?


Yes, I know, but it works, I mean my attribute is based on work requests, if there's no work requests, then it work as today.
3 cases.
- you have CPDN on your computer (I've got one too), and it is always on edf mode (not mine, you'd better move this project to a faster computer, don't you think ?), so there's no work requests for other projects on this computer, and field is not set, so share is as now.
- you have CPDN on your computer and it run correctly, and an other project with works, so share is as now.
- you have CPDN on your computer and it run correctly, but you've got also another projet like harvard clean energy, hydrogen, ... and don't have any works for now, so the long term is set to 0 on this project until wu's coming.

My share is not based on eternity, but on projects that need my cpu now (running project), if they don't have jobs for me now, that's fine, but I don't want it cause problems on others (very low long term cause other projects to run one by one), and I don't want to detache, reattache in order to reset this attributes.
If I have 10 projects on my computer, my share is 1/10 for each if they give me wus, if one don't give me wu, share become 1/9 for each, and so on.

As I said, you may decide your share, but projects don't care of it, if they don't have any jobs to do, c'est la vie, some projects are closing, where are your share on those after 1 year of very big long term ? Are you sure riesel sieve will continue ? University projects are changing from year to an other, and they need my cpu now, not next year because today I'm running a very large long term project.

I prefer this, but it's a choice, if you want it run as today, just set the parameter to zero (as default), but if you want to set a project with no job for 2 months in pending state, just set this parameter to 61 days.
I think its a good idea, no ?

Add-on : suppose there's 1 million hosts with a fifty-fifty share between seti and simap, as simap don't give works all the time, it means that seti will lose 1 million hosts each time simap give works, that's not the way I see the share.
11) Message boards : BOINC client : Projects with no works (Message 20402)
Posted 22 Sep 2008 by thierry.l
Post:
Hi,
I finally decided to change the boinc core in order to manage share in two ways
- The actual one : share is managed for very long term, as example : 2 projects on one client, same share, one doesn't give any wu for 2 months, and then there's works to do, so it will run alone for 1 month in order to reach the same term as the other, so share is near 66% - 33 % (100 - 0 x 2, 0 - 100 x 1) .
- The new one : share is managed between project with wu, you may select a delay (as parameter) which is the limit until a project with no work become invisible (in resource share), in the previous example, with a delay of 1 week to 2 months, share is near 83% - 17% (100 - 0 x 2, 50 - 50 x 1), share is as usual with a delay upper 2 months.
A new attribute 'out_of_work_time' is added to project ( .xml too), if a project don't have any result (even suspended), and is suspended, or don't need more work, or need work but don't get any, then this attribute is set to 1, each time there's an adjust debt, the wall-cpu-time is added to it. If the attributes is upper the delay, then the project is temporary suspended (long term debt is reset to zero, project is not potentially runnable anymore (except for work request)).
NB :
- wall_cpu_time is stored when suspending tasks, and restored when resuming (I don't think ajusting debt on a project when client was sleeping is fair).
- I kept in mind the message of the highland cow about the edf, so if all the cpus are running edf project, wall_cpu_time is not added to out_of_work_time.

If you're interested in this new management about share, contact me.
- source code based on 6.2.18 -

regards
12) Message boards : BOINC Manager : My Wish List - 2 (Message 17324)
Posted 14 May 2008 by thierry.l
Post:

I've just confirmed that on my computers that property was already set to normal window. But after a reboot the BOINC manager window still opens first time as super-mini.


Sorry about that, it works on my computers W2K and Vista - Boinc 5.10.45.
When set to normal window link to boincmgr.exe" /s, start as icon in taskbar but after opening, size is fine (not position - a little bit right) ; When set to minimized as it is defined at install, start is the same, but after opening, size and position seem to be randomly defined and really small.
13) Message boards : BOINC Manager : My Wish List - 2 (Message 17318)
Posted 13 May 2008 by thierry.l
Post:

2) the BOINC manager should remember its size and screen position after reboot. it lost that feature a time ago, but I do not remember the first version this did not work anymore.


Hi,

If the property of BoincManager link in startup directory is set to normal window instead of minimized, size is kept.

My wish list is :
Is it possible to have one more option
- On multiprocessors, use only n CPUS when computer is in use -

regards
14) Message boards : BOINC Manager : manager not responding (Message 16290)
Posted 1 Apr 2008 by thierry.l
Post:
Probably the same error.
01-Apr-2008 11:22:06 [lhcathome] Computation for task wm72A_m72allA__1__64.275_59.305__12_14__6__36_1_sixvf_boinc336910_5 finished
01-Apr-2008 11:22:06 [Milkyway@home] Starting gs_364_1207000870_1537501_0
01-Apr-2008 11:22:06 [Milkyway@home] Starting task gs_364_1207000870_1537501_0 using astronomy version 122
01-Apr-2008 11:22:08 [lhcathome] Started upload of wm72A_m72allA__1__64.275_59.305__12_14__6__36_1_sixvf_boinc336910_5_0
01-Apr-2008 11:22:09 [lhcathome] Temporarily failed upload of wm72A_m72allA__1__64.275_59.305__12_14__6__36_1_sixvf_boinc336910_5_0: transient upload error
01-Apr-2008 11:22:09 [lhcathome] Backing off 1 min 0 sec on upload of wm72A_m72allA__1__64.275_59.305__12_14__6__36_1_sixvf_boinc336910_5_0
01-Apr-2008 11:23:10 [lhcathome] Started upload of wm72A_m72allA__1__64.275_59.305__12_14__6__36_1_sixvf_boinc336910_5_0
01-Apr-2008 11:23:11 [lhcathome] Temporarily failed upload of wm72A_m72allA__1__64.275_59.305__12_14__6__36_1_sixvf_boinc336910_5_0: transient upload error
01-Apr-2008 11:23:11 [lhcathome] Backing off 1 min 0 sec on upload of wm72A_m72allA__1__64.275_59.305__12_14__6__36_1_sixvf_boinc336910_5_0
01-Apr-2008 11:24:12 [lhcathome] Started upload of wm72A_m72allA__1__64.275_59.305__12_14__6__36_1_sixvf_boinc336910_5_0
01-Apr-2008 13:38:36 [---] Starting BOINC client version 5.10.45 for windows_intelx86


Boinc manager freeze, no way to suspend computation, so shutdown and restart of vista.
At startup, boinc restart completly (as it is a new install) and perform init for all projects (previous tasks are lost)
01-Apr-2008 13:38:43 [proteins@home] URL: http://biology.polytechnique.fr/proteinsathome/; Computer ID: not assigned yet; location: (none); project prefs: default
01-Apr-2008 13:38:43 [rosetta@home] URL: http://boinc.bakerlab.org/rosetta/; Computer ID: not assigned yet; location: (none); project prefs: default
01-Apr-2008 13:38:43 [boincsimap] URL: http://boinc.bio.wzw.tum.de/boincsimap/; Computer ID: not assigned yet; location: (none); project prefs: default
01-Apr-2008 13:38:43 [BURP] URL: http://burp.boinc.dk/; Computer ID: not assigned yet; location: (none); project prefs: default
01-Apr-2008 13:38:43 [superlinkattechnion] URL: http://cbl-boinc-server2.cs.technion.ac.il/superlinkattechnion/; Computer ID: not assigned yet; location: (none); project prefs: default
01-Apr-2008 13:38:44 [Cosmology@Home] URL: http://cosmologyathome.org/; Computer ID: not assigned yet; location: (none); project prefs: default
01-Apr-2008 13:38:44 [Hydrogen@Home] URL: http://hydrogenathome.org/; Computer ID: not assigned yet; location: (none); project prefs: default
01-Apr-2008 13:38:44 [lhcathome] URL: http://lhcathome.cern.ch/lhcathome/; Computer ID: not assigned yet; location: (none); project prefs: default
01-Apr-2008 13:38:44 [Milkyway@home] URL: http://milkyway.cs.rpi.edu/milkyway/; Computer ID: not assigned yet; location: (none); project prefs: default
01-Apr-2008 13:38:44 [QMC@HOME] URL: http://qah.uni-muenster.de/; Computer ID: not assigned yet; location: (none); project prefs: default
01-Apr-2008 13:38:44 [SETI@home] URL: http://setiathome.berkeley.edu/; Computer ID: not assigned yet; location: (none); project prefs: default
01-Apr-2008 13:38:44 [Spinhenge@home] URL: http://spin.fh-bielefeld.de/; Computer ID: not assigned yet; location: (none); project prefs: default
01-Apr-2008 13:38:45 [SZTAKI Desktop Grid] URL: http://szdg.lpds.sztaki.hu/szdg/; Computer ID: not assigned yet; location: (none); project prefs: default
01-Apr-2008 13:38:45 [malariacontrol.net beta] URL: http://www.malariacontrol.net/; Computer ID: not assigned yet; location: (none); project prefs: default
01-Apr-2008 13:38:45 [uFluids] URL: http://www.ufluids.net/; Computer ID: not assigned yet; location: (none); project prefs: default
01-Apr-2008 13:38:45 [World Community Grid] URL: http://www.worldcommunitygrid.org/; Computer ID: not assigned yet; location: (none); project prefs: default


Too bad
15) Message boards : BOINC client : Projects with no works (Message 12639)
Posted 20 Sep 2007 by thierry.l
Post:
Thanks !
I see what you mean, if you got an EDF task running and a project with no work, long_term_debt of EDF will not have negative value as it is normalized.
But anyway, if there's no work for 2nd project at the end of EDF task, the first should download another WU, no ?
And if you got another project with works, the EDF project will have a big negative value on long_term_debt, so won't download.
I mean, you may decide share, but if distributed works of different projects do not follow this share, you will have permanent misbehavior.
My purpose was only to considere a no-WUs project with no impact on the others as it doesn't need my CPU - for now (like suspended but keeping an eye opened in case of restart).
16) Message boards : BOINC client : Projects with no works (Message 12634)
Posted 20 Sep 2007 by thierry.l
Post:
Sorry,
I made a mistake, I thought that no active task for project and dont_request_more_work had the same effect that suspended_via_gui.
In fact, I was wrong and may be responsable of those negative values. So, in order to suspend one project for any reason, we have to set dont_request_more_work to true until active task ended, and then suspend project (and reset dont_request_more_work).

What is EDF ?
17) Message boards : BOINC client : Projects with no works (Message 12631)
Posted 20 Sep 2007 by thierry.l
Post:
I still have to reset long_term_debt even on 5.10.20.
Anyway, I thought it was a good idea to stabilize long_term_debt around zero, as those projects were new at restart, or around cpu_scheduling_period, ready to run.
18) Message boards : BOINC client : Projects with no works (Message 12610)
Posted 19 Sep 2007 by thierry.l
Post:
Hi,
Sometimes, some projects don't give any works, for any reasons, for a long time SIMAP, LHC, Proteins, ...
It cause a problem on other projects (which are distributing WUs correctly) because of the long_term_debt which is normalized, so a project with no work got a big positive long_term_debt that increasing, and the others, a long_term_debt that decreasing below cpu_scheduling_period, and finally don't request for any new job.
I suggest to limit long_term_debt for not runnable() projects.
void CLIENT_STATE::adjust_debts() {
...
// adjust long-term debts
//
if (p->runnable() || p->wall_cpu_time_this_debt_interval ||
((p->long_term_debt < 0) && p->potentially_runnable())) {
share_frac = p->resource_share/prrs;
p->long_term_debt += share_frac*total_wall_cpu_time_this_debt_interval
- p->wall_cpu_time_this_debt_interval;
}
total_long_term_debt += p->long_term_debt;

regards
19) Message boards : BOINC client : Folks who disappear after downloading only one WU. (Message 12587)
Posted 18 Sep 2007 by thierry.l
Post:
In perusing WUs, I’m surprised how many instances there are where a person has downloaded ONE (1) WU, and then is never heard from again. I know this doesn’t stop, but only delays, WU completion due to WU being reissued.

What I wonder is: Has there ever been any effort to CONTACT these one-WU folks? Maybe see if they had some problem or need help, clarification, or encouragement?

Maybe if first WU is not returned by completion date …
(1) the server side sends a simple canned e-mail encouraging them & pointing them to forums or someplace
(2) the client side pops up a message suggesting <something> to remind & encourage them to continue participating


Since some projects are having longer and longer WUs, has there been any discussion of making the FIRST WU a new client gets be a TEST-WU which just ensures client can go thru process?


Strange, I thought Boinc was a GNU free project, based on volunteers free to choose what they want to do, to help one or many scientists, universities or companies and not the others because X or Y.
So, we're free to work on one project, but if we choose to quit, here is the newletters, spam or else.

May be is it just a hardware problem ?
Crash disk, restore, not enough memory or GHz, human error, ...

As Helfin, some wants to help on CPDN, but receiving the first WU and seeing an estimated CPU time of 5000 hours may cause a bad uninstall of Boinc, because, the CPU is not enough powerful, because the person don't want to let its computer running night and day because it enforced climatic reheating, or computer is too noisy to sleep, or else ...
Some project need too many ressources, and there's 2 solutions, quit or buy. I think it's easier to quit (PacMan or the latest 3D game?) ... for now.

Please, no mails, newletters or else, just freedom (and may be shorter WUs ?).




Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.