Scheduler and CPDN

Message boards : BOINC client : Scheduler and CPDN
Message board moderation

To post messages, you must log in.

AuthorMessage
[AF>Linux]Arnaud

Send message
Joined: 30 Aug 05
Posts: 58
Message 2825 - Posted: 28 Jan 2006, 11:42:28 UTC
Last modified: 28 Jan 2006, 11:58:01 UTC

Hi,

It would be urgent to do something concerning the scheduler and CPDN, especially in Linux:
With the soon-to-be-released new CPDN model, my 2 Ghz computer is now constantly in EDF mode: estimated completion time: 333 days instead of 194 days (estimated by me with the s/ts in the viz)
As I don't want to baby-sit Boinc (suspend, download, upload, resume, etc), I have only two choices: detach this machine from CPDN or stop the other projects (Rosetta and E@H).
Wouldn't it be possible, at least, to modify the code so that Boinc estimates the completion time from the s/ts.
Thanks.

ID: 2825 · Report as offensive
Jim K
Avatar

Send message
Joined: 8 Sep 05
Posts: 168
Message 2834 - Posted: 28 Jan 2006, 15:10:48 UTC

Change your connect to time to a low figure like .5 and see if that helps....
BOINC Wiki
ID: 2834 · Report as offensive
Bill Hepburn

Send message
Joined: 12 Sep 05
Posts: 12
Message 2845 - Posted: 29 Jan 2006, 0:49:56 UTC - in response to Message 2825.  
Last modified: 29 Jan 2006, 0:52:16 UTC

That's a problem with those long run times for CPDN. It's even worse since CPDN does such a poor job of estimating how long it will take. BOINC is supposed to figure out a correction factor for each project after it has run a number of work units. The rub is that with a CPDN work unit taking several months, it will be a year before the correction factor does much good. There doesn't seem to be a lot of interest on the part of the program (priorities, and staffing I assume) to fix the problem.

In theory, you should be able to just not worry about it, let it run in and out of EDF mode.

I have a machine (1.7 GHz) that will really take about 2000 hours to do a work unit (which it thinks it will be 3000). Since there are only 8760 hours in a year (24 * 365), it enters EDF fairly often (I normally run 4 projects on that machine). It goes into EDF, runs off the other projects (since their deadlines are first), then runs CPDN for several days until it decides it is out of danger. Then, it downloads something else and runs it pretty much exclusively (CPDN has run up a lot of debt by this time) then it goes into EDF again. In a wierd sort of way, it's neat to watch.

Having said all that, if the estimates were better, the problem wouldn't be quite as bad, but CPDN is starting to look like a project that isn't good for these "slow" computers unless you want to it exclusively.
ID: 2845 · Report as offensive
[AF>Linux]Arnaud

Send message
Joined: 30 Aug 05
Posts: 58
Message 2854 - Posted: 29 Jan 2006, 7:17:45 UTC
Last modified: 29 Jan 2006, 7:30:44 UTC

Hi Bill,

The problem is that with the Trancient Coupled Model, Boinc doesn't even goes in and out of EDF (like with the sulphur model),as Boinc estimates the completion time to 7500/8000 hours in Linux (I know the ect is better estimated in Windows)
Boinc is constantly in EDF mode: 24/7 and never goes out of this state
So if I don't suspend manually the TCM, Boinc will never download work for other projects: It's not acceptable, as the real time to finish this model is 6 months: So the 1 year deadline of the CPDN model will be easily met and Boinc should download work for other project.

As for the correction factor, it's a joke.
It probably works fine for short Wus like Seti, but it clearly doesn't work for long Wus, as even when I change manually this factor in the xml file, the estimated completion time doesn't change sufficienly to get out of constant EDF mode.

I persist to think that CPDN is not an "elite" project reserved to people with dual-core or "formula 1" computers, and should be accessible to "slow" machines with a normal behavior of Boinc.
ID: 2854 · Report as offensive
Bill Hepburn

Send message
Joined: 12 Sep 05
Posts: 12
Message 2856 - Posted: 29 Jan 2006, 8:29:03 UTC - in response to Message 2854.  

I certainly won't try to justify bad estimates. It is annoying that a bad estimate makes a problem with the work scheduler when it is working exactly as it should.

I wonder what would happen if you let CPDN run in EDF until it runs through about half the model. That would probably take about three months. By then, it would probably think it would need about 5000 hours more and probably drop out of EDF. Figuring out exactly when that would happen is more than my brain can handle right now.

I have to wonder if the folks at CPDN think they only want (or need) really hot computers. In the past few months we have gone from the SLABs to the sulphur models. Now you say there is a new one that will take even longer. I suppose that as long as they get enough computer horsepower from hot machines, they are happy, and nothing much will change. If their requirements get too out of hand, they may have to rethink things, or mayby Moore's law will keep them going indefinitely.

Cheers.
ID: 2856 · Report as offensive
Profile Andrew Hingston

Send message
Joined: 25 Nov 05
Posts: 55
United Kingdom
Message 2862 - Posted: 29 Jan 2006, 16:15:50 UTC

I seem to recall a recent posting from Carl at CPDN saying that it was going to tackle the time estimate problem, but he is very busy trying to meet deadlines on a project development at the moment. But the estimated duration did get a lot more inaccurate with BOINC 5 - it isn't that the project itself did it. CPDN is sufficiently different that a method that works for SETI, say, won't necessarily be applicable to CPDN.

As for CPDN needing fast machines, it is true that climate and weather has always wanted machines at the cutting edge. That's in the nature of the subject, because the complexity of the real world is beyond anything current computing can match. It is a pity if that is putting people off, because anyone with a machine of 1 GHz or so can usefully participate. On the other hand, it is necessary that they understand what they are taking on. The advice to join more than one project is good for those coming from SETI classic, but if you have a three or four year old computer switched on for a few hours a day, it isn't realistic to combine CPDN with three or four other projects.
ID: 2862 · Report as offensive
[AF>Linux]Arnaud

Send message
Joined: 30 Aug 05
Posts: 58
Message 2864 - Posted: 29 Jan 2006, 17:16:17 UTC
Last modified: 29 Jan 2006, 17:24:52 UTC

Hi Andrew,

When I speak of "slow" machines, I mean machines around 2 Ghz or equivalent (two or three years old)
Of course, it's unrealistic to run CPDN on 1 Ghz machines :o).

The problem is that a lot of users running presently CPDN, also run other Boinc projects and if BOINC is constantly in EDF mode with average machines of 2 or 2.5 Ghz, we can expect a lot of ranting, or even worse detaching.

As I don't have access to the boinc_dev list, I posted my message here to say that Boinc shouldn't enter constant EDF mode on an average 2 Ghz Mr Smith machine, as the TCM can be finished in less than 6 months on this kind of machine (of course running 24/7)
ID: 2864 · Report as offensive
Profile Andrew Hingston

Send message
Joined: 25 Nov 05
Posts: 55
United Kingdom
Message 2867 - Posted: 29 Jan 2006, 20:36:35 UTC

I agree, Arnaud. Even with a 1 GHz machine, the computer shouldn't enter EDF mode for the CPDN model for some time (and anyway the user would be well advised to detach from the other projects or keep the machine on longer).

Nobody who thought about it would ever have supposed that scheduling would be easy to implement satisfactorily, and when BOINC launched the two main projects were SETI and CPDN, so it was always appreciated that these would have to be reconciled.
ID: 2867 · Report as offensive
Paul D. Buck

Send message
Joined: 29 Aug 05
Posts: 225
Message 2871 - Posted: 30 Jan 2006, 5:10:12 UTC

YOu can get into a couple of other problems with the various modes and CPDN (and a few other combinations). The good news is that John has once again figured out what most of them are and we do have a lightly better CPU scheduler and Work Scheduler in the mill.

The issue has been that this is not an easy component to work on. It has gone through at least 3 major generations with new projects, like CPDN, causing possible issues.

Heck, I have the opposite problem ... I have a CPDN work unit I can't keep IN EDF mode so it will finish on time ... :)
ID: 2871 · Report as offensive

Message boards : BOINC client : Scheduler and CPDN

Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.