Posts by David Ball

1) Message boards : BOINC client : What does busy mean in work_fetch event log entry, also rr_sim doing something strange with time slice. (Message 91439)
Posted 8 May 2019 by David Ball
Post:
Thanks for the help. I realize now that busy just meant that it was giving an extra little bit of runtime to Rosetta. My original post was on the 5th. I was configured to get 4 days of work. The Rosetta work units were due on the 12th. rr_sim had apparently figured out that they would miss the deadline due to additional work being fetched to maintain 4 days of work.

One thing I noticed: While it was still showing some busy time, but not as much as in the original post, I shortened the configuration for the amount of work that Iit was getting by 0.25 days ( from 4.00 to 3.75) and that made the "busy" work go to zero. rr_sim sure has to deal with a lot of complicated things.

Thanks,

David
2) Message boards : BOINC client : What does busy mean in work_fetch event log entry, also rr_sim doing something strange with time slice. (Message 91383)
Posted 5 May 2019 by David Ball
Post:
5/5/2019 1:35:28 AM | | [work_fetch] shortfall 0.00 nidle 0.00 saturated 395513.63 busy 19859.02


I've tried looking through the source code but that left me even more confused. It's reset to zero in
"struct BUSY_TIME_ESTIMATOR", and when BUSY_TIME_ESTIMATOR goes through
the instances and finds the lowest value, then adds that value to the busy time of the others.

rr_sim sets the busy time to non-zero but I've only looked at it for a few minutes and that's not long enough to fully understand what's going on. Also, why does rr_sim use a time slice of 3600 seconds when I have it set to 720 minutes (12 hours aka 43200 secs) so my large jobs will run straight through? The large time slice keeps me from having a bunch of partially complete jobs that I have to wonder when it will get around to finishing, and the extra jobs that are started increase the memory footprint. The large time slice is especially good on projects where the jobs have about a 1GB memory footprint.

What does the "busy 19859.02" in the above event log entry mean?

Does a non-zero value for "busy" in the log entry mean that it's run into trouble completing the work on time?

Sorry if I'm being a pain with my question. It sort of grew beyond the basic question when I looked at the source code.

-- David
3) Message boards : Questions and problems : Rosetta hogging resources (Message 89659)
Posted 17 Jan 2019 by David Ball
Post:
When I want to limit how many task Rosetta can run at once, I go to the \projects\boinc.bakerlab.org_rosetta subdirectory of the BOINC program data directory and add an app_config.xml file with the following in it:

<!-- Rosetta -->

<app_config>
<project_max_concurrent>4</project_max_concurrent>
</app_config>


The above will prevent BOINC from running more than 4 Rosetta tasks (aka workunits) at once. It does not effect how many many Rosetta workunits are downloaded from the server - just prevents more than 4 from running at a time. You can change the 4 to any number you want that is 1 or greater up to the number of cores you have BOINC running on.

NOTE: if you already have an app_config.xml, you will need to read the docs to find out how to merge them.

On Windows you can use advanced view on the interface and select from the top menu "Options" and "read config files" to get BOINC to start using the setting in the app_config.xml. BOINC will also place an error message in the log file if it finds an error in the file. I haven't run BOINC on Linux for several years so I can't be sure of the command line to cause it to read the config files, but I think it's

boinccmd --read_cc_config

The --read_cc_config option will Reread the configuration files, to include cc_config.xml and any app_config.xml existing in the project folders.


BTW, my contact info is out of date so posting in the forum is the only way to get a message to me.
4) Message boards : Projects : Anyone else having WCG problems? (Message 62228)
Posted 18 May 2015 by David Ball
Post:
I don't know what caused it but I've kept trying and the error eventually went away and it accepted the report for that workunit. I didn't reboot or reset the project or anything like that. I still wonder if anyone else was having the same problem.
5) Message boards : Projects : Anyone else having WCG problems? (Message 62202)
Posted 17 May 2015 by David Ball
Post:
Here's what happens when my system tries to report a result to WCG:

5/17/2015 10:28:29 AM | World Community Grid | [work_fetch] request: CPU (0.00 sec, 0.00 inst) NVIDIA GPU (0.00 sec, 0.00 inst)
5/17/2015 10:28:29 AM | World Community Grid | Sending scheduler request: Requested by user.
5/17/2015 10:28:29 AM | World Community Grid | Reporting 1 completed tasks
5/17/2015 10:28:29 AM | World Community Grid | Not requesting tasks: don't need (CPU: job cache full; NVIDIA GPU: job cache full)
5/17/2015 10:28:32 AM | | Project communication failed: attempting access to reference site
5/17/2015 10:28:32 AM | World Community Grid | Scheduler request failed: Unrecognized or bad HTTP Content or Transfer-Encoding
5/17/2015 10:28:32 AM | | [work_fetch] Request work fetch: RPC complete
5/17/2015 10:28:34 AM | | Internet access OK - project servers may be temporarily down.


It didn't have any problem uploading the result, just reporting it.

It's not having any problem with the 10+ other projects the computer is attached to.

-- David
6) Message boards : Projects : News on Project Outages (Message 60704)
Posted 6 Mar 2015 by David Ball
Post:
Haven't gotten any work from Convector for a while. For the last 2 or 3 days, I can't even connect to the server either through boinc or through a web browser. Boincsynergy says it hasn't been able to get stats for 3.55 days. Does anyone know if the project is going away or just having problems?

http://convector.fsv.cvut.cz/

edit: Add project URL
7) Message boards : Projects : News on Project Outages (Message 60206)
Posted 10 Feb 2015 by David Ball
Post:
EDGeS@Home has been down for a couple of days. The BOINC client and
even the website gets a "cannot connect" error message.
8) Message boards : Projects : QMC@home/cleanmobility.now (Message 60034)
Posted 28 Jan 2015 by David Ball
Post:
It was 6 months ago. What is status now?


I had the same type of problems at the beginning of this month. I finally aborted the WU after about 340 thousand seconds. I selected to not run the cleanmobility application and have since run a QMC@HOME-orca-alphabeta and it finished in less than 2 hours and got credit.

One thing they might want to do is only distribute the cleanmobility work units to windows for a week or two starting a couple of days after patch tuesday since the monthly patches always seem to require a rebooot.

-- David
9) Message boards : Questions and problems : Ubuntu Linux with GeForce GT 630 - No usable GPUs found (Message 58846)
Posted 22 Dec 2014 by David Ball
Post:
I don't know how to solve you problem but I can tell you that the GT 630 is a very strange card. It comes in 3 types. I am listing then in performance order from slowest to fastest.

GT 630 D3 = Fermi / basically a GT 440 with DDR3 memory and a 700 mhz clock / 49 watts

GT 630 G5 = Fermi / basically a GT 440 with GDDR5 memory and a 810 mhz clock / 65 watts

The above 2 types are based on the Fermi architecture and have 96 CUDA cores and a 128 bit memory interface. The third type is:

GT 630 = Kepler architecture GPU with 384 CUDA cores at 902 mhz with a 64 bit memory interface and DDR3 memory. It uses only 25 watts.

Note that I don't know which is faster on the few projects requiring DP (double precision floating point) math, the Fermi GDDR5 or the Kepler card. nVidia changed the way they do DP on the Kepler.

On Fermi they had cores that could do either single or double precision math and turned DP math off on most cores when they sold them in graphics cards. IIRC, the Fermi graphics cards only had 1/8th or 1/12th of the cores DP enabled. The reason Fermi had all of the cores DP capable was because they could use the same chip with all of the CUDA cores DP enabled in supercomputers or expensive workstations and sell them for several thousand dollars per card. I think the reason for using them in expensive workstations was because the expensive cards had different drivers and were the only ones that were certified for some professional software that had to produce perfect results. When companies are designing something like an entire aircraft they need those perfect results so they only use the expensive graphics cards that are certified for the design software.

On Kepler graphics cards the regular CUDA cores are single precision only (SP is what is needed for graphics) and they have a few separate double precision only cores (8 IIRC). I think the Kepler chips also have much less cache since they are graphics cards only. By not having all of the cores be double precision capable like in Fermi, the Kepler can have more SP CUDA cores in a similar sized chip. BTW, it was not unusual for the older graphics cards (pre-fermi) to have single precision only. My nvidia 9600 GT card didn't have double precision at all so it could only run the BOINC gpu projects that didn't require double precision.

You can find more information on the GT 630 versions at http://www.geforce.com/hardware/desktop-gpus/geforce-gt-630/specifications

I have run a Fermi based GT 630 under windows Vista and it worked great with Seti, Milkyway, Einstein, GPUGRID and PrimeGrid. It also worked with the original POEM GPU project (haven't tried the new one that POEM has now) and it worked with the WCG GPU sub-project that used to exist but that sub-project has finished.

Does your GPU have 96 CUDA cores (Fermi architecture) or 384 CUDA cores (Kepler architecture)?
10) Message boards : Projects : News on Project Outages (Message 58756)
Posted 19 Dec 2014 by David Ball
Post:
POEM is still having problems. I managed to get some work from them but I can't turn it in because I'm getting a scheduler response that it can't connect to server (database server I think).

If you go to the POEM main web page, part of it isn't displaying and it's saying something about too many database connections.

Oops, just checked the home page and now I'm getting can't connect to server errors in Firefox.

On the asteroids@home problem, I was under the impression it would be up sometime on the 18th but no such luck. I doubt we'll see it open before next week. It's basically supposed to be open from 9 AM to 7 PM UTC, which sounds a lot like normal weekday business hours, I'm guessing that it might be closed on weekends. Sure do wish we had something definite about when it will be back up. I've suspended it and went into the BOINC xml files and lowered it's priority so the percentages that the BOINC manager shows will be more accurate. I wish there was an option in the BOINC manager to have it display the real percentages each project has with the suspended projects having a share of zero. Currently, I've got work_fetch_debug turned on in the cc_config file so that I can see the real percentages in the event log.

-- David
11) Message boards : News : FightMalaria@Home relaunched (Message 58732)
Posted 17 Dec 2014 by David Ball
Post:
Thanks for the info. BTW, even though it has a new name, our old accounts still exist and have the same username (email address) and password.
12) Message boards : Projects : Is Volpex dead? (Message 58731)
Posted 17 Dec 2014 by David Ball
Post:
Volpex is still handing out WU but they're having a problem with too many connections to the server so the WU either abort themselves because the deadline was reached before the job could be downloaded or the job starts and gets a compute error in the first minute because the job can't connect to the server.
13) Message boards : Projects : Is Volpex dead? (Message 58377)
Posted 2 Dec 2014 by David Ball
Post:
They're still working. I got a WU during the night and this morning stats.free-dc.org says I gained 68 points today.
14) Message boards : Projects : News on Project Outages (Message 55753)
Posted 30 Aug 2014 by David Ball
Post:
convector.fsv.cvut.cz is back up.
15) Message boards : Projects : News on Project Outages (Message 55698)
Posted 29 Aug 2014 by David Ball
Post:
convector.fsv.cvut.cz is still down and can't even be connected to with a browser. It's been close to a week. Has anyone heard anything?
16) Message boards : Projects : QMC@home/cleanmobility.now (Message 55587)
Posted 26 Aug 2014 by David Ball
Post:
Apparently the new version of the project isn't running the server code to export credits yet. Boincsynergy (http://www.boincsynergy.com/stats/) says the last time they updated credits was 627 days ago.


EDIT: FYI, while the stats sites say I have 1,134,443 credits, my BOINC manager on a machine that is attached to QMC says I have 1,141,172 credits so I am getting credits for the work units that I've run but the server isn't creating the file that the stats sites read to get the current credits.
17) Message boards : Projects : News on Project Outages (Message 55586)
Posted 26 Aug 2014 by David Ball
Post:
My BOINC client has been unable to connect to convector.fsv.cvut.cz for about 3 days now. I can't even connect with a browser.
18) Message boards : Projects : Projects are offline. Often. (Message 55447)
Posted 17 Aug 2014 by David Ball
Post:
Also, it's normal for projects to go offline at least a few hours (some even a day or two) once a week for weekly maintenance on the BOINC databases.

Some projects are online but aren't distributing new work because they are going to introduce new software whose results are in a different format than the existing software and that would cause any existing jobs that haven't been validated to fail. They stop distributing new jobs, except for the ones that are already in the queue for the old software, until all the existing jobs have been processed and returned or it's been so long that they abandon them. Then they kill any old jobs left over and switch to the new software and distribute new jobs.

Many of the projects are run from universities and I've been amazed at how often the project has to go down because the university is doing maintenance and cutting power to the building that the servers happen to be in for a day or two (usually a weekend since the buildings house mostly classrooms).

Some of the smaller ( or even mid-range ) projects can have problems because
A) A large team with 1000+ users and some users having as many as 50 dedicated BOINC crunchers decides to make that the project of the month and swamps the projects servers.
B) If the project has a GPU application and one or two of the major GPU projects have either dropped their GPU application or are down for some reason, a massive number of clients who do GPU computing may switch to that smaller project and it can't cope with the several hundred percent increase in clients. For instance, World Community Grid and POEM have had GPU projects that finished and they suddenly had no more GPU work. FYI, POEM is starting a new GPU project but it's been a long time and I've also heard people complaining about the credit the new POEM GPU application is going to award. I don't know if the new POEM GPU project is live yet or still in testing phase. I have my machines set to only run the POEM CPU projects. The machine that I did have trying to get GPU work from POEM had a motherboard failure (SATA controller) so it's not running any more.
19) Message boards : Projects : NRG@Home (Message 54043)
Posted 10 May 2014 by David Ball
Post:
I attached NRG with one of my C2D machines and it's working great for me. Since the machine I tested it on is running several projects and keeps about 1.25 days of work, some of the workunits from NRG are running in high priority mode. You might want to add another day to the deadline unless you need them very quickly for some reason. I'll start adding NRG to the other projects on some of my dedicated C2Q Q6600 crunchers.

For those who are interested, I haven't gotten ANY errors from the workunits. Well done.
20) Message boards : Projects : News on Project Outages (Message 54042)
Posted 10 May 2014 by David Ball
Post:
Anybody know what's going on with EON? They're usually extremely reliable but BOINC can't contact them and I haven't even been able to reach their website for a couple of days. I get "Server Not Found" in Firefox.

http://eon.ices.utexas.edu/eon2/


Next 20

Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.