Message boards : Projects : News on Project Outages
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · 8 . . . 67 · Next
Author | Message |
---|---|
Send message Joined: 4 Sep 09 Posts: 381 |
Over the past few weeks, the outages have been relatively short 00 say four to eight hours. This one is now at about 15 hours.... |
Send message Joined: 4 Sep 09 Posts: 381 |
That the Collatz database server has been offline this long over the weekend raises the possibility that the admin there (it's a one person effort, running it out of his home I believe), may not be around this weekend to do a server restart. |
Send message Joined: 4 Sep 09 Posts: 381 |
Collatz database server was back and running Monday morning -- it is offline again as of 1PM PST today. |
Send message Joined: 29 Aug 05 Posts: 15560 |
Seti going to be out of work for a day or more. Matt Lebofsky wrote: BUT ALSO we needed to update some fields in the current science database schema to also make the database itself telescope agnostic. Just a few "alter table" commands to lengthen the tape name fields beyond 20 characters. We thought these alters would take a few hours (and completed before the end of today's Tuesday outage). Now it looks like it might take a day. We can't split/assimilate any new work until the alters are finished. Oh well. We're going to run out of work tonight, but should have fresh work sometime tomorrow morning. It is a holiday tomorrow, so cut us some slack, if it's later than tomorrow morning :). Source. |
Send message Joined: 18 Jul 11 Posts: 217 |
Not getting any work from WCG Fight AIDS Phase 2 |
Send message Joined: 18 Jul 11 Posts: 217 |
Not getting any work from WCG Fight AIDS Phase 2 Received work |
Send message Joined: 4 Sep 09 Posts: 381 |
As is frequently the case with the higher workload presented by the shorter work units, the Collatz database server is offline (this now happens 3 or more times a week). The outages typically run from 4 hours to 24 hours at which time the database server gets recycled and the clock on the next outage is restarted. |
Send message Joined: 4 Sep 09 Posts: 381 |
Collatz is back up -- outage this time was about 8 hours. |
Send message Joined: 4 Sep 09 Posts: 381 |
Collatz appears to be in a bit of a yo-yo mode. The database was offline yesterday (11/22) for about 8 hours. Back up around noon yesterday. It was up this morning about 8AM, and is offline again as of 9AM. One thing of note, the problem there (which is clearly chronic) doesn't extend to uploads -- they seem to go through. That means that when the database server is rebooted (and it appears that is all that is being done at the moment when it crashes) it validates a very large collected set of uploaded work units, processes the large set of new reports that occur once the data base server is alive and validates these. Then it sends out new work. Until the next time it crashes. With the very much increased workload that the sieve units (which complete in a much shorter time than the previous work units), the database server has been crashing quite a bit more frequently of late -- about 10 times in the past month. |
Send message Joined: 4 Sep 09 Posts: 381 |
The Collatz database remains offline -- this outage in a bit longer than the average (and frequent) outage for Collatz -- it is now at about 12 hours. |
Send message Joined: 4 Sep 09 Posts: 381 |
Collatz was back online for almost an entire day before it crashed yet againabout 2 hours ago (2:30PM PST). It seems given the regularity with which it crashes and the seeming undefined problem which has persisted for a VERY long time, perhaps some effort could be put toward developing a shut down / restart script which could be run automatically on a daily basis.... |
Send message Joined: 4 Sep 09 Posts: 381 |
It seems that the frequency of Collatz database crashes is increasing while the return to operation cycle is getting longer.... I am not sure that anyone else is noticing this.... |
Send message Joined: 4 Sep 09 Posts: 381 |
Collatz is currently back up and running... |
Send message Joined: 28 May 10 Posts: 52 |
Has anyone information about Citizen Science Grid (DNA@home and Subset Sum)? Most servers are down and questions on message board are not answered. |
Send message Joined: 4 Sep 09 Posts: 381 |
Collatz is down yet again -- went down about 4AM this morning. It is still down. This has become an almost daily event for Collatz -- followed by a downtime which runs between 4 and 24 hours at which time the database server has been manually rebooted. At this stage, the down time for Collatz is running equal to its up time. Given that there has been no discussion over at the project regarding the problems, it seems that folks may do as I'm about to do and consign Collatz to secondary project status -- to be run only when its status can be closely watched externally by the users. For me that means, before I go to sleep, I'll suspend Collatz as I have some faith that the data base server will crash over night. Then, during days when I can check Collatz status regularly, I'll let the project process. Then when (not if) it goes offline during the day, I'll suspend it again until the next moment when it is running. I realize that Collatz is a one person enterprise, I just wish it was somewhat less unreliable these days. |
Send message Joined: 23 Feb 08 Posts: 2493 |
Collatz is down yet again -- went down about 4AM this morning. It is still down. The reality is it suspends itself when it becomes unreachable and your BOINC scheduler will automatically grab work from other projects. There isn't that much data traffic sending out a few packets that are not answered waiting for a timeout. BOINC keeps on with other projects and work units. But if you want the aggravation of manually starting and stopping a project, you can choose to give yourself this headache. |
Send message Joined: 4 Sep 09 Posts: 381 |
Gary, thanks for the reply. Yeah, I get that Gary, but I had shifted it to a primary on multiple systems -- which I had also done years ago as well. Over the past few years I had shifted to Milkyway, MooWrap, and GPUGrid. With Collatz shifting the sieve work units I shifted back to Collatz as the primary project. With GPUGrid though, given its long run work units -- it is either primary or the work units can time out. So that's the major change. I just don't like seeing those 10 minute completed work units stack up with Collatz. Though as long as Collatz has a 50 unit limit and I have the other projects in a receive work mode, I suppose (aside from the systems running GPUgrid) that would work out. It is rather unfortunate the Collatz breaks down so regularly these days. Today's daily outage is at 9 hours or more... |
Send message Joined: 4 Sep 09 Posts: 381 |
As I noted before, it was almost inevitable that with the short run sieve work units, the stability problems that have manifested themselves with the Collatz project would become far more frequent. To me, this has the feel of some form of memory leak where the database server simply runs out of available memory and collapses under the strain. Not knowing how one would deal with it, my simplistic thinking would be to run a script or scripts. One would take the database server down 'gracefully' based on a clock. One would then restart the database server as part of a *planned* server reset to restore the memory. But that's just me and I am likely to be rather clueless about the issue and a work around. |
Send message Joined: 4 Sep 09 Posts: 381 |
Collatz is running again -- at the momemnt. I'm wondering with the increasingly frequent database server crashes whether something might done to make them planned instead of unplanned and thus much shorter in duration. I get it that with the very short sieve units, the processing load has increased a lot. My own suspicion, perhaps ill-informed, is that the database server is encountering some memory leak (as I suspect it always had), which is made worse by the higher volume processing. Since it appears that resolving the actual problem is not an option for whatever reason, how about pre-empting it? My (admittedly novice) suggestion would be a pair of scripts. One would take down the database server *gracefully* at a programmed time of day (perhaps every day). The other would restart the database server about 10 minutes later. Perhaps something along these lines would restore the server to a 'memory clean slate' each cycle. Just a thought from one of the users. |
Send message Joined: 8 Nov 14 Posts: 11 |
Anyone have any word on QCN? |
Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License,
Version 1.2 or any later version published by the Free Software Foundation.