Posts by BarryAZ

1) Message boards : Projects : News on Project Outages (Message 71963)
Posted 30 Aug 2016 by BarryAZ
Post:
Looks like the various Boinc stat sites are having problems.
2) Message boards : Projects : News on Project Outages (Message 71758)
Posted 21 Aug 2016 by BarryAZ
Post:
Collatz is offline again
3) Message boards : Projects : News on Project Outages (Message 71469)
Posted 11 Aug 2016 by BarryAZ
Post:
Collatz is back online -- fixed a problem with the firewall -- fix number 2 appears to work.

Seti is unreachable this morning.
4) Message boards : Projects : News on Project Outages (Message 71440)
Posted 10 Aug 2016 by BarryAZ
Post:
Collatz was back online around 10AM PDT yesterday -- a problem with his firewall device. He reverted to an older device -- but that seems to have the same problem 24 hours later. No access to the home page, let alone the database.
5) Message boards : Projects : News on Project Outages (Message 71320)
Posted 9 Aug 2016 by BarryAZ
Post:
Collatz site is offline - since about 7 PM PDT
6) Message boards : Projects : News on Project Outages (Message 68743)
Posted 3 Apr 2016 by BarryAZ
Post:
Add another 19 hours to the POEM is not around lament.
7) Message boards : Projects : News on Project Outages (Message 68738)
Posted 2 Apr 2016 by BarryAZ
Post:
And POEM is still vapor 17 hours later.
8) Message boards : Projects : News on Project Outages (Message 68710)
Posted 2 Apr 2016 by BarryAZ
Post:
Yup -- and POEM remains offline -- not even the home page.
9) Message boards : Projects : News on Project Outages (Message 66325)
Posted 21 Dec 2015 by BarryAZ
Post:
POEM site currently can't be accessed at all
10) Message boards : Projects : News on Project Outages (Message 65952)
Posted 10 Dec 2015 by BarryAZ
Post:
The Collatz database crash sequence seems to be on the order of -- expect it to happen and plan for it with available alternative projects.

The crash happens when MySQL goes unresponsive. There is no solid way to anticipate when that will happen and thus no means to pre-empt an unplanned crash via a planned shutdows (or scripted shut down and restart).

The database crash happens after the server has been running for somewhere between 20 and 48 hours.

When it crashes, the recovery cycle takes between 4 hours and 24 hours, depending on a number of variables.

Until the length of the work units is significantly increased (something that is being worked on)< the workload on the server (one of the apparent variables regarding the semi-regular crashes) will continue to result in the current frequency of the crashes.

The hope is that with a shift to the longer work units, the server load will be abated and the frequency of crashes reduced from the current 2 to 3 times a week to less than once a week.
11) Message boards : Projects : News on Project Outages (Message 65659)
Posted 28 Nov 2015 by BarryAZ
Post:
Collatz is running again -- at the momemnt.

I'm wondering with the increasingly frequent database server crashes whether something might done to make them planned instead of unplanned and thus much shorter in duration.

I get it that with the very short sieve units, the processing load has increased a lot.

My own suspicion, perhaps ill-informed, is that the database server is encountering some memory leak (as I suspect it always had), which is made worse by the higher volume processing.

Since it appears that resolving the actual problem is not an option for whatever reason, how about pre-empting it?

My (admittedly novice) suggestion would be a pair of scripts.

One would take down the database server *gracefully* at a programmed time of day (perhaps every day).

The other would restart the database server about 10 minutes later.

Perhaps something along these lines would restore the server to a 'memory clean slate' each cycle.

Just a thought from one of the users.
12) Message boards : Projects : News on Project Outages (Message 65653)
Posted 27 Nov 2015 by BarryAZ
Post:
As I noted before, it was almost inevitable that with the short run sieve work units, the stability problems that have manifested themselves with the Collatz project would become far more frequent.

To me, this has the feel of some form of memory leak where the database server simply runs out of available memory and collapses under the strain.

Not knowing how one would deal with it, my simplistic thinking would be to run a script or scripts.

One would take the database server down 'gracefully' based on a clock.

One would then restart the database server as part of a *planned* server reset to restore the memory.

But that's just me and I am likely to be rather clueless about the issue and a work around.
13) Message boards : Projects : News on Project Outages (Message 65650)
Posted 27 Nov 2015 by BarryAZ
Post:
Gary, thanks for the reply.

Yeah, I get that Gary, but I had shifted it to a primary on multiple systems -- which I had also done years ago as well.

Over the past few years I had shifted to Milkyway, MooWrap, and GPUGrid.

With Collatz shifting the sieve work units I shifted back to Collatz as the primary project.

With GPUGrid though, given its long run work units -- it is either primary or the work units can time out.

So that's the major change.

I just don't like seeing those 10 minute completed work units stack up with Collatz.

Though as long as Collatz has a 50 unit limit and I have the other projects in a receive work mode, I suppose (aside from the systems running GPUgrid) that would work out.

It is rather unfortunate the Collatz breaks down so regularly these days.

Today's daily outage is at 9 hours or more...
14) Message boards : Projects : News on Project Outages (Message 65648)
Posted 27 Nov 2015 by BarryAZ
Post:
Collatz is down yet again -- went down about 4AM this morning. It is still down.

This has become an almost daily event for Collatz -- followed by a downtime which runs between 4 and 24 hours at which time the database server has been manually rebooted.

At this stage, the down time for Collatz is running equal to its up time.

Given that there has been no discussion over at the project regarding the problems, it seems that folks may do as I'm about to do and consign Collatz to secondary project status -- to be run only when its status can be closely watched externally by the users.

For me that means, before I go to sleep, I'll suspend Collatz as I have some faith that the data base server will crash over night. Then, during days when I can check Collatz status regularly, I'll let the project process. Then when (not if) it goes offline during the day, I'll suspend it again until the next moment when it is running.

I realize that Collatz is a one person enterprise, I just wish it was somewhat less unreliable these days.
15) Message boards : Projects : News on Project Outages (Message 65643)
Posted 26 Nov 2015 by BarryAZ
Post:
Collatz is currently back up and running...
16) Message boards : Projects : News on Project Outages (Message 65635)
Posted 26 Nov 2015 by BarryAZ
Post:
It seems that the frequency of Collatz database crashes is increasing while the return to operation cycle is getting longer....

I am not sure that anyone else is noticing this....
17) Message boards : Projects : News on Project Outages (Message 65633)
Posted 26 Nov 2015 by BarryAZ
Post:
Collatz was back online for almost an entire day before it crashed yet againabout 2 hours ago (2:30PM PST).

It seems given the regularity with which it crashes and the seeming undefined problem which has persisted for a VERY long time, perhaps some effort could be put toward developing a shut down / restart script which could be run automatically on a daily basis....
18) Message boards : Projects : News on Project Outages (Message 65577)
Posted 24 Nov 2015 by BarryAZ
Post:
The Collatz database remains offline -- this outage in a bit longer than the average (and frequent) outage for Collatz -- it is now at about 12 hours.
19) Message boards : Projects : News on Project Outages (Message 65568)
Posted 23 Nov 2015 by BarryAZ
Post:
Collatz appears to be in a bit of a yo-yo mode. The database was offline yesterday (11/22) for about 8 hours. Back up around noon yesterday. It was up this morning about 8AM, and is offline again as of 9AM.

One thing of note, the problem there (which is clearly chronic) doesn't extend to uploads -- they seem to go through. That means that when the database server is rebooted (and it appears that is all that is being done at the moment when it crashes) it validates a very large collected set of uploaded work units, processes the large set of new reports that occur once the data base server is alive and validates these. Then it sends out new work.

Until the next time it crashes.

With the very much increased workload that the sieve units (which complete in a much shorter time than the previous work units), the database server has been crashing quite a bit more frequently of late -- about 10 times in the past month.
20) Message boards : Projects : News on Project Outages (Message 65423)
Posted 13 Nov 2015 by BarryAZ
Post:
Collatz is back up -- outage this time was about 8 hours.


Next 20

Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.