Computation error

Message boards : Questions and problems : Computation error
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile Wizardling
Avatar

Send message
Joined: 4 Mar 09
Posts: 39
New Zealand
Message 62625 - Posted: 21 Jun 2015, 19:34:21 UTC

I'm getting a lot of computation errors for projects I've recently added or re-added, after a few years of mostly - and trouble free - running of only SETI@home and World Community Grid on this computer

Any idea as to why? I would post in the NumberFields@home board, for example, but lacking any credit I'm prevented from letting them know they have a problem/finding out what my problem is.

Mid-2007 24" 2.8GHz iMac, 4GB RAM, ATI Radeon HD 2600 Pro 256MB VRAM, OS X 10.6.8
"The surest sign that intelligent life exists elsewhere in the universe is that it has never tried to contact us." - Calvin and Hobbes (Bill Watterson)
ID: 62625 · Report as offensive
Claggy

Send message
Joined: 23 Apr 07
Posts: 1112
United Kingdom
Message 62626 - Posted: 21 Jun 2015, 20:12:34 UTC - in response to Message 62625.  
Last modified: 21 Jun 2015, 20:12:48 UTC

I'm getting a lot of computation errors for projects I've recently added or re-added, after a few years of mostly - and trouble free - running of only

Any idea as to why?

How about checking the error codes of the failed tasks at the projects concerned?

Claggy
ID: 62626 · Report as offensive
Profile Wizardling
Avatar

Send message
Joined: 4 Mar 09
Posts: 39
New Zealand
Message 62632 - Posted: 21 Jun 2015, 22:37:52 UTC - in response to Message 62626.  

If you can tell me how, I will. The only stderr.txt files (is that what I'm after?) are in /Library/Application\ Support/BOINC\ Data/slots and have nothing to do with the project's that have failing apps.
"The surest sign that intelligent life exists elsewhere in the universe is that it has never tried to contact us." - Calvin and Hobbes (Bill Watterson)
ID: 62632 · Report as offensive
Profile Jord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15466
Netherlands
Message 62633 - Posted: 22 Jun 2015, 1:13:30 UTC

Your list with tasks for your computer, http://numberfields.asu.edu/NumberFields/results.php?hostid=23920, click on the Task ID for details of what the tasks sent back. In your case, signal 11, or translated a segmentation error pointing to a problem with your memory, virtual memory (page file) or that it's a bad batch of tasks.

However, if you're the only one returning these as an error and consistently over two or more projects, you best go look into a problem with the RAM or page file on that computer.
ID: 62633 · Report as offensive
Profile Wizardling
Avatar

Send message
Joined: 4 Mar 09
Posts: 39
New Zealand
Message 62634 - Posted: 22 Jun 2015, 2:35:18 UTC - in response to Message 62633.  

I have no other issues, so I have to blame BOINC given it's happening across multiple projects, all newly added on this machine (or re-added after being removed years ago) and applications within projects, but not all projects.

If I had RAM or VM problems I'd see evidence elsewhere, unless it was an extremely subtle issue. But it has been months since my last kernel panic, and I get one or two of those a year, which is normal enough, given this Mac runs 24/7 and is heavily used.

Restarting BOINC and restarting my machine has had no effect. Several projects continue to run unaffected. I guess I could try a completely clean reinstall of BOINC and the projects, but that would entail the loss of some work which I dislike doing, but I guess that is no big thing.
"The surest sign that intelligent life exists elsewhere in the universe is that it has never tried to contact us." - Calvin and Hobbes (Bill Watterson)
ID: 62634 · Report as offensive
Profile Jord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15466
Netherlands
Message 62635 - Posted: 22 Jun 2015, 6:21:10 UTC - in response to Message 62634.  
Last modified: 22 Jun 2015, 6:27:13 UTC

BOINC doesn't do any calculations, doesn't load your memory to capacity, it is merely managing things. Its memory use is minor. To blame it and not (memtest) look elsewhere only because you feel it cannot be true, is just idly wishing
 
Project science applications running under BOINC stress your hardware in ways that the other software on your system, including games, have never done before. Your wingmen (the others running the same work as you) manage to finish the work without errors, so what, their BOINC is better?
ID: 62635 · Report as offensive
Profile Wizardling
Avatar

Send message
Joined: 4 Mar 09
Posts: 39
New Zealand
Message 62639 - Posted: 22 Jun 2015, 11:04:13 UTC - in response to Message 62635.  

This Mac only has 4GB RAM - it is always all in use, and I've been running SET@home and World Community Grid without any troubles, not to mention other apps for years. I still run a bunch of BOINC projects and apps ok.

If this is a RAM problem, it's the most subtle one I've ever encountered...
"The surest sign that intelligent life exists elsewhere in the universe is that it has never tried to contact us." - Calvin and Hobbes (Bill Watterson)
ID: 62639 · Report as offensive
Profile Jord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15466
Netherlands
Message 62644 - Posted: 22 Jun 2015, 14:57:04 UTC - in response to Message 62639.  
Last modified: 22 Jun 2015, 16:44:22 UTC

The computation error is something that the science application does.
The contents of the stderr.txt as shown in the TaskID is something that the science application exits with.

So I searched at Numberfields forums for signal 11 and immediately got this thread dated yesterday, in which Eric Driver, admin of Numberfields writes:
Looking through recent Mac results I found several other Mac users having the same problem, but the majority were fine. The ones getting the signal 11 all had Darwin OS version 11.4.2 or earlier. Also, all the good results were from Darwin version 12.6.0 or higher.
and
My guess is that it's related to the version of the build tools and/or OS used to build the apps. Since I don't have a mac, I use a virtual machine, so I am kind of stuck with the OS version that came with it (Mountain Lion). Again, I'm not that familiar with the Mac OS, but it appears that any OS predating Mountain Lion (12.0.0) may have problems running NumberFields@home.

So in this case at Numberfields it would appear to be a problem with your operating system versus their science applications being incompatible.

Edit: I've asked Eric to make a news item out of it, which he has done.

Apropos, for those trying to post on forums without having credit or RAC, even the Numberfields project has a Questions and Answers forum where you can post without requiring a RAC higher than 1 and at least N credit. Most projects do.
ID: 62644 · Report as offensive
Profile Wizardling
Avatar

Send message
Joined: 4 Mar 09
Posts: 39
New Zealand
Message 62649 - Posted: 22 Jun 2015, 22:16:51 UTC - in response to Message 62644.  

Cheers for the info, though I'd still need any credit to post on a totally new (to me) project's page.

Also - this has happened with a bunch of projects, not just Numberfields, so a bunch of them are obviously mis-compiling their apps for older versions of OS X.
"The surest sign that intelligent life exists elsewhere in the universe is that it has never tried to contact us." - Calvin and Hobbes (Bill Watterson)
ID: 62649 · Report as offensive
Profile Jord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15466
Netherlands
Message 62650 - Posted: 22 Jun 2015, 23:46:41 UTC - in response to Message 62649.  

Cheers for the info, though I'd still need any credit to post on a totally new (to me) project's page.

Not on the Questions and Answers forum, these forums are specifically added without the requirement to have credit or RAC so you can post when you have problems from the start. Just to prove this, see this user Recliner posting message nr 1273 with zero credit and zero RAC.

The main NF forums require credit and a RAC higher than 1, which makes sure that spammers can't post their crap there.

this has happened with a bunch of projects

Which ones?
.. so a bunch of them are obviously mis-compiling their apps for older versions of OS X.

Not so obvious. It can be lots of other things. But without you telling us about the projects that have the problems, that's difficult to figure out. Thus far you only said you do not have problems with Seti and WCG.
ID: 62650 · Report as offensive
Profile Wizardling
Avatar

Send message
Joined: 4 Mar 09
Posts: 39
New Zealand
Message 62651 - Posted: 23 Jun 2015, 2:25:13 UTC - in response to Message 62650.  

The support form you linked to wasn't one I could see.
"The surest sign that intelligent life exists elsewhere in the universe is that it has never tried to contact us." - Calvin and Hobbes (Bill Watterson)
ID: 62651 · Report as offensive
Profile Wizardling
Avatar

Send message
Joined: 4 Mar 09
Posts: 39
New Zealand
Message 62652 - Posted: 23 Jun 2015, 2:46:22 UTC - in response to Message 62650.  

this has happened with a bunch of projects

Which ones?
.. so a bunch of them are obviously mis-compiling their apps for older versions of OS X.

Not so obvious. It can be lots of other things. But without you telling us about the projects that have the problems, that's difficult to figure out. Thus far you only said you do not have problems with Seti and WCG.


All the below contain computation or similar looking errors.

https://einstein.phys.uwm.edu/results.php?userid=81109
http://csgrid.org/csg/results.php?userid=94101
http://climateapps2.oerc.ox.ac.uk/cpdnboinc/results.php?userid=98031
http://escatter11.fullerton.edu/nfs/results.php?userid=31713
http://boinc.fzk.de/poem/results.php?userid=10911
https://boinc.bakerlab.org/rosetta/results.php?userid=2041
http://boinc.umiacs.umd.edu/results.php?userid=3381

All were recently added or returned to this machine. Some have been running on other computers I only see every few months, so maybe things have also been going wrong there and I've not noticed.

But on this particular machine, at least while booted into OS X (I dual boot into WinXP for gaming, in the main, and a couple other tasks, during which I run a couple projects lacking Mac support) I've only been running SETI and WCG both of which almost always run fine, e.g. -

[/url]http://setiathome.berkeley.edu/results.php?userid=8098294[url]
"The surest sign that intelligent life exists elsewhere in the universe is that it has never tried to contact us." - Calvin and Hobbes (Bill Watterson)
ID: 62652 · Report as offensive
Claggy

Send message
Joined: 23 Apr 07
Posts: 1112
United Kingdom
Message 62653 - Posted: 23 Jun 2015, 3:43:57 UTC - in response to Message 62651.  

The support form you linked to wasn't one I could see.

Straight under the search forums box it says 'If you have a question or problem, please use the Questions & Answers section of the message boards.'

Claggy
ID: 62653 · Report as offensive
Profile Wizardling
Avatar

Send message
Joined: 4 Mar 09
Posts: 39
New Zealand
Message 62654 - Posted: 23 Jun 2015, 4:28:28 UTC - in response to Message 62653.  

I admit I did not notice that. But why would I? It's in text I would not expect a forum link to be. I wasn't reading the rest of the page. I was reading the list of forums which oddly does not include the support one. So I think this is both a problem of me not looking hard enough, and poor page design :-)
"The surest sign that intelligent life exists elsewhere in the universe is that it has never tried to contact us." - Calvin and Hobbes (Bill Watterson)
ID: 62654 · Report as offensive
Profile Jord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15466
Netherlands
Message 62656 - Posted: 23 Jun 2015, 5:49:34 UTC - in response to Message 62652.  
Last modified: 23 Jun 2015, 5:53:06 UTC

http://einstein.phys.uwm.edu/show_user.php?userid=81109
taskID http://einstein.phys.uwm.edu/result.php?resultid=504231187 shows process exited with code 193 (0xc1, -63)
Crashed executable name: einstein_S6BucketFU2UB_1.01_x86_64-apple-darwin__X64
Machine type Intel 80486 (64-bit executable)
System version: Macintosh OS 10.6.8 build 10K549
Sun Jun 21 04:09:14 2015

That's something you can post at the Help Desk sections of their forum, where you can post without needing credit or RAC.

http://csgrid.org/csg/show_user.php?userid=94101
Does indeed exit with process got signal 11, something you should be able to ask about in their forums.

http://climateapps2.oerc.ox.ac.uk/cpdnboinc/show_user.php?userid=98031
The erroneous models crashed with process exited with code 9 (0x9, -247), but then you also managed to bring a couple of them home and to fruition. One possible bottleneck as always on CPDN is that your computer has only got 4GB of memory, in which it has to store the OS, CPDN and anything else running at the time. CPDN is memory hungry.

http://escatter11.fullerton.edu/nfs/show_user.php?userid=31713
No problems with 15e Lattice Sieve v1.10 and signal 11 again with 16e Lattice Sieve V5 v1.11 is something you will have to ask at their forums. Although, apparently they already know about it for an older app.

http://boinc.fzk.de/poem/show_user.php?userid=10911
process got signal 5, that one I don't know. dyld: unknown required load command 0x80000022, you might want to report that at their forums.

But doesn't everyone run POEM on a GPU these days? Perhaps that they don't test CPU apps that strongly.

https://boinc.bakerlab.org/rosetta/show_user.php?userid=2041
Same error as at POEM, process got signal 5, something you'll have to ask at the Rosetta forums.

http://boinc.umiacs.umd.edu/show_user.php?userid=3381
And also signal 5.

http://setiathome.berkeley.edu/show_user.php?userid=8098294
Great, not one error in sight. But then, one cannot easily compare the load on the CPU and memory that's done by Seti to that of any of the other projects you tried. The nearest is Einstein, and then can only be compared because both do things in space and with the Arecibo dish, but then comparisons quickly evaporate.

And as shown, it's not always signal 11.
Checking the man page for signal 5, I see that it is thrown on a Trace/Breakpoint Trap, so deliberately added by the application's programmer. He should know why his app throws the error, and thus you should ask at their forums.
Signal 11 is a segmentation error, or an Invalid memory reference. Can be caused by problematic memory, or an application addressing a wrong part of the memory.
ID: 62656 · Report as offensive
Profile Wizardling
Avatar

Send message
Joined: 4 Mar 09
Posts: 39
New Zealand
Message 62660 - Posted: 23 Jun 2015, 8:19:06 UTC - in response to Message 62656.  

Cheers Ageless for that analysis :-) I shall post about the respective issues to their respective forums, as per your advice.

I'd also add looking at my WCG results I see no errors either, however I couldn't see an easy way to post a link that would work outside my login, but here's a screenshot: I mention this as some of the WCG projects like the current The Clean Energy Project - Phase 2 required a bunch of RAM, in this application's case - 1,024 MB. SETI@home by comparison only needs around 64MB of RAM. So if I had some kind of subtle RAM issue, it feels like it ought to have shown up in other ways, like the beefier WCD apps, and of course all the other stuff I have this aging 2007 era iMac doing.

With it's paltry 4GB RAM it uses lots of VM and there is considerable read/write to the and from the VM pagefile. My RAM and VM system get quite the workout 24/7, and till adding/re-adding these extra BOINC projects, I saw no obvious issues, big or small.

I'm not saying to those who have suggested RAM issues that it can't be them. It just doesn't feel like a RAM issue, if you know what I mean :-) RAM issues in the past were sometimes subtle, but always noticeable in a more widespread manner. I'd see a variety of app's and system processes giving errors, crashing and experiencing unstable behaviour, kernel panics, freezes, unusual RAM activity, etc. It has been a long time since I owned a computer that had enough RAM and simple/efficiently written enough apps, that a bad RAM issue only noticeable at a certain usage point wouldn't happen constantly.

That said, I'll try to spend overnight running a memtest as soon as I can spare this machine (it does more than crunch BOINC project data, though I often enjoy seeing the BOINC results more than the serious work I do on it myself, heh). If you guys have any other suggestions troubleshooting-wise, I'm open to some more investigative work!

TIA :-)
"The surest sign that intelligent life exists elsewhere in the universe is that it has never tried to contact us." - Calvin and Hobbes (Bill Watterson)
ID: 62660 · Report as offensive

Message boards : Questions and problems : Computation error

Copyright © 2024 University of California. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.