Raspberry Pi client causing kernel error and freeze

Message boards : BOINC client : Raspberry Pi client causing kernel error and freeze
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
hartacus

Send message
Joined: 6 May 14
Posts: 11
Australia
Message 54046 - Posted: 10 May 2014, 14:43:22 UTC

Hi,
I have a Raspberry Pi set up using Raspbian (fully updated as of today inc. firmware). Using the version of BOINC from the repos, set up with SETI, Asteroids, Einstein and WUProp. Work units for WUProp and Einstein complete and are validated - I haven't had time for a work unit on the other two projects to complete yet.

Problem is, BOINC and/or its projects seem to be causing a kernel error. I first noticed it as being unable to connect to the BOINC client remotely and being unable to SSH into the Pi (as I'm running it headless) after the Pi had been on and crunching for a few hours/overnight. Connecting a monitor shows this issue:

BUG: unsupported FP instruction in kernel mode
internal error: Oops - undefined instruction: 0 [#1] PREEMPT ARM


which then appears to send the kernel into debug mode and hard freeze the Pi. The associated process is "period_search_1" because at this moment I was running an Asteroids task. To test, I suspended Asteroid and ran a SETI task which also resulted in a freeze.

So, my questions:
1. Anyone else have this problem and fix it?

2. Is this the right place to ask, or should I be asking in the projects' forums? (I suspect this is the right place, because I got this happening with multiple projects.)

3. It appears to be a problem with CPU preemption being unsupported on a stock Raspbian kernel. Short of compiling a custom kernel (something I can do but would rather not), is there any way to make BOINC/projects not use CPU preemption? Say, if I set the client to run at 100% CPU all of the time regardless of CPU load from other programs could that help?

4. Could this be at all related to the Explorer crashes I've noticed on my Win7 BOINC box (see recent thread)? Drawing a long bow I know, but if Windows just has a nicer crash recovery setup than the Raspbian kernel the symptoms could be paralleled.

5. Since it's a problem with an unsupported kernel instruction, why isn't it happening immediately/far more frequently? Sometimes it happens within a few minutes of startup, sometimes it'll run overnight without a problem.

Thanks for any ideas you may have.
ID: 54046 · Report as offensive
Claggy

Send message
Joined: 23 Apr 07
Posts: 1112
United Kingdom
Message 54050 - Posted: 10 May 2014, 17:18:39 UTC - in response to Message 54046.  
Last modified: 10 May 2014, 17:27:54 UTC

I have the same problem on my Pi running Raspbian Jessie and the 3.12.x kernel, the 3.10.x kernel had no problem:

http://www.raspberrypi.org/forums/viewtopic.php?p=547713#p547713

Claggy
ID: 54050 · Report as offensive
hartacus

Send message
Joined: 6 May 14
Posts: 11
Australia
Message 54054 - Posted: 11 May 2014, 1:02:32 UTC - in response to Message 54050.  

Interesting... I've posted over there so I can follow along and add momentum to the investigation. Let me know if you come across a solution.
ID: 54054 · Report as offensive
MarkJ
Volunteer tester
Help desk expert

Send message
Joined: 5 Mar 08
Posts: 272
Australia
Message 54064 - Posted: 11 May 2014, 15:07:59 UTC

I've not had this problem running e@h work. I have a couple of Pi's running Jessie with BOINC 7.2.42 from the repo. I have 3 more running Wheezy with BOINC 7.0.27 also from the repo.

I used to run the a@h app. That was before they increased the size of their work units for the 2nd time. I will try a few a@h tasks and report back how it's going. It might take a while as their tasks are fairly large now (for a Pi at least).
MarkJ
ID: 54064 · Report as offensive
hartacus

Send message
Joined: 6 May 14
Posts: 11
Australia
Message 54067 - Posted: 12 May 2014, 1:02:26 UTC - in response to Message 54064.  

It might take a while as their tasks are fairly large now (for a Pi at least).


I'm getting a freeze roughly every day, sometimes more often, so if you're getting the issue you should get it soon. Let us know how you go!
ID: 54067 · Report as offensive
MarkJ
Volunteer tester
Help desk expert

Send message
Joined: 5 Mar 08
Posts: 272
Australia
Message 54075 - Posted: 12 May 2014, 10:27:40 UTC
Last modified: 12 May 2014, 10:31:13 UTC

When I logon into the Pi it says:
Linux *** 3.10.25+ #622 PREEMPT Fri Jan 3 18:41:00 GMT 2014 armv6l


I assume that means its using the 3.10.25 kernel. This one has just started an Asteroids task which its estimating at 115 hours. Its running Wheezy and BOINC 7.0.27 at the moment. I was going to update it but I will see how it goes with this one.

The other ones that I have updated also claim to be running a 3.10.25+ PREEMPT kernel so it might be pointless in doing this test. All I did to upgrade them was an apt-get dist-upgrade after changing my apt sources.list, which probably why the kernel is the same on both.
MarkJ
ID: 54075 · Report as offensive
Claggy

Send message
Joined: 23 Apr 07
Posts: 1112
United Kingdom
Message 54087 - Posted: 12 May 2014, 17:45:40 UTC - in response to Message 54075.  

When I logon into the Pi it says:
Linux *** 3.10.25+ #622 PREEMPT Fri Jan 3 18:41:00 GMT 2014 armv6l


I assume that means its using the 3.10.25 kernel. This one has just started an Asteroids task which its estimating at 115 hours. Its running Wheezy and BOINC 7.0.27 at the moment. I was going to update it but I will see how it goes with this one.

The other ones that I have updated also claim to be running a 3.10.25+ PREEMPT kernel so it might be pointless in doing this test. All I did to upgrade them was an apt-get dist-upgrade after changing my apt sources.list, which probably why the kernel is the same on both.

To update the kernel, modules, sdk etc you do a sudo rpi-update && sudo reboot (assuming you have rpi-update installed), that will update the kernel, modules, sdk, etc, make sure you back your SDCard up before hand,
and to not contact any projects between updating and restoring backup, as that will abandon any work received. (scheduler contact seq will no longer line up, the workaround is to edit the client_state.xml and increase the <rpc_seqno> value to one or two above what the project server reports)

Claggy
ID: 54087 · Report as offensive
MarkJ
Volunteer tester
Help desk expert

Send message
Joined: 5 Mar 08
Posts: 272
Australia
Message 54092 - Posted: 13 May 2014, 10:49:06 UTC - in response to Message 54087.  

$ sudo rpi-update
sudo: rpi-update: command not found

No it isn't installed.
MarkJ
ID: 54092 · Report as offensive
Claggy

Send message
Joined: 23 Apr 07
Posts: 1112
United Kingdom
Message 54094 - Posted: 13 May 2014, 12:46:35 UTC - in response to Message 54093.  

$ sudo rpi-update
sudo: rpi-update: command not found

No it isn't installed.


not having a ras... PI myself
might try.

sudo apt-get update
sudo apt-get dist-upgrade
sudo rpi-update

maybe the last one should be sudo apt-get rpi-update


sudo apt-get install rpi-update will install it.

Claggy
ID: 54094 · Report as offensive
hartacus

Send message
Joined: 6 May 14
Posts: 11
Australia
Message 54109 - Posted: 13 May 2014, 23:17:41 UTC - in response to Message 54075.  

When I logon into the Pi it says:
Linux *** 3.10.25+ #622 PREEMPT Fri Jan 3 18:41:00 GMT 2014 armv6l


I assume that means its using the 3.10.25 kernel. This one has just started an Asteroids task which its estimating at 115 hours. Its running Wheezy and BOINC 7.0.27 at the moment. I was going to update it but I will see how it goes with this one.

The other ones that I have updated also claim to be running a 3.10.25+ PREEMPT kernel so it might be pointless in doing this test. All I did to upgrade them was an apt-get dist-upgrade after changing my apt sources.list, which probably why the kernel is the same on both.


Mine's running 3.12.18+ #679 PREEMPT Thu May 1 14:40:27 BST 2014 armv6l. Clearly the kernel was intended to have proper functioning preemption.

Through a remote GUI, I've set the client to run while the Pi is active (checkbox and removing the CPU usage threshhold) and to use 100% of the CPU, but am still getting hangs. Are there any other BOINC settings that might limit preemption (or am I barking up the wrong tree)?
ID: 54109 · Report as offensive
MarkJ
Volunteer tester
Help desk expert

Send message
Joined: 5 Mar 08
Posts: 272
Australia
Message 54111 - Posted: 14 May 2014, 11:45:20 UTC - in response to Message 54094.  

sudo apt-get install rpi-update will install it.

Claggy


Updated kernel, but still running Wheezy and BOINC 7.0.27 at the moment. Will let it continue its a@h task
Linux *** 3.12.19+ #682 PREEMPT Mon May 12 23:27:36 BST 2014 armv6l

MarkJ
ID: 54111 · Report as offensive
MarkJ
Volunteer tester
Help desk expert

Send message
Joined: 5 Mar 08
Posts: 272
Australia
Message 54123 - Posted: 15 May 2014, 11:17:05 UTC
Last modified: 15 May 2014, 11:20:01 UTC

Its been running overnight and all day, still running a@h task with 108 hours to go (estimated).
MarkJ
ID: 54123 · Report as offensive
hartacus

Send message
Joined: 6 May 14
Posts: 11
Australia
Message 54124 - Posted: 15 May 2014, 12:46:04 UTC - in response to Message 54111.  

Updated kernel, but still running Wheezy and BOINC 7.0.27 at the moment. Will let it continue its a@h task
Linux *** 3.12.19+ #682 PREEMPT Mon May 12 23:27:36 BST 2014 armv6l


Huh, they snuck in a kernel update. I've updated from 3.12.18+ to 3.12.19+ and will test.
ID: 54124 · Report as offensive
MarkJ
Volunteer tester
Help desk expert

Send message
Joined: 5 Mar 08
Posts: 272
Australia
Message 54137 - Posted: 16 May 2014, 11:06:14 UTC

Its still going. Its on 47% after 96 hours, so it looks like it will be another 100 or so hours to completion. I will upgrade it to Jessie and BOINC 7.2.42 and see if it can finish this task off.
MarkJ
ID: 54137 · Report as offensive
hartacus

Send message
Joined: 6 May 14
Posts: 11
Australia
Message 54142 - Posted: 16 May 2014, 15:34:27 UTC - in response to Message 54137.  

Interesting. The upgrade to kernel 3.12.19+ did nothing for me, it's still hanging. Keen to see how your upgrade affects things!
ID: 54142 · Report as offensive
MarkJ
Volunteer tester
Help desk expert

Send message
Joined: 5 Mar 08
Posts: 272
Australia
Message 54151 - Posted: 18 May 2014, 4:17:55 UTC
Last modified: 18 May 2014, 4:19:11 UTC

Well it crashed although I don't have any error messages from it. I had to power cycle it to get it going again.

Not sure if there's any kernel logging anywhere, if so where to look?
MarkJ
ID: 54151 · Report as offensive
Claggy

Send message
Joined: 23 Apr 07
Posts: 1112
United Kingdom
Message 54158 - Posted: 18 May 2014, 17:00:07 UTC - in response to Message 54142.  

Interesting. The upgrade to kernel 3.12.19+ did nothing for me, it's still hanging. Keen to see how your upgrade affects things!

I updated to another 3.12.19+ release later on Friday, that didn't help.

Claggy
ID: 54158 · Report as offensive
hartacus

Send message
Joined: 6 May 14
Posts: 11
Australia
Message 54164 - Posted: 19 May 2014, 0:12:56 UTC - in response to Message 54151.  

Well it crashed although I don't have any error messages from it. I had to power cycle it to get it going again.

Not sure if there's any kernel logging anywhere, if so where to look?


Haven't been able to find a log of it anywhere, but if you have a monitor attached and disable screen blanking/powerdown you can see it as the last thing displayed on the screen when it's frozen. This will disable blanking temporarily (google for permanent solutions):
setterm -blank 0 -powerdown 0
ID: 54164 · Report as offensive
MarkJ
Volunteer tester
Help desk expert

Send message
Joined: 5 Mar 08
Posts: 272
Australia
Message 54194 - Posted: 20 May 2014, 11:30:49 UTC

Crashed again. I'm going back to the backup I took of the SD card. Will update to Jessie and 7.2.42 and leave the kernel alone.
MarkJ
ID: 54194 · Report as offensive
Claggy

Send message
Joined: 23 Apr 07
Posts: 1112
United Kingdom
Message 54249 - Posted: 23 May 2014, 20:00:33 UTC - in response to Message 54194.  

Been setting up another SDCard with Jessie on it, but with 3.10.25+ kernel, still ongoing when I was there last,

Browsing Github, I've found the ticket for our Bug:

https://github.com/raspberrypi/linux/issues/600

I do have a serial cable, although I've never used it before, I'm able to do the last request without problem,
and I can reimage my new SDCard with Wheezy and the lastest firmware if necessary, and put Boinc on it.

Claggy
ID: 54249 · Report as offensive
1 · 2 · Next

Message boards : BOINC client : Raspberry Pi client causing kernel error and freeze

Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.