Compute error - SIGSEGV: segmentation violation

Message boards : Questions and problems : Compute error - SIGSEGV: segmentation violation
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Radjin

Send message
Joined: 17 Sep 19
Posts: 12
United States
Message 92864 - Posted: 20 Sep 2019, 16:18:53 UTC

I am sure others are having this issue or have, but a search of the forum didn’t pop up any recent posts or resolutions.

I decided to add my web server into my computers as it sits mostly idle, it’s a Debian 10(Buster) Linux box<details below>. First I tried the repository install with apt. Then talking to people on the Seti@Home forums they suggested using a Berkeley version they claimed would be more efficient and include everything needed. Once I got that running I was back to the same errors. A pretty knowledgeable friend had me look at a number of things and convinced me to return to the repository version which I did and am currently running.

Any help to get past or at least understand this would be appreciated.

My skill level on Linux is just enough to be dangerous so please be a bit more -verbose in explanations or how to do something.

Radjin~
======
A typical error as listed on my accounts/computers/tasks page:

Task 8058757637
Name blc11_2bit_guppi_58692_04223_HIP79568_0125.25756.0.21.44.68.vlar_0
Workunit 3657386096
Created 18 Sep 2019, 11:11:31 UTC
Sent 18 Sep 2019, 16:48:51 UTC
Report deadline 21 Nov 2019, 3:42:30 UTC
Received 18 Sep 2019, 16:56:17 UTC
Server state Over
Outcome Computation error
Client state Compute error
Exit status 11 (0x0000000B) Unknown error code
Computer ID 8816958
Run time
CPU time
Validate state Invalid
Credit 0.00
Device peak FLOPS 4.14 GFLOPS
Application version SETI@home v8 v8.00
x86_64-pc-linux-gnu
Stderr output
<core_client_version>7.14.2</core_client_version>
<![CDATA[
<message>
process got signal 11</message>
<stderr_txt>
SIGSEGV: segmentation violation

</stderr_txt>
]]>
======
My computer:

CPU type GenuineIntel
Intel(R) Core(TM) i7-2700K CPU @ 3.50GHz [Family 6 Model 42 Stepping 7]
Number of processors 8
Coprocessors ---
Virtualization None
Operating System Linux Debian
Debian GNU/Linux 10 (buster) [4.19.0-6-amd64|libc 2.28 (Debian GLIBC 2.28-10)]
BOINC version 7.14.2
Memory 31.3 GB
Cache 8192 KB
Swap space 15.89 GB
Total disk space 884.49 GB
Free Disk Space 751.27 GB
Measured floating point speed 4.14 billion ops/sec
Measured integer speed 63.45 billion ops/sec
Average upload rate 147 KB/sec
Average download rate 2478.34 KB/sec
Average turnaround time 0 days
Application details Show
Tasks 307
Number of times client has contacted server 37
Last time contacted server 20 Sep 2019, 11:08:20 UTC
Fraction of time BOINC is running 98.97%
While BOINC is running, fraction of time computing is allowed 100.00%
While is BOINC running, fraction of time GPU computing is allowed 100.00%
Task duration correction factor 1
ID: 92864 · Report as offensive
Profile Jord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15477
Netherlands
Message 92869 - Posted: 20 Sep 2019, 22:06:54 UTC - in response to Message 92864.  

From https://setiathome.berkeley.edu/forum_thread.php?id=84658&postid=2012358
Keith Myers wrote:
Sigsegv errors are usually caused by unstable cpu clocks or unstable memory clocks. Something is corrupting memory addresses. This a OS issue and not a BOINC or Seti issue.

He's right you know?
ID: 92869 · Report as offensive
Radjin

Send message
Joined: 17 Sep 19
Posts: 12
United States
Message 92871 - Posted: 20 Sep 2019, 23:53:19 UTC - in response to Message 92869.  
Last modified: 20 Sep 2019, 23:55:09 UTC

I am not disputing that it is a memory issue either corruption or fixed memory location that is out of range; everything I have read says that, however it only happens with seti/BOINC. I have also read that with the disabled vsyscall in later kernels this error is a known issue with BOINC in posts as late as 2018.

What I am asking is since this is a known issue, how does one diagnose the cause of the error, memory, OS, BOINC? How did others resolve the issue? I can find dozens of references to the issue, all with a BOINC project, but only two resolutions, where vsyscall was put into emulate mode.

This is not a bash post, it’s a call to the experts to help explain and resolve a problem that affects a number of people. At the moment all I have gotten is the equivalent of dump your computer and build a new one, dump your OS and do a clean install of this OS, don’t use the stable repository to install BOINC as recommended by the BOINC literature, install this custom package instead(which gave me the same error). I’m open to try new things except run unstable software.

At this point the only time I get this error is with BOINC. Is the reason that everyone goes silent on solving this issue because it’s unsolvable?
ID: 92871 · Report as offensive
Profile Jord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15477
Netherlands
Message 92873 - Posted: 21 Sep 2019, 1:50:09 UTC - in response to Message 92871.  
Last modified: 21 Sep 2019, 1:58:21 UTC

Actually, the time you get the error is with Seti, as it is their science application that runs the work and giving the error. BOINC is the managing software, it doesn't do any of the calculations, using of RAM or anything intensive. As you have shown in your other post, you can run BOINC until you press Ctrl+C without problems. So, BOINC isn't causing the SIGSEV error.

So what you can test is run another project. See if that project's science application(s) return the same error, and if it doesn't, it's something specific about Seti's application/work form that reacts this way on your system. Then you'll have to go back to them to work that out.
If another project is returning the same errors, it may be your hardware/OS anyway. Or the programming code used is similar to Seti's. The one I know that wildly differs is Climateprediction.net as they used to use Fortran for their code. See https://boinc.berkeley.edu/projects.php for a list of projects and if they have support for Linux.

You can also try different managers, such as Prime95+ or Folding@Home.
ID: 92873 · Report as offensive
Radjin

Send message
Joined: 17 Sep 19
Posts: 12
United States
Message 92874 - Posted: 21 Sep 2019, 2:19:40 UTC - in response to Message 92873.  
Last modified: 21 Sep 2019, 3:03:06 UTC

Thanks for the suggestions, I will add the project tonight to see what happens.
ID: 92874 · Report as offensive
Profile Dave
Help desk expert

Send message
Joined: 28 Jun 10
Posts: 2518
United Kingdom
Message 92877 - Posted: 21 Sep 2019, 5:38:59 UTC

Sigsegv errors are usually caused by unstable cpu clocks or unstable memory clocks. Something is corrupting memory addresses. This a OS issue and not a BOINC or Seti issue.


There have been batches of work in the past from CPDN when computers with very high success rates on completing tasks have produced over 25% of these errors. If SETI is producing a significant number of these on any particular type of task, it doesn't preclude a hardware problem but it certainly suggests that at the very least, these tasks stress the hardware in a way the other tasks don't.

My other argument against it being hardware is that the tasks on CPDN would mostly fail at the same point. E.G just before creation of first zip or end of first model day, almost always at a point that could be pinpointed.

So, Yes, checking with other projects that stress the hardware to the same level is a good idea.

I haven't seen it with CPDN for over a year now but unfortunately those at Oxford have had problems with the latest batch due to go out under Linux so I doubt there will be any work there before Monday at the earliest.
ID: 92877 · Report as offensive
floyd
Help desk expert

Send message
Joined: 23 Apr 12
Posts: 77
Message 92878 - Posted: 21 Sep 2019, 7:19:00 UTC - in response to Message 92871.  

What I am asking is since this is a known issue, how does one diagnose the cause of the error, memory, OS, BOINC? How did others resolve the issue? I can find dozens of references to the issue, all with a BOINC project, but only two resolutions, where vsyscall was put into emulate mode.
So you have identified a possible (and IMO very likely) cause, and you know a workaround. But you don't mention that you have tried it, or an outcome. What about that?
ID: 92878 · Report as offensive
Profile Dave
Help desk expert

Send message
Joined: 28 Jun 10
Posts: 2518
United Kingdom
Message 92879 - Posted: 21 Sep 2019, 8:00:33 UTC - in response to Message 92878.  

With CPDN, it stopped I think after a newer version of the particular model type so not a lot you can do at the user end. I can't comment on the Seti@home because I have never seen it with them.
ID: 92879 · Report as offensive
Radjin

Send message
Joined: 17 Sep 19
Posts: 12
United States
Message 92880 - Posted: 21 Sep 2019, 14:28:57 UTC - in response to Message 92878.  

What I am asking is since this is a known issue, how does one diagnose the cause of the error, memory, OS, BOINC? How did others resolve the issue? I can find dozens of references to the issue, all with a BOINC project, but only two resolutions, where vsyscall was put into emulate mode.
So you have identified a possible (and IMO very likely) cause, and you know a workaround. But you don't mention that you have tried it, or an outcome. What about that?


This is another interesting conundrum.

I added to grub: GRUB_CMDLINE_LINUX_DEFAULT="VSYSCALL=EMULATE"
sudo update-grub

Generating grub configuration file ...
Found linux image: /boot/vmlinuz-4.19.0-6-amd64
Found initrd image: /boot/initrd.img-4.19.0-6-amd64
Found linux image: /boot/vmlinuz-4.19.0-5-amd64
Found initrd image: /boot/initrd.img-4.19.0-5-amd64
Found memtest86+ image: /boot/memtest86+.bin
Found memtest86+ multiboot image: /boot/memtest86+_multiboot.bin
done
The grub update seemed to go ok.

cat /usr/src/linux-headers-$(uname -r)/.config | grep 

cat: /usr/src/linux-headers-4.19.0-6-amd64/.config: No such file or directory
For some reason I don’t get the confirmation of the emulation mode.
ID: 92880 · Report as offensive
Radjin

Send message
Joined: 17 Sep 19
Posts: 12
United States
Message 92881 - Posted: 21 Sep 2019, 14:34:37 UTC - in response to Message 92879.  

With CPDN, it stopped I think after a newer version of the particular model type so not a lot you can do at the user end. I can't comment on the Seti@home because I have never seen it with them.


I haven’t seen any work being downloaded but I will wait a week and see what happens.

Aside from trying different projects to see what happens, how can I test the possibility hardware issues? I see a memtest86+ but it comes with mixed reviews.
ID: 92881 · Report as offensive
Profile Joseph Stateson
Volunteer tester
Avatar

Send message
Joined: 27 Jun 08
Posts: 641
United States
Message 92884 - Posted: 21 Sep 2019, 17:44:46 UTC - in response to Message 92881.  

With CPDN, it stopped I think after a newer version of the particular model type so not a lot you can do at the user end. I can't comment on the Seti@home because I have never seen it with them.


I haven’t seen any work being downloaded but I will wait a week and see what happens.

Aside from trying different projects to see what happens, how can I test the possibility hardware issues? I see a memtest86+ but it comes with mixed reviews.



That memtest works fine. I have used in on latest Dell Area51 back to old dual opteron servers. Usually the ubuntu install comes with it.
ID: 92884 · Report as offensive
floyd
Help desk expert

Send message
Joined: 23 Apr 12
Posts: 77
Message 92885 - Posted: 21 Sep 2019, 17:57:24 UTC - in response to Message 92880.  

I added to grub: GRUB_CMDLINE_LINUX_DEFAULT="VSYSCALL=EMULATE"
Better make that "vsyscall=emulate". I wouldn't be surprised to see that the upper case version doesn't work.

The grub update seemed to go ok.
So it seems. Of course you rebooted after that?

For some reason I don’t get the confirmation of the emulation mode.
I'm not aware of a way to query the current mode. You could look at /proc/cmdline. And of course the best confirmation would be if your application didn't segfault any longer.

Be aware that vsyscall is just a run time parameter, it overrides the kernel default but doesn't change it permanently.
ID: 92885 · Report as offensive
Profile Keith Myers
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 17 Nov 16
Posts: 863
United States
Message 92886 - Posted: 21 Sep 2019, 19:13:50 UTC - in response to Message 92871.  


At this point the only time I get this error is with BOINC. Is the reason that everyone goes silent on solving this issue because it’s unsolvable?

In your case the answer is yes. Since you aren't open to any of the suggestions. Those posts you referenced were from years ago and have been solved by updating the OS or updating the science apps or updating BOINC. Disabling vsyscall was only needed at Einstein for a little while until they produced science apps compatible with the older distributions and put a computer preference in the science app selection in Project preferences.
OTHER SETTINGS
Run Linux app versions built with LIBC 2.15:
YESNO
This ensures compatibility with new Linux systems that have virtual syscalls disabled, but breaks compatibility with older systems with (G)LIBC prior to 2.15
ID: 92886 · Report as offensive
Radjin

Send message
Joined: 17 Sep 19
Posts: 12
United States
Message 92887 - Posted: 21 Sep 2019, 19:23:18 UTC - in response to Message 92885.  

I added to grub: GRUB_CMDLINE_LINUX_DEFAULT="VSYSCALL=EMULATE"
Better make that "vsyscall=emulate". I wouldn't be surprised to see that the upper case version doesn't work.

Replaced with lower case.

The grub update seemed to go ok.
So it seems. Of course you rebooted after that?

Yes

For some reason I don’t get the confirmation of the emulation mode.
I'm not aware of a way to query the current mode. You could look at /proc/cmdline. And of course the best confirmation would be if your application didn't segfault any longer.

What does
cat /usr/src/linux-headers-$(uname -r)/.config | grep VSYSCALL
tell me? I have seen this suggested in a number of posts where they received a reply of
cat /usr/src/linux-headers-$(uname -r)/.config | grep VSYSCALL 
CONFIG_GENERIC_TIME_VSYSCALL=y
CONFIG_X86_VSYSCALL_EMULATION=y
CONFIG_LEGACY_VSYSCALL_EMULATE=y
# CONFIG_LEGACY_VSYSCALL_NONE is not set

Be aware that vsyscall is just a run time parameter, it overrides the kernel default but doesn't change it permanently.

Yes, thank you. I understand it is a temporary thing.
ID: 92887 · Report as offensive
Radjin

Send message
Joined: 17 Sep 19
Posts: 12
United States
Message 92888 - Posted: 21 Sep 2019, 19:33:20 UTC - in response to Message 92886.  

OTHER SETTINGS
Run Linux app versions built with LIBC 2.15:
YESNO
This ensures compatibility with new Linux systems that have virtual syscalls disabled, but breaks compatibility with older systems with (G)LIBC prior to 2.15


This is a specific version of BOINC, or some sort of library I don’t already have? If I add/switch to this library, will it break the apt update process?
ID: 92888 · Report as offensive
floyd
Help desk expert

Send message
Joined: 23 Apr 12
Posts: 77
Message 92889 - Posted: 21 Sep 2019, 20:10:34 UTC - in response to Message 92887.  
Last modified: 21 Sep 2019, 20:12:19 UTC

What does
cat /usr/src/linux-headers-$(uname -r)/.config | grep VSYSCALL
tell me? I have seen this suggested in a number of posts where they received a reply of
cat /usr/src/linux-headers-$(uname -r)/.config | grep VSYSCALL 
CONFIG_GENERIC_TIME_VSYSCALL=y
CONFIG_X86_VSYSCALL_EMULATION=y
CONFIG_LEGACY_VSYSCALL_EMULATE=y
# CONFIG_LEGACY_VSYSCALL_NONE is not set
If you installed the kernel headers for your currently running kernel - $(uname -r) is the version string - this shows you how the kernel is configured regarding VSYSCALL. This is purely informational. In this example the vsyscall emulation is built in and enabled by default (VSYSCALL_EMULATE=y). For your kernel, it is built in and disabled by default (VSYSCALL_NONE=y). I suspect that's what the Seti application can't cope with, so you override it with the vsyscall=emulate boot parameter and then it's time for a test to see if we're on the right path.
ID: 92889 · Report as offensive
Les Bayliss
Help desk expert

Send message
Joined: 25 Nov 05
Posts: 1654
Australia
Message 92890 - Posted: 21 Sep 2019, 21:37:43 UTC - in response to Message 92881.  

Radjin

If you're going to try models from cpdn, you need to be aware that they are 32 bit, and sometimes needed libraries aren't installed by default.
This is the usual culprit:
libstdc++.so.6

If it's not there, the models will crash at about 6 seconds.
ID: 92890 · Report as offensive
Radjin

Send message
Joined: 17 Sep 19
Posts: 12
United States
Message 92891 - Posted: 21 Sep 2019, 21:40:55 UTC - in response to Message 92889.  
Last modified: 21 Sep 2019, 21:53:39 UTC

What does
cat /usr/src/linux-headers-$(uname -r)/.config | grep VSYSCALL
tell me? I have seen this suggested in a number of posts where they received a reply of
cat /usr/src/linux-headers-$(uname -r)/.config | grep VSYSCALL 
CONFIG_GENERIC_TIME_VSYSCALL=y
CONFIG_X86_VSYSCALL_EMULATION=y
CONFIG_LEGACY_VSYSCALL_EMULATE=y
# CONFIG_LEGACY_VSYSCALL_NONE is not set
If you installed the kernel headers for your currently running kernel - $(uname -r) is the version string - this shows you how the kernel is configured regarding VSYSCALL. This is purely informational. In this example the vsyscall emulation is built in and enabled by default (VSYSCALL_EMULATE=y). For your kernel, it is built in and disabled by default (VSYSCALL_NONE=y). I suspect that's what the Seti application can't cope with, so you override it with the vsyscall=emulate boot parameter and then it's time for a test to see if we're on the right path.


Thanks. A prior post suggested there may be no way to check if the option was activated. When I run:
cat /usr/src/linux-headers-$(uname -r)/.config | grep
I get:
cat: /usr/src/linux-headers-4.19.0-6-amd64/.config: No such file or directory
even though I should have activated it in grub with:
GRUB_CMDLINE_LINUX_DEFAULT="vsyscall=emulate"
and:
sudo update-grub
Generating grub configuration file ...
Found linux image: /boot/vmlinuz-4.19.0-6-amd64
Found initrd image: /boot/initrd.img-4.19.0-6-amd64
Found linux image: /boot/vmlinuz-4.19.0-5-amd64
Found initrd image: /boot/initrd.img-4.19.0-5-amd64
Found memtest86+ image: /boot/memtest86+.bin
Found memtest86+ multiboot image: /boot/memtest86+_multiboot.bin
done
sudo reboot

This is likely a moot point as some pretty knowledgeable people have told me the issue I am trying to resolve is likely not with the vsyscall at all. I am just trying all options in order of complexity given I am pretty much a noob learning as I go along. I can’t be certain I have activated the option if I don’t get the expected output when I use the cat command.

I have always run Debian Linux via command line, always used apt, and never had to step into the realm of compiling or updating outside of apt. So every time I get a suggestion beyond that realm I spend hours reading what I am doing and what has happened to others who did it. I’m quite thankful there are others out there who can help even if we noobs irritate them.
ID: 92891 · Report as offensive
Radjin

Send message
Joined: 17 Sep 19
Posts: 12
United States
Message 92894 - Posted: 21 Sep 2019, 21:59:41 UTC - in response to Message 92890.  
Last modified: 21 Sep 2019, 21:59:53 UTC

Radjin

If you're going to try models from cpdn, you need to be aware that they are 32 bit, and sometimes needed libraries aren't installed by default.
This is the usual culprit:
libstdc++.so.6

If it's not there, the models will crash at about 6 seconds.


Thanks for that piece and of information. I added CPDN just to test the vsyscall on something other than seti@home; I haven’t downloaded any work units as of yet; hearing there was some bug with creating them for Linux and there may be some ready this coming week. Nothing just works...
ID: 92894 · Report as offensive
floyd
Help desk expert

Send message
Joined: 23 Apr 12
Posts: 77
Message 92902 - Posted: 22 Sep 2019, 9:10:09 UTC - in response to Message 92891.  

A prior post suggested there may be no way to check if the option was activated. When I run:
cat /usr/src/linux-headers-$(uname -r)/.config | grep
I get:
cat: /usr/src/linux-headers-4.19.0-6-amd64/.config: No such file or directory
even though I should have activated it in grub with:
GRUB_CMDLINE_LINUX_DEFAULT="vsyscall=emulate"
and:
sudo update-grub
Most likely you don't have the kernel headers installed. You don't need them. If you wish you can do
grep VSYSCALL /boot/config-$(uname -r)
instead but both will only show you the default values compiled into the kernel, not the current state as you seem to think.

This is likely a moot point as some pretty knowledgeable people have told me the issue I am trying to resolve is likely not with the vsyscall at all.
They may be right. Another reason to verify my theory soon.

I am just trying all options in order of complexity
And this one - besides the expected effects matching what you see - is easily tested. One changed configuration line, one update command, all you need to do now is run BOINC as usual and see if something has changed. Before all your SETI tasks segfaulted immediately. If now you see one run for some minutes you most likely have identified the issue. Also I see you have another computer running Debian 10 at SETI. Activate that, if the problem is caused by vsyscall it must show the same errors.
ID: 92902 · Report as offensive
1 · 2 · Next

Message boards : Questions and problems : Compute error - SIGSEGV: segmentation violation

Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.