Trouble starting client daemon on openSuSE11.0

Message boards : Questions and problems : Trouble starting client daemon on openSuSE11.0
Message board moderation

To post messages, you must log in.

AuthorMessage
Marc Chamberlin
Avatar

Send message
Joined: 24 Nov 09
Posts: 14
United States
Message 28958 - Posted: 24 Nov 2009, 5:47:07 UTC

I have attempted to set up BOINC on an openSuSE 11.0 computer and I have ran into a snag... I am following the instructions at http://www.spy-hill.net/~myers/help/boinc/unix.html. I cannot launch the client daemon as it keeps telling me another BOINC process is already running. I KNOW that is not true, as there is no process running with the name boinc in it... So either there is some test that is failing, or this is a bogus error message and misleading me...

FYI, I changed the script for starting the BOINC client daemon slightly, in order to bring out this error better. Just before it actually starts the daemon process, there is a sleep command to pause the script for 4 second. If I leave that as is, then the script will exit and report success. BUT on doing a status check slightly later (boinc status) I will discover that the daemon as stopped running and an error message is produced in the error log file saying another process was already running... I change that 4 second pause to 20 seconds, and now the attempt to start the BOINC client daemon also reports a failure. So to me this is a strong indication that something is going wrong when the startproc is actually called and attempts to start the BOINC client daemon.

I have also check under /var/lock/subsys for a boinc lock file, and if I find it, I delete it. I have noted that the lock file may not be deleted sometimes but I dunno why. I dunno how to debug this any further. Any and all help/suggestions made will be much appreciated! Thanks in advance.

Marc Chamberlin
ID: 28958 · Report as offensive
Profile KSMarksPsych
Avatar

Send message
Joined: 30 Oct 05
Posts: 1239
United States
Message 28962 - Posted: 24 Nov 2009, 10:08:33 UTC

I'll forward this thread to Eric who wrote the scripts.

I'm afraid I only know how to follow the instructions, not debug it.
Kathryn :o)
ID: 28962 · Report as offensive
Eric Myers
Avatar

Send message
Joined: 12 Feb 06
Posts: 232
United States
Message 28968 - Posted: 24 Nov 2009, 14:37:33 UTC - in response to Message 28962.  

Marc, did you copy the working directory from another machine?
That would include files that make it think the client is already running.

Or it could be a lock file in one of several places, depending on the
Linux distro. You can run the init script with "stop" and even if the
client is not running it should clear out any old lock files. Then give it "start" to restart it. So something like

# /etc/init.d/boinc stop
# /etc/init.d/boinc start

-Eric

-- Eric Myers

"Education is not the filling of a pail, but the lighting of a fire." -- William Butler Yeats
ID: 28968 · Report as offensive
Marc Chamberlin
Avatar

Send message
Joined: 24 Nov 09
Posts: 14
United States
Message 28973 - Posted: 24 Nov 2009, 17:40:50 UTC - in response to Message 28968.  

Thanks Eric for your quick reply again.

HeHe I had to smile after reading your first thought about copying files from another system. I had actually done that initially (on another computer, trying to save time of course) and then discovered it did not work because I had copied files from an x64 bit architecture to a x32 bit system. Oops! Not a good idea! So after that I have been downloading the install shell script for each system, including the one I am having troubles with. (it is an x64 bit architecture also) So no, this is from an install done from scratch.

As for doing a stop and start of the daemon process, I have not been doing it the way you said, i.e. explicitly, but via the restart command which effectively should be the same as doing a stop then a start. And yes, I have discovered from reading the scripts, that the stop command should be clearing all the lock files automatically. My earlier comment about finding a lock file still around was most likely from my checking, right after trying to start the daemon, finding it had died, and then I was checking to see if the lock file was around. Doing the stop (or restart) does clear the lock files so you can ignore that comment of mine. Doing an explicit stop then start makes no difference.

I did a diff to compare the various files on my SuSE11.0 x64 bit system which is failing, with the files on another SuSE11.0 x64 bit system where BOINC is working. No differences! I also augmented the boinc daemon script so as to trace it, but don't seen anything obvious. Yet something is happening in the startproc call to run the actual boinc client, which seems to believe it is already running and therefore failing to let the daemon run.

So, my questions are - How does the boinc client test to see if another instance of itself is running? Unless it is doing something wrong, or out of sequence, I don't see how it could be examining the list of running processes because when I do something like - ps aux | grep boinc it simply is not showing up. Looking for lock files? Maybe but I dunno where one might exist that the scripts don't know about... Or could this be some sort of fall through error message (best guess) and thus bogus?

Marc...


ID: 28973 · Report as offensive
Marc Chamberlin
Avatar

Send message
Joined: 24 Nov 09
Posts: 14
United States
Message 28974 - Posted: 24 Nov 2009, 19:30:07 UTC - in response to Message 28973.  

HA! Found the problem, and
fixed
it!


Nasty little bug, and the error message about "Another instance of BOINC is running" is bogus and very misleading. This should be addressed in the client code and fixed...

I decided to get even more serious about debugging this problem and did a system trace (strace) call on the boinc client program to see if that would revel anything. It did! I had created the boinc working directory - /var/lib/boinc - as root (since only root has write permissions in the /var/lib directory. This gave only root write permissions on the boinc directory. strace revealed that the user boinc was trying to open a file called lockfile in the working directory. And since boinc did not have permission, it fails.

I would assume that the boinc client code is written in such a way, as to assume that any failure on accessing this particular lockfile, results in generating a fall through error message, which assumes the failure was caused by another instance of BOINC already running. THIS IS WRONG and the code should be rewritten so as to produce a more informative message about the failure to open the lockfile.

Changing the ownership and permissions of the boinc working directory corrected this problem, but IMHO the real fix lies in a better informative error message and that must be done in the code itself. Something I as a user cannot do. If there is a bugzilla or bug tracking system for BOINC, let me know and I will report this there...

Thanks again Eric for your help, you made me think hard! LOL

Marc...
ID: 28974 · Report as offensive
Eric Myers
Avatar

Send message
Joined: 12 Feb 06
Posts: 232
United States
Message 29002 - Posted: 25 Nov 2009, 23:29:27 UTC - in response to Message 28974.  

Marc Chamberlin wrote:

Changing the ownership and permissions of the boinc working directory corrected this problem, but IMHO the real fix lies in a better informative error message and that must be done in the code itself.

I agree, this is certainly worthy of a ticket, if there is not one already. I now remember getting bitten by this myself a couple of years ago, but didn't remember it until you found the answer.

Perhaps one of us who has a Trac account could scan to see if there is already a ticket for this? And if not, create one. I'll try if I get some time to do it before someone beats me to it.


-- Eric Myers

"Education is not the filling of a pail, but the lighting of a fire." -- William Butler Yeats
ID: 29002 · Report as offensive
Eric Myers
Avatar

Send message
Joined: 12 Feb 06
Posts: 232
United States
Message 30373 - Posted: 30 Dec 2009, 17:46:36 UTC - in response to Message 29002.  

I finally had some time to catch up on some tickets. This one is #970 - "ERR_ALREADY_RUNNING is misleading if it's just a file permission problem"
-- Eric Myers

"Education is not the filling of a pail, but the lighting of a fire." -- William Butler Yeats
ID: 30373 · Report as offensive

Message boards : Questions and problems : Trouble starting client daemon on openSuSE11.0

Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.