Boinc breaking down / reinstall needed

Message boards : Questions and problems : Boinc breaking down / reinstall needed
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
JeromeC

Send message
Joined: 13 Oct 10
Posts: 118
France
Message 40423 - Posted: 28 Sep 2011, 21:34:27 UTC
Last modified: 28 Sep 2011, 21:35:32 UTC

Hi,

I have faced this issue for some time now with my iMac : at some point I have a connexion error to the client and then impossible to restart it, I'm forced to reinstall to have boinc running.

I've been a beta tester with Test4Theory, they have a very specific script that changes permissions on the mac to allow boinc to run their Virtual Machine, it actually changes the users from boinc_master / boinc_project to one own user (if i understood well), after that i can see all boinc tasks running under my user actually.

So I thought the problem was linked to their system and had mentioned this in their forum, but nothing came out about the issue. It didn't happen for some (long) time and i thought all was ok, "they must have fixed something", but then today it happened again : boinc manager lost connection, impossible to restart, error.


stdoutdae.txt says :
-----------------
28-Sep-2011 19:46:01 [malariacontrol.net] Sending scheduler request: To fetch work.
28-Sep-2011 19:46:01 [malariacontrol.net] Reporting 3 completed tasks, requesting new tasks for CPU
28-Sep-2011 19:46:04 [malariacontrol.net] Scheduler request completed: got 2 new tasks
28-Sep-2011 19:46:06 [malariacontrol.net] Started download of wu_1177_417_42697_0_1317228015
28-Sep-2011 19:46:06 [malariacontrol.net] Started download of wu_1200_29_42698_0_1317228128
28-Sep-2011 19:46:07 [malariacontrol.net] Finished download of wu_1177_417_42697_0_1317228015
28-Sep-2011 19:46:08 [malariacontrol.net] Finished download of wu_1200_29_42698_0_1317228128
28-Sep-2011 19:49:22 [Leiden Classical] Computation for task wu_78596477_1316764335_9219_1 finished
28-Sep-2011 19:49:22 [malariacontrol.net] Starting task wu_1200_29_42698_0_1317228128_1 using openMalariaB version 657
28-Sep-2011 19:49:24 [Leiden Classical] Started upload of wu_78596477_1316764335_9219_1_0
28-Sep-2011 19:49:24 [Leiden Classical] Started upload of wu_78596477_1316764335_9219_1_1
28-Sep-2011 19:49:25 [Leiden Classical] Finished upload of wu_78596477_1316764335_9219_1_0
28-Sep-2011 19:49:25 [Leiden Classical] Finished upload of wu_78596477_1316764335_9219_1_1
28-Sep-2011 20:06:22 [Leiden Classical] Computation for task wu_898976128_1316764334_7041_1 finished
28-Sep-2011 20:06:22 [Leiden Classical] Starting task wu_78596477_1316764335_7530_0 using classical version 556
28-Sep-2011 20:06:25 [Leiden Classical] Started upload of wu_898976128_1316764334_7041_1_0
28-Sep-2011 20:06:25 [Leiden Classical] Started upload of wu_898976128_1316764334_7041_1_1
28-Sep-2011 20:06:25 [---] Can't open client_state_next.xml: fopen() failed
28-Sep-2011 20:06:25 [---] Couldn't write state file: fopen() failed; giving up
File ownership or permissions are set in a way that
does not allow sandboxed execution of BOINC applications.
To use BOINC anyway, use the -insecure command line option.
To change ownership/permission, reinstall BOINC or run
the shell script Mac_SA_Secure.sh. (Error code -1202)
File ownership or permissions are set in a way that
does not allow sandboxed execution of BOINC applications.
To use BOINC anyway, use the -insecure command line option.
To change ownership/permission, reinstall BOINC or run
the shell script Mac_SA_Secure.sh. (Error code -1024)
File ownership or permissions are set in a way that
does not allow sandboxed execution of BOINC applications.
To use BOINC anyway, use the -insecure command line option.
To change ownership/permission, reinstall BOINC or run
the shell script Mac_SA_Secure.sh. (Error code -1024)


stderrdae.txt says :
-----------------
GetMACAddress returned 0x00000005
SIGPIPE: write on a pipe with no reader
SIGPIPE: write on a pipe with no reader
SIGPIPE: write on a pipe with no reader
SIGPIPE: write on a pipe with no reader
SIGPIPE: write on a pipe with no reader
ftok: No such file or directory
md5_file: can't open projects/abcathome.com/abc_sieve_2.10_x86_64-apple-darwin
md5_file: Too many open files
error: verify_file: md5_file error -108
Permissions error -1202 at /Library/Application Support/BOINC Data/slots/1/stderrgfx.txt
Another instance of BOINC is running.
SIGPIPE: write on a pipe with no reader
md5_file: can't open projects/wuprop.boinc-af.org/wu_v3_1314843905_30353_0_0
md5_file: Too many open files
Permissions error -1202 at /Library/Application Support/BOINC Data/slots/4/stderrgfx.txt
Permissions error -1202 at /Library/Application Support/BOINC Data/slots/4/stderrgfx.txt
Another instance of BOINC is running.
Permissions error -1202 at /Library/Application Support/BOINC Data/slots/3/stderrgfx.txt


stderrgui.txt says :
---------------
Exiting...
Permissions error -1202 at /Library/Application Support/BOINC Data/slots/12/stderrgfx.txt
Permissions error -1202 at /Library/Application Support/BOINC Data/slots/12/stderrgfx.txt
SIGPIPE: write on a pipe with no reader
SIGPIPE: write on a pipe with no reader
Permissions error -1202 at /Library/Application Support/BOINC Data/slots/1/stderrgfx.txt
Permissions error -1202 at /Library/Application Support/BOINC Data/slots/5/stderrgfx.txt
Permissions error -1202 at /Library/Application Support/BOINC Data/slots/0/stderrgfx.txt
Permissions error -1202 at /Library/Application Support/BOINC Data/slots/4/stderrgfx.txt
SIGPIPE: write on a pipe with no reader
SIGPIPE: write on a pipe with no reader
SIGPIPE: write on a pipe with no reader
SIGPIPE: write on a pipe with no reader
Permissions error -1202 at /Library/Application Support/BOINC Data/slots/6/stderrgfx.txt
Permissions error -1202 at /Library/Application Support/BOINC Data/slots/6/stderrgfx.txt
SIGPIPE: write on a pipe with no reader
SIGPIPE: write on a pipe with no reader
SIGPIPE: write on a pipe with no reader
Permissions error -1202 at /Library/Application Support/BOINC Data/slots/4/stderrgfx.txt
SIGPIPE: write on a pipe with no reader
SIGPIPE: write on a pipe with no reader
SIGPIPE: write on a pipe with no reader
SIGPIPE: write on a pipe with no reader
SIGPIPE: write on a pipe with no reader
Permissions error -1202 at /Library/Application Support/BOINC Data/slots/3/stderrgfx.txt


Since the 1st log talks about Mac_SA_Secure.sh, I got the latest version on boinc server and tried it, it seemed to run well but it didn't fix the issue, i had to reinstall.


Thanks for your help and suggestions.


My config details :
----------------

Version du système : Mac OS X 10.6.8 (10K549)
Version du noyau : Darwin 10.8.0

Nom du modèle : iMac
Identifiant du modèle : iMac11,1
Nom du processeur : Intel Core i7
Vitesse du processeur : 2,8 GHz
Nombre de processeurs : 1
Nombre total de cœurs : 4
Cache de niveau 2 (par cœur) : 256 Ko
Cache de niveau 3 : 8 Mo
Mémoire : 8 Go

Macintosh HD :
Capacité : 2 To (2 000 054 960 128 octets)
Disponible : 1,31 To (1 312 226 381 824 octets)
Inscriptible : Oui
Système de fichiers : HFS+ journalisé
Nom BSD : disk0s2
Point de montage : /


My boinc details :
---------------
Mer 28 sep 22:44:50 2011 | | Starting BOINC client version 6.12.35 for x86_64-apple-darwin
Mer 28 sep 22:44:50 2011 | | Config: GUI RPC allowed from any host
Mer 28 sep 22:44:50 2011 | | log flags: file_xfer, sched_ops, task
Mer 28 sep 22:44:50 2011 | | Libraries: libcurl/7.19.7 OpenSSL/0.9.7l zlib/1.2.3 c-ares/1.6.0
Mer 28 sep 22:44:50 2011 | | Data directory: /Library/Application Support/BOINC Data
Mer 28 sep 22:44:50 2011 | | Processor: 8 GenuineIntel Intel(R) Core(TM) i7 CPU 860 @ 2.80GHz [x86 Family 6 Model 30 Stepping 5]
Mer 28 sep 22:44:50 2011 | | Processor features: FPU VME DE PSE TSC MSR PAE MCE CX8 APIC SEP MTRR PGE MCA CMOV PAT PSE36 CLFSH DS ACPI MMX FXSR SSE SSE2 SS HTT TM PBE SSE3 DTES64 MON DSCPL VMX SMX EST TM2 SSSE3 CX16 TPR PDCM SSE4.1 SSE4.2 POPCNT
Mer 28 sep 22:44:50 2011 | | OS: Mac OS X 10.6.8 (Darwin 10.8.0)
Mer 28 sep 22:44:50 2011 | | Memory: 8.00 GB physical, 1.19 TB virtual
Mer 28 sep 22:44:50 2011 | | Disk: 1.82 TB total, 1.19 TB free
Mer 28 sep 22:44:50 2011 | | Local time is UTC +2 hours
Mer 28 sep 22:44:50 2011 | | VirtualBox version: 4.1.2
Mer 28 sep 22:44:50 2011 | | No usable GPUs found
Mer 28 sep 22:44:50 2011 | ABC@home | URL http://abcathome.com/; Computer ID 113645; resource share 10
Mer 28 sep 22:44:50 2011 | Constellation | URL http://aerospaceresearch.net/constellation/; Computer ID 2772; resource share 100
Mer 28 sep 22:44:50 2011 | wanless2 | URL http://bearnol.is-a-geek.com/wanless2/; Computer ID 14411; resource share 10
Mer 28 sep 22:44:50 2011 | rosetta@home | URL http://boinc.bakerlab.org/rosetta/; Computer ID 1349340; resource share 200
Mer 28 sep 22:44:50 2011 | Poem@Home | URL http://boinc.fzk.de/poem/; Computer ID 65114; resource share 200
Mer 28 sep 22:44:50 2011 | Leiden Classical | URL http://boinc.gorlaeus.net/; Computer ID 73340; resource share 200
Mer 28 sep 22:44:50 2011 | Evo@Home | URL http://boinc.run.montefiore.ulg.ac.be/evo/; Computer ID 1; resource share 1000
Mer 28 sep 22:44:50 2011 | Test4Theory@Home | URL http://boinc01.cern.ch/test4theory/; Computer ID 443; resource share 1000
Mer 28 sep 22:44:50 2011 | boincsimap | URL http://boincsimap.org/boincsimap/; Computer ID 166736; resource share 200
Mer 28 sep 22:44:50 2011 | climateprediction.net | URL http://climateprediction.net/; Computer ID 1140665; resource share 100
Mer 28 sep 22:44:50 2011 | Einstein@Home | URL http://einstein.phys.uwm.edu/; Computer ID 3550876; resource share 100
Mer 28 sep 22:44:50 2011 | NFS@Home | URL http://escatter11.fullerton.edu/nfs/; Computer ID 13976; resource share 10
Mer 28 sep 22:44:50 2011 | Milkyway@home | URL http://milkyway.cs.rpi.edu/milkyway/; Computer ID 145860; resource share 100
Mer 28 sep 22:44:50 2011 | ibercivis | URL http://registro.ibercivis.es/; Computer ID 140921; resource share 500
Mer 28 sep 22:44:50 2011 | SETI@home | URL http://setiathome.berkeley.edu/; Computer ID 5297103; resource share 50
Mer 28 sep 22:44:50 2011 | WUProp@Home | URL http://wuprop.boinc-af.org/; Computer ID 40; resource share 100
Mer 28 sep 22:44:50 2011 | Enigma@Home | URL http://www.enigmaathome.net/; Computer ID 23753; resource share 10
Mer 28 sep 22:44:50 2011 | malariacontrol.net | URL http://www.malariacontrol.net/; Computer ID 154129; resource share 200
Mer 28 sep 22:44:50 2011 | primaboinca | URL http://www.primaboinca.com/; Computer ID 4432; resource share 100
Mer 28 sep 22:44:50 2011 | PrimeGrid | URL http://www.primegrid.com/; Computer ID 139192; resource share 10
Mer 28 sep 22:44:50 2011 | yoyo@home | URL http://www.rechenkraft.net/yoyo/; Computer ID 36124; resource share 10
Mer 28 sep 22:44:50 2011 | World Community Grid | URL http://www.worldcommunitygrid.org/; Computer ID 1178424; resource share 500
Mer 28 sep 22:44:50 2011 | | General prefs: from http://bam.boincstats.com/ (last modified 21-Jun-2011 06:48:53)
Mer 28 sep 22:44:50 2011 | | Computer location: home
Mer 28 sep 22:44:50 2011 | | General prefs: no separate prefs for home; using your defaults
Mer 28 sep 22:44:50 2011 | | Reading preferences override file
Mer 28 sep 22:44:50 2011 | | Preferences:
Mer 28 sep 22:44:50 2011 | | max memory usage when active: 7372.80MB
Mer 28 sep 22:44:50 2011 | | max memory usage when idle: 7782.40MB
Mer 28 sep 22:44:50 2011 | | max disk usage: 20.00GB
Mer 28 sep 22:44:50 2011 | | don't use GPU while active
Mer 28 sep 22:44:50 2011 | | (to change preferences, visit the web site of an attached project, or select Preferences in the Manager)
Mer 28 sep 22:44:50 2011 | | Not using a proxy
ID: 40423 · Report as offensive
Profile Jord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15484
Netherlands
Message 40424 - Posted: 28 Sep 2011, 22:04:49 UTC - in response to Message 40423.  
Last modified: 28 Sep 2011, 22:21:35 UTC

Are you sure it's T4T doing this changing of user account/permissions, not that you installed BOINC as a daemon first and didn't do so on an upgrade?

I mean, only when BOINC is installed as or runs as daemon, does it use the boinc_master/boinc_project accounts. When you run it as is, it runs under your own account. Even a BOINC that is installed as a daemon can be run as normal (not daemon), when you just start up the client first and the BOINC Manager second, or not start BOINC with the --daemon attribute.

By the way, it's not possible to run T4T as a daemon under any of the platforms, as that'll fail all tasks running in the virtual machine due to permission problems.

Edit: Oh, I found the old thread. It looks like you're telling the truth. I'm playing this through the developers.
ID: 40424 · Report as offensive
Profile Jord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15484
Netherlands
Message 40427 - Posted: 28 Sep 2011, 23:04:06 UTC - in response to Message 40424.  

OK, I did get an answer from the developers. They know about this script and they share my concerns about the security of that script. However, they've also got a solution for the problem, which will be installed on the T4T project in October, when Daniel is back from honeymoon.

This fix requires that the project installs the newer server software and uses the latest vboxwrapper application. A big change that they don't want to do while Daniel is away.

Now then, this doesn't fix your immediate problem. And I'm not sure how come that the script breaks your BOINC. You may, just for the moment, put this project on NNT, or detach from it. Then wait for T4T to make the necessary changes, before continuing to test for them, with the newer wrapper etc.
ID: 40427 · Report as offensive
JeromeC

Send message
Joined: 13 Oct 10
Posts: 118
France
Message 40438 - Posted: 29 Sep 2011, 14:46:48 UTC

Thanks a lot for the quick research and update with fresh info !

Good to know that someone is taking care about that, and they will implement a cleaner solution soon... though the strange thing is I'm not 100% it's because of T4T that this is happening, you can see in the log above that boinc was in the middle of a Leiden Classical download when the error happened...

But since, this clearly started to happen after I started to test T4T...
ID: 40438 · Report as offensive
Profile Jord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15484
Netherlands
Message 40444 - Posted: 29 Sep 2011, 16:52:28 UTC - in response to Message 40438.  

though the strange thing is I'm not 100% it's because of T4T that this is happening, you can see in the log above that boinc was in the middle of a Leiden Classical download when the error happened...

Or that's the only thing you see. It may well be that T4T tried to start up but fell flat on the 'wrong' ownership and took your BOINC out with it.

The most difficult thing when trying to reproduce the problem is to get the exact same circumstances to happen. You can only try. ;-)

So why not test running without T4T for a while, where that while is any time you decide for yourself. Then run with T4T for a same amount of time. See if anything triggers the problem to return. What you can do is add some debug flags that add some information about things now happening silently.

I suggest using the following cc_config.xml file in your BOINC Data directory:
<cc_config>
<log_flags> 
<checkpoint_debug>1</checkpoint_debug>
<cpu_sched>1</cpu_sched>
<file_xfer_debug>1</file_xfer_debug>
<sched_op_debug>1</sched_op_debug>
</log_flags>
</cc_config>


Save with a clear text editor as ANSI formatted file (NOT as Word 2003 XML file! This adds stuff to the file that makes it illegible for BOINC. BOINC its XML is at this point still specially designed for BOINC only. It's not real XML.), make sure it has only got the .xml extension. If it also got a .txt extension, rename the file so it is called cc_config.xml exactly. Save it to your BOINC Data directory, which according to your own start-up messages is at /Library/Application Support/BOINC Data/

When saved, open BOINC Manager->Advanced view->Advanced->Read config file. That'll start the flags. You'll see a lot of extra information in your event log now.

Let's hope this catches your crash in the act. :-)
ID: 40444 · Report as offensive
JeromeC

Send message
Joined: 13 Oct 10
Posts: 118
France
Message 40452 - Posted: 29 Sep 2011, 18:07:04 UTC
Last modified: 29 Sep 2011, 18:07:39 UTC

I already had a cc_config.xml to use this option (I think I had mentioned it in the other topic you discovered) :

<options>
<allow_remote_gui_rpc>1</allow_remote_gui_rpc>
</options>

so I added the debug flags you mention, I'm gonna have a hell of a log file now !!! (especially because of the checkpoints, I can see T4T and primaboinc are doinc one almost every second !!!)

Since I hadn't had the issue with boinc for a long time (since july) I'd rather do the test the other way : i'm gonna crunch with T4T for a while and if I have the issue again after some time (to be decided) I'll stop T4T and see what happens without it...
ID: 40452 · Report as offensive
Profile Jord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15484
Netherlands
Message 40453 - Posted: 29 Sep 2011, 18:15:53 UTC - in response to Message 40452.  
Last modified: 29 Sep 2011, 20:24:28 UTC

so I added the debug flags you mention, I'm gonna have a hell of a log file now !!! (especially because of the checkpoints, I can see T4T and primaboinc are doinc one almost every second !!!)

In case you worry about the log not enduring the night, you can increase the size of stdoutdae.txt by adding the following line to cc_config.xml, in the <options> section. I'll write the cc_config.xml file lines, as if..

<cc_config>
<options> 
<max_stdout_file_size>12000000</max_stdout_file_size>
</options> 
</cc_config>

This will make stdoutdae.txt 12,000,000 bytes big (or just under 12MB). You can fill in any value, of course. Size is in bytes. Default is 2097152 (2MB). (Thanks Richard for teaching me math. :-))

Don't forget to do the re-read of the config file after adding the line to the <options> section. :-)

And thanks for testing. Fingers crossed you crash again. :-)
ID: 40453 · Report as offensive
JeromeC

Send message
Joined: 13 Oct 10
Posts: 118
France
Message 40459 - Posted: 29 Sep 2011, 21:47:53 UTC

Good to know !

Done :)

Thx
ID: 40459 · Report as offensive
JeromeC

Send message
Joined: 13 Oct 10
Posts: 118
France
Message 40546 - Posted: 5 Oct 2011, 6:19:38 UTC

Ooops it did it again : boinc was not active anymore since yesterday evening.

Boinc has no more task running, but it says it has...

The very last row in stdoutdae.txt is

04-Oct-2011 18:24:28 [Test4Theory@Home] [checkpoint] result uc_1316531402_49376_0 checkpointed

and the only other data file dated the same hour than this last activity is

sched_request_boincsimap.org_boincsimap.xml

BUT now it's different, cause I restart boinc, and it's working... so it started with the same symptoms, but it seems to be a different issue.
ID: 40546 · Report as offensive
Profile Jord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15484
Netherlands
Message 40547 - Posted: 5 Oct 2011, 6:44:18 UTC - in response to Message 40546.  

Sanity check. Did you check with "show processes of all users" (if you have that choice), to see if the science apps weren't running under another account not called Jerome?
ID: 40547 · Report as offensive
JeromeC

Send message
Joined: 13 Oct 10
Posts: 118
France
Message 40549 - Posted: 5 Oct 2011, 13:20:02 UTC

On Mac OS X you see all processes from all user with the tool, and yes my i7 was close to 100% and not the 800% cumulative % (that's the way the show CPU use, each core has 100%) that it should be using when running all projects, and the strange thing is you can see in the screenshot that all projects are unloaded but boinc is using 87% of a core by itself, just like when it's doing a CPU check (almost : it's doing with 8 boinc instances when doing the check on Mac OS X), but it was stuck in there - besides it should not be doing a CPU check when it has started some - long - time ago, shouldn't it ?

ID: 40549 · Report as offensive
JeromeC

Send message
Joined: 13 Oct 10
Posts: 118
France
Message 40550 - Posted: 5 Oct 2011, 21:31:51 UTC

ARG I had a kernel panic... it was a long time since I had this pleasure on my mac... after restarting, "authorizations not setup properly, please reinstall boinc..."

I think the crash was not boinc related (I was install an upgrade of Parallels Desktop, always a heavy install... plus doing several other things at the same time... plus boinc...) so now I have to resintall / T4T script.

Just for the record, last words of boinc (before crash I assume) were :

05-Oct-2011 23:20:05 [Test4Theory@Home] [checkpoint] result uc_1316531402_57056_0 checkpointed
ID: 40550 · Report as offensive
JeromeC

Send message
Joined: 13 Oct 10
Posts: 118
France
Message 40610 - Posted: 9 Oct 2011, 8:48:51 UTC
Last modified: 9 Oct 2011, 8:51:02 UTC

Boinc breaking down again :

08-Oct-2011 20:51:30 [Test4Theory@Home] [checkpoint] result uc_1316531402_67196_0 checkpointed
08-Oct-2011 20:51:31 [Test4Theory@Home] [checkpoint] result uc_1316531402_67196_0 checkpointed
08-Oct-2011 20:51:33 [Test4Theory@Home] [checkpoint] result uc_1316531402_67196_0 checkpointed
08-Oct-2011 20:51:35 [Test4Theory@Home] [checkpoint] result uc_1316531402_67196_0 checkpointed
08-Oct-2011 20:51:36 [climateprediction.net] Can't get task disk usage: opendir() failed
08-Oct-2011 20:51:36 [Test4Theory@Home] Can't get task disk usage: opendir() failed
08-Oct-2011 20:51:36 [World Community Grid] Can't get task disk usage: opendir() failed
08-Oct-2011 20:51:36 [SETI@home] Can't get task disk usage: opendir() failed
08-Oct-2011 20:51:36 [ibercivis] Can't get task disk usage: opendir() failed
08-Oct-2011 20:51:36 [rosetta@home] Can't get task disk usage: opendir() failed
08-Oct-2011 20:51:36 [ibercivis] Can't get task disk usage: opendir() failed
08-Oct-2011 20:51:36 [SETI@home] Can't get task disk usage: opendir() failed
08-Oct-2011 20:51:36 [WUProp@Home] Can't get task disk usage: opendir() failed
08-Oct-2011 20:51:37 [Test4Theory@Home] [checkpoint] result uc_1316531402_67196_0 checkpointed
08-Oct-2011 20:51:38 [Test4Theory@Home] [checkpoint] result uc_1316531402_67196_0 checkpointed
08-Oct-2011 20:51:39 [Test4Theory@Home] [checkpoint] result uc_1316531402_67196_0 checkpointed
08-Oct-2011 20:51:39 [ibercivis] [checkpoint] result Mr.Wilson_07_19_51_44_794869011_2669028199_0 checkpointed
08-Oct-2011 20:51:40 [Test4Theory@Home] [checkpoint] result uc_1316531402_67196_0 checkpointed
08-Oct-2011 20:51:40 [ibercivis] Computation for task Mr.Wilson_07_19_51_44_794869011_2669028199_0 finished
08-Oct-2011 20:51:40 [malariacontrol.net] Failed to open init file slots/7/init_data.xml
08-Oct-2011 20:51:40 [malariacontrol.net] Signature verification error for openMalariaB_6.57_i686-apple-darwin
08-Oct-2011 20:51:40 [malariacontrol.net] [sched_op] Deferring communication for 1 min 28 sec
08-Oct-2011 20:51:40 [malariacontrol.net] [sched_op] Reason: Unrecoverable error for task wu_1183_503_50752_0_1318091353_0 (couldn't start Can't write init file: -108: -108)
08-Oct-2011 20:51:40 [---] Can't open client_state_next.xml: fopen() failed
08-Oct-2011 20:51:40 [---] Couldn't write state file: fopen() failed; giving up
File ownership or permissions are set in a way that
does not allow sandboxed execution of BOINC applications.
To use BOINC anyway, use the -insecure command line option.
To change ownership/permission, reinstall BOINC or run
the shell script Mac_SA_Secure.sh. (Error code -1024)
File ownership or permissions are set in a way that
does not allow sandboxed execution of BOINC applications.
To use BOINC anyway, use the -insecure command line option.
To change ownership/permission, reinstall BOINC or run
the shell script Mac_SA_Secure.sh. (Error code -1024)


What the hell is happening ?! what should I do ??


(needless to say, I cannot start it again, have to resinstall it, etc)
ID: 40610 · Report as offensive
Profile Jord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15484
Netherlands
Message 40690 - Posted: 17 Oct 2011, 13:35:41 UTC - in response to Message 40610.  
Last modified: 17 Oct 2011, 13:37:29 UTC

The developers still haven't come back to me on this, other than that they suspect it to be some permissions problem between BOINC and T4T, but that was something you'd figured out already. ;-)

I'll keep pushing.
In the mean time, can you try to run BOINC 6.13.8, from here, to see if that fixes your instability problems a bit? This is a fully new BOINC version with a lot of bugs of its own, do heed the warnings in the change log (make a back up of your BOINC Data directory). But 6.13.8 is more stable than previous versions were, we're just trying to get it to crash again. :-)
ID: 40690 · Report as offensive
JeromeC

Send message
Joined: 13 Oct 10
Posts: 118
France
Message 40705 - Posted: 18 Oct 2011, 6:18:42 UTC

Actually I've had again the situation crash / reboot twice in the last week (last one : this morning), being into another account / doing something else, and after reboot, each time boinc error / resintall, but I really don't know if it has anything to do with the eventual same issue I have, or not.

So this time I have put 6.13.8 as you suggest, forgetting completely to backup anything, and so far all is doing well.

See you later for my next adventures :D

(I'm in a good mood this morning !)
ID: 40705 · Report as offensive
JeromeC

Send message
Joined: 13 Oct 10
Posts: 118
France
Message 40760 - Posted: 21 Oct 2011, 20:03:11 UTC

6.13.8 is really behaving strangely regarding project ressource share : it started to download way too much WUs in several of my running projects (I have many), especially considering I only set 0,2 day for network buffer , including 8 CPDN units (so completely out of time estimation), I had to cancel 7 this morning before going to work, and now I can see it only downloaded WUs of only one project (Ibercivis), that has an equal share with WCG for which it didn't download any WU, and not even speaking of all the other project, with lesser priority share, but still...

... but I has not crashed or forced me to reinstall, so far...

... I started to write this 2 days ago and forgot to post it, today I started to have many errors with ibercivis (one of the sub-projects I think) and then boinc stopped responding, I closed it and then it wouldn't start again, reinstall, blah blah.

Maybe I'm gonna stop this thread cause I guess I'm boring everybody here :D
ID: 40760 · Report as offensive
Profile Jord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15484
Netherlands
Message 40761 - Posted: 21 Oct 2011, 20:17:56 UTC - in response to Message 40760.  
Last modified: 21 Oct 2011, 20:33:25 UTC

6.13.8 is really behaving strangely regarding project resource share

Yes, that's known and a fix for that will come in 6.13.9
I have a Seti RAC of 272 on an RS of 50, compared to the other project eligible to fetch work having RSes of 400, 800 and 1000 and them having lower RACs. ;-)

... but I has not crashed or forced me to reinstall, so far...

That's what we were testing for. :)

... I started to write this 2 days ago and forgot to post it, today I started to have many errors with ibercivis (one of the sub-projects I think) and then boinc stopped responding, I closed it and then it wouldn't start again, reinstall, blah blah.

Do you have any error output of that? If so, please post it in this thread. I'll forward the info to the developers.

You can check for error output of the BOINC client (crashes, dumps) in stderrdae.txt in the BOINC Data directory (/Library/Application Support/BOINC Data). If you want, you can email it to me. I sent you my email address in PM.

And please, keep it up. You don't bore me. :)
ID: 40761 · Report as offensive
JeromeC

Send message
Joined: 13 Oct 10
Posts: 118
France
Message 40764 - Posted: 21 Oct 2011, 22:01:17 UTC
Last modified: 21 Oct 2011, 22:03:27 UTC

Mmmm actually not much :

stdoutdae.txt
-----------
21-Oct-2011 20:41:27 [Test4Theory@Home] [checkpoint] result uc_1318931023_17066_0 checkpointed
21-Oct-2011 20:41:27 [ibercivis] Computation for task soluvel_Oc7c16_prod070_7_1319200544_1 finished
21-Oct-2011 21:18:39 [---] Starting BOINC client version 6.13.8 for x86_64-apple-darwin
21-Oct-2011 21:18:39 [---] This a development version of BOINC and may not function properly

It crashed around 20:40, I waited 30 mn before installing again and restarting, you can see there is no specific message in the middle.

stderrdae.txt still dated from an older crash (14/10/11)


stderrgui.txt is dated from today but strangely it starts with
-----------
SIGPIPE: write on a pipe with no reader
SIGBUS: bus error

Crashed executable name: BOINCManager.virtualbox
built using BOINC library version 6.12.33
Machine type Intel 80486 (32-bit executable)
System version: Macintosh OS 10.6.8 build 10K540
Sun Jul 17 23:22:18 2011

and then there is not timestamp inside so I'm not sure, it finished with :

Exiting...
Permissions error -1202 at /Library/Application Support/BOINC Data/slots/12/stderrgfx.txt
Permissions error -1202 at /Library/Application Support/BOINC Data/slots/12/stderrgfx.txt
SIGPIPE: write on a pipe with no reader
SIGPIPE: write on a pipe with no reader
Permissions error -1202 at /Library/Application Support/BOINC Data/slots/1/stderrgfx.txt
Permissions error -1202 at /Library/Application Support/BOINC Data/slots/5/stderrgfx.txt
Permissions error -1202 at /Library/Application Support/BOINC Data/slots/0/stderrgfx.txt
Permissions error -1202 at /Library/Application Support/BOINC Data/slots/4/stderrgfx.txt
SIGPIPE: write on a pipe with no reader
SIGPIPE: write on a pipe with no reader
SIGPIPE: write on a pipe with no reader
SIGPIPE: write on a pipe with no reader
Permissions error -1202 at /Library/Application Support/BOINC Data/slots/6/stderrgfx.txt
Permissions error -1202 at /Library/Application Support/BOINC Data/slots/6/stderrgfx.txt
SIGPIPE: write on a pipe with no reader
SIGPIPE: write on a pipe with no reader
SIGPIPE: write on a pipe with no reader
Permissions error -1202 at /Library/Application Support/BOINC Data/slots/4/stderrgfx.txt
SIGPIPE: write on a pipe with no reader
SIGPIPE: write on a pipe with no reader
SIGPIPE: write on a pipe with no reader
SIGPIPE: write on a pipe with no reader
SIGPIPE: write on a pipe with no reader
Permissions error -1202 at /Library/Application Support/BOINC Data/slots/3/stderrgfx.txt
Permissions error -1202 at /Library/Application Support/BOINC Data/slots/2/stderrgfx.txt
SIGPIPE: write on a pipe with no reader
SIGPIPE: write on a pipe with no reader
SIGPIPE: write on a pipe with no reader
SIGPIPE: write on a pipe with no reader
Permissions error -1202 at /Library/Application Support/BOINC Data/slots/2/stderrgfx.txt
Permissions error -1202 at /Library/Application Support/BOINC Data/slots/1/stderrgfx.txt
Permissions error -1202 at /Library/Application Support/BOINC Data/slots/6/stderrgfx.txt
SIGPIPE: write on a pipe with no reader
Permissions error -1202 at /Library/Application Support/BOINC Data/slots/4/stderrgfx.txt
Permissions error -1202 at /Library/Application Support/BOINC Data/slots/1/stderrgfx.txt
ID: 40764 · Report as offensive
Profile Jord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15484
Netherlands
Message 40800 - Posted: 23 Oct 2011, 23:42:23 UTC

Progress, of sorts.

In your stderrdae.txt file, there were a couple of lines that showed this:
Permissions error -1202 at /Library/Application Support/BOINC Data/slots/1/stderrgfx.txt
Permissions error -1202 at /Library/Application Support/BOINC Data/slots/4/stderrgfx.txt
Permissions error -1202 at /Library/Application Support/BOINC Data/slots/4/stderrgfx.txt
Permissions error -1202 at /Library/Application Support/BOINC Data/slots/3/stderrgfx.txt

When BOINC displays that "Permissions error" message, it will refuse to run (i.e., exit) and displays an alert telling the user to reinstall BOINC.

Now then, what we're trying to find out is which project(s) causes this and what is at that time in the stderrgfx.txt file. So for this, you'll need to run BOINC again, with all the projects that you normally run --probably including T4T-- and wait until BOINC does its crash.

Now before you go reinstall BOINC, please first open stderrdae.txt in your BOINC Data directory and check if at the end of the file there's such a "Permissions error" line. If there is, follow the path to that slot directory and post the contents of the stderrgfx.txt file that's in there. Also post the last line(s) of stderrdae.txt

Think you can do that? :-)
ID: 40800 · Report as offensive
Profile Jord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15484
Netherlands
Message 40804 - Posted: 24 Oct 2011, 6:43:30 UTC
Last modified: 24 Oct 2011, 11:41:13 UTC

During the night I got a couple more instructions from the developers:

1. I need to remind you that you are a beta tester of T4T, and that LHC@home requires Mac users to run their special script that changes permissions for all BOINC executable and data, and disables the normal sandbox security. This is something we at BOINC strongly discourage and do not support.

It is therefore important that you tell the T4T developers that this may be badly messing up your BOINC and other projects. The developers have been working with T4T to set up a way for them to run VirtualBox without changing BOINC's normal permissions and without disabling sandbox security, but this isn't finished yet.

2. BONC 6.13.8 introduced significant changes to the way the Mac BOINC installer sets up the boinc_master and boinc_project user groups, and is still experimental. The developers have received confirmation that it solves the old "You currently are not authorized to manage the client" errors, at least for one user. However, there may still be other issues, and perhaps even may be new issues caused by the installer changes.

Please run the Terminal application (/Applications/Utilities/Terminal) and enter the following 2 commands:
dscl . list /users UniqueID
dscl . list /groups PrimaryGroupID

Then post the outcome of those, or email them to me. I will pass them on to the developer. To copy the results to a file, you can either select the "Export test as..." in Terminal's Shell menu, or copy and paste the text into a text file.

3. Installing an older version of BOINC will _not_ reverse the new installer's changes to the boinc_master and boinc_project user groups. To do that, you must run the Uninstall BOINC application.

4. We might suggest you to suspend T4T, uninstall BOINC, then reinstall an older version. Do not run the LHC@home script and do not run T4T. See if that fixes things. If not, you may also need to reset all projects to get things back to a normal state (the servers should then send you the lost tasks again at those projects that have this feature on; these tasks will start from zero again.)
ID: 40804 · Report as offensive
1 · 2 · Next

Message boards : Questions and problems : Boinc breaking down / reinstall needed

Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.