6.6.38 problems

Message boards : Questions and problems : 6.6.38 problems
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
David Ball

Send message
Joined: 2 Dec 06
Posts: 66
United States
Message 27791 - Posted: 6 Oct 2009, 10:05:50 UTC

6.6.38 is now the recommended version so I upgraded to it on a couple of machines. Both machines have Intel integrated graphics so no CUDA available.

Machine 1: Vista 64 bit, C2Q Q6600, No Cuda GPU

After install, it would run for a few seconds (could see the 4 workunit processes running in task manager) and then just go away. Tried different options and finally had to wipe BOINC and install a non-service install. Seems to be working ok now.

Machine 2: Vista 32 bit, C2D 6420, 3.x GB ram, no CUDA gpu

Upgrade wasn't too painful on this machine. It's running as a service. Left machine alone for a few hours and came back to find that BOINC wasn't running. Downgraded back to 6.6.36 and all is well.

Something needs to be done about this. 6.6.36 has work fetch problems which cause the server to refuse to send work, even when it would obviously be able to complete it on time. 6.6.36 also tends to underfetch work, even when the server gives it work if asked for. 6.6.38 has serious problems, especially on 64 bit Vista. There really needs to be a stable, reliable, version of the client.

BTW, I've seen some posts in project message boards that seem to indicate that others are having similar problems with 6.6.38 and 6.10.x.

Thanks,
David
ID: 27791 · Report as offensive
Profile Ageless
Volunteer moderator
Project administrator
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 12110
Netherlands
Message 27793 - Posted: 6 Oct 2009, 10:22:45 UTC - in response to Message 27791.  
Last modified: 6 Oct 2009, 10:23:03 UTC

Without error messages from either BOINC itself (to be found in the stderrdae.txt file in your Data directory) or by Windows (to be found in your Event Log), there's not much to go on to try to fix this.

Alternatively, download the symbol files (see DebugWinClient) and run BOINC through the Windows debugger Windbg.

Then...
Right-click My Computer->Properties
Advanced
Environment variables

Add New System variable.
Name it _NT_SYMBOL_PATH
Give it this value:
srv*C:\windows\symbols*http://msdl.microsoft.com/download/symbols;srv*c:\windows\symbols*http://boinc.berkeley.edu/symstore/

Click OK,
OK again

Install the symbols package if you haven't done so.
Just let it unpack to C:\Windows\Symbols

Now start Windbg (Start->Programs->Debugging Tools for Windows->Windbg)
Go File->Symbol Search Path
Click Browse
Add your BOINC directory (normally C:\Program Files\BOINC\)
Click browse
Add your BOINC Data directory (on Vista normally C:\ProgramData\BOINC\)
click OK.

From the menu bar: File->Open Executable.
Browse to your BOINC directory (normally C:\Program Files\BOINC)
Click on boinc.exe and click on Open.

A command line window will open as well as a Windebug window.
Minimize the windows command line window.

From the menu bar: Debug->Go
Now let the debugger run until BOINC crashes.

When BOINC has crashed: In the Windebug window, there's a command bar. At the bottom. Click it and type in 'kb' (no quotes), then hit Enter.

A stack trace is being build. It'll say **BUSY**

When that's done, go Edit->Write Windows text to file.
Save this log as my-debug.txt to your documents or somewhere where you
can easily find it.

File->Exit, you can save the workspace if you want to (but it isn't necessary).

Post the log.
Jord
Please do not private message me for tech support. Use the forums for that. Tech PMs will be ignored.
ID: 27793 · Report as offensive
David Ball

Send message
Joined: 2 Dec 06
Posts: 66
United States
Message 27813 - Posted: 6 Oct 2009, 18:05:14 UTC - in response to Message 27793.  

OK, It crashed on the Vista 64 Bit machine and I saved the std*.txt files before I did anything to the system. You can find them at

http://www.booksnbytes.com/Boinc.6.6.38CrashFiles/

There's also a file called after_crash_revert_6.6.36.txt which is the initial log after I downgraded to 6.6.36... This is significant because it shows a successful upload of the file which 6.6.38 failed to upload immediately prior to crash.

The dumps cover 2 crashes. Each crash is preceded by file upload errors in stdoutdae.txt. Apparently something restarted boinc after the first crash and it crashed again within seconds. This is the tail end of stdoutdae.txt just after the crash, beginning with the completion of the Lattice project task that crashes 6.6.38 on upload.



06-Oct-2009 10:49:11 [The Lattice Project] Computation for task 310025290.402308572247175.9_2 finished
06-Oct-2009 10:49:12 [Einstein@Home] Resuming task h1_0934.80_S5R4__506_S5R5a_1 using einstein_S5R5 version 305
06-Oct-2009 10:49:14 [The Lattice Project] Started upload of 310025290.402308572247175.9_2_0
06-Oct-2009 10:49:14 [The Lattice Project] Started upload of 310025290.402308572247175.9_2_1

06-Oct-2009 10:49:15 [Virtual Prairie] Sending scheduler request: To fetch work.
06-Oct-2009 10:49:15 [Virtual Prairie] Reporting 1 completed tasks, requesting new tasks
06-Oct-2009 10:49:16 [The Lattice Project] [error] Error reported by file upload server: nbytes missing or negative
06-Oct-2009 10:51:57 [---] Starting BOINC client version 6.6.38 for windows_x86_64
06-Oct-2009 10:51:57 [---] log flags: task, file_xfer, sched_ops
06-Oct-2009 10:51:57 [---] Libraries: libcurl/7.19.4 OpenSSL/0.9.8k zlib/1.2.3
06-Oct-2009 10:51:57 [---] Data directory: C:\ProgramData\BOINC
06-Oct-2009 10:51:57 [---] Running under account dball
06-Oct-2009 10:51:57 [---] Processor: 4 GenuineIntel Intel(R) Core(TM)2 Quad CPU Q6600 @ 2.40GHz [Intel64 Family 6 Model 15 Stepping 11]
06-Oct-2009 10:51:57 [---] Processor features: fpu tsc pae nx sse sse2 pni
06-Oct-2009 10:51:57 [---] OS: Microsoft Windows Vista: Home Premium x64 Edition, Service Pack 2, (06.00.6002.00)
06-Oct-2009 10:51:57 [---] Memory: 4.99 GB physical, 10.16 GB virtual
06-Oct-2009 10:51:57 [---] Disk: 687.57 GB total, 619.56 GB free
06-Oct-2009 10:51:57 [---] Local time is UTC -5 hours
06-Oct-2009 10:51:57 [---] No CUDA-capable NVIDIA GPUs found
06-Oct-2009 10:51:57 [---] No coprocessors
06-Oct-2009 10:51:57 [---] Not using a proxy
06-Oct-2009 10:51:57 [ABC@home] URL: http://abcathome.com/; Computer ID: 97508; location: home; project prefs: default
06-Oct-2009 10:51:57 [boincsimap] URL: http://boinc.bio.wzw.tum.de/boincsimap/; Computer ID: 132285; location: home; project prefs: default
06-Oct-2009 10:51:57 [Poem@Home] URL: http://boinc.fzk.de/poem/; Computer ID: 36646; location: home; project prefs: default
06-Oct-2009 10:51:57 [The Lattice Project] URL: http://boinc.umiacs.umd.edu/; Computer ID: 26874; location: home; project prefs: default
06-Oct-2009 10:51:57 [Docking@Home] URL: http://docking.cis.udel.edu/; Computer ID: 10262; location: home; project prefs: default
06-Oct-2009 10:51:57 [Einstein@Home] URL: http://einstein.phys.uwm.edu/; Computer ID: 1841138; location: home; project prefs: default
06-Oct-2009 10:51:57 [Virtual Prairie] URL: http://vcsc.cs.uh.edu/virtual-prairie/; Computer ID: 8518; location: (none); project prefs: default
06-Oct-2009 10:51:57 [Cosmology@Home] URL: http://www.cosmologyathome.org/; Computer ID: 40229; location: home; project prefs: default
06-Oct-2009 10:51:57 [malariacontrol.net] URL: http://www.malariacontrol.net/; Computer ID: 114795; location: home; project prefs: default
06-Oct-2009 10:51:57 [PrimeGrid] URL: http://www.primegrid.com/; Computer ID: 81986; location: (none); project prefs: default
06-Oct-2009 10:51:57 [World Community Grid] URL: http://www.worldcommunitygrid.org/; Computer ID: 794219; location: (none); project prefs: default
06-Oct-2009 10:51:57 [Poem@Home] General prefs: from Poem@Home (last modified 03-Oct-2009 02:36:15)
06-Oct-2009 10:51:57 [Poem@Home] Computer location: home
06-Oct-2009 10:51:57 [Poem@Home] General prefs: no separate prefs for home; using your defaults
06-Oct-2009 10:51:57 [---] Preferences limit memory usage when active to 3576.57MB
06-Oct-2009 10:51:57 [---] Preferences limit memory usage when idle to 4649.55MB
06-Oct-2009 10:51:57 [---] Preferences limit disk usage to 40.00GB
06-Oct-2009 10:51:57 [The Lattice Project] Started upload of 310025290.402308572247175.9_2_0
06-Oct-2009 10:51:57 [The Lattice Project] Started upload of 310025290.402308572247175.9_2_1

06-Oct-2009 10:51:57 [ABC@home] Restarting task abc_sieve_wu_00163113_0 using abc-sieve version 200
06-Oct-2009 10:51:58 [Einstein@Home] Restarting task h1_0934.80_S5R4__506_S5R5a_1 using einstein_S5R5 version 305
06-Oct-2009 10:51:58 [Docking@Home] Restarting task 1hpx_31_mod0014_56522_196997_0 using charmm34 version 623
06-Oct-2009 10:51:58 [Virtual Prairie] Restarting task vip_multi_run9_1254698122_1814088_0 using virtual_prairie_multi version 9
06-Oct-2009 10:51:58 [Virtual Prairie] Sending scheduler request: To fetch work.
06-Oct-2009 10:51:58 [Virtual Prairie] Reporting 1 completed tasks, requesting new tasks
06-Oct-2009 10:51:59 [The Lattice Project] [error] Error reported by file upload server: nbytes missing or negative
06-Oct-2009 10:51:59 [The Lattice Project] [error] Error reported by file upload server: nbytes missing or negative



After installing 6.6.36 over the top of 6.6.38, you can see that it successfully uploaded the files.

10/6/2009 12:21:45 PM Starting BOINC client version 6.6.36 for windows_x86_64
10/6/2009 12:21:45 PM log flags: task, file_xfer, sched_ops
10/6/2009 12:21:45 PM Libraries: libcurl/7.19.4 OpenSSL/0.9.8j zlib/1.2.3
10/6/2009 12:21:45 PM Data directory: C:\ProgramData\BOINC
10/6/2009 12:21:45 PM Running under account dball
10/6/2009 12:21:46 PM Processor: 4 GenuineIntel Intel(R) Core(TM)2 Quad CPU Q6600 @ 2.40GHz [Intel64 Family 6 Model 15 Stepping 11]
10/6/2009 12:21:46 PM Processor features: fpu tsc pae nx sse sse2 pni
10/6/2009 12:21:46 PM OS: Microsoft Windows Vista: Home Premium x64 Edition, Service Pack 2, (06.00.6002.00)
10/6/2009 12:21:46 PM Memory: 4.99 GB physical, 10.16 GB virtual
10/6/2009 12:21:46 PM Disk: 687.57 GB total, 619.36 GB free
10/6/2009 12:21:46 PM Local time is UTC -5 hours
10/6/2009 12:21:46 PM No CUDA devices found
10/6/2009 12:21:46 PM No coprocessors
10/6/2009 12:21:46 PM Not using a proxy
10/6/2009 12:21:46 PM Version change (6.6.38 -> 6.6.36)
10/6/2009 12:21:46 PM ABC@home URL: http://abcathome.com/; Computer ID: 97508; location: home; project prefs: default
10/6/2009 12:21:46 PM boincsimap URL: http://boinc.bio.wzw.tum.de/boincsimap/; Computer ID: 132285; location: home; project prefs: default
10/6/2009 12:21:46 PM Poem@Home URL: http://boinc.fzk.de/poem/; Computer ID: 36646; location: home; project prefs: default
10/6/2009 12:21:46 PM The Lattice Project URL: http://boinc.umiacs.umd.edu/; Computer ID: 26874; location: home; project prefs: default
10/6/2009 12:21:46 PM Docking@Home URL: http://docking.cis.udel.edu/; Computer ID: 10262; location: home; project prefs: default
10/6/2009 12:21:46 PM Einstein@Home URL: http://einstein.phys.uwm.edu/; Computer ID: 1841138; location: home; project prefs: default
10/6/2009 12:21:46 PM Virtual Prairie URL: http://vcsc.cs.uh.edu/virtual-prairie/; Computer ID: 8518; location: (none); project prefs: default
10/6/2009 12:21:46 PM Cosmology@Home URL: http://www.cosmologyathome.org/; Computer ID: 40229; location: home; project prefs: default
10/6/2009 12:21:46 PM malariacontrol.net URL: http://www.malariacontrol.net/; Computer ID: 114795; location: home; project prefs: default
10/6/2009 12:21:46 PM PrimeGrid URL: http://www.primegrid.com/; Computer ID: 81986; location: (none); project prefs: default
10/6/2009 12:21:46 PM World Community Grid URL: http://www.worldcommunitygrid.org/; Computer ID: 794219; location: (none); project prefs: default
10/6/2009 12:21:46 PM Poem@Home General prefs: from Poem@Home (last modified 03-Oct-2009 02:36:15)
10/6/2009 12:21:46 PM Poem@Home Computer location: home
10/6/2009 12:21:46 PM Poem@Home General prefs: no separate prefs for home; using your defaults
10/6/2009 12:21:46 PM Preferences limit memory usage when active to 3576.57MB
10/6/2009 12:21:46 PM Preferences limit memory usage when idle to 4649.55MB
10/6/2009 12:21:46 PM Preferences limit disk usage to 40.00GB
10/6/2009 12:21:46 PM Running CPU benchmarks
10/6/2009 12:21:46 PM Suspending computation - running CPU benchmarks
10/6/2009 12:21:46 PM The Lattice Project Started upload of 310025290.402308572247175.9_2_0
10/6/2009 12:21:46 PM The Lattice Project Started upload of 310025290.402308572247175.9_2_1
10/6/2009 12:21:47 PM The Lattice Project Finished upload of 310025290.402308572247175.9_2_0
10/6/2009 12:21:47 PM The Lattice Project Finished upload of 310025290.402308572247175.9_2_1

10/6/2009 12:21:47 PM The Lattice Project Started upload of 310025290.402308572247175.9_2_2
10/6/2009 12:21:47 PM The Lattice Project Started upload of 310025290.402308572247175.9_2_3
10/6/2009 12:21:50 PM The Lattice Project Finished upload of 310025290.402308572247175.9_2_2
10/6/2009 12:21:58 PM The Lattice Project Finished upload of 310025290.402308572247175.9_2_3



You can find the full stderrdae.txt file on the URL I specified but here's the beginning and end of the data for the second crash. All the modloads are in the full file.

BOINC Windows Runtime Debugger Version 6.6.38


Dump Timestamp : 10/06/09 10:51:59
Loaded Library : dbghelp.dll
Loaded Library : symsrv.dll
Loaded Library : srcsrv.dll
Loaded Library : version.dll
Debugger Engine : 4.0.5.0
Symbol Search Path: C:\ProgramData\BOINC;C:\Program Files\BOINC;srv*C:\ProgramData\BOINC\symbols*http://msdl.microsoft.com/download/symbols;srv*C:\ProgramData\BOINC\symbols*http://boinc.berkeley.edu/symstore


ModLoad: 40000000 000ef000 C:\Program Files\BOINC\boinc.exe (6.6.38.0) (-nosymbols- Symbols Loaded)
Linked PDB Filename : c:\Src\BOINCSVN\branches\boinc_core_release_6_6a\win_build\Build\x64\Release\boinc_exe.pdb
File Version : 6.6.38
Company Name : Space Sciences Laboratory
Product Name : BOINC client
Product Version : 6.6.38


skipped middle of file but the full file can be found at the url above.

*** Dump of the Process Statistics: ***

- I/O Operations Counters -
Read: 26320, Write: 1050, Other 4523

- I/O Transfers Counters -
Read: 107420877, Write: 681838, Other 384800

- Paged Pool Usage -
QuotaPagedPoolUsage: 108272, QuotaPeakPagedPoolUsage: 114304
QuotaNonPagedPoolUsage: 63584, QuotaPeakNonPagedPoolUsage: 81692

- Virtual Memory Usage -
VirtualSize: 9252864, PeakVirtualSize: 66867200

- Pagefile Usage -
PagefileUsage: 9252864, PeakPagefileUsage: 9252864

- Working Set Size -
WorkingSetSize: 13668352, PeakWorkingSetSize: 13668352, PageFaultCount: 3429

*** Dump of thread ID 1944 (state: Initialized): ***

- Information -
Status: Base Priority: Normal, Priority: Normal, , Kernel Time: 0.000000, User Time: 0.000000, Wait Time: 0.000000

- Unhandled Exception Record -
Reason: Access Violation (0xc0000005) at address 0x000000014004482E read attempt to address 0x00001D10

- Registers -
rax=0000000000000000 rbx=0000000000e67810 rcx=0000000002deeeb6 rdx=000000003d28f1a0 rsi=0000000000000000 rdi=0000000000dd83e0
r8=0000000000000000 r9=0000000000000000 r10=00000000739e0000 r11=0000000002deeea0 r12=0000000000000000 r13=0000000000e65a90
r14=0000000000000003 r15=0000000000000000 rip=000000004004482e rsp=000000000012fc40 rbp=0000000000000000
cs=0033 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00010206

- Callstack -
ChildEBP RetAddr Args to Child
0012fc80 4004497c 00e67810 00e65a90 00000000 00000003 boinc!+0x0
0012fcb0 40015e7a 00000000 00000000 40096da0 00000000 boinc!+0x0
0012fcf0 40042e6f 00000000 4007abf3 00000000 40042823 boinc!+0x0
0012fd40 400432d7 00000001 00000020 00000020 73a317be boinc!+0x0
0012ff20 40053a80 00130000 0e744320 00130000 00000000 boinc!+0x0
0012ff50 7789be3d 00000000 00000000 00000000 00000000 boinc!+0x0
0012ff80 779d6a51 00000000 00000000 00000000 00000000 kernel32!BaseThreadInitThunk+0x0
0012ffd0 00000000 00000000 00000000 00000000 00000000 ntdll!RtlUserThreadStart+0x0


*** Debug Message Dump ****


*** Foreground Window Data ***
Window Name :
Window Class :
Window Process ID: 0
Window Thread ID : 0

Exiting...


I hope this helps. From what I'm seeing on the Lattice message boards, others are experiencing the same thing. Again, you can find the FULL files at
http://www.booksnbytes.com/Boinc.6.6.38CrashFiles/

--David
David Ball
ID: 27813 · Report as offensive
Profile Ageless
Volunteer moderator
Project administrator
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 12110
Netherlands
Message 27814 - Posted: 6 Oct 2009, 18:41:41 UTC - in response to Message 27813.  

Next test. Can you reproduce the problem with the latest BOINC 6.10.13?

If you can, I'll inform the developers.
If you cannot, then we can assume it's already fixed in the upcoming client.

The 6.6 range is out of development, any big problems with it will not be fixed in that version range.
Jord
Please do not private message me for tech support. Use the forums for that. Tech PMs will be ignored.
ID: 27814 · Report as offensive
David Ball

Send message
Joined: 2 Dec 06
Posts: 66
United States
Message 27815 - Posted: 6 Oct 2009, 19:21:20 UTC - in response to Message 27814.  

OK, I've loaded 6.10.13 and will see what happens.

It might be fixed in the 6.10.x series. I found someone on the lattice boards who was running 6.10.6 and instead of a crash, they got the following error.

04/10/2009 17:04:44 The Lattice Project [error] Error reported by file upload server: nbytes missing or negative
04/10/2009 17:04:44 The Lattice Project [error] Error reported by file upload server: nbytes missing or negative
04/10/2009 17:04:44 The Lattice Project Giving up on upload of 275167640.7145602532405452.10_1_0: permanent upload error
04/10/2009 17:04:44 The Lattice Project Giving up on upload of 275167640.7145602532405452.10_1_1: permanent upload error


The WU on 6.10.6 did validate on Lattice though, even with the 2 permanent upload errors so either the file was zero length or it wasn't actually needed by Lattice.

So:
6.6.36 - uploads without error
6.6.38 - crashes BOINC client after "Error reported by file upload server: nbytes missing or negative"
6.10.6 - gets permanent upload error but the WU validates.

The main problem I see with this is that 6.6.38 is the default/recommended BOINC client to download and that is the version that crashes. Some people running Lattice are getting frustrated and canceling WU's or dropping Lattice because of this.

It will take a couple of days for the current Lattice WU to finish on my machine (estimated 45 CPU hours to go) that I loaded 6.10.13 on. I will post to this thread if it has a problem.

-- David
David Ball
ID: 27815 · Report as offensive
David Ball

Send message
Joined: 2 Dec 06
Posts: 66
United States
Message 27817 - Posted: 6 Oct 2009, 23:32:42 UTC

OK, I loaded 6.10.13 on a second machine that had a Lattice WU near completion. It has now completed, uploaded, and validated. Here's the section of the log pertaining to the upload.

10/6/2009 5:36:47 PM The Lattice Project Computation for task 16466160.6849746529756418.1_0 finished
10/6/2009 5:36:47 PM Virtual Prairie Starting vip_multi_run9_1254698122_1819755_0
10/6/2009 5:36:47 PM Virtual Prairie Starting task vip_multi_run9_1254698122_1819755_0 using virtual_prairie_multi version 9
10/6/2009 5:36:49 PM The Lattice Project Started upload of 16466160.6849746529756418.1_0_0
10/6/2009 5:36:49 PM The Lattice Project Started upload of 16466160.6849746529756418.1_0_1
10/6/2009 5:36:50 PM The Lattice Project [error] Error reported by file upload server: nbytes missing or negative
10/6/2009 5:36:50 PM The Lattice Project [error] Error reported by file upload server: nbytes missing or negative
10/6/2009 5:36:50 PM The Lattice Project Giving up on upload of 16466160.6849746529756418.1_0_0: permanent upload error
10/6/2009 5:36:50 PM The Lattice Project Giving up on upload of 16466160.6849746529756418.1_0_1: permanent upload error
10/6/2009 5:36:50 PM The Lattice Project Started upload of 16466160.6849746529756418.1_0_2
10/6/2009 5:36:50 PM The Lattice Project Started upload of 16466160.6849746529756418.1_0_3

10/6/2009 5:36:51 PM The Lattice Project Sending scheduler request: To fetch work.
10/6/2009 5:36:51 PM The Lattice Project Requesting new tasks
10/6/2009 5:36:52 PM The Lattice Project Finished upload of 16466160.6849746529756418.1_0_2
10/6/2009 5:36:57 PM The Lattice Project Scheduler request completed: got 1 new tasks
10/6/2009 5:36:59 PM The Lattice Project Started download of 248365700.01770482587345712_0
10/6/2009 5:36:59 PM The Lattice Project Started download of 248365700.01770482587345712_1
10/6/2009 5:37:00 PM The Lattice Project Finished download of 248365700.01770482587345712_0
10/6/2009 5:37:00 PM The Lattice Project Started download of 248365700.01770482587345712.9_2
10/6/2009 5:37:01 PM The Lattice Project Finished download of 248365700.01770482587345712.9_2
10/6/2009 5:37:04 PM The Lattice Project Finished upload of 16466160.6849746529756418.1_0_3
10/6/2009 5:38:09 PM The Lattice Project Finished download of 248365700.01770482587345712_1
10/6/2009 5:47:05 PM The Lattice Project Sending scheduler request: To report completed tasks.
10/6/2009 5:47:05 PM The Lattice Project Reporting 1 completed tasks, not requesting new tasks
10/6/2009 5:47:11 PM The Lattice Project Scheduler request completed



It still gets the error from the upload server and it logs a permanent upload error, while 6.6.36 didn't get an error at all. It doesn't crash. It reports the WU and the WU validates with a quorum of 2.

It looks like it might still be sending a bad record to the server but it recovers and doesn't crash now. Someone should probably try to figure out why it's getting that error from the server when 6.6.36 didn't.

I'm sticking with 6.10.13 on those 2 machines.

Thanks,

David
David Ball
ID: 27817 · Report as offensive
Profile Ageless
Volunteer moderator
Project administrator
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 12110
Netherlands
Message 27822 - Posted: 7 Oct 2009, 6:27:25 UTC - in response to Message 27817.  
Last modified: 7 Oct 2009, 14:26:32 UTC

OK, I have kept the developers up-to-date about this thread. The answer is really simple: Lattice should update their server software.

The "nbytes missing or negative" is a server error.

So why will 6.6.36 upload & report where later versions won't do this, crash on trying or immediately discard of the problematic result files? Because of changes to the client that it doesn't send back garbage results etc. The main gist is the following changes. The code before these changes was giving problems to other projects with 6.6.36 and before.

Change Log for 6.6.38 wrote:
David 16 July 2009
- client: code cleanup for project-level file xfer backoff

- client: fix backoff logic

- client/manager/GUI RPC: show project-level backoffs

- client: changed file upload logic

Old: each upload attempt consists of two HTTP requests:
-- the 1st to get the current file size on server,
-- the 2nd to upload the remainder of the file.

Problem:

a) if the upload server is overloaded and requests are succeeding with probability X, then the chance of both requests succeeding is X2. So e.g. a per-request success rate of 0.1 becomes an overall success rate of 0.01.

b) the "get file size" request can be avoided in some cases.

New:

If we've already queried the file size and haven't uploaded any additional bytes, don't query the file size again.

- client: if file < 8KB, upload it in its entirety and skip size check

- client: (refinement to previous checkin) don't skip file size check if file has multiple upload URLs. We might have uploaded different amounts on different servers.

- client: change the way a resource's "estimated delay" (passed to server for crude deadline check) is computed.

Old: estimated delay is the interval for which the resource is fully used (i.e., all instances busy).

Problem: this may cause unnecessary project starvation.
example: 1 CPU machine, has a month-long CPDN job with a 1-year deadline (it's not in deadline trouble). Then the CPU estimated delay will be 1 month, and the client won't get any work from projects with deadlines shorter than 1 month.

New: estimated delay is the latest time at which the resource is fully used and is being used by at least 1 job that is projected to miss its deadline under RR.

Note: this isn't precise, but I don't think we can improve it much without getting a lot more complex.

David 17 July 2009
- client: 2nd try on my last checkin.

We need to estimate 2 different delays for each resource type:
1) "saturated time": the time the resource will be fully utilized (new name for the old "estimated delay"). This is used to compute work requests.
2) "busy time": the time a new job would have to wait to start using this resource. This is passed to the scheduler and used for a crude deadline check.

Note: this is ill-defined; a single number doesn't suffice. But as a very rough estimate, I'll use the sum of (J.duration * J.ninstances)/ninstances over all jobs that miss their deadline under RR sim.

Jord
Please do not private message me for tech support. Use the forums for that. Tech PMs will be ignored.
ID: 27822 · Report as offensive
Mike W

Send message
Joined: 18 Nov 08
Posts: 9
United States
Message 27825 - Posted: 7 Oct 2009, 13:31:36 UTC - in response to Message 27822.  

I am having this problem as well, i have several completed lattice projects, and i dont want to have to delete them.

where can i find a previous version of boinc to re-download,

or when do you think the problem will be fixed?
ID: 27825 · Report as offensive
Profile Gundolf Jahn

Send message
Joined: 20 Dec 07
Posts: 1069
Germany
Message 27827 - Posted: 7 Oct 2009, 13:41:16 UTC - in response to Message 27825.  

where can i find a previous version of boinc to re-download?

That would be here.

Gruß,
Gundolf
ID: 27827 · Report as offensive
Stick

Send message
Joined: 10 Oct 09
Posts: 20
United States
Message 27859 - Posted: 10 Oct 2009, 4:20:38 UTC
Last modified: 10 Oct 2009, 4:38:39 UTC

I had problems uploading these SZTAKI results with 6.6.38:

http://szdg.lpds.sztaki.hu/szdg/result.php?resultid=10415951
http://szdg.lpds.sztaki.hu/szdg/result.php?resultid=10417381

SZTAKI results generate several files for upload - one of which is apparently empty (i.e zero bytes) and I believe the problem occurred when BOINC attempted to upload the empty file. When this happened, the BOINC client disconnected from BOINC manager. All attempts to reconnect failed - including BOINC restarts and BOINC "repair" installations. But, reverting back to 6.6.36 fixed it. Note that I had the same problem with both my laptop and desktop computers.
ID: 27859 · Report as offensive
Stick

Send message
Joined: 10 Oct 09
Posts: 20
United States
Message 27890 - Posted: 11 Oct 2009, 23:17:36 UTC - in response to Message 27815.  

Another instance of the problem with SZTAKI posted here. So I tend to agree with the comment below - 6.6.38 should not be the recommended version. I would also note that the BOINC version history page does not yet acknowledge the existence of 6.6.38.

The main problem I see with this is that 6.6.38 is the default/recommended BOINC client to download and that is the version that crashes. Some people running Lattice are getting frustrated and canceling WU's or dropping Lattice because of this.

ID: 27890 · Report as offensive
Profile Ageless
Volunteer moderator
Project administrator
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 12110
Netherlands
Message 27892 - Posted: 11 Oct 2009, 23:54:14 UTC - in response to Message 27890.  

6.6.36 has a problem where the scheduler will eventually answer that your work request cannot go through, as your BOINC is on for so much percent, with calculation enabled 100%. 6.6.38 has a fix for that on the client side.

Just because some people at one project have problems uploading, doesn't mean BOINC has to change its recommended version to something lower with bugs that hit more people at more projects, when the solution for the project with the problems now is simple: update their server software. (*)

Seeing SZTAKI their global preferences, they don't even have BOINC 6 preferences support, meaning their server software is most probably still at the 5.11 release that they speak about in this thread.

You could just as well revert back to BOINC 5.10.45 then and say that's better off made recommended.

(*) I see that the Lattice project has just updated their server software.
Jord
Please do not private message me for tech support. Use the forums for that. Tech PMs will be ignored.
ID: 27892 · Report as offensive
Stick

Send message
Joined: 10 Oct 09
Posts: 20
United States
Message 27901 - Posted: 12 Oct 2009, 3:17:59 UTC - in response to Message 27892.  
Last modified: 12 Oct 2009, 3:50:39 UTC

(*) I see that the Lattice project has just updated their server software.

If that means SZTAKI is the lone hold-out, then I agree. But . . .
6.6.36 has a problem where the scheduler will eventually answer that your work request cannot go through, as your BOINC is on for so much percent, with calculation enabled 100%. 6.6.38 has a fix for that on the client side.
. . . that was an annoyance. This problem causes 6.6.38 to crash. While that may not be enough for it to rate a "MAY BE UNSTABLE" label, in my book, the "Recommended" rating is questionable.

You could just as well revert back to BOINC 5.10.45 then and say that's better off made recommended.

When I started with BOINC, I think the recommended version was 4.06. And, I generally upgrade when a new recommended version is released. Having done that, I have seen a lot of improvements to the program and also a lot of minor glitches. 5.10.45 had glitches and so does 6.6.36. Progress isn't always smooth. But I had come to expect a level of stability with the BOINC "recommended version" that 6.6.38 just didn't live up to. That is why I finally joined the BOINC forum and made my report.
ID: 27901 · Report as offensive
Profile Ageless
Volunteer moderator
Project administrator
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 12110
Netherlands
Message 27902 - Posted: 12 Oct 2009, 5:27:57 UTC - in response to Message 27901.  

. . . that was an annoyance. This problem causes 6.6.38 to crash. While that may not be enough for it to rate a "MAY BE UNSTABLE" label, in my book, the "Recommended" rating is questionable.

If you're new to the scene, a BOINC version that doesn't fetch work from your projects due to it being stuck on "calculations enabled 100% of that" is not an annoyance, but something you want to get rid of. As in Add/remove Programs->Uninstall BOINC. Do better next time. This affected all projects out there.

For you too, try BOINC 6.10.13. It's the release candidate for the 6.10 series. If it also has crashing problems, let me know and I'll notify the developers.

But until that time, there is some responsibility for the Projects as well to keep their server software up-to-date, so try bombarding the admin there with requests to get the 2 year old server software updated.
Jord
Please do not private message me for tech support. Use the forums for that. Tech PMs will be ignored.
ID: 27902 · Report as offensive
Stick

Send message
Joined: 10 Oct 09
Posts: 20
United States
Message 27919 - Posted: 12 Oct 2009, 15:17:20 UTC - in response to Message 27902.  
Last modified: 12 Oct 2009, 15:34:51 UTC

. . . that was an annoyance. This problem causes 6.6.38 to crash. While that may not be enough for it to rate a "MAY BE UNSTABLE" label, in my book, the "Recommended" rating is questionable.

If you're new to the scene, a BOINC version that doesn't fetch work from your projects due to it being stuck on "calculations enabled 100% of that" is not an annoyance, but something you want to get rid of. As in Add/remove Programs->Uninstall BOINC. Do better next time. This affected all projects out there.

As I tried to say earlier, if the 6.6.38 problem is now isolated to SZTAKI, I agree with you. But, having experienced the "calculations enabled 100% of that" problem with 6.6.36, I still think that "annoyance" is an apt descriptor. The point is, neither problem is something that should be in a "recommended" release. (And, in that regard, maybe 5.10.45 was better.) ;-)

For you too, try BOINC 6.10.13. It's the release candidate for the 6.10 series. If it also has crashing problems, let me know and I'll notify the developers.

I'll do it sometime soon (with SZTAKI) and report back.

But until that time, there is some responsibility for the Projects as well to keep their server software up-to-date, so try bombarding the admin there with requests to get the 2 year old server software updated.

Knowing what I do about the admins/devs at SZTAKI, I am surprised to hear their software is only 2 years out of date. ;-) But I'll do what I can to prod them and also to warn SZTAKI users against upgrading to 6.6.38.
ID: 27919 · Report as offensive
Profile Ageless
Volunteer moderator
Project administrator
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 12110
Netherlands
Message 27923 - Posted: 12 Oct 2009, 15:38:09 UTC - in response to Message 27919.  

Knowing what I do about the admins/devs at SZTAKI, I am surprised to hear their software is only 2 years out of date. ;-) But I'll do what I can to prod them and also to warn SZTAKI users against upgrading to 6.6.38.

Yes, Adam is a bit slow.

But that's fair. :-)
Jord
Please do not private message me for tech support. Use the forums for that. Tech PMs will be ignored.
ID: 27923 · Report as offensive
Roger Rasmussen

Send message
Joined: 12 Oct 09
Posts: 1
United States
Message 27931 - Posted: 12 Oct 2009, 18:16:54 UTC

I just "upgraded" to 6.6.38 and now my Vista machine won't communicate with the server. Version 6.6.38 works fine on my XT machine.
ID: 27931 · Report as offensive
Profile Ageless
Volunteer moderator
Project administrator
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 12110
Netherlands
Message 27939 - Posted: 12 Oct 2009, 20:21:07 UTC - in response to Message 27931.  

I just "upgraded" to 6.6.38 and now my Vista machine won't communicate with the server.

Which server? What are your messages, if any?
Jord
Please do not private message me for tech support. Use the forums for that. Tech PMs will be ignored.
ID: 27939 · Report as offensive
Stick

Send message
Joined: 10 Oct 09
Posts: 20
United States
Message 27949 - Posted: 12 Oct 2009, 23:41:08 UTC - in response to Message 27919.  

For you too, try BOINC 6.10.13. It's the release candidate for the 6.10 series. If it also has crashing problems, let me know and I'll notify the developers.

I'll do it sometime soon (with SZTAKI) and report back.


Looks like 6.10.13 works like it should with SZTAKI - at least it did with this result. Here are the messages related to its upload:

10/12/2009 7:04:01 PM SZTAKI Desktop Grid Started upload of caa03215-1b38-489c-abd4-252c594ad6ff_6f4d3cfd-b2ab-4430-b46d-7f9eadb8435f_450036_2_0
10/12/2009 7:04:01 PM SZTAKI Desktop Grid Started upload of caa03215-1b38-489c-abd4-252c594ad6ff_6f4d3cfd-b2ab-4430-b46d-7f9eadb8435f_450036_2_1
10/12/2009 7:04:03 PM SZTAKI Desktop Grid Finished upload of caa03215-1b38-489c-abd4-252c594ad6ff_6f4d3cfd-b2ab-4430-b46d-7f9eadb8435f_450036_2_0
10/12/2009 7:04:03 PM SZTAKI Desktop Grid Finished upload of caa03215-1b38-489c-abd4-252c594ad6ff_6f4d3cfd-b2ab-4430-b46d-7f9eadb8435f_450036_2_1
10/12/2009 7:04:03 PM SZTAKI Desktop Grid Started upload of caa03215-1b38-489c-abd4-252c594ad6ff_6f4d3cfd-b2ab-4430-b46d-7f9eadb8435f_450036_2_2
10/12/2009 7:04:03 PM SZTAKI Desktop Grid Started upload of caa03215-1b38-489c-abd4-252c594ad6ff_6f4d3cfd-b2ab-4430-b46d-7f9eadb8435f_450036_2_3
10/12/2009 7:04:04 PM SZTAKI Desktop Grid [error] Error reported by file upload server: nbytes missing or negative
10/12/2009 7:04:04 PM SZTAKI Desktop Grid Giving up on upload of caa03215-1b38-489c-abd4-252c594ad6ff_6f4d3cfd-b2ab-4430-b46d-7f9eadb8435f_450036_2_2: permanent upload error
10/12/2009 7:04:04 PM SZTAKI Desktop Grid Finished upload of caa03215-1b38-489c-abd4-252c594ad6ff_6f4d3cfd-b2ab-4430-b46d-7f9eadb8435f_450036_2_3
10/12/2009 7:04:04 PM SZTAKI Desktop Grid Started upload of caa03215-1b38-489c-abd4-252c594ad6ff_6f4d3cfd-b2ab-4430-b46d-7f9eadb8435f_450036_2_4
10/12/2009 7:04:05 PM SZTAKI Desktop Grid Finished upload of caa03215-1b38-489c-abd4-252c594ad6ff_6f4d3cfd-b2ab-4430-b46d-7f9eadb8435f_450036_2_4
ID: 27949 · Report as offensive
John Beck

Send message
Joined: 15 Oct 09
Posts: 5
United States
Message 28018 - Posted: 15 Oct 2009, 8:22:31 UTC - in response to Message 27902.  

I am not sure what is going on but my boinc acount keeps being lost. I will create one then the next time I try to login it says that there is no account. I have 2 computers. both have v...38. one works. My new computer however keeps trying to connect but can't. I have uninstalled & reinstalled multiple times. it will work for a little while then start the whole thing over. When it disconnects I can't access anything. No settings, no buttons. All greyed out. I am realy curious why the program is getting worse not better. It is almost treating me as an invalid user or my account is bad. I was using a USB thumb as a cash with eBooster would that have anything to do with it and if it does it should be with the install requirements as at least a warning or something. Hope whatever is plaging the system gets fixed soon. This is realy anoying. Ive wasted hours trying to solve the issue myself. It looks like it is at least being looked at. I put my sysmptoms out to see if mine are different than the rest.

Thanks,
John

ID: 28018 · Report as offensive
1 · 2 · Next

Message boards : Questions and problems : 6.6.38 problems

Copyright © 2017 University of California. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.