Posts by Joe Bloggs

InfoMessage
1) Message boards : BOINC client : BOINC 7.0.40-42 and new app_config.xml
Message 47478
Posted 23 Jan 2013 by Joe Bloggs
One thing I'm testing is to not process a science that might slip through the due the "If there is no work for your selection, send me something else". Possible scenarios are a project that always fails on a specific device or is really to plain heavy. Is <max_instances>0</max_instances> going to stop the execution [on CPU, GPU or both], or is 0 as so often interpreted as "no restriction"?


A workaround I've thought of is to set the cpu_usage or gpu_usage tag to a number higher than the number available in the system, say 10 on a quad-core system. That way the scheduler would never find enough cpus to run it. But I understand these tags are only available for gpu apps as of now?
2) Message boards : Questions and problems : Task exited with zero status but no 'finished' file... (almost) all of them
Message 47414
Posted 20 Jan 2013 by Joe Bloggs
Turns out the hard disk my OS and BOINC was on was dying, dying, dead as of yesterday. Guess we can really close the case on this one. "no finished file"... indeed :rolleyes:
3) Message boards : Questions and problems : Task exited with zero status but no 'finished' file... (almost) all of them
Message 47382
Posted 18 Jan 2013 by Joe Bloggs
... and right after I posted that, I witnessed another of these events again. So I guess that malware-turned-malware-sweeper wasn't the culprit either. This time I saw cpu utilization drop to zero and firefox labelled unresponsive under resource monitor. Closing firefox for the night to see if that fixes this.
4) Message boards : Questions and problems : Task exited with zero status but no 'finished' file... (almost) all of them
Message 47381
Posted 18 Jan 2013 by Joe Bloggs
Turned out I had a persistent bugger of an anti-malware program running. I thought I'd disabled it but it's borderline malware itself. Just uninstalled it, we'll see what happens tonight.
5) Message boards : Questions and problems : Task exited with zero status but no 'finished' file... (almost) all of them
Message 47371
Posted 18 Jan 2013 by Joe Bloggs
What I meant was that I am still getting these errors (more of them today in fact) but these short apps all make their way to completion regardless. And if I could understand the code I would be coding boinc instead of just running it ;)
6) Message boards : Questions and problems : Task exited with zero status but no 'finished' file... (almost) all of them
Message 47366
Posted 18 Jan 2013 by Joe Bloggs

Why do you torture yourself? YOU have it running there, not US. Try different settings, (10, 20, 30...99). Just do it and never mind asking us for permission or recommendation or whatever it is you seem to think you need. It won't blow up and destroy your house, kill your children or anything else bad. At worst it will not crunch, not a big deal. Too much thinking/talking beating around the bush; too little doing, testing and verifying.


I am trying things... one at a time. Last things I tried were Bil's clock tracking script and the processor scheduling thing I mentioned. Neither made any difference. If I put every change proposed in the stew at the same time, then supposing it actually worked, I'd still not know the root cause in my case right?

It seems the sky's the limit as far as the number of potential things to try to remedy these mysterious zero errors so I do want some expert input on prioritizing which Chinese herbal treatments to try next, yeah. ;) (just kidding)

Setting the processor load threshold back to 25% for now. If these errors keep up (but they don't seem to be impacting production what with the mix of tasks I have running atm, mostly 10-20min WUs which seem to continue running just fine after one of these errors--which is why I'm taking the one-at-a-time troubleshooting approach for now) I'll try exiting my audio app.
7) Message boards : Questions and problems : Multi-core Computing
Message 47363
Posted 18 Jan 2013 by Joe Bloggs
Might you be able to specify the number of CPUs used using the new app_config.xml feature of 7.0.4x?
8) Message boards : BOINC Manager : BOINC Manager hogging memory (7.0.42)
Message 47362
Posted 18 Jan 2013 by Joe Bloggs
I noticed my computer running low on memory, and found that exiting BOINC freed me 3 gigs or so. I opened task manager and found that boincmgr.exe (which had been running about 18 hours) was taking over 2GB of memory before I shut it down--but when I restart BOINC, it only takes about 23MB.

Repeat: it's not the apps nor the boinc client, but boinc manager (boincmgr.exe) taking up all the memory.

Here's the config I've been using:
18/1/2013 9:49:18 | | Starting BOINC client version 7.0.42 for windows_x86_64
18/1/2013 9:49:18 | | log flags: file_xfer, sched_ops, task, cpu_sched, cpu_sched_debug, rr_simulation
18/1/2013 9:49:18 | | log flags: sched_op_debug, task_debug, work_fetch_debug
18/1/2013 9:49:18 | | Libraries: libcurl/7.25.0 OpenSSL/1.0.1 zlib/1.2.6
18/1/2013 9:49:18 | | Data directory: D:\ProgramData\BOINC
18/1/2013 9:49:18 | | Running under account User
18/1/2013 9:49:18 | | Processor: 6 AuthenticAMD AMD Phenom(tm) II X6 1090T Processor [Family 16 Model 10 Stepping 0]
18/1/2013 9:49:18 | | Processor: 512.00 KB cache
18/1/2013 9:49:18 | | Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 htt pni cx16 syscall nx lm svm sse4a osvw ibs skinit wdt page1gb rdtscp 3dnowext 3dnow
18/1/2013 9:49:18 | | OS: Microsoft Windows 7: Ultimate x64 Edition, Service Pack 1, (06.01.7601.00)
18/1/2013 9:49:18 | | Memory: 7.50 GB physical, 14.99 GB virtual
18/1/2013 9:49:18 | | Disk: 746.35 GB total, 278.53 GB free
18/1/2013 9:49:18 | | Local time is UTC +8 hours
18/1/2013 9:49:18 | | CAL: ATI GPU 0: ATI Radeon HD 5x00 series (Redwood) (CAL version 1.4.1734, 512MB, 479MB available, 1120 GFLOPS peak)
18/1/2013 9:49:18 | | OpenCL: ATI GPU 0: ATI Radeon HD 5x00 series (Redwood) (driver version 1016.4 (VM), device version OpenCL 1.2 AMD-APP (1016.4), 512MB, 479MB available, 1120 GFLOPS peak)
18/1/2013 9:49:18 | Poem@Home | Found app_config.xml
18/1/2013 9:49:18 | Collatz Conjecture | Found app_config.xml
18/1/2013 9:49:18 | SETI@home | Found app_config.xml
18/1/2013 9:49:18 | World Community Grid | Found app_config.xml
18/1/2013 9:49:18 | Poem@Home | URL http://boinc.fzk.de/poem/; Computer ID 153066; resource share 100
18/1/2013 9:49:18 | Collatz Conjecture | URL http://boinc.thesonntags.com/collatz/; Computer ID 119253; resource share 20
18/1/2013 9:49:18 | fightmalaria@home | URL http://boinc.ucd.ie/fmah/; Computer ID 12544; resource share 100
18/1/2013 9:49:18 | climateprediction.net | URL http://climateprediction.net/; Computer ID 1261116; resource share 200
18/1/2013 9:49:18 | MindModeling@Beta | URL http://mindmodeling.org/; Computer ID 31472; resource share 100
18/1/2013 9:49:18 | SETI@home | URL http://setiathome.berkeley.edu/; Computer ID 6873848; resource share 100
18/1/2013 9:49:18 | Cosmology@Home | URL http://www.cosmologyathome.org/; Computer ID 181717; resource share 100
18/1/2013 9:49:18 | World Community Grid | URL http://www.worldcommunitygrid.org/; Computer ID 2265514; resource share 100
18/1/2013 9:49:18 | World Community Grid | General prefs: from World Community Grid (last modified 17-Jan-2013 23:30:17)
18/1/2013 9:49:18 | World Community Grid | Host location: none
18/1/2013 9:49:18 | World Community Grid | General prefs: using your defaults
18/1/2013 9:49:18 | | Reading preferences override file
18/1/2013 9:49:18 | | Preferences:
18/1/2013 9:49:18 | | max memory usage when active: 3838.26MB
18/1/2013 9:49:18 | | max memory usage when idle: 6908.87MB
18/1/2013 9:49:18 | | max disk usage: 100.00GB
18/1/2013 9:49:18 | | don't use GPU while active
18/1/2013 9:49:18 | | (to change preferences, visit the web site of an attached project, or select Preferences in the Manager)
9) Message boards : Questions and problems : Task exited with zero status but no 'finished' file... (almost) all of them
Message 47361
Posted 18 Jan 2013 by Joe Bloggs
HDD load: I don't think so, but then I wasn't paying attention to that.
Antivirus: Avast! (with exceptions for boinc program and data directory set for the realtime scan)
Firewall: windows' bundled (with exceptions set for the wrong directory until this morning :oops: )
10) Message boards : Questions and problems : Task exited with zero status but no 'finished' file... (almost) all of them
Message 47355
Posted 17 Jan 2013 by Joe Bloggs
Just witnessed another of these events happening right in front of my eyes.

Bil's script didn't log any system clock misbehaviour.

What I saw was that the "time elapsed" for running apps froze up, although BOINC Manager itself continued to function.

My hunch is that BOINC.EXE, but not boincmgr.exe, was being starved of CPU cycles for some reason.

Although I said I'd raise boinc.exe to realtime priority yesterday, I didn't, because I wanted to see whether the other changes I made made the difference first.

So now there's two things I think I can try, one, set boinc.exe to realtime priority, two, set processor scheduling to "background services" instead of "programs".

One other significant thing I haven't tried yet is the suggestion to pause BOINC depending on "non-BOINC" processor load. Awhile back I found BOINC see-sawing between running and pausing every 10 seconds or so with a 25% non-BOINC load threshold for suspending, and setting it to 0 (no restriction) didn't seem to impact system performance at the time, so that's where I've set it. Am I way off base here?

Will report back with findings...
11) Message boards : Questions and problems : Task exited with zero status but no 'finished' file... (almost) all of them
Message 47342
Posted 17 Jan 2013 by Joe Bloggs
Thanks. It's not exactly running with millisecond precision but correctly logged the time skip when I moved the time forward by 1 minute. I'll leave this running and see what it says the next time one of the exit with zeroes happens.
12) Message boards : Questions and problems : Task exited with zero status but no 'finished' file... (almost) all of them
Message 47339
Posted 17 Jan 2013 by Joe Bloggs
I thought it's the clock going forward that does that?

I'd be happy to run such a script if anybody can supply one :S

Anyway, if there are methods to track time independently of the system clock (and of course there are--the High Precision Event Timer is just the latest in a whole series of kernel timers, old and new, but all with millisecond precision or much better) wouldn't it make more sense to track time using one of these instead of the system clock for something as critical as the continued survival of a science app?

And if these exit with zero errors occur at the drop of a hat, shouldn't there be a way to ignore them completely, or at least remove the 100 error limit for long-running apps?
13) Message boards : Questions and problems : Multi-core Computing
Message 47338
Posted 17 Jan 2013 by Joe Bloggs
The WUs ARE multithreaded though?

I open process explorer and each WU process under BOINC (e.g. wcg_hcc1 for a WCG help conquer cancer WU) has at least 3 threads, most of them 4?
14) Message boards : Questions and problems : Task exited with zero status but no 'finished' file... (almost) all of them
Message 47334
Posted 17 Jan 2013 by Joe Bloggs
And thanks for the detailed explanation Ageless.
15) Message boards : Questions and problems : Task exited with zero status but no 'finished' file... (almost) all of them
Message 47332
Posted 17 Jan 2013 by Joe Bloggs
WTF?


+1


Where's the repository for the stderr of an ongoing task? I want to inspect one to see if this 30s discrepancy between the time BOINC calls a task dead and the time a task calls BOINC dead exists on other tasks.

Anyway I'm filing this in the alpha mailing list if nobody here finds this normal.
16) Message boards : Questions and problems : I assume this is a windows-7 "Bug" ...
Message 47331
Posted 17 Jan 2013 by Joe Bloggs
Has anyone posted a wish for the "count" parameter of gpus to be user-editable--so that a fast gpu can count as two to receive more jobs while a slow one can count as 0.5 to receive no jobs other than those that the user hand-edits to run on 0.5 gpus?
17) Message boards : Questions and problems : Task exited with zero status but no 'finished' file... (almost) all of them
Message 47316
Posted 16 Jan 2013 by Joe Bloggs
From the log:

16-Jan-2013 20:19:58 [MindModeling@Beta] [task] Process for MindModeling-279-50f690bd5c5e7_0 exited, exit code 0, task state 1
16-Jan-2013 20:19:58 [MindModeling@Beta] Task MindModeling-279-50f690bd5c5e7_0 exited with zero status but no 'finished' file

From the corresponding stderr of the (thankfully finished) WU:
Beginning run at 20:13:09
Starting model with IV vector: 5.0 2.3500 0.89375 0.06400 0.04020:20:28 (5676): No heartbeat from core client for 30 sec - exiting
wrapper: starting
20:21:12 (6164): wrapper: running ../../projects/mindmodeling.org/1.88_windows_intelx86_ccl.exe (-I ccl.image -l letf.lisp -b -- windows_intelx86 mm_config_file.txt in.txt)

Note that the time 20:20:28 is 30 seconds AFTER the log declares that the WU had exited.
WTF?
18) Message boards : Questions and problems : Task exited with zero status but no 'finished' file... (almost) all of them
Message 47314
Posted 16 Jan 2013 by Joe Bloggs
Well, today was defrag day, and defrag hour passed by without incident. But later in the day I started getting these zero exit errors again.

Another few things I'm wondering about:
1. I had started running the ASIO audio application I talked about in my last post. This had to be run as a Realtime priority application in order to get any reliability at all. Maybe that's the problem? It uses very little CPU time though and I did not notice any impact on system response using it before installing BOINC. I'm an audiophile before I'm a BOINCer and I have to run this app just to get any sound out of my computer so...

I'd closed and restarted BOINC to do some of my own troubleshooting and now I can't see the old log entries. I know they're kept in a file somewhere but what's it called?

2. One of the lines in the stderr for the failed CPDN task mentioned not being able to get a lock on a file. I went to my ProgramData directory and noted in its Properties that Read-Only was marked with a square, which if I understand correctly means that some items inside had been marked read-only... so I cleared the checkbox, told windows to make the change (to writeable) to all files in the directory and subdirectories... but when I opened its properties again, the checkbox was marked with a square again...

I opened a command line and did "dir *.* /s /ar" to look for any files marked read-only and came up blank though.

While I'm at it, let's mention a little fiasco I had a while ago: a few days ago, before I knew about 7.0.42, I was experimenting with GPU allocation and wanted my video card to be recognized as having a count of 2 to get more WUs running on it at the same time. I edited the relevant count entry in client_state.xml. This of course didn't do anything while BOINC was running, and when I closed and restarted BOINC the proper count of 1 of course reasserted itself. In a bout of frustration (what did I have to lose, right? I was just horsing around with a new hobby) I set client_state.xml to read-only. Immediately all kinds of sh!t started going down. Connecting to BOINC client took forever. Event log complains of not being able to update the client state every time anything happened. I closed BOINC and unchecked the read-only attribute but for some reason it wouldn't stick. BOINC kept making the same complaints. I messed with the security settings trying to grant everyone full control and that wouldn't stick either. Finally, in a panic, I deleted client_state.xml and client_state_prev.xml. That didn't go well, of course: all the job data was lost and BOINC started downloading all-new tasks.

Finally I restored the xml files, and somehow when I selected both files at the same time and unchecked the read-only property in the property box for the group selection, the change stuck and BOINC recovered.

Now, I don't know what lasting damage might have resulted from my shenanigans... :S

3. Speaking of permissions, I was surprised to find, in the properties for the folder ProgramData\BOINC, that there were a bunch of underprivileged users. Only "boinc_admins" had anything near full control (which was all the permissions checked: modify, read & execute, list folder contents, read, write) except for "full control" and "special permissions". Neither boinc_users nor boinc_projects had modify or write permissions, and one of them (I don't remember which, I think it was boinc_projects) had NO permissions at all. I've since edited all these boinc users to get all the permissions I could grant (which was every tick, including "full control", except "special permissions".)

Was what I saw normal? Or do they need these permissions I just granted them? Or are those non-permissions normal safeguards that I should restore?


I've also set BOINC and boincmgr to run as administrator, and, in a random punt at the "no heartbeat for 30 seconds" problem, given boinc.exe Realtime thread priority.

I'll leave BOINC and my audio program running overnight and see how it goes. :S :S
19) Message boards : Questions and problems : Automatic Temperature regulation
Message 47305
Posted 16 Jan 2013 by Joe Bloggs
TThrottle (and BOINCTasks) were specifically developed to work with BOINC, so your supposition is right. ;>)

If you have BOINCTasks also running and allow TThrottle to interact with BOINCTasks and v.v, you can even see the hosts CPU temps in there and efficiency %, translating average throttle with which tasks are allowed to run.


The temp sensor for the CPU doesn't seem to work, but I bought a buff 12" heatsink+fan along with my new toy (AMD Phenom X6 1090T 3.2GHz) so I'm not worried there ;)

It should be noted that that my old clunky Radeon ATI HD5670 gfx card smokes the CPU for points generated though, even when underclocked and throttled :S
20) Message boards : Questions and problems : I assume this is a windows-7 "Bug" ...
Message 47304
Posted 16 Jan 2013 by Joe Bloggs
Just curious, how are you stipulating that two WUs run on the fast card and one on the slow card? All your config is way over my head but this is something I may want to do if I ever get a new card...
Next 20

Copyright © 2025 University of California.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.