Posts by Joe Bloggs

1) Message boards : BOINC client : BOINC 7.0.40-42 and new app_config.xml (Message 47478)
Posted 23 Jan 2013 by Joe Bloggs
Post:
One thing I'm testing is to not process a science that might slip through the due the "If there is no work for your selection, send me something else". Possible scenarios are a project that always fails on a specific device or is really to plain heavy. Is <max_instances>0</max_instances> going to stop the execution [on CPU, GPU or both], or is 0 as so often interpreted as "no restriction"?


A workaround I've thought of is to set the cpu_usage or gpu_usage tag to a number higher than the number available in the system, say 10 on a quad-core system. That way the scheduler would never find enough cpus to run it. But I understand these tags are only available for gpu apps as of now?
2) Message boards : Questions and problems : Task exited with zero status but no 'finished' file... (almost) all of them (Message 47414)
Posted 20 Jan 2013 by Joe Bloggs
Post:
Turns out the hard disk my OS and BOINC was on was dying, dying, dead as of yesterday. Guess we can really close the case on this one. "no finished file"... indeed :rolleyes:
3) Message boards : Questions and problems : Task exited with zero status but no 'finished' file... (almost) all of them (Message 47382)
Posted 18 Jan 2013 by Joe Bloggs
Post:
... and right after I posted that, I witnessed another of these events again. So I guess that malware-turned-malware-sweeper wasn't the culprit either. This time I saw cpu utilization drop to zero and firefox labelled unresponsive under resource monitor. Closing firefox for the night to see if that fixes this.
4) Message boards : Questions and problems : Task exited with zero status but no 'finished' file... (almost) all of them (Message 47381)
Posted 18 Jan 2013 by Joe Bloggs
Post:
Turned out I had a persistent bugger of an anti-malware program running. I thought I'd disabled it but it's borderline malware itself. Just uninstalled it, we'll see what happens tonight.
5) Message boards : Questions and problems : Task exited with zero status but no 'finished' file... (almost) all of them (Message 47371)
Posted 18 Jan 2013 by Joe Bloggs
Post:
What I meant was that I am still getting these errors (more of them today in fact) but these short apps all make their way to completion regardless. And if I could understand the code I would be coding boinc instead of just running it ;)
6) Message boards : Questions and problems : Task exited with zero status but no 'finished' file... (almost) all of them (Message 47366)
Posted 18 Jan 2013 by Joe Bloggs
Post:

Why do you torture yourself? YOU have it running there, not US. Try different settings, (10, 20, 30...99). Just do it and never mind asking us for permission or recommendation or whatever it is you seem to think you need. It won't blow up and destroy your house, kill your children or anything else bad. At worst it will not crunch, not a big deal. Too much thinking/talking beating around the bush; too little doing, testing and verifying.


I am trying things... one at a time. Last things I tried were Bil's clock tracking script and the processor scheduling thing I mentioned. Neither made any difference. If I put every change proposed in the stew at the same time, then supposing it actually worked, I'd still not know the root cause in my case right?

It seems the sky's the limit as far as the number of potential things to try to remedy these mysterious zero errors so I do want some expert input on prioritizing which Chinese herbal treatments to try next, yeah. ;) (just kidding)

Setting the processor load threshold back to 25% for now. If these errors keep up (but they don't seem to be impacting production what with the mix of tasks I have running atm, mostly 10-20min WUs which seem to continue running just fine after one of these errors--which is why I'm taking the one-at-a-time troubleshooting approach for now) I'll try exiting my audio app.
7) Message boards : Questions and problems : Multi-core Computing (Message 47363)
Posted 18 Jan 2013 by Joe Bloggs
Post:
Might you be able to specify the number of CPUs used using the new app_config.xml feature of 7.0.4x?
8) Message boards : BOINC Manager : BOINC Manager hogging memory (7.0.42) (Message 47362)
Posted 18 Jan 2013 by Joe Bloggs
Post:
I noticed my computer running low on memory, and found that exiting BOINC freed me 3 gigs or so. I opened task manager and found that boincmgr.exe (which had been running about 18 hours) was taking over 2GB of memory before I shut it down--but when I restart BOINC, it only takes about 23MB.

Repeat: it's not the apps nor the boinc client, but boinc manager (boincmgr.exe) taking up all the memory.

Here's the config I've been using:
18/1/2013 9:49:18 | | Starting BOINC client version 7.0.42 for windows_x86_64
18/1/2013 9:49:18 | | log flags: file_xfer, sched_ops, task, cpu_sched, cpu_sched_debug, rr_simulation
18/1/2013 9:49:18 | | log flags: sched_op_debug, task_debug, work_fetch_debug
18/1/2013 9:49:18 | | Libraries: libcurl/7.25.0 OpenSSL/1.0.1 zlib/1.2.6
18/1/2013 9:49:18 | | Data directory: D:\ProgramData\BOINC
18/1/2013 9:49:18 | | Running under account User
18/1/2013 9:49:18 | | Processor: 6 AuthenticAMD AMD Phenom(tm) II X6 1090T Processor [Family 16 Model 10 Stepping 0]
18/1/2013 9:49:18 | | Processor: 512.00 KB cache
18/1/2013 9:49:18 | | Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 htt pni cx16 syscall nx lm svm sse4a osvw ibs skinit wdt page1gb rdtscp 3dnowext 3dnow
18/1/2013 9:49:18 | | OS: Microsoft Windows 7: Ultimate x64 Edition, Service Pack 1, (06.01.7601.00)
18/1/2013 9:49:18 | | Memory: 7.50 GB physical, 14.99 GB virtual
18/1/2013 9:49:18 | | Disk: 746.35 GB total, 278.53 GB free
18/1/2013 9:49:18 | | Local time is UTC +8 hours
18/1/2013 9:49:18 | | CAL: ATI GPU 0: ATI Radeon HD 5x00 series (Redwood) (CAL version 1.4.1734, 512MB, 479MB available, 1120 GFLOPS peak)
18/1/2013 9:49:18 | | OpenCL: ATI GPU 0: ATI Radeon HD 5x00 series (Redwood) (driver version 1016.4 (VM), device version OpenCL 1.2 AMD-APP (1016.4), 512MB, 479MB available, 1120 GFLOPS peak)
18/1/2013 9:49:18 | Poem@Home | Found app_config.xml
18/1/2013 9:49:18 | Collatz Conjecture | Found app_config.xml
18/1/2013 9:49:18 | SETI@home | Found app_config.xml
18/1/2013 9:49:18 | World Community Grid | Found app_config.xml
18/1/2013 9:49:18 | Poem@Home | URL http://boinc.fzk.de/poem/; Computer ID 153066; resource share 100
18/1/2013 9:49:18 | Collatz Conjecture | URL http://boinc.thesonntags.com/collatz/; Computer ID 119253; resource share 20
18/1/2013 9:49:18 | fightmalaria@home | URL http://boinc.ucd.ie/fmah/; Computer ID 12544; resource share 100
18/1/2013 9:49:18 | climateprediction.net | URL http://climateprediction.net/; Computer ID 1261116; resource share 200
18/1/2013 9:49:18 | MindModeling@Beta | URL http://mindmodeling.org/; Computer ID 31472; resource share 100
18/1/2013 9:49:18 | SETI@home | URL http://setiathome.berkeley.edu/; Computer ID 6873848; resource share 100
18/1/2013 9:49:18 | Cosmology@Home | URL http://www.cosmologyathome.org/; Computer ID 181717; resource share 100
18/1/2013 9:49:18 | World Community Grid | URL http://www.worldcommunitygrid.org/; Computer ID 2265514; resource share 100
18/1/2013 9:49:18 | World Community Grid | General prefs: from World Community Grid (last modified 17-Jan-2013 23:30:17)
18/1/2013 9:49:18 | World Community Grid | Host location: none
18/1/2013 9:49:18 | World Community Grid | General prefs: using your defaults
18/1/2013 9:49:18 | | Reading preferences override file
18/1/2013 9:49:18 | | Preferences:
18/1/2013 9:49:18 | | max memory usage when active: 3838.26MB
18/1/2013 9:49:18 | | max memory usage when idle: 6908.87MB
18/1/2013 9:49:18 | | max disk usage: 100.00GB
18/1/2013 9:49:18 | | don't use GPU while active
18/1/2013 9:49:18 | | (to change preferences, visit the web site of an attached project, or select Preferences in the Manager)
9) Message boards : Questions and problems : Task exited with zero status but no 'finished' file... (almost) all of them (Message 47361)
Posted 18 Jan 2013 by Joe Bloggs
Post:
HDD load: I don't think so, but then I wasn't paying attention to that.
Antivirus: Avast! (with exceptions for boinc program and data directory set for the realtime scan)
Firewall: windows' bundled (with exceptions set for the wrong directory until this morning :oops: )
10) Message boards : Questions and problems : Task exited with zero status but no 'finished' file... (almost) all of them (Message 47355)
Posted 17 Jan 2013 by Joe Bloggs
Post:
Just witnessed another of these events happening right in front of my eyes.

Bil's script didn't log any system clock misbehaviour.

What I saw was that the "time elapsed" for running apps froze up, although BOINC Manager itself continued to function.

My hunch is that BOINC.EXE, but not boincmgr.exe, was being starved of CPU cycles for some reason.

Although I said I'd raise boinc.exe to realtime priority yesterday, I didn't, because I wanted to see whether the other changes I made made the difference first.

So now there's two things I think I can try, one, set boinc.exe to realtime priority, two, set processor scheduling to "background services" instead of "programs".

One other significant thing I haven't tried yet is the suggestion to pause BOINC depending on "non-BOINC" processor load. Awhile back I found BOINC see-sawing between running and pausing every 10 seconds or so with a 25% non-BOINC load threshold for suspending, and setting it to 0 (no restriction) didn't seem to impact system performance at the time, so that's where I've set it. Am I way off base here?

Will report back with findings...
11) Message boards : Questions and problems : Task exited with zero status but no 'finished' file... (almost) all of them (Message 47342)
Posted 17 Jan 2013 by Joe Bloggs
Post:
Thanks. It's not exactly running with millisecond precision but correctly logged the time skip when I moved the time forward by 1 minute. I'll leave this running and see what it says the next time one of the exit with zeroes happens.
12) Message boards : Questions and problems : Task exited with zero status but no 'finished' file... (almost) all of them (Message 47339)
Posted 17 Jan 2013 by Joe Bloggs
Post:
I thought it's the clock going forward that does that?

I'd be happy to run such a script if anybody can supply one :S

Anyway, if there are methods to track time independently of the system clock (and of course there are--the High Precision Event Timer is just the latest in a whole series of kernel timers, old and new, but all with millisecond precision or much better) wouldn't it make more sense to track time using one of these instead of the system clock for something as critical as the continued survival of a science app?

And if these exit with zero errors occur at the drop of a hat, shouldn't there be a way to ignore them completely, or at least remove the 100 error limit for long-running apps?
13) Message boards : Questions and problems : Multi-core Computing (Message 47338)
Posted 17 Jan 2013 by Joe Bloggs
Post:
The WUs ARE multithreaded though?

I open process explorer and each WU process under BOINC (e.g. wcg_hcc1 for a WCG help conquer cancer WU) has at least 3 threads, most of them 4?
14) Message boards : Questions and problems : Task exited with zero status but no 'finished' file... (almost) all of them (Message 47334)
Posted 17 Jan 2013 by Joe Bloggs
Post:
And thanks for the detailed explanation Ageless.
15) Message boards : Questions and problems : Task exited with zero status but no 'finished' file... (almost) all of them (Message 47332)
Posted 17 Jan 2013 by Joe Bloggs
Post:
WTF?


+1


Where's the repository for the stderr of an ongoing task? I want to inspect one to see if this 30s discrepancy between the time BOINC calls a task dead and the time a task calls BOINC dead exists on other tasks.

Anyway I'm filing this in the alpha mailing list if nobody here finds this normal.
16) Message boards : Questions and problems : I assume this is a windows-7 "Bug" ... (Message 47331)
Posted 17 Jan 2013 by Joe Bloggs
Post:
Has anyone posted a wish for the "count" parameter of gpus to be user-editable--so that a fast gpu can count as two to receive more jobs while a slow one can count as 0.5 to receive no jobs other than those that the user hand-edits to run on 0.5 gpus?
17) Message boards : Questions and problems : Task exited with zero status but no 'finished' file... (almost) all of them (Message 47316)
Posted 16 Jan 2013 by Joe Bloggs
Post:
From the log:

16-Jan-2013 20:19:58 [MindModeling@Beta] [task] Process for MindModeling-279-50f690bd5c5e7_0 exited, exit code 0, task state 1
16-Jan-2013 20:19:58 [MindModeling@Beta] Task MindModeling-279-50f690bd5c5e7_0 exited with zero status but no 'finished' file

From the corresponding stderr of the (thankfully finished) WU:
Beginning run at 20:13:09
Starting model with IV vector: 5.0 2.3500 0.89375 0.06400 0.04020:20:28 (5676): No heartbeat from core client for 30 sec - exiting
wrapper: starting
20:21:12 (6164): wrapper: running ../../projects/mindmodeling.org/1.88_windows_intelx86_ccl.exe (-I ccl.image -l letf.lisp -b -- windows_intelx86 mm_config_file.txt in.txt)

Note that the time 20:20:28 is 30 seconds AFTER the log declares that the WU had exited.
WTF?
18) Message boards : Questions and problems : Task exited with zero status but no 'finished' file... (almost) all of them (Message 47314)
Posted 16 Jan 2013 by Joe Bloggs
Post:
Well, today was defrag day, and defrag hour passed by without incident. But later in the day I started getting these zero exit errors again.

Another few things I'm wondering about:
1. I had started running the ASIO audio application I talked about in my last post. This had to be run as a Realtime priority application in order to get any reliability at all. Maybe that's the problem? It uses very little CPU time though and I did not notice any impact on system response using it before installing BOINC. I'm an audiophile before I'm a BOINCer and I have to run this app just to get any sound out of my computer so...

I'd closed and restarted BOINC to do some of my own troubleshooting and now I can't see the old log entries. I know they're kept in a file somewhere but what's it called?

2. One of the lines in the stderr for the failed CPDN task mentioned not being able to get a lock on a file. I went to my ProgramData directory and noted in its Properties that Read-Only was marked with a square, which if I understand correctly means that some items inside had been marked read-only... so I cleared the checkbox, told windows to make the change (to writeable) to all files in the directory and subdirectories... but when I opened its properties again, the checkbox was marked with a square again...

I opened a command line and did "dir *.* /s /ar" to look for any files marked read-only and came up blank though.

While I'm at it, let's mention a little fiasco I had a while ago: a few days ago, before I knew about 7.0.42, I was experimenting with GPU allocation and wanted my video card to be recognized as having a count of 2 to get more WUs running on it at the same time. I edited the relevant count entry in client_state.xml. This of course didn't do anything while BOINC was running, and when I closed and restarted BOINC the proper count of 1 of course reasserted itself. In a bout of frustration (what did I have to lose, right? I was just horsing around with a new hobby) I set client_state.xml to read-only. Immediately all kinds of sh!t started going down. Connecting to BOINC client took forever. Event log complains of not being able to update the client state every time anything happened. I closed BOINC and unchecked the read-only attribute but for some reason it wouldn't stick. BOINC kept making the same complaints. I messed with the security settings trying to grant everyone full control and that wouldn't stick either. Finally, in a panic, I deleted client_state.xml and client_state_prev.xml. That didn't go well, of course: all the job data was lost and BOINC started downloading all-new tasks.

Finally I restored the xml files, and somehow when I selected both files at the same time and unchecked the read-only property in the property box for the group selection, the change stuck and BOINC recovered.

Now, I don't know what lasting damage might have resulted from my shenanigans... :S

3. Speaking of permissions, I was surprised to find, in the properties for the folder ProgramData\BOINC, that there were a bunch of underprivileged users. Only "boinc_admins" had anything near full control (which was all the permissions checked: modify, read & execute, list folder contents, read, write) except for "full control" and "special permissions". Neither boinc_users nor boinc_projects had modify or write permissions, and one of them (I don't remember which, I think it was boinc_projects) had NO permissions at all. I've since edited all these boinc users to get all the permissions I could grant (which was every tick, including "full control", except "special permissions".)

Was what I saw normal? Or do they need these permissions I just granted them? Or are those non-permissions normal safeguards that I should restore?


I've also set BOINC and boincmgr to run as administrator, and, in a random punt at the "no heartbeat for 30 seconds" problem, given boinc.exe Realtime thread priority.

I'll leave BOINC and my audio program running overnight and see how it goes. :S :S
19) Message boards : Questions and problems : Automatic Temperature regulation (Message 47305)
Posted 16 Jan 2013 by Joe Bloggs
Post:
TThrottle (and BOINCTasks) were specifically developed to work with BOINC, so your supposition is right. ;>)

If you have BOINCTasks also running and allow TThrottle to interact with BOINCTasks and v.v, you can even see the hosts CPU temps in there and efficiency %, translating average throttle with which tasks are allowed to run.


The temp sensor for the CPU doesn't seem to work, but I bought a buff 12" heatsink+fan along with my new toy (AMD Phenom X6 1090T 3.2GHz) so I'm not worried there ;)

It should be noted that that my old clunky Radeon ATI HD5670 gfx card smokes the CPU for points generated though, even when underclocked and throttled :S
20) Message boards : Questions and problems : I assume this is a windows-7 "Bug" ... (Message 47304)
Posted 16 Jan 2013 by Joe Bloggs
Post:
Just curious, how are you stipulating that two WUs run on the fast card and one on the slow card? All your config is way over my head but this is something I may want to do if I ever get a new card...


Next 20

Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.