BOINC 6.10.43/6.10.44 no longer released for public

Message boards : Questions and problems : BOINC 6.10.43/6.10.44 no longer released for public
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Profile Jord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15480
Netherlands
Message 31968 - Posted: 5 Apr 2010, 20:35:07 UTC
Last modified: 30 Apr 2010, 15:51:01 UTC

Rom Walton wrote:
Earlier today we pulled the last round of stable clients and rolled back to the stable clients that were available in early December.

A bug was introduced in 6.10.25 where the core client would continuously download new work from projects where the total GPU ram was enough to run the GPU app but not enough was available at run-time to actually run the application without crashing. This bug was fixed in 6.10.46.

As a result of having to pull the previous stable build we are moving forward with the 6.10.50 build as a potential release candidate build. I have adjust the test grouping to enable all of them now.

We really need to get a new stable version of the Mac client out the door, CUDA support for the Mac is not in the current stable Mac client.

Please report your results, good or bad, as quickly as possible.

----- Rom



For Windows: 6.10.43
For Linux: 6.10.44
For Macintosh: 6.10.43

Available from http://boinc.berkeley.edu/download.php

Some of the changes for this release are:

* New: Updated localization files

* New: Added accessibility support for advanced view (Mac)

* New: Added support for parsing username and password information from environment variables

* New: Added ability to ignore specific GPU cards through the use of ignore device through the use of a configuration file option

* New: Added ability to suspend BOINC's use of GPUs when certain applications are run through the use of a configuration file option

* New: Allow new screen saver options to be configured via the OS specific configuration screen

* New: Snooze GPU option off of the system tray menu

* New: Adjust GPU activity settings via the advanced view activity menu

* New: Added ability for a project to be specified as a "Backup Project" by setting its resource share to 0

* New: Suspend computation of BOINC applications if CPU usage from non-BOINC applications exceeds a volunteer defined value (Defaults to 25%)

* New: Support detecting SSE2, SSE3, and other forms of advanced instruction sets for older Windows machines (Windows)

* Fix: Numerous CPU/GPU scheduling fixes

* Fix: Numerous work-fetch fixes

* Fix: Mac Installer for non-admin installs

* Fix: Recover from crashes in Nvidia and ATI GPU functions (Linux, Mac)

* Fix: Reap child processes spawned from a wrapper on POSIX systems (Linux, Mac)

* Fix: If a project supports ATI or Nvidia projects, display their icons in the project list

* Fix: Show most commonly used preferences in use during startup via messages.
ID: 31968 · Report as offensive
Theadalus

Send message
Joined: 6 Apr 10
Posts: 12
Netherlands
Message 31975 - Posted: 6 Apr 2010, 2:16:20 UTC - in response to Message 31968.  

* New: Suspend computation of BOINC applications if CPU usage from non-BOINC applications exceeds a volunteer defined value (Defaults to 25%)

Hi,

Where can i change this setting, i just installed v6.10.44 on Ubuntu Server (no graph interface), and running Einstein@Home (can not find setting on Account pages)?

Thnx.
ID: 31975 · Report as offensive
Les Bayliss
Help desk expert

Send message
Joined: 25 Nov 05
Posts: 1654
Australia
Message 31976 - Posted: 6 Apr 2010, 3:54:17 UTC - in response to Message 31975.  

In the BOINC manager's menu: Advanced -> Preferences _> processor usage
ID: 31976 · Report as offensive
Richard Haselgrove
Volunteer tester
Help desk expert

Send message
Joined: 5 Oct 06
Posts: 5081
United Kingdom
Message 31978 - Posted: 6 Apr 2010, 8:23:26 UTC - in response to Message 31976.  

In the BOINC manager's menu: Advanced -> Preferences _> processor usage

Les, he said no GUI!

It's not in boinccmd either. Only solution I can come up with is a global_prefs_override.xml file with:

<global_preferences>
   <cpu_usage_limit>100.000000</cpu_usage_limit>
</global_preferences>
ID: 31978 · Report as offensive
Les Bayliss
Help desk expert

Send message
Joined: 25 Nov 05
Posts: 1654
Australia
Message 31979 - Posted: 6 Apr 2010, 8:51:57 UTC - in response to Message 31978.  
Last modified: 6 Apr 2010, 9:01:49 UTC

Ooops. Yes. Still, at least he now knows not to look on the project pages.
I don't know how people cope without the gui. :(

edit
I wonder if the devs considered this difficulty for those without gui access to these options, and was anything implemented/planned in place of the gui?
ID: 31979 · Report as offensive
Theadalus

Send message
Joined: 6 Apr 10
Posts: 12
Netherlands
Message 31980 - Posted: 6 Apr 2010, 9:24:46 UTC

I think the 'global_prefs_override.xml' method will work, but i don't think 'cpu_usage_limit' is the correct parameter... :(

I believe this parameter defines the max cpu % what can/may be used?

So if devvers can give the correct parameter name, it will make my day ;)
ID: 31980 · Report as offensive
Richard Haselgrove
Volunteer tester
Help desk expert

Send message
Joined: 5 Oct 06
Posts: 5081
United Kingdom
Message 31981 - Posted: 6 Apr 2010, 9:33:19 UTC - in response to Message 31980.  

Beg your pardon, I did copy it from a working installation, but too many of mine are at 100%......

To put in a restriction:

   <suspend_cpu_usage>97.000000</suspend_cpu_usage>

or no restriction at all

   <suspend_cpu_usage>0.000000</suspend_cpu_usage>
ID: 31981 · Report as offensive
Theadalus

Send message
Joined: 6 Apr 10
Posts: 12
Netherlands
Message 31982 - Posted: 6 Apr 2010, 9:49:50 UTC

OK, seems to work :)

Line "suspend work if non-BOINC CPU load exceeds 25 %" not mentioned anymore.


Thnx peeps!
ID: 31982 · Report as offensive
Profile Bryan Price

Send message
Joined: 28 Mar 10
Posts: 1
United States
Message 32014 - Posted: 7 Apr 2010, 14:17:32 UTC

Thanks for telling me where I needed to make changes for the suspend computation, since that part seems to be awfully twitchy. I'm not doing anything (i.e, I'm asleep), and it flips between working and not working. And then when it IS working, I get Firefox doing something at 25% (100% of one of my cores), and it doesn't hiccup. So I just changed that value to 0. Works just like it used to know. :)
ID: 32014 · Report as offensive
rvp_lan
Avatar

Send message
Joined: 30 Dec 08
Posts: 24
France
Message 32102 - Posted: 11 Apr 2010, 16:28:59 UTC - in response to Message 32014.  

Hi,
Thanks for telling me where I needed to make changes for the suspend computation, since that part seems to be awfully twitchy.
Same here... At first, I thought it could be an interesting tweak, but on multi-core machines, when doing something "usual" with a browser and boring non-optimized Flash, just ONE core is heavily used, the others aren't. So there's plainty other cores to play with for Boinc.

So I just changed that value to 0. Works just like it used to know.
All same!

I don't get it anyway. Since Boinc is linked to the idle CPU time, why the need of this supplementary parameter? Multi-cores or not, if my system needs CPU, Boinc just cooldown naturally and progressively following idle time available.

So, I would rather had put this parameter for the GPU. My kids are complaining about GPU WU which interfere with their greedy GPU ressources games! But most of the time, CPU cores aren't much busy. So, on modern machines with multi-cores, Boinc has always a bit of ressources available to crunch.

Curious to see how all of this will evolve when everything will be OpenCL compliant!!! Hope you will not transform us, final users, into gurus of the tweaking parameters...
ID: 32102 · Report as offensive
Profile Jord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15480
Netherlands
Message 32108 - Posted: 11 Apr 2010, 17:32:07 UTC - in response to Message 32102.  
Last modified: 19 Apr 2010, 10:50:19 UTC

I don't get it anyway.

As I wrote here:

Please, think outside your own box. This feature isn't for everyone who has been using BOINC for ages and are running it 24/7 without looking at it much.

It's built in for completely new people, people who were complaining that despite BOINC's applications running on the lowest possible priority, it taking up CPU cycles that would slow down their computer.

These people would complain about that, uninstall, leave and tell other potential crunchers negative things about BOINC. It's added to help those people, for if they come back, to see that even they are listened to.

You can easily disable the function by setting its value to zero.



So, I would rather had put this parameter for the GPU.

All GPU applications still run on the CPU. There are no applications that run on the GPU only, as there is no operating system that knows how to do that. All science applications geared towards the GPUs will still need the CPU for the execution of the application, and to do the necessary translation of whatever task there is to be done on the GPU, from the binary data to kernels that the GPU understands, to transfer that data to the GPU's memory and when the GPU is done with it, to transfer it back and store it on the disk. None of that can be done by the GPU itself.

You can use the <exclusive_app> and <exclusive_gpu_app> functions of BOINC for suspending BOINC when it detects any of the games entering Windows memory. See my GPU FAQ for more on that.
ID: 32108 · Report as offensive
rvp_lan
Avatar

Send message
Joined: 30 Dec 08
Posts: 24
France
Message 32146 - Posted: 12 Apr 2010, 22:34:36 UTC - in response to Message 32108.  
Last modified: 12 Apr 2010, 22:36:07 UTC

Hi Ageless, as always: precious and pertinent arguments delivered.

Please, think outside your own box. It's built in for ... people who were complaining that despite BOINC's applications running on the lowest possible priority, it taking up CPU cycles that would slow down their computer.
Effectively, this is what some have opposed to me when I tried (years ago) to deploy CNET on the whole computer parc! And now, outside of my box, I get the point.

Now that I see the potential of this parameter, I still think that it needs refinements. I insist: when just ONE core is heavily used, the others aren't always used too.

Boinc could stop cores progressively by monitoring if there's still an heavy CPU usage after/during a given time. When the CPU stays at 25% during more than 2 minutes on a multi-cores, it could be 'nice' (arf) to stop one core for Boinc, but not ALL cores at the same time. We loose valuable cycles available on other cores. This is particulary true under Unix kernels where there's a fine repartition of the load.

You can use the <exclusive_app> and <exclusive_gpu_app> functions of BOINC for suspending BOINC when it detects any of the games entering Windows memory.
Another interesting parameters that I didn't know about. But it has always been my point to stay a "simple" user of Boinc! By not becoming a tuning "guru", I try to have a "simple" vision of what Boinc should do to perform quietly without disturbing the user.

Regards
ID: 32146 · Report as offensive
Profile Jord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15480
Netherlands
Message 32148 - Posted: 12 Apr 2010, 22:56:55 UTC - in response to Message 32146.  
Last modified: 19 Apr 2010, 10:50:02 UTC

I insist: when just ONE core is heavily used, the others aren't always used too.


At this moment BOINC doesn't know which processor/core is doing what. For instance you have a 4 core CPU, it runs CPDN, Einstein, Seti and Leiden at the same time. A non-BOINC program is taking up (part of) one of the cores. You want only that core to suspend its work and continue work on the other cores? What if the non-BOINC program takes up more than one core? (Windows Update will do that!)

Again, that's not what this preference is for. It's really not for people who have been using the program for a while already, it's for those that are new, that find that their system is slow when they run BOINC at the same time as other (heavy) programs and they don't want to know about the <exclusive_apps> tags as less as you do.

Besides, those that have been running with the program for some time now will then complain that their one core doing very important work is suspended. It's always very important work, isn't it? ;-)

You know how to disable it. And seeing how something like scanning AV programs will break this function, it'll go through some function change in the future. :-)
ID: 32148 · Report as offensive
avidday

Send message
Joined: 19 Apr 10
Posts: 4
Message 32271 - Posted: 19 Apr 2010, 9:33:37 UTC

I am trying to understand the scheduling behaviour of the Linux 6.10.44 release with GPUs, because I am seeing some strange things I can't explain. My client, from time to time, sits with a full task queue, not running any task (this is using the Milkyway@home cuda application) for anything up to 45 minutes at a time. Under other circumstances, a task will sit for days in the task queue and never start - even to the point that when the project went down for maintenance for 24 hours and every other task was completed, a five day old task was still stuck in the queue and never started, nor ran.

My machine looks like this (Ubuntu 9.14 with the 195.36.15 release drivers):

Fri 16 Apr 2010 09:01:44 PM EEST		Starting BOINC client version 6.10.44 for x86_64-pc-linux-gnu
Fri 16 Apr 2010 09:01:44 PM EEST		log flags: file_xfer, sched_ops, task
Fri 16 Apr 2010 09:01:44 PM EEST		Libraries: libcurl/7.18.0 OpenSSL/0.9.8g zlib/1.2.3.3 c-ares/1.5.1
Fri 16 Apr 2010 09:01:44 PM EEST		Data directory: /home/david/BOINC
Fri 16 Apr 2010 09:01:44 PM EEST		Processor: 4 AuthenticAMD AMD Phenom(tm) II X4 945 Processor [Family 16 Model 4 Stepping 2]
Fri 16 Apr 2010 09:01:44 PM EEST		Processor: 512.00 KB cache
Fri 16 Apr 2010 09:01:44 PM EEST		Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good pni monitor cx16 lahf_lm cmp_legacy svm extapic cr8_
Fri 16 Apr 2010 09:01:44 PM EEST		OS: Linux: 2.6.28-18-generic
Fri 16 Apr 2010 09:01:44 PM EEST		Memory: 7.70 GB physical, 7.45 GB virtual
Fri 16 Apr 2010 09:01:44 PM EEST		Disk: 891.22 GB total, 757.31 GB free
Fri 16 Apr 2010 09:01:44 PM EEST		Local time is UTC +3 hours
Fri 16 Apr 2010 09:01:44 PM EEST		NVIDIA GPU 0: GeForce GTX 275 (driver version unknown, CUDA version 3000, compute capability 1.3, 895MB, 701 GFLOPS peak)
Fri 16 Apr 2010 09:01:44 PM EEST		NVIDIA GPU 1: GeForce GTX 275 (driver version unknown, CUDA version 3000, compute capability 1.3, 896MB, 701 GFLOPS peak)
Fri 16 Apr 2010 09:01:44 PM EEST	Milkyway@home	URL http://milkyway.cs.rpi.edu/milkyway/; Computer ID 167065; resource share 100
Fri 16 Apr 2010 09:01:44 PM EEST	Milkyway@home	General prefs: from Milkyway@home (last modified 08-Apr-2010 17:50:24)
Fri 16 Apr 2010 09:01:44 PM EEST	Milkyway@home	Host location: none
Fri 16 Apr 2010 09:01:44 PM EEST	Milkyway@home	General prefs: using your defaults
Fri 16 Apr 2010 09:01:44 PM EEST		Reading preferences override file
Fri 16 Apr 2010 09:01:44 PM EEST		Preferences:
Fri 16 Apr 2010 09:01:44 PM EEST		   max memory usage when active: 3943.92MB
Fri 16 Apr 2010 09:01:44 PM EEST		   max memory usage when idle: 7099.06MB
Fri 16 Apr 2010 09:01:44 PM EEST		   max disk usage: 10.00GB
Fri 16 Apr 2010 09:01:44 PM EEST		   max CPUs used: 1
Fri 16 Apr 2010 09:01:44 PM EEST		   (to change, visit the web site of an attached project,
Fri 16 Apr 2010 09:01:44 PM EEST		   or click on Preferences)
Fri 16 Apr 2010 09:01:44 PM EEST		Not using a proxy


with one gpu marked compute compute prohibited and the other marked compute exclusive. I have "Compute while computer is in use" and "Use GPU while computer is in use" selected in the manager, and most of the time, it works fine. A typical example of the problem looks something like this:

Mon 19 Apr 2010 11:45:37 AM EEST	Milkyway@home	Computation for task de_new_test2_29033_1271663028_0 finished
Mon 19 Apr 2010 11:45:39 AM EEST	Milkyway@home	Started upload of de_new_test2_29033_1271663028_0_0
Mon 19 Apr 2010 11:45:42 AM EEST	Milkyway@home	Finished upload of de_new_test2_29033_1271663028_0_0
Mon 19 Apr 2010 11:46:17 AM EEST	Milkyway@home	Sending scheduler request: To fetch work.
Mon 19 Apr 2010 11:46:17 AM EEST	Milkyway@home	Reporting 1 completed tasks, requesting new tasks for GPU
Mon 19 Apr 2010 11:46:22 AM EEST	Milkyway@home	Scheduler request completed: got 1 new tasks
Mon 19 Apr 2010 11:46:24 AM EEST	Milkyway@home	Started download of de_new_test2_46499_1271666646_search_parameters
Mon 19 Apr 2010 11:46:27 AM EEST	Milkyway@home	Finished download of de_new_test2_46499_1271666646_search_parameters
Mon 19 Apr 2010 11:47:28 AM EEST	Milkyway@home	Sending scheduler request: To fetch work.
Mon 19 Apr 2010 11:47:28 AM EEST	Milkyway@home	Requesting new tasks for GPU
Mon 19 Apr 2010 11:47:33 AM EEST	Milkyway@home	Scheduler request completed: got 0 new tasks
Mon 19 Apr 2010 11:47:33 AM EEST	Milkyway@home	Message from server: No work sent
Mon 19 Apr 2010 11:47:33 AM EEST	Milkyway@home	Message from server: (reached limit of 6 tasks in progress)
Mon 19 Apr 2010 11:48:38 AM EEST	Milkyway@home	Sending scheduler request: To fetch work.
Mon 19 Apr 2010 11:49:48 AM EEST	Milkyway@home	Sending scheduler request: To fetch work.
Mon 19 Apr 2010 11:49:48 AM EEST	Milkyway@home	Requesting new tasks for GPU
Mon 19 Apr 2010 11:49:53 AM EEST	Milkyway@home	Scheduler request completed: got 0 new tasks
Mon 19 Apr 2010 11:49:53 AM EEST	Milkyway@home	Message from server: No work sent
Mon 19 Apr 2010 11:49:53 AM EEST	Milkyway@home	Message from server: (reached limit of 6 tasks in progress)
Mon 19 Apr 2010 11:50:58 AM EEST	Milkyway@home	Sending scheduler request: To fetch work.
Mon 19 Apr 2010 11:50:58 AM EEST	Milkyway@home	Requesting new tasks for GPU
Mon 19 Apr 2010 11:51:03 AM EEST	Milkyway@home	Scheduler request completed: got 0 new tasks
Mon 19 Apr 2010 11:51:03 AM EEST	Milkyway@home	Message from server: No work sent
Mon 19 Apr 2010 11:51:03 AM EEST	Milkyway@home	Message from server: (reached limit of 6 tasks in progress)
Mon 19 Apr 2010 11:52:08 AM EEST	Milkyway@home	Sending scheduler request: To fetch work.
Mon 19 Apr 2010 11:52:08 AM EEST	Milkyway@home	Requesting new tasks for GPU
Mon 19 Apr 2010 11:52:13 AM EEST	Milkyway@home	Scheduler request completed: got 0 new tasks
Mon 19 Apr 2010 11:52:13 AM EEST	Milkyway@home	Message from server: No work sent
Mon 19 Apr 2010 11:52:13 AM EEST	Milkyway@home	Message from server: (reached limit of 6 tasks in progress)
Mon 19 Apr 2010 11:52:22 AM EEST	Milkyway@home	Starting de_new_test2_18284_1271660211_2
Mon 19 Apr 2010 11:52:22 AM EEST	Milkyway@home	Starting task de_new_test2_18284_1271660211_2 using milkyway version 24


Here a task finishes and reports, with 5 other tasks in the queue. A new task is requested and downloaded from the scheduler, so that the task queue is full at 6 tasks, then nothing happens. The machine sits idle for several minutes, periodically polling for new work (and getting nothing because it has a full task queue), but no task ever starts. Then finally something happens.

This is a small example, but I have observed these "fallow" periods persist for 45 minutes in a couple of cases. The question is why?

Does your project have a source repository somewhere I could browse? I have a suspicion about what might be happening [it might be the client is mishandling or misinterpreting the Linux driver compute settings], but looking at your CUDA interface code would certainly be helpful.

Thanks in advance.
ID: 32271 · Report as offensive
Profile Jord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15480
Netherlands
Message 32272 - Posted: 19 Apr 2010, 9:53:49 UTC - in response to Message 32271.  
Last modified: 19 Apr 2010, 10:50:32 UTC

Does your project have a source repository somewhere I could browse? I have a suspicion about what might be happening [it might be the client is mishandling or misinterpreting the Linux driver compute settings], but looking at your CUDA interface code would certainly be helpful.

Thanks in advance.

BOINC isn't a project, while why the Milkyway scheduler may or may not give you work is something you have to take up with them. It's their server that says that no work is sent, with the reason given (their maximum of 6 tasks per queue).

But if you want to look at the BOINC source code, that's possible. Check http://boinc.berkeley.edu/trac/browser/branches/boinc_core_release_6_10 for the 6.10 code.
ID: 32272 · Report as offensive
avidday

Send message
Joined: 19 Apr 10
Posts: 4
Message 32274 - Posted: 19 Apr 2010, 10:30:06 UTC - in response to Message 32272.  
Last modified: 19 Apr 2010, 10:30:43 UTC



BOINC isn't a project, while why the Milkyway scheduler may or may not give you work is something you have to take up with them. It's their server that says that no work is sent, with the reason given (their maximum of 6 tasks per queue).


I understand that, but my question is why, when the client has work, it doesn't run it? The task start/stop/report logic is in the client, not the project server, isn't it?

I am working on the assumption that as long as the client's own internal settings permit it, it will just start and run tasks until the task queue is empty. I am seeing long pauses between the client starting tasks which I am assuming should not occur.

But if you want to look at the BOINC source code, that's possible. Check http://boinc.berkeley.edu/trac/browser/branches/boinc_core_release_6_10 for the 6.10 code.


Thank you for the link
ID: 32274 · Report as offensive
Profile Jord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15480
Netherlands
Message 32277 - Posted: 19 Apr 2010, 10:49:51 UTC - in response to Message 32274.  
Last modified: 19 Apr 2010, 12:52:32 UTC

..but my question is why, when the client has work, it doesn't run it?

Set up a cc_config.xml file and add into it these flags:

<cc_config>
<log_flags>
<cpu_sched>1</cpu_sched>
<cpu_sched_debug>1</cpu_sched_debug>
<coproc_debug>1</coproc_debug>
<rr_simulation>1</rr_simulation>
<task_debug>1</task_debug>
</log_flags>
<options>
<max_stdout_file_size>18388608</max_stdout_file_size>
</options>
<cc_config>

Using these flags will fill up your stdoutdae.txt log quite quickly, so it may be prudent to increase its size it can use. I have therefore put in that your stdoutdae.txt file may become 18MB. If you want to change it, the value put in must be in bytes, where 1MB = 1024 * 1024 bytes.

Forgot something... :-)
Since that log will be quite extensive, please do not post it in the forums. Or at least not in this thread... please email it to me, I'll send you my email address in a private message.
ID: 32277 · Report as offensive
avidday

Send message
Joined: 19 Apr 10
Posts: 4
Message 32282 - Posted: 19 Apr 2010, 12:04:55 UTC - in response to Message 32277.  
Last modified: 19 Apr 2010, 12:12:46 UTC


Forgot something... :-)
Since that log will be quite extensive, please do not post it in the forums. Or at least not in this thread... please email it to me, I'll send you my email address in a private message.


Indeed it is very verbose (your xml was a bit broken btw, but the schema is pretty straight forward). I have a little snippet which explains both problems I see.

A task finishes and the machine is idle. The scheduler runs:

19-Apr-2010 14:40:36 [---] [rr_sim] rr_sim start: work_buf_total 30240.00 on_frac 0.961 active_frac 0.993
19-Apr-2010 14:41:26 [---] [cpu_sched_debug] enforce_schedule(): start
19-Apr-2010 14:41:26 [---] [cpu_sched_debug] preliminary job list:
19-Apr-2010 14:41:26 [---] [cpu_sched_debug] final job list:
19-Apr-2010 14:41:26 [---] [cpu_sched_debug] using 0.00 out of 1 CPUs
19-Apr-2010 14:41:26 [---] [cpu_sched_debug] enforce_schedule: end
19-Apr-2010 14:41:26 [---] [rr_sim] rr_sim start: work_buf_total 30240.00 on_frac 0.961 active_frac 0.993
19-Apr-2010 14:41:26 [Milkyway@home] [rr_sim] 0.00: starting de_new_test2_81566_1271673925_0 (0.05 CPU + 1.00 NV)
19-Apr-2010 14:41:26 [Milkyway@home] [rr_sim] 0.00: starting de_new_test2_44636_1271666330_2 (0.05 CPU + 1.00 NV)
19-Apr-2010 14:41:26 [Milkyway@home] [rr_sim] 0.00: de_new_test2_81566_1271673925_0 finishes after 720.80 (97353.46G/135.06G)
19-Apr-2010 14:41:26 [Milkyway@home] [rr_sim] 720.80: starting de_new_test2_76466_1271672823_1 (0.05 CPU + 1.00 NV)
19-Apr-2010 14:41:26 [Milkyway@home] [rr_sim] 720.80: de_new_test2_44636_1271666330_2 finishes after 0.00 (0.00G/135.06G)
19-Apr-2010 14:41:26 [Milkyway@home] [rr_sim] 720.80: starting de_new_test2_80758_1271673719_1 (0.05 CPU + 1.00 NV)
19-Apr-2010 14:41:26 [Milkyway@home] [rr_sim] 720.80: de_new_test2_76466_1271672823_1 finishes after 720.80 (97353.46G/135.06G)
19-Apr-2010 14:41:26 [Milkyway@home] [rr_sim] 1441.61: starting de_new_test2_59278_1271669297_1 (0.05 CPU + 1.00 NV)
19-Apr-2010 14:41:26 [Milkyway@home] [rr_sim] 1441.61: de_new_test2_80758_1271673719_1 finishes after 0.00 (0.00G/135.06G)
19-Apr-2010 14:41:26 [Milkyway@home] [rr_sim] 1441.61: de_new_test2_59278_1271669297_1 finishes after 720.80 (97353.46G/135.06G)


and nothing happens. It reruns the same way at 30 second intervals for 6 minutes, with the machine idle and then:

19-Apr-2010 14:47:30 [---] [cpu_sched_debug] Request CPU reschedule: Idle state change
19-Apr-2010 14:47:30 [---] [cpu_sched_debug] schedule_cpus(): start
19-Apr-2010 14:47:30 [---] [rr_sim] rr_sim start: work_buf_total 30240.00 on_frac 0.961 active_frac 0.993
19-Apr-2010 14:47:30 [Milkyway@home] [rr_sim] 0.00: starting de_new_test2_81566_1271673925_0 (0.05 CPU + 1.00 NV)
19-Apr-2010 14:47:30 [Milkyway@home] [rr_sim] 0.00: starting de_new_test2_44636_1271666330_2 (0.05 CPU + 1.00 NV)
19-Apr-2010 14:47:30 [Milkyway@home] [rr_sim] 0.00: de_new_test2_81566_1271673925_0 finishes after 720.79 (97353.46G/135.06G)
19-Apr-2010 14:47:30 [Milkyway@home] [rr_sim] 720.79: starting de_new_test2_76466_1271672823_1 (0.05 CPU + 1.00 NV)
19-Apr-2010 14:47:30 [Milkyway@home] [rr_sim] 720.79: de_new_test2_44636_1271666330_2 finishes after 0.00 (0.00G/135.06G)
19-Apr-2010 14:47:30 [Milkyway@home] [rr_sim] 720.79: starting de_new_test2_80758_1271673719_1 (0.05 CPU + 1.00 NV)
19-Apr-2010 14:47:30 [Milkyway@home] [rr_sim] 720.79: de_new_test2_76466_1271672823_1 finishes after 720.79 (97353.46G/135.06G)
19-Apr-2010 14:47:30 [Milkyway@home] [rr_sim] 1441.58: starting de_new_test2_59278_1271669297_1 (0.05 CPU + 1.00 NV)
19-Apr-2010 14:47:30 [Milkyway@home] [rr_sim] 1441.58: de_new_test2_80758_1271673719_1 finishes after 0.00 (0.00G/135.06G)
19-Apr-2010 14:47:30 [Milkyway@home] [rr_sim] 1441.58: starting de_new_test2_73287_1271672226_2 (0.05 CPU + 1.00 NV)
19-Apr-2010 14:47:30 [Milkyway@home] [rr_sim] 1441.58: de_new_test2_59278_1271669297_1 finishes after 720.79 (97353.46G/135.06G)
19-Apr-2010 14:47:30 [Milkyway@home] [rr_sim] 2162.37: de_new_test2_73287_1271672226_2 finishes after 0.00 (0.00G/135.06G)
19-Apr-2010 14:47:30 [Milkyway@home] [cpu_sched_debug] scheduling de_new_test2_81566_1271673925_0 (coprocessor job, FIFO)
19-Apr-2010 14:47:30 [---] [cpu_sched_debug] reserving 1.000000 of coproc CUDA
19-Apr-2010 14:47:30 [Milkyway@home] [cpu_sched_debug] scheduling de_new_test2_44636_1271666330_2 (coprocessor job, FIFO)
19-Apr-2010 14:47:30 [---] [cpu_sched_debug] reserving 1.000000 of coproc CUDA
19-Apr-2010 14:47:30 [---] [cpu_sched_debug] Request enforce CPU schedule: schedule_cpus
19-Apr-2010 14:47:30 [---] [cpu_sched_debug] enforce_schedule(): start
19-Apr-2010 14:47:30 [---] [cpu_sched_debug] preliminary job list:
19-Apr-2010 14:47:30 [Milkyway@home] [cpu_sched_debug] 0: de_new_test2_81566_1271673925_0 (MD: no; UTS: no)
19-Apr-2010 14:47:30 [Milkyway@home] [cpu_sched_debug] 1: de_new_test2_44636_1271666330_2 (MD: no; UTS: no)
19-Apr-2010 14:47:30 [---] [cpu_sched_debug] final job list:
19-Apr-2010 14:47:30 [Milkyway@home] [cpu_sched_debug] 0: de_new_test2_81566_1271673925_0 (MD: no; UTS: no)
19-Apr-2010 14:47:30 [Milkyway@home] [cpu_sched_debug] 1: de_new_test2_44636_1271666330_2 (MD: no; UTS: no)
19-Apr-2010 14:47:30 [Milkyway@home] [coproc_debug] Assigning CUDA instance 0 to de_new_test2_81566_1271673925_0
19-Apr-2010 14:47:30 [Milkyway@home] [coproc_debug] Assigning CUDA instance 1 to de_new_test2_44636_1271666330_2
19-Apr-2010 14:47:30 [Milkyway@home] Can't get available GPU RAM: 999
19-Apr-2010 14:47:30 [---] [cpu_sched_debug] Request CPU reschedule: insufficient GPU RAM


So it is something in the machine idle logic which is stopping the jobs from being launched, and then, as I thought, it is the compute mode settings which are problematic after that (I do a lot of CUDA development, and I can help you fix that if you want). The first GPU is marked as compute prohibited by the driver, but the boinc scheduler is trying to use it anyway. The job it tries to schedule on the compute prohibited device then gets stuck on the job queue, even though it was never started.

We can continue this by email/pm if you like..
ID: 32282 · Report as offensive
Profile Jord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15480
Netherlands
Message 32283 - Posted: 19 Apr 2010, 12:58:51 UTC - in response to Message 32282.  

Indeed it is very verbose (your xml was a bit broken btw, but the schema is pretty straight forward).

I noticed. Fixed that, for a future use. I shouldn't be doing 3 things at the same time. :-)

19-Apr-2010 14:47:30 [Milkyway@home] Can't get available GPU RAM: 999
19-Apr-2010 14:47:30 [---] [cpu_sched_debug] Request CPU reschedule: insufficient GPU RAM


There is a fix for this already, but it won't come until the next BOINC version. It's also comprised of checking available memory on the GPUs before they get work, not as it is now give work, then check memory.

Send me the full log anyway and I'll forward it to the developers, just in case there's something else going on as well. Do know, I am not a developer, just a volunteer like you. I just have close contacts with the developers. :-)
ID: 32283 · Report as offensive
avidday

Send message
Joined: 19 Apr 10
Posts: 4
Message 32285 - Posted: 19 Apr 2010, 13:13:42 UTC - in response to Message 32283.  

There is a fix for this already, but it won't come until the next BOINC version. It's also comprised of checking available memory on the GPUs before they get work, not as it is now give work, then check memory.


That will fix the symptom, so that jobs won't get wrongly put into an infinite "check every 5 minutes for enough free memory" loop, but not the root cause of the problem, which is actually the act checking the free memory itself.

I will email you the log and some other information that the developers should probably look at.

Thanks for your help.
ID: 32285 · Report as offensive
1 · 2 · Next

Message boards : Questions and problems : BOINC 6.10.43/6.10.44 no longer released for public

Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.