Posts by cjreyn

1) Message boards : BOINC client : GPU suspension (Message 36023)
Posted 10 Dec 2010 by cjreyn
Post:
Hi all,
Just wondering how cuda or opencl kernels are suspended when running through Boinc?

Chris
2) Message boards : Server programs : GPU scheduling (Message 33212)
Posted 2 Jun 2010 by cjreyn
Post:
Ok, one final question... A job for a GPU implies GPU execution only, but in practice this may not always be the case.

For example a more complex application may mostly use the CPU if most of the program cannot does not adhere to the SIMD model, whilst employing the GPU to speed up loops/code fragments that do adhere to the SIMD model. In effect, some ratio of the GPU/CPU can be utilized throughout the programs execution.

How does Boinc deal with this? Presumably there's some accurate measure of the CPU and GPU utilization for credit reporting?
3) Message boards : Server programs : GPU scheduling (Message 33177)
Posted 1 Jun 2010 by cjreyn
Post:
Ok, so presumably the feeder then just serves up the binary and libs matching the platform/OS etc combination as specified by the client's request.

Thank you so much for the clarification, extremely helpful!

Cheers

Chris
4) Message boards : Server programs : GPU scheduling (Message 33072)
Posted 27 May 2010 by cjreyn
Post:
The job is associated with an application though, which has strict requirements upon the client's capabilities as defined by the application plan. Reading the application planning page, I can see that an application, and all subsequent jobs associated with it will be tied to its definition: NAME_VERSION_PLATFORM[__PLAN-CLASS]. So how can an application, and a job created for this application, have:

more than one compatibility label on them

Surely it either requires a GPU or it doesn't?
5) Message boards : Server programs : GPU scheduling (Message 33070)
Posted 27 May 2010 by cjreyn
Post:
Ok, this is much more clear for me now... One final question!

So an job from an application can be both CPU and GPU compatible; I'm assuming that the application developer has some CPU based code fragments to fall back on should the Cuda cuCtxGetDevice() return no device and hence the Cuda code cannot be executed?

Would it not be more efficient for a machine requesting work for its CPU to instead be served an explicitly CPU only job first, perhaps leaving jobs that can be executed on the GPU OR CPU to another client requesting a GPU job?

Thanks for the prompt replies!

Chris
6) Message boards : Server programs : GPU scheduling (Message 33066)
Posted 27 May 2010 by cjreyn
Post:
I was under the impression that the Feeder pre-fetches a small number of database jobs and loads them into shared-memory (the "ready to send queue"), rather than in response to client initiated requests (can you confirm this)? Then the scheduler matches client requests to the "ready to send queue" listed in the shared-memory segment. In this case, this queue can contain a jobs from a several different applications, some requiring Cuda/GPU capable clients, and others not.

Here's the bit I don't understand but really need to (I've read a few Boinc publications and its not clear in any)....

Consider then a client requesting work. Will (as you imply in the previous post) the client request contain a request for work explicitly for a given application? If this is true, I'm guessing the scheduler would match the request to an entry in the "ready to send queue", and hence if the client is Cuda/GPU capable, and a job from a Cuda based application exists, it will be served to the client?

Cheers

Chris
7) Message boards : Server programs : GPU scheduling (Message 33064)
Posted 27 May 2010 by cjreyn
Post:
Ok, so the feeder pulls jobs from the DB in some pre-defined order, and places them in the "ready to send queue" for the scheduler to dispatch. So let me re-phrase the question...

How are jobs dispatched by the scheduler from the "ready to send queue"? If for example, a GPU capable node sends a request for work, will the scheduler always assign a Cuda job from the "ready to send queue" or is it possible it will assign a job from an application that does not use Cuda, i.e. a CPU only job?

Cheers

Chris
8) Message boards : Server programs : GPU scheduling (Message 33047)
Posted 26 May 2010 by cjreyn
Post:
Hi all,
I have a quick question about the scheduling functionality for jobs that use the GPU.

Consider I have two science applications, one which utilizes the GPU say using CUDA, the other that is CPU only. When jobs are scheduled for both applications, and a CUDA capable client connects, will it always be served a CUDA job from the queue? It seems non nonsensical to have clients that have GPUs pulling CPU-only jobs, i.e. jobs that could instead go to other non GPU capable nodes.

Cheers

Chris
9) Message boards : API : Multi-threaded app as single-threaded? (Message 32462)
Posted 29 Apr 2010 by cjreyn
Post:
Ok David, here goes!....

We have a 1600 Node Windows based DG here at Univ of Westminster (UOW), which runs applications for EDGeS at home. Some of the applications (not all, hence the per project comment of my earlier post) are Linux specific, and hence we're looking at a solution using virtual machines.

Much literature exists on the suitability of different VM technologies for solving this problem. I have experimented with VMWare, VirtualBox and QEMU/KQEMU, and have come to the conclusion that VirtualBox is a preferable solution for our DG, and probably for further Boinc integration for several reasons.

Firstly, VMWare cannot be run headless; we have a configuration here where WU's are suspended when a student interacts with a DG node, and they should not see the virtual machine (VM) display of any kind. The KQEMU driver for Windows is now obsolete, and hence for any kind of performance on an x86 architecture QEMU is not suitable. Which leaves VirtualBox....

VirtualBox has some nice features, namely a VM control interface (VBoxManage.exe which is a wrapper around a COM interface), and the capacity to share directories between the Guest and Host machine.

I have constructed a Boinc C based application which uses win32api CreateProcess functions (like the wrapper) to launch, control (suspend/resume) and poll the VM, as well as transfering the jobs outputs (through the afore mentioned shared directory) when the WU is finished. Now comes the reason for limits on per application simultaneous WUs....

When we have multiple VM's running simultaneously, i.e. 1 per CPU core, things get tricky due to the way VirtualBox organizes the VM configuration in xml files. It has one xml file shared amongst all VMs, and hence running simultaneous WU's would require WU's having transactional access to this file. If I can restrict the client to pulling one WU at a time this tricky constraint no longer applies. The VM can instead be instantiated with as many virtual CPU's as there are physical cores, so the node is still fully utilized.

Thoughts!!?

Chris
10) Message boards : BOINC client : Abort time period (Message 32415)
Posted 27 Apr 2010 by cjreyn
Post:
Mmm, restarting the client is not really an option... Its running as a service deployed using ZenWorks (tricky to script), and there may be other WU's running which will be cancelled if the client exits. Surely the timeout is a const variable in the source? So if I recompile the client I can change this?
11) Message boards : BOINC client : Abort time period (Message 32413)
Posted 27 Apr 2010 by cjreyn
Post:
The problem here is for running WU's. Boinc will send an abort message before it kills a WU, e.g. due to it exceeding resource reservation limits. I have approx 2 secs to ensure that any processes I launch (I'm using some win32 CreateProcess functions) die. If I don't kill the processes in time, Boinc fails to clean the slot dir, since the files it tries to delete are in use by an active process.

This 2 sec timeout is too tight, and I was wondering how to change it?
12) Message boards : BOINC client : Abort time period (Message 32408)
Posted 26 Apr 2010 by cjreyn
Post:
Hi guys, just a quick one. When Boinc sends an abort request, I have a finite time to handle this in my application. How/where is this value defined, and can I change it on a per application basis?

Cheers

Chris
13) Message boards : API : Multi-threaded app as single-threaded? (Message 32407)
Posted 26 Apr 2010 by cjreyn
Post:
I have in deed read through that section. The problem here is that setting <max_wus_in_progress> in config.xml is project specific. I need to be able to define this on a per application basis.
14) Message boards : API : Multi-threaded app as single-threaded? (Message 32319)
Posted 21 Apr 2010 by cjreyn
Post:
Bump. I need to figure out how to force the client to pick up only 1 WU per machine, not per CPU core. Maybe application planning will be helpful here?
15) Message boards : BOINC client : Spurious Suspends (Message 31540)
Posted 11 Mar 2010 by cjreyn
Post:
Well, I never explicitly allocate heap memory using malloc, but that's not to say the compiler won't allocate it. The app is developed using VC++ 2008 Express so sure I could get it to dump as soon as a suspend flag is found.

However, the CPU throttling is more likely to be the problem, as other processes could well be running on the node (Novel based updates, virus scans etc). This could well be causing the seemingly random suspend (and also resume messages). This would be strange however, as its the child processes launched by my app (via win32 CreateProcess() calls) that are consuming cpu. Will the Core Client recognise these as Boinc related, and call suspend/resume to throttle back the CPU?

Also, why is suspend/resume used, and not Windows or Unix thread/process priority nicing instead?
16) Message boards : BOINC client : Spurious Suspends (Message 31515)
Posted 10 Mar 2010 by cjreyn
Post:
Hi guys,
I have an application that is deployed across a large private DG (1600 nodes), running an old 5.10.45 version of the client.

The main worker thread of my application must always run, as its launching other processes (similarly to the wrapper) and needs to control them accordingly. Hence I've disabled the worker thread suspend function, and instead poll for suspend flags via a call to boinc_get_status() from my worker thread, and handle suspend requests accordingly.

The problem is that on most nodes, including my own test node (running the same core client version), I'm seeing random setting of the suspend flags as signalled by the core client. These occur, even without explicit gui based suspend requests.

I heard somewhere that Boinc is using suspend to "nice" applications, and was wondering if this is the case on all client versions?

Cheers

Chris
17) Message boards : BOINC client : Sending Deadline Info to Core Client (Message 26664)
Posted 18 Aug 2009 by cjreyn
Post:
This strikes me as an attempt to achieve some balance in retaining efficiency; whether by retaining users by allowing them to receive credit for uploading complete but useless results at a cost of time lost in processing redundant tasks, vs cancelling overdue/redundant tasks to reduce said lost time but with a potentially detrimental effect on the size of the host pool. In which case, 6.10 favours the later.

Thank you all for the comments, very helpful!
18) Message boards : BOINC client : Sending Deadline Info to Core Client (Message 26657)
Posted 18 Aug 2009 by cjreyn
Post:
The CC will post a message in the logs about tasks which are over the deadline, advising that you may not get credit for it and might want to abort them. However you have to do that manually by design


How does it notify (a pop up box or a log print), and why is a user input required to cancel/abort jobs over the deadline? Surely if the deadline has passed, the job should be automatically cancelled as it cannot be used/processed by the server?

The redundant task abort makes perfect sense, although this would not address tasks currently being processed that are over the deadline.

Is there some scope for developing this feature here?
19) Message boards : BOINC client : Sending Deadline Info to Core Client (Message 26639)
Posted 17 Aug 2009 by cjreyn
Post:
Hi all,
I'm relatively new to PRC, so please forgive any naive observations/assumptions! Also, I hope this is the correct place for the post?

I was wondering how BOINC currently handles resuming a suspended task/wu, either through user input or via detection of keyboard or mouse I/O.

I can envisage a scenario where a wu is sent to a client, and suspended at some point. Does the client, upon resuming processing of this wu, check that the deadline has not passed, either through a scheduler contact (server determines the wu should be cancelled), or analysis by the core client of the suspend time vs start-time & deadline?

Enabling client side deadline awareness would be a great way of ensuring that clients are not wasting time on wu's that are "dead" i.e. having been suspended for a long enough period of time to be considered un-returnable.

If someone who is aware of the code base where this may be implemented could let me know whether this functionality exists, so that I could potentially work on a branch to include it?

Regards

Chris




Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.