WorkShop13/HackfestNotes – BOINC

Context Navigation

Welcome to the BOINC'13 hackfest notepad

Misc

Tools for hackfest communication all year long

Two days of hackfest was probably more productive. How could we keep doing this for the whole year ? Matt suggested to use Jira and Tristan made a demo/presentation of it:

https://www.atlassian.com/software/jira

Jira is maintained by the bitbucket company. Maybe the same kind of things could be done with github. Then the community behind the tool is also important to take into account.

Suggestions and requests

Christian: deadline extension for workunits. This requires to update the client.
Next BOINC workhop

Hawai`i ? More seriously Budapest may be an option.

BOINC on Android , making Android Apps
Joachim, Matt, Keith, Uwe Expected

Making Android app is not a group thing so maybe not for today, although Uwe is interested in learning this... There is a wiki page about this that could be tested.

What we should discuss instead:

Discussing the UI. What could be improved. An outside eye would help.
Helping with the registration, google account setup or OpenID.
What was done

Uwe tried the tool chain, used the instructions on the BOINC wiki page. It mostly worked and he improved them.
Regarding google/openID registration: they completed a proof of concept that included testing website integration, android application integration and reviewed the requirements in order to make it work within using wxWidgets on the BOINC client. I could not follow all the technical works that remains for it to work but David and Kevin did so they should remember it....
Multi-user projects
Arnaud, Lionel, Wenjing, Kevin. Points to take into account: * Batch prioritization

multi-user
multi-app/project
better user contribution in term of volunteer machines may mean better machine share (coallitions ?)
* Batch completion

task granularity matching (GPU/CPU/Android)
machine availability and speed to accelerate batch completion
homogeneous redundancy
how to estimate the expected runtime of B given the project's entire resource

A basic mechanism could be that batches are prioritized according to some fairness mechanism and whenever a volunteer requests work, we try to find a job suitable for him but with stricter "SLA". We would give him a job only if we have good confidence it will be able to process it in time, i.e. if there is a chance that the job completes before the expected deadline of the batch.

Issue: a wrong estimation of batch completion will lead to starving jobs...

Possible solution: keep a separate queue of high priority jobs with higher replication / which are sent to reliable fast machines

There is a mechanism in the server that "forces" the client to contact the server every so often, to make sure the client reports a job as soon as it's completed instead of waiting for the deadline.

BOINC Brainstorming

This is a follow up of the [[file:journal.org::*Discussions%20with%20David][discussions with David]].

* Wiki design documents. http://boinc.berkeley.edu/trac/wiki/PortalFeatures http://boinc.berkeley.edu/trac/wiki/MultiUser http://boinc.berkeley.edu/trac/wiki/MultiUserPriority http://boinc.berkeley.edu/trac/wiki/JobPrioritization * Policy in six steps First step: compute batch priorities * Goal

The goal of this first step serves two goals:

obtain priorities to order batches
give a fair share of resources (without necessarily taking into

account the preferences of the volunteers)

* Solution

As a point of simplification, we don't take into account volunteers preferences in the sharing of the platform so we use the whole aggregated power of the platform to compute the estimated runtime of the batches.

Based on user shares and runtime estimates of batches, we compute Logical Start Time of users and batches and Logical End Time of batches. Then we prioritize batches by increasing Logical Start Time.

This is described here: http://boinc.berkeley.edu/trac/wiki/PortalFeatures#Prioritizingbatches

This is done at batch submission. Unfortunately, there is a problem with this approach, which we evoke now

* What if LST of some users become incredibly large ?

That happens if one or two users are inactive for a long time. Is it a problem ?

Discussion about how much "burst" we could allow. It would be very nice to be able to say that a user does not have more "advantage" if he stays inactive for more than some time (say 1 week). One way of doing this is to compute the virtual schedule by assigning shares based on active users instead of all users (like done in http://rr.liglab.fr/research_report/RR-LIG-033_orig.pdf), and constraint Logical Start Times to be at least "now - 1 week".

We need to take into account the problem of the computing cost of recomputing everything each time a user queue changes state from being empty or not (optimization concern here).

* What if LET is badly estimated Second step: compute batch deadlines * Goal

The objective of this step is to provide an estimate which allows to perform "resource selection": those "batch deadlines" will be used to prevent slow machines to execute jobs from this batch. Too tight deadlines may incur starvation, so we will add a deadline extension mechanism. But even with this mechanism, too tight deadlines will slow the progress of the batch by excluding not so slow machines.

On the other hand, too loose deadlines will slow the progress

of the batch by accepting slower machines that could/should have contributed to a lower priority batch. This may seem strange but there could be a very good reason for doing this. Systematically excluding slow workers will require to know about jobs a long time in advance, which may be problematic. For example, in WCG, they only load batches one day in advance and cannot load more than this. So this kind of projects should have much looser deadlines to make sure every volunteer gets a batch it can work on.

* How to do it

The server computes an "estimated" schedule (still based on a fluid view of the platform).

*Recheck whether this makes sense or not* There are at least 2 ways of doing this:

Arnaud: Very optimistic one. If we exclude slow hosts, and perform a little bit of anticipated replication near the end of the batch, and if batches are large enough, estimated execution times are actually quite close to reality. I think this is somehow illustrated here: http://www.cs.technion.ac.il/~dang/conf_papers/SilbersteinSGS09.pdf‎ But it is actually also necessary to take the number of jobs in the batch into account (if it has too few jobs, then estimating its finishing time based on the power of the whole platform is overly optimistic -- even if actually, the slow machines have started working on this batch, before).

Crude proposition : For batch $i$ (sorted by priority)

T_i = (\sum_{j=0}^{i C_j)/(Total platform power)
R_j = estimated Computational Cost of Batch j}

T'_i = Expected turnaround time of the (\sum_j=0^{i N_j)-fastest
machines}

N_j = number of jobs in batch j

Estimated finish time of Batch i = max(T_i, T'_i)

Note that this is may be quite inaccurate when the system is not in steady state because the project ran out of work. The estimation for the first jobs is going to be completely of. We will have the same kind of trouble if the batches are relatively small, hence the next proposals.

Lionel: *supposedly more accurate*, you add the power of the fastest machines until you get to complete the whole batch. in order of priority, and based on distribution of turnaround

times of machines, compute a deadline that would allow sufficiently many machines to finish all jobs. I'm not really sure how to do this without keeping track of the queue of each machine, which is something we do NOT want to do. Maybe by splitting the machines in a small number of groups with similar performance ?

Kevin: *maybe smarter but complex* whenever the system gets empty, Arnaud's estimation is way too optimistic. So we could try to estimate the aggregate power of the system as time goes before entering steady state and integrate this curve to evaluate the estimated completion time of the batch.
Kevin 2: divide the batch cost by the aggregated power of the say 50% fastest machine. Somehow, it's the same as option 1 but giving some slack, maybe just another way of thinking of it.
*Kevin 3* (maybe the preferred one ?): if batches are much smaller than available host, use the median speed of machines, and compute home much time it would take to complete 1 job on this machine. Within such time bound, we would expect "90-95%" of the jobs to be completed. Some others will have to be resubmited but only on the fastest machines. So recompute the same value but with the 90%-fastest machine.

In any case, it is important to set a "minimum value" for deadlines to avoid starving of batches.

But apparently it is not necessary to give a very high slack at this step. Furthermore, these batch deadlines are likely to have to be extended as time goes if we realize they are too tight and exclude too many people. This could be done with thresholds but we don't have a nice idea of a general solution.

Yet, after discussing with Kevin, we realized some "bad" situations may happen, and that it may call for some kind of advanced control loop between the batch deadline estimation and the regulator. If we have several ongoing high priority batches for which we made a too optimistic estimation of their deadline. Then all of them should see their deadline set to a minimum value that selects only the 10% fastest machines. As these batch do not complete and require replicas only on the 10% fastest machines, the regulator will see the UNSENT job set filled with urgent jobs, which means the deadlines are too tight and need to be re-extended. So the regulator may actually be the right person to take care of batch deadline management.

Third step: populate the shared memory segment (feeder+regulator) * Goal

When a host requests work, it will pick from the feeding array (shared memory segment) the jobs with higher priority that it can finish by the deadline.

It is thus important that *the feeding array contains enough diversity* for the job/volunteer matchmaking to work: filling it in order of priority may result in "slow" machines not getting any work, because all jobs in the feeding array would have too tight deadlines.

* How to do it

To do this, one way is to analyze the "expected performance" of machines (ie their speed modified by their average availability), divide it in quantiles, and make sure that there is a similar amount of work for each of these quantiles. Or (and it may be easier) respect minimal amounts for each quantile, and fill the rest with high priority jobs (e.g, replicated jobs that need to be resent because of an error on a client). This is a generalization of the "size matching" mechanism already in place in the regulator, and could be implemented in (an alternate version of) the regulator as well.

We also need to ensure that there is some jobs available for each user/project, because some volunteers accept only some limited subset of users. This would be the job of the regulator, based on the current set of "UNSENT" jobs.

In this case, probably the best implementation for the feeder would be to pick at random from the "UNSENT" jobs. This way, unless, there is an incredibly high diversity, and a large difference between the size of the UNSENT job set and the size of the shared memory array.

Fourth step: job/volunteer matchmaking * Goal

Here, we want to enforce the priority values computed in the first step.

* How to do it

We scan the array and select jobs such that

expected completion on this particular volunteer is smaller than batch deadline
It fits the size constraints (i.e., not too large and not too small)
It respects the volunteer project preferences.
Make sure that the deadline the task is going to get will

not create a too important slack (we may want to use volunteer provided information on how often they reconnect as an estimation of what cannot be accepted) as it may make some volunteers unhappy. Ideally, there would be a volunteer provided value regarding minimum slack.

Among these jobs, we select the ones with the highest batch priority until we get the desired amount of work and jobs are still feasible.

Issues:

Strictly smaller than batch deadline may create starvation of the

batch. So we need a mechanism to re-extend batch deadlines

but this was discussed earlier.

Fifth step: assign job deadlines

Left open, we do not know yet how to do this.

A lot of discussions but here is what we finally proposed:

Let BD be the batch deadline. Let T_90 be the time a job of the batch would take on the 90% percentile machine. Say we aim to at most 3 series of resubmission. Then we initially set the job deadlines to BD-3T_90. Whenever this deadline passed, we enter the resubmission mechanism and we only replicate with a tight deadline of T_90. This resubmission means you create an additional replica of the job because the deadline was passed.

Sixth step: trigger replication

Instead of having only resubmission, we may want to have speculative job replication. This means that when deadline BD-3T_90 is passed, we resubmit not just one missed job, but several copies to decrease the failure probability and hence the potential re-resubmission.

Possible Client Mechanism Evolution

add a job state that says "report asap"
add a mechanism that says "run immediately" ?
allow volunteers to minimum slack
* Other Concerns Too tight job/batch deadlines ?

This requires tuning I guess

How to do aggressive replication for the straggler jobs ?

It depends on what we mean by aggressive. We can either use automatic resubmission or speculative replication or a combination.

Does there exist "processor affinity" ?

I mean jobs which are really more efficient on GPU, and jobs more efficient on CPU ? Answer is yes but we may want to consider this kind of optimization.

Turnaround time estimation vs. Throughput estimation

We're currently effective turnaround time and not potential turnaround time. In WCG, some very fast workers have a huge cash and take a lot of time to complete jobs, hence a poor effective turnaround time whereas they have a huge throughput. Actually, we do not keep track of this throughput but we keep track of the credit history, which may be a more accurate measure as job duration from a batch to another may vary whereas credit is bound the amount of work to be done.

Why do we have a notion of batch LET and batch deadline?

The LET notion is used to ensure there is a "fair" sharing of resource. The batch deadline is here to exclude slow volunteers. So although the two notions have quite different usages, they are very related. Kevin provided a good reason for having two separate notions computed in a different way. If there is ever a batch B1 with a large number of small jobs and a batch B2 with a small number of large jobs, they will both incur the same resource usage. However the time it will actually take to complete will be quite different.

Note that this means that if we ever compute batch deadline using Kevin 3 option, B1 will get rather tight deadline. It is still not clear to me whether it is a good or a bad thing. Somehow volunteers expect reasonable slacks. One option would be to enforce a minimum slack for volunteers so we added this constraint in the matchmaking.

Can we "predict" availability based on current uptime ?

This is an interesting question but this seems very difficult. Maybe I could discuss with Jean-Marc and he would convince me it's just unfeasible. :)

Issues raised by Uwe

On a project where the number of batches/jobs that are ready to run is very large relative to the size of the infrastructure, preloading all workunits into BOINC can cause a significant degredation of performance. There may need to be a way to load a batch so that it can be priortized and planned, but that the workunit/result records are not created and files are not copied to download until the system knows that those are required. A mechanism would need to exist to assess this condition and trigger that creation.

Application specific search space division

Wenjie, Gerdus

More or less the same scheme can be applied, maybe with some problem-specific tweaking, but not much. No BOINC server side change is needed, a few simple scripts can already implement the mechanism quite well. Maybe some kind of priority system can be implemented in the recycling. Splitting leftovers is also a way to deal with the long workunits that lasts, but maybe application-specific.

Important to keep things simple, even at the cost of accuracy! This is an engineering problem.

TODO Integration with hubs, clouds, grids, and desktop grids Joszef and Adam talked about it but we don't know much. We should ask them.
Make project web sites translatable

We should be careful about words with context-sensitive meaning. Having the context would be useful when translating. Maybe, SETI@home could be a good test-case.

Creating and deploying VM-based app versions

Carlos, Christian, Uwe, Carlos Val. Wenjing, Francisco, dario The smallest VM size Christian achieved was ~680 MB (uncompressed) and 180 MB (gzip compressed). Getting lower would require specific kernels, and then there is a increased risk of bugs It would be great if we could get it under the 100MB range. In Sztaki, they simply compress the image and it shrinks from 700 to 100MB, which is sufficient.

Christian explained Dario and Carlos how to create VM-based apps and how he deals with this in RNA World VmImage?: http://www.rnaworld.de/rnaworld/download/rnaWorld2GB.vdi.gz Extract the VDI to your hard disk, create a new Virtual Machine using this VDI and create a shared folder called 'shared' and place an executable/script named 'boinc_app' inside. This will be executed by the startup script of the VM. You can do for example:

#!/bin/bash sleep 10 echo "Hello World" > ../shared/hello.txt sleep 10

You can also interrupt the startup script by Ctrl-C in the VM and explore the inside.

We identified the problem of getting serious errors from inside the VM in a generic way. One way would be to redirect the console output via a serial port to the vboxwrapper and look for kernel panic or similiar messages. One approach is the socat pipe (http://wiki.illumos.org/display/illumos/Serial+Console+in+VirtualBox).

To solves these problems, Christian needs new features in the virtualbox wrapper, which could be based on socat.

Nothing much was done on the small VM size side.

Right now, although we have an installer for both BOINC and Vbox, the BOINC page mainly advertize for BOINC only as we don't want people to install useless things. David would like to change this to a big download BOINC+Vbox button and a small download BOINC explaining that it may restrict project participation. Christian raises the fact that to run 64bits apps in Vbox, you need the VTX feature to be activated in the bios, and there is no way to check whether it is activated or not without trying to execute workunits. Ideally in such a case, the client should post a notification to the volunteer to explain him what he should do to improve things.

An alternative suggestion was to enable BOINC to initiate the download and installation of VirtualBox at the time the user attempts to connect to a project that requires VirtualBox

Php pages with twitter bootstrap Francisco, Dario, David, Christian

Only works with bootstrap 2.3.2 and can be integrated into default BOINC with some effort. I also think it's possible to make it optional to have an easy upgrade path for projects. It will be soon put into BOINC.

How to automate the end-to-end testing of BOINC? Dario, Kevin, Joachim, Augustin, Adam
automate the database deployment, compiling, submission of a client

Nothing was done here.

Remote job submission : unification Wenjin, Christian, David

API-doc: http://boinc.berkeley.edu/trac/wiki/RemoteJobs It would be nice to have a disk quota for each submitter to limit disk usage on the server A hook on the server side to allow application specific preparation of batches. Possible implementation: place a file appname.inc in html/project.inc and submit_rpc_handler.php will look for this and call a specific function (prepare_job or prepare_batch) that is returning an XML structure that get's passed back to the submitter. Attention, this preparation could take some time! Another hook should make it possible to prepare the output data of a batch. A project might want to zip output files for each submitter on a daily basis before the whole batch is finished so the submitter can download these partial files.

Brainstormed on how name files should be managed. There is a beginning of a plan but it's not quite there yet. Maybe it will become more concrete next week.

Drupal/BOINC tutorial Tristan talked with whoever was interested (ClimatePrediction?, LHC, CAS@home). He's going to write some documentations explaining how to do this, which should make it easy for everyone.
Abandonned tasks
Sub-second CPU throttling. Further simplifying the BOINC install process and GUI Prototype a BOINC GUI using HTML5 Improving the BOINC server documentation GPU, multi-thread, VM-based, and Android applications Data-intensive applications

Last modified 10 years ago Last modified on Jan 20, 2014, 1:53:16 PM

Download in other formats:

Plain Text