A summary of what I've been doing and thinking about recently.
After 20 years, volunteer computing has had successes,
but has not approached its potential.
VC was supposed to enable ground-breaking research
by providing more computing power than was available or affordable otherwise.
This has happened but only to a small extent.
The set of VC projects has been small and essentially static for 10 years.
Of the scientists who use high-throughput computing and could benefit from VC,
only a tiny fraction actually do.
VC was supposed to greatly increase global public interest in science;
this has happened but only to a small extent.
The volunteer population is almost entirely from a single demographic
(older, IT-savvy males)
and has been gradually shrinking for ~10 years.
These problems can be traced to BOINC's original structural model:
the "project ecosystem" model.
In this model, there's a dynamic ecosystem of competing projects,
the public learns about them and make informed choices,
the best projects get the most computing power,
and the public learns and gets excited about science.
BOINC is designed to encourage this model (e.g. cross-project IDs and credit).
The model was based on several assumptions:
It's sufficiently easy to create and operate a BOINC project
that almost any computational science research group can do it.
Other than providing the software and a list of projects,
BOINC should have no centralized functions or control;
projects are autonomous.
Volunteers will evaluate the projects (by reading their web sites)
and make rational decisions about which ones to support.
Furthermore, they will do this repeatedly as new projects arise.
Projects will compete for volunteers by making compelling
web sites that explain and promote their research.
The model didn't work as envisioned, for a number of reasons:
Creating and operating a VC project is harder than we realized:
it requires a combination of resource and skills (Win/Mac programming,
sysadmin, DB admin, web design, PR/outreach)
that few academic research groups have.
For a research group, trying to use VC is a risk.
There's a substantial investment, with no guarantee of any return,
since no one may volunteer.
Adding a VC component to a grant proposal adds uncertainty and weakens the proposal.
The computing needs of many research groups are sporadic -
e.g. they need a big chunk of throughput every now and then.
For such groups, buying computing time on a commercial cloud may be cheaper than using VC.
Attracting volunteers is a marketing exercise.
It's difficult to do effective marketing when there are
dozens of competing brands (i.e. projects names).
Most volunteers aren't interested willing to survey and assess
a large set of projects once, much less repeatedly.
We made little effort to interface, technically or politically,
with the mainstream HPC/HTC world (Grid, Supercomputing, Condor, etc.).
They came to view VC in negative ways: as a threat, a gimmick, etc.
Around 2006 there was a brief and small interest in VC in academic computer world.
Since then, nothing: no conferences on distributed computing list VC as a topic of interest.
This has been damaging to VC; e.g. no one is working on solving the hard
problems that arise in VC (such as how to grant credit).
A new model
I think we need to take what we've learned from the project ecosystem model
and make a new and better model.
I brought up this idea at the 2013 BOINC workshop,
and proposed a model consisting of two related parts:
Partner with existing HTC computing providers
such as supercomputing centers and science portals to add BOINC-based back ends.
These projects would be operated by the provider's staff.
Tens of thousands of scientists use such computing providers.
These scientists would benefit from lower queueing delays, higher throughput and lower cost.
But they wouldn't need to do anything; they wouldn't even need to know that VC is being used.
Create an account manager (let's called it "TBD" for now) acting as the primary volunteer interface.
TBD lets volunteers express their preferences in terms of
keywords (scientific areas and locations)
rather than selecting specific projects.
Based on these preferences, and corresponding keywords of projects and applications,
TBD dynamically assigns computers to a set of vetted projects,
which would include both existing (single-group) projects,
as well as the new computing-center projects.
On a technical level, the new model is enabled by our ability
(thanks to Rom Walton and various people from CERN)
to run jobs in virtual machines,
and recent refinements by Marius Milea to support Docker on top of this.
This makes it possible for HTC providers like TACC and nanoHUB
(which already use Docker for app deployment)
to run hundreds of existing applications with no porting or other per-app work.
This model addresses most of the problems with the previous one.
Notes about the new model:
It doesn't interfere with or preclude existing BOINC activities.
Current projects continue as they are.
Scientists can create new single-group projects if they want.
Volunteers can attach individual projects as they currently do,
or use existing account managers like BAM! and Gridrepublic.
TBD will act as an allocator of computing power.
This will be based in part on user preferences,
but there will of necessity also be a higher-level allocation policy,
decided on by an organization.
The decision process should include merit and need;
it may include politics and money as well.
NSF has an organization - XSEDE - that does this for NSF-funded computing resources.
I'm in contact with XSEDE, and hope to include them in TBD.
Involving NSF in the process is important; but this project needs to be international.
This part of the model needs to be worked out at a high level.
The model focuses on large HPC-provider-level projects,
but it actually encourages single-group projects,
since they can apply for an allocation from TBD and be assured
of computing power prior to making any investment.
TBD can serve as a brand for VC marketing purposes.
It will also provide a basis for corporate partnerships;
if technology or game companies want to support VC,
they can support TBD rather then having to select individual projects.
In 2014 I started thinking seriously about the new model,
and I teamed up with two mainstream computing providers as test cases:
nanoHUB, at Purdue University,
which is a nanoscience portal.
It provides web interfaces to computational tools, used by thousands of scientists,
many of which create HTC workloads well-suited to VC.
Our goal is to create success stories that inspire all HTC providers to add VC back ends.
In 2014 we sent a proposal to NSF; it got good reviews but was rejected.
We revised and resubmitted the next year,
and in 2016 we were given 1 year of funding
and encouraged to re-apply again.
We did, and recently learned that our latest proposal was funded for 3 years,
starting this month.
The proposal text is here,
and the 1-page summary is here.
We didn't get all the money we asked for, which is par for the course.
We got enough to pay my salary, and 50% salary for my collaborators at Purdue and TACC.
I had hoped to be able to hire a web designer here at UCB;
maybe I can find other sources of money to do this.
Relationship to BOINC
In 2016 BOINC became a community-run project; I don't control it.
The new project, TBD, will be separate from BOINC.
I hope that the BOINC community likes and supports TBD,
but some people might not, and I don't want to step on their toes.
Of course, I'm interested in hearing comments and criticisms about TBD,
and in discussing it.
I've been mostly MIA from BOINC for the last couple of years,
because I've been working full-time on other projects.
I apologize for this.
With this new funding, I'll be able to devote a good chunk of my time to
managing and contributing to BOINC,
e.g. setting up a functional release management process.
I suspect that relatively few current volunteers will use TBD;
it's more for new users with wider demographics.
So current projects won't lose computing power,
and they should get additional power from TBD.
Long term, I think something like TBD is our only hope for going from a few 100K volunteers
to millions or tens of millions.
And such a rising tide will float all of our ships.
To implement TBD, I'll need to add some features to BOINC, e.g.:
The client will pass credit estimate information to account managers.
Account managers can send clients opaque data to be passed in scheduler requests
(preference keywords in this case).
The scheduler will have a "keyword matching" option that takes user and job
keywords into account.
E.g. it will preferentially send biomed jobs to volunteers who want to support biomed.
These features will have no impact on existing projects.
The BOINC web site will link to TBD as well as BAM! and GridRepublic.
The TBD source code will be released under LGPLv3, and will be stored on Github.
We'll welcome code contributions.
I've been through a few names for TBD.
The latest proposal calls it "Science United".
This is OK, but it's a bit long and uninteresting.
Also it conjures the ill-fated "United Devices", an early attempt at commercializing VC.
I thought about names starting with "Sci" and came up with:
"Sciborg": volunteers are assimilated into a collective intelligence. Too ominous.
"Sciphon": like we're siphoning off computing power.
Has connotations of stealing gasoline.
"SciOn": where the "O" is the power-button icon.
Power up Science!
I like this one, though Scion is also a former car brand.
The bottom line: computer nerds shouldn't invent brand names.
Hopefully I can get help from marketing/branding experts from the business world.
The UCB business school teaches classes in this sort of thing.