wiki:BossaOverview

Version 10 (modified by davea, 16 years ago) (diff)

--

Bossa overview

Bossa is a software framework for "distributed thinking" - the use of volunteers on the Internet to perform tasks that require human intelligence, knowledge, or cognitive skills. Examples of such projects include Stardust@home and GalaxyZoo. Bossa simplifies the task of creating projects like these. It serves roughly the same function as Amazon's Mechanical Turk, but is simpler and more powerful, does not involve payment, and is open source.

Volunteers have different skill levels; they may do tasks well or poorly, and a few of them may intentionally do tasks incorrectly. One can achieve an overall level of accuracy that is higher than the population average using replication - having multiple volunteers do each task, and comparing the results. Replication is also useful for tasks that do not have a unique correct answer, and for which you want to collect alternatives.

The diverse requirements of distributed thinking

Distributed thinking projects may have widely varying properties and requirements.

First, they may require different orders of job distribution. Consider the two cases:

  • Project A has a limited set of jobs and lots of volunteers.
  • Project B has an unbounded stream of jobs and a limited set of volunteers.

Project A will want to have each job performed about the same number of times. It will want to issues all jobs once, then issue them all a second time, and so on. In constrast, project B will want to issue the first job to a quorum of volunteers, then issue the second job to a quorum, and so on.

Second, projects may want to assess the ability of each volunteer, and use this to determine how many replicas of each job to perform. Several factors might contribute to the ability estimate:

  • The volunteer's performance on a training course.
  • The volunteer's performance on a stream of "calibration jobs" (with known answers) intermixed with the job stream.
  • The fraction of time a volunteer's response to a job agrees with the "correct" response, as determined by replication.

In addition, the way in which ability is described may vary. In simple cases it might be a single number, e.g. and error rate. For tasks that involve feature detection we might want to track the rates of false positives and false negatives separately. For more complex tasks there could be arbitrarily many dimensions.

Third, there are various way in which "experts" might be used. Two general possibilities:

Experts do the same job, only better
For example, experts might be used to resolve cases in which no concensus is reached by non-experts. Or they might be used to verify rare features found by non-experts.
Experts do more sophisticated jobs
For example, non-experts might look for features, while experts classify them.

How Bossa addresses these requirements

Bossa's design philosophy is to provide mechanisms, and leave it up to the projects to define the policies. This is done using the following techniques:

  • Project-specific information about jobs and volunteers is stored in project data fields. These are PHP object whose structure is determined by the project. Bossa stores them in a serialized form in the database.
  • Project-specific policies are encoded in PHP project policy functions that are invoked at various points (e.g., when jobs are issued, when they complete, and when they time out).

The mechanisms provided by Bossa include:

  • Jobs can be designated as calibration jobs, and Bossa can be instructed to randomly mix a given rate of calibration jobs into the job stream.
  • The order in which jobs are issued is determined by a floating-point priority. The manipulation of these priorities is up to the project.
  • Users can be assigned integer levels, in which case each job has a vector of priorities, one per level.

This framework enables all of the examples listed above (and others not yet conceived) to be implemented with a small amount of PHP programming.

Attachments (1)

Download all attachments as: .zip