= Volunteer data archival =

'''Volunteer data archival''' means using disk space on volunteered home computers
to store large data files.
This document describes the design of a system to
provide volunteer data archival on BOINC.
We assume the goals include:
 * Storing large (e.g. petabyte) files.
   Files may be thousands of times larger than the
   amount of space available on individual computers.
 * Store files are long periods.
 * Be able to reduce the probability of data loss
   to arbitrarily small levels.

Properties of the volunteer host population include:

 * A host may be sporadically available because
   it is turned off, or because the user has suspended network activity.
   Unavailable periods may range from minutes to several days.
 * The upload and download speeds of hosts vary widely,
   and can be fairly low (e.g. 1 Mbps) in some cases.
 * The amount of disk space available to a project on a given host
   may fluctuate over time, because of the user's own disk usage
   or disk usage by other BOINC projects to which the host is attached.
 * The population is dynamic: hosts are constantly arriving and leaving.
   The mean lifetime of a host may be fairly small
   (on the order of 100 days).
 * Many hosts are behind firewalls.
   We assume that all communication is initiated by the BOINC client,
   and involves HTTP requests to trusted project servers.
   We don't consider direct client-to-client communication.

There are two basic techniques for achieving reliable storage using
unreliable resources:

 * '''Replication''': a file 

 * '''Coding''': with Reed-Solomon coding, a file is divided into N 'packets',
  and an additional K checksum packets are generated.
  The original data can be reconstructed from any N of these N+K packets.