= Volunteer data archival = '''Volunteer data archival''' means using disk space on volunteered home computers to store large data files. This document describes the design of a system to provide volunteer data archival on BOINC. We assume the goals include: * Storing large (e.g. petabyte) files. Files may be thousands of times larger than the amount of space available on individual computers. * Store files are long periods. * Be able to reduce the probability of data loss to arbitrarily small levels. Properties of the volunteer host population include: * A host may be sporadically available because it is turned off, or because the user has suspended network activity. Unavailable periods may range from minutes to several days. * The upload and download speeds of hosts vary widely, and can be fairly low (e.g. 1 Mbps) in some cases. * The amount of disk space available to a project on a given host may fluctuate over time, because of the user's own disk usage or disk usage by other BOINC projects to which the host is attached. * The population is dynamic: hosts are constantly arriving and leaving. The mean lifetime of a host may be fairly small (on the order of 100 days). * Many hosts are behind firewalls. We assume that all communication is initiated by the BOINC client, and involves HTTP requests to trusted project servers. We don't consider direct client-to-client communication. There are two basic techniques for achieving reliable storage using unreliable resources: * '''Replication''': a file * '''Coding''': with Reed-Solomon coding, a file is divided into N 'packets', and an additional K checksum packets are generated. The original data can be reconstructed from any N of these N+K packets.