Context Navigation

← Previous Change
Wiki History
Next Change →

Changes between Initial Version and Version 1 of VolunteerDataArchival

Timestamp:: Nov 22, 2011, 3:04:00 PM (12 years ago)
Author:: davea
Comment:: --

Legend:

: Unmodified
: Added
: Removed
: Modified

VolunteerDataArchival

                       v1
+= Volunteer data archival =
+'''Volunteer data archival''' means using disk space on volunteered home computers
+to store large data files.
+This document describes the design of a system to
+provide volunteer data archival on BOINC.
+We assume the goals include:
+ * Storing large (e.g. petabyte) files.
+   Files may be thousands of times larger than the
+   amount of space available on individual computers.
+ * Store files are long periods.
+ * Be able to reduce the probability of data loss
+   to arbitrarily small levels.
+Properties of the volunteer host population include:
+ * A host may be sporadically available because
+   it is turned off, or because the user has suspended network activity.
+   Unavailable periods may range from minutes to several days.
+ * The upload and download speeds of hosts vary widely,
+   and can be fairly low (e.g. 1 Mbps) in some cases.
+ * The amount of disk space available to a project on a given host
+   may fluctuate over time, because of the user's own disk usage
+   or disk usage by other BOINC projects to which the host is attached.
+ * The population is dynamic: hosts are constantly arriving and leaving.
+   The mean lifetime of a host may be fairly small
+   (on the order of 100 days).
+ * Many hosts are behind firewalls.
+   We assume that all communication is initiated by the BOINC client,
+   and involves HTTP requests to trusted project servers.
+   We don't consider direct client-to-client communication.
+There are two basic techniques for achieving reliable storage using
+unreliable resources:
+ * '''Replication''': a file
+ * '''Coding''': with Reed-Solomon coding, a file is divided into N 'packets',
+  and an additional K checksum packets are generated.
+  The original data can be reconstructed from any N of these N+K packets.