Changes between Version 3 and Version 4 of VolunteerDataArchival


Ignore:
Timestamp:
Nov 23, 2011, 3:06:22 PM (12 years ago)
Author:
davea
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • VolunteerDataArchival

    v3 v4  
    7070unreliable resources:
    7171
    72 '''Replication''': a file is divided into N pieces,
    73 and each piece is stored on M hosts.
     72=== Replication ===
     73
     74With this technique, a file is divided into N chunks,
     75and each chunk is stored on M hosts.
    7476If a replica is lost, and there another replica,
    7577that replica is uploaded to the server, then downloaded to another host.
     78By increasing M, reliability can be made arbitrarily high.
    7679
    77 '''Coding''': with Reed-Solomon coding, a file is divided into N 'packets',
     80Replication has advantages:
     81
     82 * Recovery from a failure is fast, since only one upload and download is done.
     83  This minimizes the chances of another failure occurring during recovery.
     84 * By making N large, the server storage needed for a recovery
     85  can be made arbitrarily small.
     86
     87and disadvantages:
     88
     89 * It has an extremely high space overhead,
     90  since M in general must be made large to provide reliability.
     91 * Even if individual chunks are made reliabile,
     92  the failure rate for the file as a whole increases exponentially with N
     93
     94=== Coding ===
     95
     96With Reed-Solomon coding, a file is divided into N 'packets',
    7897and an additional K checksum packets are generated.
    7998The original data can be reconstructed from any N of these N+K packets.
    8099
    81 In
     100Coding has advantages:
     101
     102 * It can provide high reliability without high space overhead.
     103   For example, if N=40 and K=20, we can tolerate 20 simultaneous host failures
     104   with a space overhead of only 50%;
     105   with replication the overhead would be 2000%.
     106
     107and disadvantages:
     108
     109 * Regenerating a chunk requires reassembling the entire file on the server,
     110  defeating the purpose of distributed storage.
     111
     112== Hybrid reliability mechanisms ==
     113
     114Because of the above disadvantages,
     115neither replication nor coding alone is sufficient for volunteer data archival.
     116However, we can combine them in various ways that reduce the disadvantages.
     117
     118=== Multi-level coding ===
     119
     120=== Coding plus replication ===
     121
     122== The VDAB simulator ==