Changes between Version 19 and Version 20 of FileCompression


Ignore:
Timestamp:
Jan 16, 2012, 11:52:57 AM (13 years ago)
Author:
davea
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • FileCompression

    v19 v20  
    44== BOINC-supplied compression ==
    55=== Compression of input files === #compress-input
    6 Starting with version 5.4, the BOINC client is able to handle HTTP `Content-Encoding` types 'deflate' (zlib algorithm) and 'gzip' (gzip algorithm). The client decompresses these files 'on the fly' and stores them on disk in uncompressed form. This can be used in the following two ways.
    7 
    8 Both methods store files uncompressed on the client. If you need compression on the client, you must do it at the application level (see below).
     6Starting with version 5.4, the BOINC client is able to handle HTTP `Content-Encoding` types 'deflate'
     7(zlib algorithm) and 'gzip' (gzip algorithm).
     8The client decompresses these files 'on the fly' and stores them on disk in uncompressed form.
     9This can be used in the following two ways.
     10
     11Both methods store files uncompressed on the client.
     12If you need compression on the client, you must do it at the application level (see below).
    913
    1014==== gzip encoding ====
    11 To use this method, gzip your downloadable files, giving them a filename suffix such as '.gz'. (The name used in your `<file_info>` elements, however, is the original filename without '.gz').
    12 
    13 This method has the advantage of reducing server disk usage and server CPU load, but it will only work with 5.4+ clients. BOINC clients older than 5.4 won't be able to download files. Use the 'min_core_client_version' entry in config.xml to enforce this.
     15To use this method, gzip your downloadable files, giving them a filename suffix such as '.gz'.
     16(The name used in your `<file_info>` elements, however, is the original filename without '.gz').
     17
     18This method has the advantage of reducing server disk usage and server CPU load,
     19but it will only work with 5.4+ clients.
     20BOINC clients older than 5.4 won't be able to download files.
     21Use the 'min_core_client_version' entry in config.xml to enforce this.
    1422
    1523==== Apache mod_deflate ====
    16 You can use the Apache 2.0 mod_deflate module to automatically compress files on the fly. See http://httpd.apache.org/docs/2.0/mod/mod_deflate.html. This method will work with all BOINC clients, but it will do compression only for 5.4+ clients.
    17 
    18 You can use this in conjunction with gzip encoding because the mod_deflate module allows you to exempt certain filetypes from on-the-fly compression.
     24You can use the Apache 2.0 mod_deflate module to automatically compress files on the fly.
     25See http://httpd.apache.org/docs/2.0/mod/mod_deflate.html.
     26This method will work with all BOINC clients, but it will do compression only for 5.4+ clients.
     27
     28You can use this in conjunction with gzip encoding because the mod_deflate module
     29allows you to exempt certain filetypes from on-the-fly compression.
    1930
    2031This method increases CPU load on the web server, but this is typically not significant.
     
    6677</Directory>
    6778}}}
    68 This configuration tells Apache to redirect to the statically compressed files if the extension is vmdk, exe, dll, or pdb. All other files are compressed on-the-fly from the download direction except for files that end with `gz`,`gif`,`jpg`,`jpeg` and `png`.
     79This configuration tells Apache to redirect to the statically compressed files
     80if the extension is vmdk, exe, dll, or pdb.
     81All other files are compressed on-the-fly from the download direction except for files
     82that end with `gz`,`gif`,`jpg`,`jpeg` and `png`.
    6983
    7084An alternate way to specify the files is the following:
     
    7690</Directory>
    7791}}}
    78 This configuration tells Apache to compress only the file types `.faa` and `.mask` served from the download directory.
     92This configuration tells Apache to compress only the file types `.faa` and `.mask`
     93served from the download directory.
    7994
    8095=== Compression of output files === #compress-output
    81 If you include the `<gzip_when_done>` tag in an [wiki:XmlFormat#Files output file description], the file will be gzip-compressed after it has been generated.
    82 
    83 The gzip_when_done is only supported in client version 5.8+. If you receive files from clients that do not support the gzip_when_done flag, then you should open the files with a function similar to this to your validator/assimilator:
     96If you include the `<gzip_when_done>` tag in an [wiki:XmlFormat#Files output file description],
     97the file will be gzip-compressed after it has been generated.
     98
     99The gzip_when_done is only supported in client version 5.8+.
     100If you receive files from clients that do not support the gzip_when_done flag,
     101then you should open the files with a function similar to this to your validator/assimilator:
    84102
    85103{{{
     
    108126}
    109127}}}
    110 This will uncompress the file if it is compressed or will read it without modification if it is not compressed.
     128This will uncompress the file if it is compressed or will read it without modification
     129if it is not compressed.
    111130
    112131== Application-level compression ==
    113132=== Using boinc_zip === #boinc-zip
    114 You can also do compression in your application. To assist this, BOINC provides a library boinc_zip, based on the [http://www.info-zip.org Info-Zip] libraries, but combines both zip & unzip functionality in one library. Any questions/comments please email Carl Christensen  (carlgt1 at yahoo dot com)
     133You can also do compression in your application.
     134To assist this, BOINC provides a library boinc_zip,
     135based on the [http://www.info-zip.org Info-Zip] libraries,
     136but combines both zip & unzip functionality in one library.
     137Any questions/comments please email Carl Christensen (carlgt1 at yahoo dot com)
    115138
    116139This library can "co-exist" with zlib (libz) in case you need that too.
    117140
    118 Basically, it will allow you to build a library that you can link  against to provide basic zip/unzip compression functionality.  It  should only add a few hundred KB to your app (basically like  distributing `zip` & `unzip` executable binaries for different platforms).
     141Basically, it will allow you to build a library that you can link
     142against to provide basic zip/unzip compression functionality.
     143It should only add a few hundred KB to your app
     144(basically like distributing `zip` & `unzip` executable binaries for different platforms).
    119145
    120146==== Limitations ==== #boinc-zip-limitations
    121 The "unzip" functionality is there, that is you can unzip a file and it will create all directories & files in the zip file.   The "zip" functionality has some limitations due to the cross-platform nature:  mainly it doesn't provide zipping recursively (i.e.  subdirectories); and wildcard handling is done using the "boinc_filelist"  function which will be explained below.
     147The "unzip" functionality is there,
     148that is you can unzip a file and it will create all directories & files in the zip file.
     149The "zip" functionality has some limitations due to the cross-platform nature:
     150mainly it doesn't provide zipping recursively (i.e. subdirectories);
     151and wildcard handling is done using the "boinc_filelist" function which will be explained below.
    122152
    123153==== Building ==== #boinc-zip-building
    124 For Windows, you can just add the project "boinc_zip" to your  Visual Studio "Solution" or "Workspace."  Basically just "Insert Existing  Project" from the Visual Studio IDE, navigate over to the boinc/zip  directory, and it should load the appropriate files.  You can then build  "Debug" and "Release" versions of the library.  Then just add the  appropriate reference to "boinc_zip.lib" (Release build) or "boinc_zipd.lib" (Debug build) in your app.
    125 
    126 For Linux & Mac, you should be able to run "./configure" and then do a "make" to build the "libboinc_zip.a" lib that you will link against.  In extreme cases, you may need to do an "aclocal && autoconf && automake" first,  to build properly for your platform.
    127 
    128 Also, please note that boinc_zip relies on some BOINC functions that you will need (and will most likely be in your app already since they are handy) -- namely `boinc/lib/filesys.C` and `boinc/lib/util.C`.
     154For Windows, you can just add the project "boinc_zip" to your Visual Studio "Solution" or "Workspace."
     155Basically just "Insert Existing Project" from the Visual Studio IDE,
     156navigate over to the boinc/zip directory, and it should load the appropriate files.
     157You can then build "Debug" and "Release" versions of the library.
     158Then just add the appropriate reference to "boinc_zip.lib" (Release build)
     159or "boinc_zipd.lib" (Debug build) in your app.
     160
     161For Linux & Mac, you should be able to run "./configure" and then do a "make"
     162to build the "libboinc_zip.a" lib that you will link against.
     163In extreme cases, you may need to do an "aclocal && autoconf && automake" first,
     164to build properly for your platform.
     165
     166Also, please note that boinc_zip relies on some BOINC functions that you will need
     167(and will most likely be in your app already since they are handy) --
     168namely `boinc/lib/filesys.C` and `boinc/lib/util.C`.
    129169
    130170==== Using ==== #boinc-zip-using
    131 Basically, you will need to `#include "boinc_zip.h"` in your app (of course  your compiler will need to know where it is, i.e. -I../boinc/zip).
    132 
    133 Then you can just call the function `boinc_zip` with the appropriate arguments to zip or unzip.  There are three overloaded boinc_zip's provided:
     171Basically, you will need to `#include "boinc_zip.h"` in your app
     172(of course your compiler will need to know where it is, i.e. -I../boinc/zip).
     173
     174Then you can just call the function `boinc_zip` with the appropriate arguments to zip or unzip.
     175There are three overloaded boinc_zip's provided:
    134176
    135177{{{
     
    141183`bZipType` is `ZIP_IT` or `UNZIP_IT` (self-explanatory)
    142184
    143 `szFileZip` is the name of the zip file to create or extract (I assume the user will provide it with the .zip extension)
    144 
    145 The main differences are in the file parameter.  The zip library used was  exhibiting odd behavior when "coexisting" with unzip, particularly in the  wildcard handling.  So a function was made that creates a `ZipFileList` class,  which is basically a vector of filenames.  If you are just compressing a  single file, you can use either the `std::string` or `const char* szFileIn` overrides.
     185`szFileZip` is the name of the zip file to create or extract
     186(I assume the user will provide it with the .zip extension)
     187
     188The main differences are in the file parameter.
     189The zip library used was exhibiting odd behavior when "coexisting" with unzip,
     190particularly in the wildcard handling.
     191So a function was made that creates a `ZipFileList` class,
     192which is basically a vector of filenames.
     193If you are just compressing a single file,
     194you can use either the `std::string` or `const char* szFileIn` overrides.
    146195
    147196You can also just pass in a `*` or a `*.*` to zip up all files in a directory.
    148197
    149 To zip multiple files in a "mix & match" fashion, you can use the `boinc_filelist` function provided.  Basically, it's a crude pattern matching of files in a directory, but it has been useful for us on the CPDN project.  Just create a  `ZipFileList` instance, and then pass this into `boinc_filelist` as follows:
     198To zip multiple files in a "mix & match" fashion,
     199you can use the `boinc_filelist` function provided.
     200Basically, it's a crude pattern matching of files in a directory,
     201but it has been useful for us on the CPDN project.
     202Just create a `ZipFileList` instance, and then pass this into `boinc_filelist` as follows:
    150203
    151204{{{
     
    154207    const std::string directory,
    155208    const std::string pattern,
    156     ZipFileList* pList, 
     209    ZipFileList* pList,
    157210    const unsigned char ucSort = SORT_NAME | SORT_DESCENDING,
    158211    const bool bClear = true
    159212);
    160213}}}
    161 if you want to zip up all text (.txt) files in a directory, just pass in: the directory as a `std::string`, the pattern, i.e. ".txt", `&yourZipList`
    162 
    163 The last two flags are the sort order of the file list (CPDN files need to be in a certain order -- descending filenames, which is why that's the default). The default is to "clear" your list, you can set that to `false` to keep adding files to your `ZipFileList`.
    164 
    165 When you have created your `ZipFileList` just pass that pointer to `boinc_zip`. You will be able to add files in other directories this way.
    166 
    167 There is a `ziptest` Project for Windows provided to experiment, which can  also be run (the "ziptest.cpp") on Unix & Mac to experiment  with how `boinc_zip` work (just g++ with the `boinc/lib/filesys.C` & `util.C` as described above).
    168 
    169 ==== Getting boinc_zip ==== #boinc-zip-getting
    170 boinc_zip is no longer in the main boinc subversion "trunk" but resides in this "depends" brance:
    171 
    172 svn co http://boinc.berkeley.edu/svn/trunk/depends_projects/zip
    173 
    174 Note for Linux/Mac:  To build along with the other boinc libraries, you will need to add the following lines to the bottom of the '''configure.ac''' file (where the various Makefiles are listed):
    175 
    176 {{{
    177      zip/Makefile
    178      zip/zip/Makefile
    179      zip/unzip/Makefile
    180 }}}
    181 Similarly for the '''Makefile.am''' file -- add zip, zip/zip and zip/unzip to the library subdirs:
    182 
    183 {{{
    184 if ENABLE_LIBRARIES
    185    API_SUBDIRS = api lib zip zip/zip zip/unzip
    186 endif
    187 }}}
     214if you want to zip up all text (.txt) files in a directory,
     215just pass in: the directory as a `std::string`, the pattern, i.e. ".txt", `&yourZipList`
     216
     217The last two flags are the sort order of the file list
     218(CPDN files need to be in a certain order -- descending filenames, which is why that's the default).
     219The default is to "clear" your list,
     220you can set that to `false` to keep adding files to your `ZipFileList`.
     221
     222When you have created your `ZipFileList` just pass that pointer to `boinc_zip`.
     223You will be able to add files in other directories this way.
     224
     225There is a `ziptest` Project for Windows provided to experiment,
     226which can also be run (the "ziptest.cpp") on Unix & Mac to experiment
     227with how `boinc_zip` work (just g++ with the `boinc/lib/filesys.C` & `util.C` as described above).
     228
    188229=== Using gzip (zlib) === #gzip
    189 These basic routines may be useful if you want to compress/decompress a file using the zlib library (usually called "libz.a" and available for most platforms). Include the header file below (qcn_gzip.h) in your program, and link against libz, and you will gain two simple to use functions for gzip'ing or gunzip'ing a file. This is for simple single file or file-by-file compression or decompression (i.e. one file that is to be compressed into a .gz or decompressed back to it's original uncompressed state). You can check for boinc client status if you want the ability to quit inside an operation etc.
     230These basic routines may be useful if you want to compress/decompress a file using the zlib library
     231(usually called "libz.a" and available for most platforms).
     232Include the header file below (qcn_gzip.h) in your program,
     233and link against libz, and you will gain two simple to use functions for gzip'ing or gunzip'ing a file.
     234This is for simple single file or file-by-file compression or decompression
     235(i.e. one file that is to be compressed into a .gz or decompressed back to it's original uncompressed state).
     236You can check for boinc client status if you want the ability to quit inside an operation etc.
    190237
    191238qcn_gzip.h: