Changes between Initial Version and Version 1 of FileCompression


Ignore:
Timestamp:
Apr 24, 2007, 12:51:35 PM (17 years ago)
Author:
davea
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • FileCompression

    v1 v1  
     1[[PageOutline]]
     2= File compression =
     3== Compression of output files ==
     4
     5If you include the <gzip_when_done> tag in an [http://boinc.berkeley.edu/xml.php#file output file description], the file will be gzip-compressed after it has been generated.
     6
     7== Compression of input files ==
     8
     9Starting with version 5.4, the BOINC client is able to handle HTTP Content-Encoding types 'deflate' (zlib algorithm) and 'gzip' (gzip algorithm). The client decompresses these files 'on the fly' and stores them on disk in uncompressed form.
     10
     11You can use this in two ways:
     12
     13    * Use the Apache 2.0 mod_deflate module to automatically compress files on the fly. This method will work with all BOINC clients, but it will do compression only for 5.4+ clients. Info on how to configure this is below.
     14    * Compress files and give them a filename suffix such as '.gz'. The name used in your <file_info> elements, however, is the original filename without '.gz'.
     15
     16      Include the following line in httpd.conf:
     17
     18{{{
     19AddEncoding x-gzip .gz
     20}}}
     21
     22
     23      This will add the content encoding to the header so that the client will decompress the file automatically. This method has the advantage of reducing server disk usage and server CPU load, but it will only work with 5.4+ clients. Use the 'min_core_version' field of the app_version table to enforce this. You can use this in conjunction because the mod_deflate module allows you to exempt certain filetypes from on-the-fly compression.
     24
     25Both methods store files uncompressed on the client. If you need compression on the client, you must do it at the application level. The BOINC source distribution includes a version of the zip library designed for use by BOINC applications on any platform (see below).
     26
     27
     28== Using mod_deflate ==
     29
     30Apache 2.0 includes a module called mod_deflate.
     31You can read about it here:
     32http://httpd.apache.org/docs/2.0/mod/mod_deflate.html
     33
     34This module allows you to specify that certain files will be
     35compressed dynamically when it is being sent to clients that specify
     36that they can handle it.
     37The BOINC client 5.4 and higher includes the ability to
     38decompress compressed files as they are downloaded.
     39If a BOINC client 5.2 or earlier requests work,
     40then the server will simply not compress the file so that
     41the client can handle the file.
     42We were expecting to only compress a few key files due to
     43the expected load on the server.
     44However, it turns out that the load on the server
     45is actually quite small so we are compressing most of the files
     46downloaded from our servers.
     47The average file is about 60% of the original file size.
     48Adding the compression on the fly only added about 5%
     49to the system CPU utilization (obviously it will vary
     50based on the power of your servers).
     51
     52You need to read the Apache 2.0 documentation about this
     53module to make sure you understand it.
     54However, our httpd.conf file for these changes includes the following:
     55{{{
     56# Enable module
     57LoadModule deflate_module modules/mod_deflate.so
     58
     59# Log file compression
     60DeflateFilterNote Input instream
     61DeflateFilterNote Output outstream
     62DeflateFilterNote Ratio ratio
     63
     64LogFormat '"%r" %{outstream}n/%{instream}n (%{ratio}n%%)' deflate
     65CustomLog logs/deflate_log deflate
     66
     67# Use low settings for compression to make sure impact on server is low
     68DeflateMemLevel 2
     69DeflateCompressionLevel 2
     70
     71Alias /boinc/download /path/to/files/download
     72
     73<Directory /path/to/files/download>
     74SetOutputFilter DEFLATE
     75SetEnvIfNoCase Request_URI \.(?:gz|gif|jpg|jpeg|png)$ no-gzip dont-vary
     76</Directory>
     77}}}
     78
     79This configuration tells Apache to compress all files served from
     80the download direction except for files that end with gz,gif,jpg,
     81jpeg and png.
     82An alternate way to specify the files is the following:
     83{{{
     84Alias /boinc/download /path/to/files/download
     85
     86<Directory /path/to/files/download>
     87AddOutputFilter DEFLATE .faa .mask
     88</Directory>
     89}}}
     90This configuration tells apache to compress only the file types
     91.faa and .mask served from the download director.
     92
     93== Using boinc_zip ==
     94
     95You can also do compression in your application.
     96To assist this, BOINC provides a library
     97boinc_zip, based on the "Info-Zip" libraries, but combines both zip & unzip
     98functionality in one library.  (http://www.info-zip.org).
     99Any questions/comments please email carlc@comlab.ox.ac.uk
     100
     101This library can "co-exist" with zlib (libz) in case you need that too.
     102
     103Basically, it will allow you to build a library that you can link
     104against to provide basic zip/unzip compression functionality.  It
     105should only add a few hundred KB to your app (basically like
     106distributing zip & unzip exe's for different platforms).
     107
     108Limitations:  the "unzip" functionality is there, that is you can unzip
     109a file and it will create all directories & files in the zip file. 
     110The "zip" functionality has some limitations due to the cross-platform
     111nature:  mainly it doens't provide zipping recursively (i.e.
     112subdirectories); and wildcard handling is done using the "boinc_filelist"
     113function which will be explained below.
     114
     115Building:  For Windows, you can just add the project "boinc_zip" to your
     116Visual Studio "Solution" or "Workspace."  Basically just "Insert Existing
     117Project" from the Visual Studio IDE, navigate over to the boinc/zip
     118directory, and it should load the appropriate files.  You can then build
     119"Debug" and "Release" versions of the library.  Then just add the
     120appropriate reference to "boinc_zip.lib" (Release build) or "boinc_zipd.lib"
     121(Debug build) in your app.
     122
     123For Linux & Mac, you should be able to run "./configure" and then do a "make"
     124to build the "libboinc_zip.a" lib that you will link against.  In extreme
     125cases, you may need to do an "aclocal && autoconf && automake" first,
     126to build properly for your platform.
     127
     128Also, please note that boinc_zip relies on some BOINC functions that you
     129will need (and will most likely be in your app already since they are handy)
     130 -- namely boinc/lib/filesys.C and boinc/lib/util.C
     131
     132Using:
     133Basically, you will need to #include "boinc_zip.h" in your app (of course
     134your compiler will need to know where it is, i.e. -I../boinc/zip).
     135
     136Then you can just call the function "boinc_zip" with the appropriate arguments
     137to zip or unzip.  There are three overridden boinc_zip's provided:
     138{{{
     139int boinc_zip(int bZipType, const std::string szFileZip,
     140     const ZipFileList* pvectszFileIn);
     141int boinc_zip(int bZipType, const std::string szFileZip,
     142     const std::string szFileIn);
     143int boinc_zip(int bZipType, const char* szFileZip, const char* szFileIn);
     144}}}
     145bZipType is ZIP_IT or UNZIP_IT (self-explanatory)
     146
     147szFileZip is the name of the zip file to create or extract
     148(I assume the user will provide it with the .zip extension)
     149
     150The main differences are in the file parameter.  The zip library used was
     151exhibiting odd behavior when "coexisting" with unzip, particularly in the
     152wildcard handling.  So a function was made that creates a "ZipFileList" class,
     153which is basically a vector of filenames.  If you are just compressing a
     154single file, you can use either the std::string or const char* szFileIn overrides. 
     155
     156You can also just pass in a "*" or a "*.*" to zip up all files in a directory.
     157
     158To zip multiple files in a "mix & match" fashion, you can use the boinc_filelist
     159function provided.  Basically, it's a crude pattern matching of files in a
     160directory, but it has been useful for us on the CPDN project.  Just create a
     161ZipFileList instance, and then pass this into boinc_filelist as follows:
     162{{{
     163bool boinc_filelist(const std::string directory,
     164                  const std::string pattern,
     165                  ZipFileList* pList,
     166                  const unsigned char ucSort = SORT_NAME | SORT_DESCENDING,
     167                  const bool bClear = true);
     168}}}
     169if you want to zip up all text (.txt) files in a directory, just pass in:
     170the directory as a std::string, the pattern, i.e. ".txt", &yourZipList
     171
     172The last two flags are the sort order of the file list (CPDN files need to be
     173in a certain order -- descending filenames, which is why that's the default).
     174The default is to "clear" your list, you can set that to "false" to keep adding
     175files to your "ZipFileList".
     176
     177When you have created your "ZipFileList" just pass that pointer to boinc_zip.
     178You will be able to add files in other directories this way.
     179
     180There is a "ziptest" Project for Windows provided to experiment, which can
     181also be run (the "ziptest.cpp") on Unix & Mac to experiment
     182with how boinc_zip work (just g++ with the boinc/lib/filesys.C & util.C as
     183described above).