[[PageOutline]] = File compression = == Compression of output files == If you include the `` tag in an [XmlFormat#Files output file description], the file will be gzip-compressed after it has been generated. == Compression of input files == Starting with version 5.4, the BOINC client is able to handle HTTP `Content-Encoding` types 'deflate' (zlib algorithm) and 'gzip' (gzip algorithm). The client decompresses these files 'on the fly' and stores them on disk in uncompressed form. You can use this in two ways: * Use the Apache 2.0 mod_deflate module to automatically compress files on the fly. This method will work with all BOINC clients, but it will do compression only for 5.4+ clients. Info on how to configure this is below. * Compress files and give them a filename suffix such as '.gz'. The name used in your `` elements, however, is the original filename without '.gz'. Include the following line in `httpd.conf`: {{{ AddEncoding x-gzip .gz }}} This will add the content encoding to the header so that the client will decompress the file automatically. This method has the advantage of reducing server disk usage and server CPU load, but it will only work with 5.4+ clients. Use the 'min_core_version' field of the app_version table to enforce this. You can use this in conjunction because the mod_deflate module allows you to exempt certain filetypes from on-the-fly compression. Both methods store files uncompressed on the client. If you need compression on the client, you must do it at the application level. The BOINC source distribution includes a version of the zip library designed for use by BOINC applications on any platform (see below). == Using mod_deflate == Apache 2.0 includes a module called mod_deflate. You can read about it here: http://httpd.apache.org/docs/2.0/mod/mod_deflate.html This module allows you to specify that certain files will be compressed dynamically when it is being sent to clients that specify that they can handle it. The BOINC client 5.4 and higher includes the ability to decompress compressed files as they are downloaded. If a BOINC client 5.2 or earlier requests work, then the server will simply not compress the file so that the client can handle the file. We were expecting to only compress a few key files due to the expected load on the server. However, it turns out that the load on the server is actually quite small so we are compressing most of the files downloaded from our servers. The average file is about 60% of the original file size. Adding the compression on the fly only added about 5% to the system CPU utilization (obviously it will vary based on the power of your servers). You need to read the Apache 2.0 documentation about this module to make sure you understand it. However, our `httpd.conf` file for these changes includes the following: {{{ # Enable module LoadModule deflate_module modules/mod_deflate.so # Log file compression DeflateFilterNote Input instream DeflateFilterNote Output outstream DeflateFilterNote Ratio ratio LogFormat '"%r" %{outstream}n/%{instream}n (%{ratio}n%%)' deflate CustomLog logs/deflate_log deflate # Use low settings for compression to make sure impact on server is low DeflateMemLevel 2 DeflateCompressionLevel 2 Alias /boinc/download /path/to/files/download SetOutputFilter DEFLATE SetEnvIfNoCase Request_URI \.(?:gz|gif|jpg|jpeg|png)$ no-gzip dont-vary }}} This configuration tells Apache to compress all files served from the download direction except for files that end with `gz`,`gif`,`jpg`,`jpeg` and `png`. An alternate way to specify the files is the following: {{{ Alias /boinc/download /path/to/files/download AddOutputFilter DEFLATE .faa .mask }}} This configuration tells Apache to compress only the file types `.faa` and `.mask` served from the download directory. == Using boinc_zip == You can also do compression in your application. To assist this, BOINC provides a library boinc_zip, based on the "Info-Zip" libraries, but combines both zip & unzip functionality in one library. (http://www.info-zip.org). Any questions/comments please email carlc@comlab.ox.ac.uk This library can "co-exist" with zlib (libz) in case you need that too. Basically, it will allow you to build a library that you can link against to provide basic zip/unzip compression functionality. It should only add a few hundred KB to your app (basically like distributing `zip` & `unzip` executable binaries for different platforms). === Limitations === The "unzip" functionality is there, that is you can unzip a file and it will create all directories & files in the zip file. The "zip" functionality has some limitations due to the cross-platform nature: mainly it doens't provide zipping recursively (i.e. subdirectories); and wildcard handling is done using the "boinc_filelist" function which will be explained below. === Building === For Windows, you can just add the project "boinc_zip" to your Visual Studio "Solution" or "Workspace." Basically just "Insert Existing Project" from the Visual Studio IDE, navigate over to the boinc/zip directory, and it should load the appropriate files. You can then build "Debug" and "Release" versions of the library. Then just add the appropriate reference to "boinc_zip.lib" (Release build) or "boinc_zipd.lib" (Debug build) in your app. For Linux & Mac, you should be able to run "./configure" and then do a "make" to build the "libboinc_zip.a" lib that you will link against. In extreme cases, you may need to do an "aclocal && autoconf && automake" first, to build properly for your platform. Also, please note that boinc_zip relies on some BOINC functions that you will need (and will most likely be in your app already since they are handy) -- namely `boinc/lib/filesys.C` and `boinc/lib/util.C`. === Using === Basically, you will need to `#include "boinc_zip.h"` in your app (of course your compiler will need to know where it is, i.e. -I../boinc/zip). Then you can just call the function `boinc_zip` with the appropriate arguments to zip or unzip. There are three overloaded boinc_zip's provided: {{{ int boinc_zip(int bZipType, const std::string szFileZip, const ZipFileList* pvectszFileIn); int boinc_zip(int bZipType, const std::string szFileZip, const std::string szFileIn); int boinc_zip(int bZipType, const char* szFileZip, const char* szFileIn); }}} `bZipType` is `ZIP_IT` or `UNZIP_IT` (self-explanatory) `szFileZip` is the name of the zip file to create or extract (I assume the user will provide it with the .zip extension) The main differences are in the file parameter. The zip library used was exhibiting odd behavior when "coexisting" with unzip, particularly in the wildcard handling. So a function was made that creates a `ZipFileList` class, which is basically a vector of filenames. If you are just compressing a single file, you can use either the `std::string` or `const char* szFileIn` overrides. You can also just pass in a `*` or a `*.*` to zip up all files in a directory. To zip multiple files in a "mix & match" fashion, you can use the `boinc_filelist` function provided. Basically, it's a crude pattern matching of files in a directory, but it has been useful for us on the CPDN project. Just create a `ZipFileList` instance, and then pass this into `boinc_filelist` as follows: {{{ bool boinc_filelist(const std::string directory, const std::string pattern, ZipFileList* pList, const unsigned char ucSort = SORT_NAME | SORT_DESCENDING, const bool bClear = true); }}} if you want to zip up all text (.txt) files in a directory, just pass in: the directory as a `std::string`, the pattern, i.e. ".txt", `&yourZipList` The last two flags are the sort order of the file list (CPDN files need to be in a certain order -- descending filenames, which is why that's the default). The default is to "clear" your list, you can set that to `false` to keep adding files to your `ZipFileList`. When you have created your `ZipFileList` just pass that pointer to `boinc_zip`. You will be able to add files in other directories this way. There is a `ziptest` Project for Windows provided to experiment, which can also be run (the "ziptest.cpp") on Unix & Mac to experiment with how `boinc_zip` work (just g++ with the `boinc/lib/filesys.C` & `util.C` as described above).