Thread 'Great Idea! Save ~50% of bandwidth for free !'

Message boards : BOINC client : Great Idea! Save ~50% of bandwidth for free !
Message board moderation

To post messages, you must log in.

AuthorMessage
CobraPL

Send message
Joined: 1 Oct 06
Posts: 6
Poland
Message 8551 - Posted: 5 Mar 2007, 15:21:04 UTC
Last modified: 5 Mar 2007, 15:24:28 UTC

I just realized, that Boinc uses gnuzip compression. I tried different algorithms than deflate (zip and gzip use deflate). The best one is… PPMD !!! It beats even LZMA (for project files) (slightly on compression ratio, 2x-3x on compression speed). WinRAR/WinACE/7-zip use LZMA.

Just download popular, free, 7-zip. Unpack *.gz files downloaded by BOINC and use 7-zip with 7z archive with PPMD method (16mb dictionary is ok). You will se big difference.

7-zip is open-source. BOINC authors can contact Igor Pavlov* for (permission to use of) source code. He uses modified PPMD, which is probably better than original. Also, everyone can google for source code of PPMD implementation.
* - author of 7-zip.

So, lets check it out, there are terabytes to save !!!

Peter Skrodzewicz
Contact: PUT_MY_LAST_NAME_SHOWN_ABOVE@gmail.com
ID: 8551 · Report as offensive
Nicolas

Send message
Joined: 19 Jan 07
Posts: 1179
Argentina
Message 8599 - Posted: 7 Mar 2007, 18:40:06 UTC

BOINC doesn't use any compression. Lazy projects can make BOINC use gzip automatically, but still, most of the time projects do compression from within the application, and thus can use use any compression they want.

I use LZMA on Renderfarm@Home.
ID: 8599 · Report as offensive
CobraPL

Send message
Joined: 1 Oct 06
Posts: 6
Poland
Message 8861 - Posted: 18 Mar 2007, 21:08:02 UTC
Last modified: 18 Mar 2007, 21:10:27 UTC

Be advised. It looks like, that 7-zip LZMA beats Winrar and PPMD:
http://boinc.bakerlab.org/rosetta/forum_thread.php?id=2992

bbdep02.May.sortlib.gz - 6665378 bytes
bbdep02.May.sortlib - 50417532 bytes
bbdep02.May.7z - 3599799 bytes (3.5mb/sec compression speed - Core Duo 2Ghz, PPMD 16mb dictionary size).
bbdep02.May.rar - 3819968
bbdep02.May.sortlib.bz2 - 3688826 bytes - but bzip2 is slow...
bbdep02.May.sortlib.7z - 2949796 bytes !!! - 7-ZIP LZMA 16mb dictionary.

So, all project maintainers: please use LZMA for data files. Also try UPX exe packer with ""--lzma --best"" option for EXE files.

About UPX case:
http://boinc.bakerlab.org/rosetta/forum_thread.php?id=2852

Project maintainers can save gigabytes of bandwidth !!! Really.
ID: 8861 · Report as offensive
Nicolas

Send message
Joined: 19 Jan 07
Posts: 1179
Argentina
Message 8865 - Posted: 19 Mar 2007, 3:08:47 UTC

Heh, I had an idea on a BOINC project to compress files, by trying out an insane amount of different algorithms, settings, and pre-filters. For example, for one particular text file I had, converting from Unix line endings to DOS line endings made the file obviously a big bigger, but also made LZMA (PPMd) get better results on it! As long as that "filter" is reversible, it could give better or worse results. The idea is trying lots of combinations of filters and compression algorithms and keeping the smallest :)

Compressing multi-GB files wouldn't be easy though, at least not if the project is public. The full uncompressed file (or compressed with a known-but-maybe-not-optimal setting) would need to be sent to all clients... Anyway, was just an idea, never did any real code about it.
ID: 8865 · Report as offensive
MikeMarsUK

Send message
Joined: 16 Apr 06
Posts: 386
United Kingdom
Message 8872 - Posted: 19 Mar 2007, 8:47:38 UTC


There are also CPU costs involved with better compression algorithms. If it saves 10% of download bandwidth over a lesser algorithm, but increases decompression time, then for some people it's a good thing (dial-up users), but for others it's a bad thing (uncapped ADSL users). Similarly the project servers may have a similar balancing act - is it CPU or bandwidth which is the bottleneck in a particular situation?

ID: 8872 · Report as offensive
CobraPL

Send message
Joined: 1 Oct 06
Posts: 6
Poland
Message 8885 - Posted: 19 Mar 2007, 15:00:46 UTC - in response to Message 8872.  
Last modified: 19 Mar 2007, 15:02:48 UTC


There are also CPU costs involved with better compression algorithms. If it saves 10% of download bandwidth over a lesser algorithm, but increases decompression time, then for some people it's a good thing (dial-up users), but for others it's a bad thing (uncapped ADSL users). Similarly the project servers may have a similar balancing act - is it CPU or bandwidth which is the bottleneck in a particular situation?

LZMA features:

* Compressing speed: 500 KB/s on 1 GHz CPU
* Decompressing speed:
o 8-12 MB/s on 1 GHz Intel Pentium 3 or AMD Athlon.
o 500-1000 KB/s on 100 MHz ARM, MIPS, PowerPC or other simple RISC CPU.
* Small memory requirements for decompressing: 8-32 KB + DictionarySize
* Small code size for decompressing: 2-8 KB (depending from speed optimizations)

LZMA decoder uses only integer operations and can be implemented in any modern 32-bit CPU (or on 16-bit CPU with some conditions).


And not 10%, but ~50% gain. Big projects use giga- or terabytes of bandwidth monthly. IMO it is quite expensive...
ID: 8885 · Report as offensive

Message boards : BOINC client : Great Idea! Save ~50% of bandwidth for free !

Copyright © 2025 University of California.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.