fanf: (Default)
[personal profile] fanf
I'm baffled. gzip does its thing by calling the zlib deflate() function, though it wraps the compressed stream differently than deflate() normally does. I have an application where I don't need gzip's extra metadata, so I wrote a small program called deflate. They should perform the same apart from a small constant overhead for gzip, but they don't. As far as I can tell the only differences are minor details of buffer handling.
$ jot 40000 | gzip -9c | wc -c
   87733
$ jot 40000 | ./deflate | wc -c
   86224
$ jot 20000 | sed 's/.*/print "hello, world &"/' | ./lua-5.1.2/src/luac -s -o - - | gzip -9c | wc -c
   81470
$ jot 20000 | sed 's/.*/print "hello, world &"/' | ./lua-5.1.2/src/luac -s -o - - | ./deflate | wc -c
   82687

Date: 2008-03-04 21:30 (UTC)
ext_8103: (Default)
From: [identity profile] ewx.livejournal.com
etch's gzip at least (1.3.5) has its own deflate.c - it doesn't use zlib. So I don't think it should necessarily be a surprise that the output differs?

Lempel-Zif and jot

Date: 2008-03-05 00:27 (UTC)
From: [identity profile] alsuren.livejournal.com
For future reference: I don't think the output of jot 40000 is the best way to test the performance of an LZ-based compression library. In this special case, the repeated symbols are limited to about 5 chars in length, so it's impossible to build up a decent symbol table. It's interesting to see that (in this special case) the algorithms with smaller symbol tables can actually perform better.

for i in 1 2 3 4 5 6 7 8 9;
do
echo -n "${i} " ;
jot 40000 | gzip -${i}c | wc -c;
done
1 84206
2 69436
3 69850
4 87738
5 87578
6 87733
7 87733
8 87733
9 87733

Maybe cat /usr/share/dasher/training_english_GB.txt would be more representative.

January 2026

S M T W T F S
    123
45678910
1112 13 14151617
18192021222324
25262728293031

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags
Page generated 2026-01-27 01:40
Powered by Dreamwidth Studios