[Zlib-devel] Compressing volatile data?
jbowler at acm.org
jbowler at acm.org
Thu Sep 2 17:18:17 EDT 2010
From: Gary Cameron
>I was speculating on a possible cause - if parts of the data in the source
buffer being
>compressed is still "active" and changing during the compression, could
this cause a
>decompression failure?
Yes; deflate accesses the input buffer multiple times. You must copy it or
the result may have inconsistencies (most likely, as Greg says, checksum
failures - I don't think more serious problems would result.) You will end
up with a snapshot of the changing bytes from a sequence of points in time
as you copy the original data to your local buffer, but that's presumably
ok.
Copying the data is no big deal; you simply need to use a small (typically
stack) buffer of say 256 bytes and call zlib repeatedly. In fact doing
this may, somewhat counter-intuitively, speed up deflate. If you care about
speed experiment with the buffer size - start with 256 bytes and go up/down
until you stop getting any speed gains (you have to test lower sizes as well
as bigger.)
deflate repeatedly accesses the previous (up to) 32 kbytes of input data
looking for repeated strings. It does this using an internal hash table to
speed the search, but the hash table indexes the input data and deflate will
try for longer matches using the bytes from the hash. That means that if
the data in the buffer changes after deflate has compressed it to the output
a future reference to the buffer may get a false match (to newly altered
bytes). Within inflate the buffer is regenerated by decompressing the input
stream and doesn't change, so later matches will get the original data
(there is no record of the change) and the checksums won't match.
John Bowler <jbowler at acm.org>
More information about the Zlib-devel
mailing list