[Zlib-devel] zlib ideas for 1.2.5 (fwd)

Wed Feb 24 01:44:17 EST 2010

Here are some ideas from Carsten Haitzler (author of the enlightenment 
windows manager)

Vincent Torri

---------- Forwarded message ----------
Date: Wed, 24 Feb 2010 17:35:47 +1100
From: "Carsten Haitzler (The Rasterman)" <raster at rasterman.com>
To: vtorri at univ-evry.fr
Subject: zlib ideas

ok. one thing i do use zlib "dumbly" for is image data. As such i dont massage
this data before i throw it at zlib - i dont want to bother with a "massaging"
stage (making a copy of it in a different format to make spatially close pixels
be close in linear space) as this just adds a stage and overhead to
compression/decompression - yes, even if the files end up bigger. So... what
might be useful for these kinds of uses - and others is to be able to hint to
zlib that the data comes in a format it could make use of when hunting for data
to put into the dictionaries.

1. thing would be - from byte A for B bytes, data comes in rows of C bytes and
balues are of D bytes (1, 2, 4 etc.) each. ie:

........
A111
2222
3333
444B
......

that would imply that values in row 1 "line up" with values in line 2, line 3
etc. and that there likely is common data between rows. (in images spatially
near pixels will tend to have similar values - sometimes the same value. zlib
can just ignore this info if it likes, or make use of it when compressing by
reading source data in a different order (eg read a value then read values to
the right, below and diagonally to the bottom-right i.e.:

[V1][V2][..........]<-row1
[V3][V4][..........]<-row2

etc.)

of course thats the simple one - grouping int blocks of 4 values. you can
extend this into:

[V1][V2][V5][Va]
[V3][V4][V6][Vb]
[V7][V8][V9][Vc]
[Vd][Ve][Vf][Vg]

for a block of 4x4 (linearising the compression order to
V1,2,3,4...9,a,b,c...f,g). Of course only for when columns and rows are
multiples of 4 - the non multiple of 4 (or 2 as above) can be chopped off and
handles as a partial block.

Anyway - maybe this is asking zlib to become more than it should be. My aim is
not maximum compression - its good compression with absolute minimum overhead.
having zlib look at its input data a different way is a way of doing this. :)

Also another thing - being able to say that data ranges in the input data have
limits on their value ranges may help you? eg in many cases for a quartet of
bytes (v1,v2,v3,v4) v2 <= v1 && v3 <= v1 && v4 <= v4 - ALWAYS. would it help
zlib to know these things?

-- 
------------- Codito, ergo sum - "I code, therefore I am" --------------
The Rasterman (Carsten Haitzler)    raster at rasterman.com