[Zlib-devel] Parallel zlib

Tue Feb 16 19:23:46 EST 2010

Looks to me like the first option is the best. What kind of block size are we talking about? 128KB? That would not that bad memory wise for an 8 thread CPU. 2MB instead of 1MB of buffers because of duplication, who cares! People with 4-core CPUs running HT ON will anyway install their systems with 6GB of RAM at the minimum.

-devsk

----- Original Message ----
> From: Mark Adler <madler at alumni.caltech.edu>
> To: zlib-devel at zlib.net
> Sent: Mon, February 15, 2010 11:14:14 PM
> Subject: Re: [Zlib-devel] Parallel zlib
> 
> On Feb 15, 2010, at 10:24 PM, devsk wrote:
> > Can we at least start to discuss what a parallel API may look like?
> 
> Sure.
> 
> Off the top of my head, there are a few approaches depending on the control 
> desired.
> 
> 1.  Add deflateParallel(strm, nthreads, blocksize) to be called after 
> deflateInit().  When deflate() is provided input, it fires off up to nthreads 
> separate threads, one for each blocksize of input data.  When deflate() is 
> provided output space, it waits for the next thread in sequence to finish and 
> provides that compressed data.  Perhaps there could be a new flush mode for 
> input without waiting for output.  This approach has the downside of deflate 
> having to copy all of the input and buffer lots of output on its own, possibly 
> duplicating the application's input and output buffers and using lots of memory.
> 
> 2.  Add parallel wrapper routines on top of the existing deflate, without 
> changing the existing deflate routines.  (Also avoids linking threads library 
> when not using threads.)  Allow the application to fire off threads on its own, 
> wait for them to complete when it wants to wait, and manage its own buffer 
> space.  A routine would be provided to recombine the output and compute the 
> trailer information (crc or adler32).  Maybe something like:
> 
>     pool = deflatePool(nthreads, deflate parameters) -- initialize a pool of 
> threads for use by deflateLaunch, creating up to nthreads threads as needed and 
> reusing them
> 
>     id = deflateDive(pool, inbuf, inlen, dict, dictlen, last, outbuf, &outlen, 
> &lastbits, howcheck, &check) -- start a raw compression job, may use dictionary, 
> may compute check value
> 
>     deflateRescue(pool, id) -- wait for job id to finish, then can consume 
> output and check value and can reuse buffers
> 
>     gluestate = deflateGlue(gluestate, outlen, lastbits, glue, &gluelen, 
> howcheck &check) -- append output by providing glue bytes (independent of pool), 
> first call with NULL gluestate initializes state
> 
>     deflateEmpty(pool) -- wait for all jobs to finish and release all thread 
> resources
> 
>     deflateDrown(pool) -- kill any running jobs and release all thread resources
> 
> This has the downside that the user has to keep track of the jobs, to not mess 
> with the buffers it provided until the jobs are done, and to correctly assemble 
> the output.  I can already imagine all the bug reports resulting from pilot 
> error ...
> 
> 3.  Do #2 above, and then add a wrapper around *that* to provide the simplified 
> functionality of #1.
> 
> Mark
> 
> 
> _______________________________________________
> Zlib-devel mailing list
> Zlib-devel at madler.net
> http://mail.madler.net/mailman/listinfo/zlib-devel_madler.net