[Zlib-devel] [PATCH 13/13] deflate: add new deflate_medium strategy

Thu Nov 28 14:48:16 EST 2013

To give some background on this new algorithm (and apologies upfront
for those already very familiar with how zlib works, there'll be a lot
of repetitive content for you)

Current zlib has two main strategies, deflate_fast and deflate_slow,
that are used at different compression levels (fast for the lower
levels, slow for the higher levels).

I'm going to make up an example to illustrate the workings of these as
well as deflate_medium; obviously the example is simplified to make it
easier to describe.

lets assume two possible sets of history:

history 1: abcdbcdefghijk
history 2: abcdbcdvwxyz

and the new, to compress, text of abcdefgh. Below is both the text and
position numbers to make it easier for me to refer to.

abcdefgh     <--- text
12345678    <--- position numbers

deflate_fast is a greedy algorithm, and what it does is basically
  at position 1, search for a match
  ---> a match of 4 is found for both hist 1 and hist 2 (abcd)
  emit a match of 4
  at position 5, search for a match  (the "5" is the greedy part)
  --> for hist 1, a match of 4 is found (efgh)
  --> for hist 2, no match is found
  for hist 1, a match of 4 is emitted
  for hist 2, a match of 1 is emitted (repeat four times)

hist1 output: 4,4  (for 2 searches)
hist2 output: 4,1,1,1,1 (for 5 searches)

deflate_slow is basically an exhaustive algorithm and does more or less
(history 1)
 at position 1, search for a match
  ---> a match of 4 is found (abcd)
 hold on to the thought of a match of 4
 at position 2, search for a match (this is the exhaustive part)
 --> a match of 7 is found (bcdefgh)
 a match of 7 beats the match of 4, so emit a length of 1, and now
remember the 7
 at position 3, search for a match
 --> a match of 6 is found (cdefgh)
 the match of 6 is not better than the 7, so emit the remembered 7
hist 1 output: 1,7 (for 3 searches)

(history 2)
 at position 1, search for a match
  ---> a match of 4 is found (abcd)
 hold on to the thought of a match of 4
 at position 2, search for a match (this is the exhaustive part)
 --> a match of 3 is found
 3 is not better than 4, so emit the remembered 4
 at position 5, search
  ->> no match found, emit a 1 (repeat 4 times)
hist 2 output: 4,1,1,1,1 (for 6 searches)

the advantage of deflate_slow in compression ratio comes from the 1,7
output being smaller than the 4,4 output.... at the expense of more
searches (and searches are expensive, they make up +/- 80% of the
execution time of zlib)

deflate_medium tries to strike a balance between the advantage of the
less-greedy deflate_slow in terms of compression ratio, while trying
to be close to the number of searches of deflate_fast.
so what deflate_medium does is this
(history 1)
  at position 1, search for a match
  ---> a match of 4 is found and we hold on to the thought of 4
  at position 5, search for a match  (position 5 is like deflate_fast,
deflate_slow looked at 2.. this is the speed gain)
  --> for hist 1, a match of 4 is found (efgh)
  now the magic part: for this match of 4 at position 5, we go look
backwards in the stream for this match, and only this match:
  if it had started at position 4, the match could have been 5 long
(one byte back matches!)
  if it had started at position 3, the match could have been 6 long
(second byte back matches!)
  if it had started at position 2, the match could have been 7 long
(third byte back matches!)
 at this point, the back search ends, and deflate_medium concludes
that emitting a match of 1 followed by a match of 7 is proper

so hist 1 output is  1,7 for 2 searches

this is the smaller (in bits) output of deflate_slow, but at the
search cost of deflate_fast!
(the back matching above is not the cost of a real search, a "search"
goes through the hash chains of all history, this is just comparing
single bytes that are already in the CPU cache by virtue of just
having done the match from position 1 onwards)

for hist 2, the algorithm flow for deflate_medium is the same as that
of deflate_fast, same output same cost (e.g. cheaper than deflate_slow
again for the same output size)

In these examples there is a moderate savings in searches.. real world
examples tend to show more savings.

In this example, there is a perfect storm of hitting the exact output
of deflate_slow at the cost of deflate_fast. The perfect storm is a
very common case, but is not always happening. There are some cases
where deflate_slow still compresses better:
Case 1: if deflate_fast would emit a match pattern of   3,1,1 while
deflate_slow would emit a match of 1,4... deflate_medium also does
3,1,1.
(this is because the minimum match length is 3, so a match of 2 is not
found as a match, hence the 1,1 and the _medium algorithm doesn't
backmatch)
Case 2: causality violations. Both deflate_slow and deflate_fast very
cleverly violate causality in some cases. Eg a stream of "aaaaaaa"
allows the search for the match to go into the future into it's own
match string (a match of 1 followed by a match of 6). I could not
convince myself that in the backsearching case said causality
violation can always be done perfectly correct, so if there is a
causality violation, the backmatch does not happen (falling back to
full greedy behavior)

On Mon, Nov 25, 2013 at 2:21 PM, Jim Kukunas
<james.t.kukunas at linux.intel.com> wrote:
> From: Arjan van de Ven <arjan at linux.intel.com>
>
> As the name suggests, the deflate_medium deflate strategy is designed
> to provide an intermediate strategy between deflate_fast and deflate_slow.
> After finding two adjacent matches, deflate_medium scans left from
> the second match in order to determine whether a better match can be
> formed.
> ---
>  configure        |    8 +-
>  deflate.c        |   13 ++
>  deflate_medium.c |  364 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 381 insertions(+), 4 deletions(-)
>  create mode 100644 deflate_medium.c
>
> diff --git a/configure b/configure
> index e9f2886..a90d0b8 100755
> --- a/configure
> +++ b/configure
> @@ -818,8 +818,8 @@ case "${ARCH}" in
>              CRC_FOLDING_lo=""
>          fi
>
> -        CFLAGS="${CFLAGS} -DUSE_QUICK"
> -        SFLAGS="${SFLAGS} -DUSE_QUICK"
> +        CFLAGS="${CFLAGS} -DUSE_QUICK -DUSE_MEDIUM"
> +        SFLAGS="${SFLAGS} -DUSE_QUICK -DUSE_MEDIUM"
>      ;;
>      i386 | i486 | i586 | i686)
>          OBJC="${OBJC} x86.o"
> @@ -858,8 +858,8 @@ case "${ARCH}" in
>              CRC_FOLDING_lo=""
>          fi
>
> -        CFLAGS="${CFLAGS} -DUSE_QUICK"
> -        SFLAGS="${SFLAGS} -DUSE_QUICK"
> +        CFLAGS="${CFLAGS} -DUSE_QUICK -DUSE_MEDIUM"
> +        SFLAGS="${SFLAGS} -DUSE_QUICK -DUSE_MEDIUM"
>      ;;
>  esac
>
> diff --git a/deflate.c b/deflate.c
> index 1a099d3..c0925ef 100644
> --- a/deflate.c
> +++ b/deflate.c
> @@ -82,6 +82,8 @@ local void fill_window    OF((deflate_state *s));
>  local block_state deflate_stored OF((deflate_state *s, int flush));
>  local block_state deflate_fast   OF((deflate_state *s, int flush));
>  local block_state deflate_quick  OF((deflate_state *s, int flush));
> +local block_state deflate_medium OF((deflate_state *s, int flush));
> +
>  #ifndef FASTEST
>  local block_state deflate_slow   OF((deflate_state *s, int flush));
>  #endif
> @@ -147,9 +149,16 @@ local const config configuration_table[10] = {
>  /* 3 */ {4,    6, 32,   32, deflate_fast},
>  #endif
>
> +#ifdef USE_MEDIUM
> +/* 4 */ {4,    4, 16,   16, deflate_medium},  /* lazy matches */
> +/* 5 */ {8,   16, 32,   32, deflate_medium},
> +/* 6 */ {8,   16, 128, 128, deflate_medium},
> +#else
>  /* 4 */ {4,    4, 16,   16, deflate_slow},  /* lazy matches */
>  /* 5 */ {8,   16, 32,   32, deflate_slow},
>  /* 6 */ {8,   16, 128, 128, deflate_slow},
> +#endif
> +
>  /* 7 */ {8,   32, 128, 256, deflate_slow},
>  /* 8 */ {32, 128, 258, 1024, deflate_slow},
>  /* 9 */ {32, 258, 258, 4096, deflate_slow}}; /* max compression */
> @@ -1617,6 +1626,10 @@ local block_state deflate_fast(s, flush)
>  #include "deflate_quick.c"
>  #endif
>
> +#ifdef USE_MEDIUM
> +#include "deflate_medium.c"
> +#endif
> +
>  #ifndef FASTEST
>  /* ===========================================================================
>   * Same as above, but achieves better compression. We use a lazy
> diff --git a/deflate_medium.c b/deflate_medium.c
> new file mode 100644
> index 0000000..5892843
> --- /dev/null
> +++ b/deflate_medium.c
> @@ -0,0 +1,364 @@
> +/*
> + * The deflate_medium deflate strategy
> + *
> + * Copyright (C) 2013 Intel Corporation. All rights reserved.
> + * Authors:
> + *  Arjan van de Ven    <arjan at linux.intel.com>
> + *
> + * For conditions of distribution and use, see copyright notice in zlib.h
> + */
> +#undef PRINT
> +#undef  PRINT_EMIT
> +
> +
> +#define likely(x)      __builtin_expect(!!(x), 1)
> +#define unlikely(x)    __builtin_expect(!!(x), 0)
> +
> +struct match {
> +    uInt       match_start;
> +    uInt       match_length;
> +    uInt       strstart;
> +    uInt       orgstart;
> +};
> +
> +
> +#define MAX_DIST2  ((1 << MAX_WBITS) - MIN_LOOKAHEAD)
> +static int tr_tally_dist(deflate_state *s, int distance, int length)
> +{
> +    return _tr_tally(s, distance, length);
> +}
> +
> +static int tr_tally_lit(deflate_state *s, int c)
> +{
> +    return  _tr_tally(s, 0, c);
> +}
> +
> +static void insert_check(int i)
> +{
> +#ifdef PRINT
> +    static int prev = - 1;
> +
> +    if (prev + 1 != i)
> +        printf("INSERT BUMP BY %i\n", i - prev - 1);
> +    prev = i;
> +#endif
> +}
> +
> +
> +static int emit_match(deflate_state *s, struct match match, IPos hash_head)
> +{
> +        int flush = 0;
> +
> +#ifdef PRINT_EMIT
> +        printf("Emitting match for pos %i, length %i ", match.strstart, match.match_length);
> +        if (match.match_length > 1)
> +            printf("from %i", match.match_start);
> +        printf("\n");
> +#endif
> +
> +        /* matches that are not long enough we need to emit as litterals */
> +        if (match.match_length < MIN_MATCH) {
> +            while (match.match_length) {
> +                    flush += tr_tally_lit (s, s->window[match.strstart]);
> +                    s->lookahead--;
> +                    match.strstart++;
> +                    match.match_length--;
> +            }
> +            return flush;
> +        }
> +
> +
> +
> +        check_match(s, match.strstart, match.match_start, match.match_length);
> +
> +        flush += tr_tally_dist(s, match.strstart - match.match_start,
> +                           match.match_length - MIN_MATCH);
> +
> +        s->lookahead -= match.match_length;
> +
> +        return flush;
> +}
> +
> +static void insert_match(deflate_state *s, struct match match)
> +{
> +        /* matches that are not long enough we need to emit as litterals */
> +        if (match.match_length < MIN_MATCH) {
> +            while (match.match_length) {
> +                    match.strstart++;
> +                    match.match_length--;
> +
> +                    if (match.match_length) {
> +                        if (match.strstart >= match.orgstart) {
> +#ifdef PRINT
> +                            printf("A\t\t\t\tInsert %i\n", match.strstart);
> +#endif
> +                            insert_string(s, match.strstart);
> +                            insert_check(match.strstart);
> +                        }
> +                    }
> +            }
> +            return;
> +        }
> +
> +
> +
> +
> +        /* Insert new strings in the hash table only if the match length
> +         * is not too large. This saves time but degrades compression.
> +         */
> +        if (match.match_length <=  16* s->max_insert_length &&
> +                    s->lookahead >= MIN_MATCH) {
> +                match.match_length--; /* string at strstart already in table */
> +                do {
> +                        match.strstart++;
> +                        if (likely(match.strstart >= match.orgstart)) {
> +#ifdef PRINT
> +                            printf("B\t\t\t\tInsert %i\n", match.strstart);
> +#endif
> +                            insert_string(s, match.strstart);
> +                            insert_check(match.strstart);
> +                        }
> +                    /* strstart never exceeds WSIZE-MAX_MATCH, so there are
> +                     * always MIN_MATCH bytes ahead.
> +                     */
> +                } while (--match.match_length != 0);
> +
> +                match.strstart++;
> +        } else {
> +                match.strstart += match.match_length;
> +                match.match_length = 0;
> +                s->ins_h = s->window[match.strstart];
> +                if (match.strstart >= 1)
> +                    UPDATE_HASH(s, s->ins_h, match.strstart+2-MIN_MATCH);
> +#if MIN_MATCH != 3
> +#warning Call UPDATE_HASH() MIN_MATCH-3 more times
> +#endif
> +        /* If lookahead < MIN_MATCH, ins_h is garbage, but it does not
> +         * matter since it will be recomputed at next deflate call.
> +         */
> +        }
> +}
> +
> +
> +static void fizzle_matches(deflate_state *s, struct match *current, struct match *next)
> +{
> +    IPos limit;
> +    unsigned char *match, *orig;
> +    int changed = 0;
> +    struct match c,n;
> +    /* step zero: sanity checks */
> +
> +    if (current->match_length <= 1)
> +            return;
> +
> +    match = s->window - current->match_length + 1 + next->match_start ;
> +    orig  = s->window - current->match_length + 1 + next->strstart ;
> +
> +    /* quick exit check.. if this fails then don't bother with anything else */
> +    if (likely(*match != *orig))
> +        return;
> +
> +
> +    /*
> +     * check the overlap case and just give up. We can do better in theory,
> +     * but unlikely to be worth it
> +     */
> +    if (next->match_start + next->match_length >= current->strstart)
> +            return;
> +
> +
> +    c = *current;
> +    n = *next;
> +
> +    /* step one: try to move the "next" match to the left as much as possible */
> +    limit = next->strstart > MAX_DIST2 ? next->strstart - MAX_DIST2 : 0;
> +
> +    match = s->window + n.match_start - 1;
> +    orig = s->window + n.strstart - 1;
> +
> +    while (*match == *orig && n.strstart > limit && n.match_length < 256 && c.match_length >= 1) {
> +        n.strstart--;
> +        n.match_start--;
> +        n.match_length++;
> +        c.match_length--;
> +        match--;
> +        orig--;
> +        changed ++;
> +    }
> +    if (!changed)
> +        return;
> +
> +
> +
> +
> +    if ( (c.match_length <= 1) && n.match_length != 2) {
> +        n.orgstart++;
> +        *current = c;
> +        *next = n;
> +    } else return;
> +
> +
> +#ifdef PRINT
> +    printf("Changed matches: (%i)\n", changed);
> +    printf("match1:       %5i ---> %5i for %i bytes\n", current->strstart, current->match_start, current->match_length);
> +    printf("match2:(%4i) %5i ---> %5i for %i bytes\n", next->orgstart, next->strstart, next->match_start, next->match_length);
> +    printf("\n");
> +#endif
> +
> +}
> +
> +block_state deflate_medium(deflate_state *s, int flush)
> +{
> +
> +    struct match current_match, next_match;
> +
> +    memset(&current_match, 0, sizeof(struct match));
> +    memset(&next_match, 0, sizeof(struct match));
> +
> +//    printf("Deflate medium\n");
> +
> +    for (;;) {
> +        IPos hash_head;       /* head of the hash chain */
> +
> +        int bflush;           /* set if current block must be flushed */
> +
> +
> +        /* Make sure that we always have enough lookahead, except
> +         * at the end of the input file. We need MAX_MATCH bytes
> +         * for the next match, plus MIN_MATCH bytes to insert the
> +         * string following the next current_match.
> +         */
> +        if (s->lookahead < MIN_LOOKAHEAD) {
> +            fill_window(s);
> +            if (s->lookahead < MIN_LOOKAHEAD && flush == Z_NO_FLUSH) {
> +                return need_more;
> +            }
> +            if (s->lookahead == 0) break; /* flush the current block */
> +            next_match.match_length = 0;
> +        }
> +        s->prev_length = 2;
> +
> +
> +        /* Insert the string window[strstart .. strstart+2] in the
> +         * dictionary, and set hash_head to the head of the hash chain:
> +         */
> +
> +        /* If we already have a future match from a previous round, just use that */
> +        if (next_match.match_length > 0) {
> +            current_match = next_match;
> +            next_match.match_length = 0;
> +
> +        } else {
> +            hash_head = 0;
> +            if (s->lookahead >= MIN_MATCH) {
> +                hash_head = insert_string(s, s->strstart);
> +                insert_check(s->strstart);
> +#ifdef PRINT
> +                    printf("C\t\t\t\tInsert %i\n", s->strstart);
> +#endif
> +            }
> +
> +            /* set up the initial match to be a 1 byte literal */
> +            current_match.match_start = 0;
> +            current_match.match_length = 1;
> +            current_match.strstart = s->strstart;
> +            current_match.orgstart = current_match.strstart;
> +#ifdef      PRINT
> +            printf("\t\t\t\tHash head is %i\n", hash_head);
> +#endif
> +
> +            /* Find the longest match, discarding those <= prev_length.
> +             * At this point we have always match_length < MIN_MATCH
> +             */
> +
> +            if (hash_head != 0 && s->strstart - hash_head <= MAX_DIST2) {
> +                /* To simplify the code, we prevent matches with the string
> +                 * of window index 0 (in particular we have to avoid a match
> +                 * of the string with itself at the start of the input file).
> +                 */
> +               current_match.match_length = longest_match (s, hash_head);
> +                current_match.match_start = s->match_start;
> +#ifdef PRINT
> +                printf("                                                                    Offset %8i:  Match from %i (%i)  \n", current_match.strstart, current_match.match_start, current_match.match_length);
> +#endif
> +                if (current_match.match_length < MIN_MATCH)
> +                    current_match.match_length = 1;
> +                if (current_match.match_start >= current_match.strstart) { /* this can happen due to some restarts */
> +//                    printf("Got a match length %i at %i for %i\n", current_match.match_length, current_match.match_start, current_match.strstart);
> +                    current_match.match_length = 1;
> +                }
> +            }
> +        }
> +
> +        insert_match(s, current_match);
> +#if 1
> +        /* now, look ahead one */
> +        if (s->lookahead > MIN_LOOKAHEAD) {
> +            s->strstart = current_match.strstart + current_match.match_length;
> +
> +            hash_head = insert_string(s, s->strstart);
> +            insert_check(s->strstart);
> +#ifdef PRINT
> +            printf("D\t\t\t\tInsert %i\n", s->strstart);
> +#endif
> +
> +            /* set up the initial match to be a 1 byte literal */
> +            next_match.match_start = 0;
> +            next_match.match_length = 1;
> +            next_match.strstart = s->strstart;
> +            next_match.orgstart = next_match.strstart;
> +
> +#ifdef PRINT
> +            printf("\t\t\t\tHash head is %i\n", hash_head);
> +#endif
> +
> +            /* Find the longest match, discarding those <= prev_length.
> +             * At this point we have always match_length < MIN_MATCH
> +             */
> +            if (hash_head != 0 && s->strstart - hash_head <= MAX_DIST2) {
> +                /* To simplify the code, we prevent matches with the string
> +                 * of window index 0 (in particular we have to avoid a match
> +                 * of the string with itself at the start of the input file).
> +                 */
> +               next_match.match_length = longest_match (s, hash_head);
> +                next_match.match_start = s->match_start;
> +#ifdef PRINT
> +                printf("                                                                    Next Offset %8i:  Match from %i (%i)  \n", next_match.strstart, next_match.match_start, next_match.match_length);
> +#endif
> +                if (next_match.match_start >= next_match.strstart) /* this can happen due to some restarts */
> +                    next_match.match_length = 1;
> +                if (next_match.match_length < MIN_MATCH)
> +                    next_match.match_length = 1;
> +                else
> +                    fizzle_matches(s, &current_match, &next_match);
> +            }
> +
> +            /* short matches with a very long distance are rarely a good idea encoding wise */
> +            if (next_match.match_length == 3 && (next_match.strstart - next_match.match_start) > 12000)
> +                    next_match.match_length = 1;
> +            s->strstart = current_match.strstart;
> +
> +        } else {
> +            next_match.match_length = 0;
> +        }
> +
> +#endif
> +        /* now emit the current match */
> +        bflush = emit_match(s, current_match, hash_head);
> +
> +        /* move the "cursor" forward */
> +        s->strstart += current_match.match_length;
> +
> +        if (bflush)
> +            FLUSH_BLOCK(s, 0);
> +    }
> +    s->insert = s->strstart < MIN_MATCH-1 ? s->strstart : MIN_MATCH-1;
> +    if (flush == Z_FINISH) {
> +        FLUSH_BLOCK(s, 1);
> +        return finish_done;
> +    }
> +    if (s->last_lit)
> +        FLUSH_BLOCK(s, 0);
> +    return block_done;
> +}
> +
> --
> 1.7.1
>
>
> _______________________________________________
> Zlib-devel mailing list
> Zlib-devel at madler.net
> http://mail.madler.net/mailman/listinfo/zlib-devel_madler.net