Skip to content

komm.LempelZiv77Code

Lempel–Ziv 77 (LZ77 or LZ1) code. It is a lossless data compression algorithm which is asymptotically optimal for ergodic sources. Let $\mathcal{X}$ be the source alphabet, and $\mathcal{Y}$ be the target alphabet. The notation used here is the following: $W \geq 2$ is the size of the sliding window, $S \in [1 : W)$ is the size of the search buffer, and $L \in [1 : W)$ is the size of the lookahead buffer, with $S + L = W$. The token format follows the original LZ77 paper, namely $(p, \ell, x)$, where $p \in [0 : S)$ is the pointer for the match, $\ell \in [0 : L)$ is the length of the match, and $x \in \mathcal{X}$ is the source symbol following the match, but with both $p$ and $\ell$ being $0$-indexed instead of $1$-indexed. Also following the LZ77 original paper, a token is represented as a fixed-size word in $\mathcal{Y}^n$, where $$n = \log S + \log L + \log |\mathcal{X}|$$ and all logs are to base $|\mathcal{Y}|$. For more details, see Say06, Sec. 5.4.1 and CT06, Sec. 13.4.1.

Note

Here, for simplicity, we assume that the source alphabet is $\mathcal{X} = [0 : |\mathcal{X}|)$ and the target alphabet is $\mathcal{Y} = [0 : |\mathcal{Y}|)$, where $|\mathcal{X}| \geq 2$ and $|\mathcal{Y}| \geq 2$ are called the source cardinality and target cardinality, respectively.

Parameters:

  • window_size (int)

    The sliding window size $W$. Must satisfy $W \geq 2$.

  • lookahead_size (int)

    The lookahead buffer size $L$. Must satisfy $1 \leq L < W$.

  • source_cardinality (int)

    The source cardinality $|\mathcal{X}|$. Must satisfy $|\mathcal{X}| \geq 2$.

  • target_cardinality (int)

    The target cardinality $|\mathcal{Y}|$. Must satisfy $|\mathcal{Y}| \geq 2$. The default value is $2$ (binary).

  • search_buffer (ArrayLike | None)

    The initial state of the search buffer. Must be a 1D-array of length $S$ with elements in $\mathcal{X}$. The default value corresponds to $(0, \ldots, 0) \in \mathcal{X}^S$.

Examples:

>>> lz77 = komm.LempelZiv77Code(
...     window_size=2**13,
...     lookahead_size=16,
...     source_cardinality=256,
...     target_cardinality=2,
... )

search_size int property

The search buffer size $S$. It is given by $S = W - L$.

source_to_tokens()

Encodes a given sequence of source symbols to the corresponding list of tokens.

Examples:

>>> lz77 = komm.LempelZiv77Code(
...     window_size=18,
...     lookahead_size=9,
...     source_cardinality=3,
...     target_cardinality=3,
... )
>>> lz77.source_to_tokens([0, 0, 1, 0, 1, 0, 2, 1, 0, 2, 1, 0, 2, 1, 2])
[(8, 2, 1), (7, 3, 2), (6, 7, 2)]

tokens_to_source()

Decodes a given list of tokens to the corresponding sequence of source symbols.

Examples:

>>> lz77 = komm.LempelZiv77Code(
...     window_size=18,
...     lookahead_size=9,
...     source_cardinality=3,
...     target_cardinality=3,
... )
>>> lz77.tokens_to_source([(8, 2, 1), (7, 3, 2), (6, 7, 2)])
array([0, 0, 1, 0, 1, 0, 2, 1, 0, 2, 1, 0, 2, 1, 2])

tokens_to_target()

Returns the target alphabet representation corresponding to a given list of tokens.

Examples:

>>> lz77 = komm.LempelZiv77Code(
...     window_size=18,
...     lookahead_size=9,
...     source_cardinality=3,
...     target_cardinality=3,
... )
>>> lz77.tokens_to_target([(8, 2, 1), (7, 3, 2), (6, 7, 2)])
array([2, 2, 0, 2, 1, 2, 1, 1, 0, 2, 2, 0, 2, 1, 2])

target_to_tokens()

Returns the list of tokens corresponding to a given target alphabet representation.

Examples:

>>> lz77 = komm.LempelZiv77Code(
...     window_size=18,
...     lookahead_size=9,
...     source_cardinality=3,
...     target_cardinality=3,
... )
>>> lz77.target_to_tokens([2, 2, 0, 2, 1, 2, 1, 1, 0, 2, 2, 0, 2, 1, 2])
[(8, 2, 1), (7, 3, 2), (6, 7, 2)]

encode()

Encodes a sequence of source symbols to a sequence of target symbols.

Parameters:

  • input (ArrayLike)

    The sequence of source symbols to be encoded. Must be a 1D-array with elements in $\mathcal{X}$.

Returns:

  • output (NDArray[integer])

    The sequence of encoded target symbols. It is a 1D-array with elements in $\mathcal{Y}$.

Examples:

>>> lz77 = komm.LempelZiv77Code(
...     window_size=18,
...     lookahead_size=9,
...     source_cardinality=3,
...     target_cardinality=3,
... )
>>> lz77.encode([0, 0, 1, 0, 1, 0, 2, 1, 0, 2, 1, 0, 2, 1, 2])
array([2, 2, 0, 2, 1, 2, 1, 1, 0, 2, 2, 0, 2, 1, 2])

decode()

Decodes a sequence of target symbols to a sequence of source symbols.

Parameters:

  • input (ArrayLike)

    The sequence of target symbols to be decoded. Must be a 1D-array with elements in $\mathcal{Y}$. Also, the sequence must be a valid output of the encode method.

Returns:

  • output (NDArray[integer])

    The sequence of decoded source symbols. It is a 1D-array with elements in $\mathcal{X}$.

Examples:

>>> lz77 = komm.LempelZiv77Code(
...     window_size=18,
...     lookahead_size=9,
...     source_cardinality=3,
...     target_cardinality=3,
... )
>>> lz77.decode([2, 2, 0, 2, 1, 2, 1, 1, 0, 2, 2, 0, 2, 1, 2])
array([0, 0, 1, 0, 1, 0, 2, 1, 0, 2, 1, 0, 2, 1, 2])