komm.LempelZiv77Code

Lempel–Ziv 77 (LZ77 or LZ1) code. It is a lossless data compression algorithm which is asymptotically optimal for ergodic sources. Let $\mathcal{X}$ be the source alphabet, and $\mathcal{Y}$ be the target alphabet. The notation used here is the following: $S \geq 1$ is the size of the search buffer, $L \geq 1$ is the size of the lookahead buffer, and $W = S + L$ is the size of the sliding window. The token format follows the original LZ77 paper, namely $(p, \ell, x)$, where $p \in [0 : S)$ is the pointer for the match, $\ell \in [0 : L)$ is the length of the match, and $x \in \mathcal{X}$ is the source symbol following the match, but with both $p$ and $\ell$ being $0$-indexed instead of $1$-indexed. Also following the LZ77 original paper, a token is represented as a fixed-size word in $\mathcal{Y}^n$, where $$n = \log S + \log L + \log |\mathcal{X}|$$ and all logs are to base $|\mathcal{Y}|$. For more details, see Say06, Sec. 5.4.1 and CT06, Sec. 13.4.1.

Note

Here, for simplicity, we assume that the source alphabet is $\mathcal{X} = [0 : |\mathcal{X}|)$ and the target alphabet is $\mathcal{Y} = [0 : |\mathcal{Y}|)$, where $|\mathcal{X}| \geq 2$ and $|\mathcal{Y}| \geq 2$ are called the source cardinality and target cardinality, respectively.

Parameters:

search_size (int) –

The search buffer size $S$. Must satisfy $S \geq 1$.
lookahead_size (int) –

The lookahead buffer size $L$. Must satisfy $L \geq 1$.
source_cardinality (int) –

The source cardinality $|\mathcal{X}|$. Must satisfy $|\mathcal{X}| \geq 2$.
target_cardinality (int) –

The target cardinality $|\mathcal{Y}|$. Must satisfy $|\mathcal{Y}| \geq 2$. The default value is $2$ (binary).
search_buffer (ArrayLike | None) –

The initial state of the search buffer. Must be a 1D-array of length $S$ with elements in $\mathcal{X}$. The default value corresponds to $(0, \ldots, 0) \in \mathcal{X}^S$.

Examples:

>>> lz77 = komm.LempelZiv77Code(
...     search_size=2**13,
...     lookahead_size=16,
...     source_cardinality=256,
...     target_cardinality=2,
... )

`window_sizeint` `property`

The sliding window size $W$. It is given by $W = S + L$.

`source_to_tokens()`

Encodes a given sequence of source symbols to the corresponding list of tokens.

Examples:

>>> lz77 = komm.LempelZiv77Code(
...     search_size=9,
...     lookahead_size=9,
...     source_cardinality=3,
...     target_cardinality=3,
... )
>>> lz77.source_to_tokens([0, 0, 1, 0, 1, 0, 2, 1, 0, 2, 1, 0, 2, 1, 2])
[(8, 2, 1), (7, 3, 2), (6, 7, 2)]

`tokens_to_source()`

Decodes a given list of tokens to the corresponding sequence of source symbols.

Examples:

>>> lz77 = komm.LempelZiv77Code(
...     search_size=9,
...     lookahead_size=9,
...     source_cardinality=3,
...     target_cardinality=3,
... )
>>> lz77.tokens_to_source([(8, 2, 1), (7, 3, 2), (6, 7, 2)])
array([0, 0, 1, 0, 1, 0, 2, 1, 0, 2, 1, 0, 2, 1, 2])

`tokens_to_target()`

Returns the target alphabet representation corresponding to a given list of tokens.

Examples:

>>> lz77 = komm.LempelZiv77Code(
...     search_size=9,
...     lookahead_size=9,
...     source_cardinality=3,
...     target_cardinality=3,
... )
>>> lz77.tokens_to_target([(8, 2, 1), (7, 3, 2), (6, 7, 2)])
array([2, 2, 0, 2, 1, 2, 1, 1, 0, 2, 2, 0, 2, 1, 2])

`target_to_tokens()`

Returns the list of tokens corresponding to a given target alphabet representation.

Examples:

>>> lz77 = komm.LempelZiv77Code(
...     search_size=9,
...     lookahead_size=9,
...     source_cardinality=3,
...     target_cardinality=3,
... )
>>> lz77.target_to_tokens([2, 2, 0, 2, 1, 2, 1, 1, 0, 2, 2, 0, 2, 1, 2])
[(8, 2, 1), (7, 3, 2), (6, 7, 2)]

`encode()`

Encodes a sequence of source symbols to a sequence of target symbols.

Parameters:

input (ArrayLike) –

The sequence of source symbols to be encoded. Must be a 1D-array with elements in $\mathcal{X}$.

Returns:

output (NDArray[integer]) –

The sequence of encoded target symbols. It is a 1D-array with elements in $\mathcal{Y}$.

Examples:

>>> lz77 = komm.LempelZiv77Code(
...     search_size=9,
...     lookahead_size=9,
...     source_cardinality=3,
...     target_cardinality=3,
... )
>>> lz77.encode([0, 0, 1, 0, 1, 0, 2, 1, 0, 2, 1, 0, 2, 1, 2])
array([2, 2, 0, 2, 1, 2, 1, 1, 0, 2, 2, 0, 2, 1, 2])

`decode()`

Decodes a sequence of target symbols to a sequence of source symbols.

Parameters:

input (ArrayLike) –

The sequence of target symbols to be decoded. Must be a 1D-array with elements in $\mathcal{Y}$. Also, the sequence must be a valid output of the encode method.

Returns:

output (NDArray[integer]) –

The sequence of decoded source symbols. It is a 1D-array with elements in $\mathcal{X}$.

Examples:

>>> lz77 = komm.LempelZiv77Code(
...     search_size=9,
...     lookahead_size=9,
...     source_cardinality=3,
...     target_cardinality=3,
... )
>>> lz77.decode([2, 2, 0, 2, 1, 2, 1, 1, 0, 2, 2, 0, 2, 1, 2])
array([0, 0, 1, 0, 1, 0, 2, 1, 0, 2, 1, 0, 2, 1, 2])

komm.LempelZiv77Code

window_size int property

source_to_tokens()

tokens_to_source()

tokens_to_target()

target_to_tokens()

encode()

decode()

`window_sizeint` `property`

`source_to_tokens()`

`tokens_to_source()`

`tokens_to_target()`

`target_to_tokens()`

`encode()`

`decode()`