komm.LempelZiv77Code
Lempel–Ziv 77 (LZ77 or LZ1) code. It is a lossless data compression algorithm which is asymptotically optimal for ergodic sources. Let $\mathcal{X}$ be the source alphabet, and $\mathcal{Y}$ be the target alphabet. Also, let $W \geq 2$ be the size of the sliding window, $S \in [1 : W)$ be the size of the search buffer, and $L \in [1 : W)$ be the size of the lookahead buffer, with $S + L = W$. For more details, see Say06, Sec. 5.4.1 and CT06, Sec. 13.4.1.
Note
Here, for simplicity, we assume that the source alphabet is $\mathcal{X} = [0 : |\mathcal{X}|)$ and the target alphabet is $\mathcal{Y} = [0 : |\mathcal{Y}|)$, where $|\mathcal{X}| \geq 2$ and $|\mathcal{Y}| \geq 2$ are called the source cardinality and target cardinality, respectively.
The token format follows the original LZ77 paper, namely $(p, \ell, x)$, where $p \in [0 : S)$ is the pointer for the match, $\ell \in [0 : L)$ is the length of the match, and $x \in \mathcal{X}$ is the source symbol following the match, but with both $p$ and $\ell$ being $0$-indexed instead of $1$-indexed. Also following the LZ77 original paper, a token is represented as a fixed-size word in $\mathcal{Y}^n$, where $$n = \log S + \log L + \log |\mathcal{X}|$$ and all logs are to base $|\mathcal{Y}|$.
Parameters:
-
window_size
(int
) –The sliding window size $W$. Must satisfy $W \geq 2$.
-
lookahead_size
(int
) –The lookahead buffer size $L$. Must satisfy $1 \leq L < W$.
-
source_cardinality
(int
) –The source cardinality $|\mathcal{X}|$. Must satisfy $|\mathcal{X}| \geq 2$.
-
target_cardinality
(int
) –The target cardinality $|\mathcal{Y}|$. Must satisfy $|\mathcal{Y}| \geq 2$. The default value is $2$ (binary).
-
search_buffer
(ArrayLike | None
) –The initial state of the search buffer. Must be a 1D-array of length $S$ with elements in $\mathcal{X}$. The default value corresponds to $(0, \ldots, 0) \in \mathcal{X}^S$.
Examples:
>>> lz77 = komm.LempelZiv77Code(
... window_size=2**13,
... lookahead_size=16,
... source_cardinality=256,
... target_cardinality=2,
... )
search_size
int
property
The search buffer size $S$. It is given by $S = W - L$.
source_to_tokens()
Encodes a given sequence of source symbols to the corresponding list of tokens.
Examples:
>>> lz77 = komm.LempelZiv77Code(
... window_size=18,
... lookahead_size=9,
... source_cardinality=3,
... target_cardinality=3,
... )
>>> lz77.source_to_tokens([0, 0, 1, 0, 1, 0, 2, 1, 0, 2, 1, 0, 2, 1, 2])
[(8, 2, 1), (7, 3, 2), (6, 7, 2)]
tokens_to_source()
Decodes a given list of tokens to the corresponding sequence of source symbols.
Examples:
>>> lz77 = komm.LempelZiv77Code(
... window_size=18,
... lookahead_size=9,
... source_cardinality=3,
... target_cardinality=3,
... )
>>> lz77.tokens_to_source([(8, 2, 1), (7, 3, 2), (6, 7, 2)])
array([0, 0, 1, 0, 1, 0, 2, 1, 0, 2, 1, 0, 2, 1, 2])
tokens_to_target()
Returns the target alphabet representation corresponding to a given list of tokens.
Examples:
>>> lz77 = komm.LempelZiv77Code(
... window_size=18,
... lookahead_size=9,
... source_cardinality=3,
... target_cardinality=3,
... )
>>> lz77.tokens_to_target([(8, 2, 1), (7, 3, 2), (6, 7, 2)])
array([2, 2, 0, 2, 1, 2, 1, 1, 0, 2, 2, 0, 2, 1, 2])
target_to_tokens()
Returns the list of tokens corresponding to a given target alphabet representation.
Examples:
>>> lz77 = komm.LempelZiv77Code(
... window_size=18,
... lookahead_size=9,
... source_cardinality=3,
... target_cardinality=3,
... )
>>> lz77.target_to_tokens([2, 2, 0, 2, 1, 2, 1, 1, 0, 2, 2, 0, 2, 1, 2])
[(8, 2, 1), (7, 3, 2), (6, 7, 2)]
encode()
Encodes a sequence of source symbols to a sequence of target symbols.
Parameters:
-
input
(ArrayLike
) –The sequence of source symbols to be encoded. Must be a 1D-array with elements in $\mathcal{X}$.
Returns:
-
output
(NDArray[integer]
) –The sequence of encoded target symbols. It is a 1D-array with elements in $\mathcal{Y}$.
Examples:
>>> lz77 = komm.LempelZiv77Code(
... window_size=18,
... lookahead_size=9,
... source_cardinality=3,
... target_cardinality=3,
... )
>>> lz77.encode([0, 0, 1, 0, 1, 0, 2, 1, 0, 2, 1, 0, 2, 1, 2])
array([2, 2, 0, 2, 1, 2, 1, 1, 0, 2, 2, 0, 2, 1, 2])
decode()
Decodes a sequence of target symbols to a sequence of source symbols.
Parameters:
-
input
(ArrayLike
) –The sequence of target symbols to be decoded. Must be a 1D-array with elements in $\mathcal{Y}$. Also, the sequence must be a valid output of the
encode
method.
Returns:
-
output
(NDArray[integer]
) –The sequence of decoded source symbols. It is a 1D-array with elements in $\mathcal{X}$.
Examples:
>>> lz77 = komm.LempelZiv77Code(
... window_size=18,
... lookahead_size=9,
... source_cardinality=3,
... target_cardinality=3,
... )
>>> lz77.decode([2, 2, 0, 2, 1, 2, 1, 1, 0, 2, 2, 0, 2, 1, 2])
array([0, 0, 1, 0, 1, 0, 2, 1, 0, 2, 1, 0, 2, 1, 2])