komm.LempelZivSSCode
Lempel–Ziv–Storer–Szymanski (LZSS) code. It is a lossless data compression algorithm which is a variation of the Lempel–Ziv 77 algorithm. Let $\mathcal{X}$ be the source alphabet, $\mathcal{Y}$ be the target alphabet, $S \geq 1$ be the size of the search buffer, and $L \geq 1$ be the size of the lookahead buffer. The token format follows CT06, Sec. 13.4.1, where a token is either a literal $(0, x)$, where $x \in \mathcal{X}$ is a source symbol, or a reference $(1, p, \ell)$, where $p \in [1 : S]$ is the pointer (location of the beginning of the match, measured backward from the end of the search window), and $\ell \in [1 : L]$ is the length of the match. References are only encoded if they provide compression benefit (length exceeds the break-even point). For more details, see CT06, Sec. 13.4.1.
Note
Here, for simplicity, we assume that the source alphabet is $\mathcal{X} = [0 : |\mathcal{X}|)$ and the target alphabet is $\mathcal{Y} = [0 : |\mathcal{Y}|)$, where $|\mathcal{X}| \geq 2$ and $|\mathcal{Y}| \geq 2$ are called the source cardinality and target cardinality, respectively.
Parameters:
-
search_size(int) –The search buffer size $S$. Must satisfy $S \geq 1$.
-
lookahead_size(int) –The lookahead buffer size $L$. Must satisfy $L \geq 1$.
-
source_cardinality(int) –The source cardinality $|\mathcal{X}|$. Must satisfy $|\mathcal{X}| \geq 2$.
-
target_cardinality(int) –The target cardinality $|\mathcal{Y}|$. Must satisfy $|\mathcal{Y}| \geq 2$. The default value is $2$ (binary).
-
search_buffer(ArrayLike | None) –The initial state of the search buffer. Must be a 1D-array of length $S$ with elements in $\mathcal{X}$. The default value corresponds to $(0, \ldots, 0) \in \mathcal{X}^S$.
Examples:
>>> lzss = komm.LempelZivSSCode(
... search_size=2**13,
... lookahead_size=16,
... source_cardinality=256,
... target_cardinality=2,
... )
window_size int
property
The sliding window size $W$. It is given by $W = S + L$.
break_even int
property
The break-even point for encoding a reference. A match must be at least this long to be worth encoding as a reference instead of literals. It is given by $$ \left\lceil \frac{1 + \lceil \log S \rceil + \lceil \log L \rceil}{1+ \lceil \log |\mathcal{X}| \rceil} \right\rceil, $$ where all logs are to base $|\mathcal{Y}|$.
source_to_tokens()
Encodes a given sequence of source symbols to the corresponding list of tokens.
Examples:
>>> lzss = komm.LempelZivSSCode(
... search_size=8,
... lookahead_size=4,
... source_cardinality=4,
... )
>>> lzss.source_to_tokens([3, 0, 0, 0, 3, 2, 0, 3])
[(0, 3), (1, 4, 4), (0, 2), (1, 3, 2)]
tokens_to_source()
Decodes a given list of tokens to the corresponding sequence of source symbols.
Examples:
>>> lzss = komm.LempelZivSSCode(
... search_size=8,
... lookahead_size=4,
... source_cardinality=4,
... )
>>> lzss.tokens_to_source([(0, 3), (1, 4, 4), (0, 2), (1, 3, 2)])
array([3, 0, 0, 0, 3, 2, 0, 3])
tokens_to_target()
Returns the target alphabet representation corresponding to a given list of tokens.
Examples:
>>> lzss = komm.LempelZivSSCode(
... search_size=8,
... lookahead_size=4,
... source_cardinality=4,
... )
>>> lzss.tokens_to_target([(0, 3), (1, 4, 4), (0, 2), (1, 3, 2)])
array([0, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 0, 1, 0, 1, 0, 0, 1])
target_to_tokens()
Returns the list of tokens corresponding to a given target alphabet representation.
Examples:
>>> lzss = komm.LempelZivSSCode(
... search_size=8,
... lookahead_size=4,
... source_cardinality=4,
... )
>>> lzss.target_to_tokens([0, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 0, 1, 0, 1, 0, 0, 1])
[(0, 3), (1, 4, 4), (0, 2), (1, 3, 2)]
encode()
Encodes a sequence of source symbols to a sequence of target symbols.
Parameters:
-
input(ArrayLike) –The sequence of source symbols to be encoded. Must be a 1D-array with elements in $\mathcal{X}$.
Returns:
-
output(NDArray[integer]) –The sequence of encoded target symbols. It is a 1D-array with elements in $\mathcal{Y}$.
Examples:
>>> lzss = komm.LempelZivSSCode(
... search_size=8,
... lookahead_size=4,
... source_cardinality=4,
... )
>>> lzss.encode([3, 0, 0, 0, 3, 2, 0, 3])
array([0, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 0, 1, 0, 1, 0, 0, 1])
decode()
Decodes a sequence of target symbols to a sequence of source symbols.
Parameters:
-
input(ArrayLike) –The sequence of target symbols to be decoded. Must be a 1D-array with elements in $\mathcal{Y}$. Also, the sequence must be a valid output of the
encodemethod.
Returns:
-
output(NDArray[integer]) –The sequence of decoded source symbols. It is a 1D-array with elements in $\mathcal{X}$.
Examples:
>>> lzss = komm.LempelZivSSCode(
... search_size=8,
... lookahead_size=4,
... source_cardinality=4,
... )
>>> lzss.decode([0, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 0, 1, 0, 1, 0, 0, 1])
array([3, 0, 0, 0, 3, 2, 0, 3])