Skip to content

komm.LempelZivSSCode

Lempel–Ziv–Storer–Szymanski (LZSS) code. It is a lossless data compression algorithm which is a variation of the Lempel–Ziv 77 algorithm. Let $\mathcal{X}$ be the source alphabet, $\mathcal{Y}$ be the target alphabet, $S \geq 1$ be the size of the search buffer, and $L \geq 1$ be the size of the lookahead buffer. The token format follows CT06, Sec. 13.4.1, where a token is either a literal $(0, x)$, where $x \in \mathcal{X}$ is a source symbol, or a reference $(1, p, \ell)$, where $p \in [1 : S]$ is the pointer (location of the beginning of the match, measured backward from the end of the search window), and $\ell \in [1 : L]$ is the length of the match. References are only encoded if they provide compression benefit (length exceeds the break-even point). For more details, see CT06, Sec. 13.4.1.

Note

Here, for simplicity, we assume that the source alphabet is $\mathcal{X} = [0 : |\mathcal{X}|)$ and the target alphabet is $\mathcal{Y} = [0 : |\mathcal{Y}|)$, where $|\mathcal{X}| \geq 2$ and $|\mathcal{Y}| \geq 2$ are called the source cardinality and target cardinality, respectively.

Parameters:

  • search_size (int)

    The search buffer size $S$. Must satisfy $S \geq 1$.

  • lookahead_size (int)

    The lookahead buffer size $L$. Must satisfy $L \geq 1$.

  • source_cardinality (int)

    The source cardinality $|\mathcal{X}|$. Must satisfy $|\mathcal{X}| \geq 2$.

  • target_cardinality (int)

    The target cardinality $|\mathcal{Y}|$. Must satisfy $|\mathcal{Y}| \geq 2$. The default value is $2$ (binary).

  • search_buffer (ArrayLike | None)

    The initial state of the search buffer. Must be a 1D-array of length $S$ with elements in $\mathcal{X}$. The default value corresponds to $(0, \ldots, 0) \in \mathcal{X}^S$.

Examples:

>>> lzss = komm.LempelZivSSCode(
...     search_size=2**13,
...     lookahead_size=16,
...     source_cardinality=256,
...     target_cardinality=2,
... )

window_size int property

The sliding window size $W$. It is given by $W = S + L$.

break_even int property

The break-even point for encoding a reference. A match must be at least this long to be worth encoding as a reference instead of literals. It is given by $$ \left\lceil \frac{1 + \lceil \log S \rceil + \lceil \log L \rceil}{1+ \lceil \log |\mathcal{X}| \rceil} \right\rceil, $$ where all logs are to base $|\mathcal{Y}|$.

source_to_tokens()

Encodes a given sequence of source symbols to the corresponding list of tokens.

Examples:

>>> lzss = komm.LempelZivSSCode(
...     search_size=8,
...     lookahead_size=4,
...     source_cardinality=4,
... )
>>> lzss.source_to_tokens([3, 0, 0, 0, 3, 2, 0, 3])
[(0, 3), (1, 4, 4), (0, 2), (1, 3, 2)]

tokens_to_source()

Decodes a given list of tokens to the corresponding sequence of source symbols.

Examples:

>>> lzss = komm.LempelZivSSCode(
...     search_size=8,
...     lookahead_size=4,
...     source_cardinality=4,
... )
>>> lzss.tokens_to_source([(0, 3), (1, 4, 4), (0, 2), (1, 3, 2)])
array([3, 0, 0, 0, 3, 2, 0, 3])

tokens_to_target()

Returns the target alphabet representation corresponding to a given list of tokens.

Examples:

>>> lzss = komm.LempelZivSSCode(
...     search_size=8,
...     lookahead_size=4,
...     source_cardinality=4,
... )
>>> lzss.tokens_to_target([(0, 3), (1, 4, 4), (0, 2), (1, 3, 2)])
array([0, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 0, 1, 0, 1, 0, 0, 1])

target_to_tokens()

Returns the list of tokens corresponding to a given target alphabet representation.

Examples:

>>> lzss = komm.LempelZivSSCode(
...     search_size=8,
...     lookahead_size=4,
...     source_cardinality=4,
... )
>>> lzss.target_to_tokens([0, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 0, 1, 0, 1, 0, 0, 1])
[(0, 3), (1, 4, 4), (0, 2), (1, 3, 2)]

encode()

Encodes a sequence of source symbols to a sequence of target symbols.

Parameters:

  • input (ArrayLike)

    The sequence of source symbols to be encoded. Must be a 1D-array with elements in $\mathcal{X}$.

Returns:

  • output (NDArray[integer])

    The sequence of encoded target symbols. It is a 1D-array with elements in $\mathcal{Y}$.

Examples:

>>> lzss = komm.LempelZivSSCode(
...     search_size=8,
...     lookahead_size=4,
...     source_cardinality=4,
... )
>>> lzss.encode([3, 0, 0, 0, 3, 2, 0, 3])
array([0, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 0, 1, 0, 1, 0, 0, 1])

decode()

Decodes a sequence of target symbols to a sequence of source symbols.

Parameters:

  • input (ArrayLike)

    The sequence of target symbols to be decoded. Must be a 1D-array with elements in $\mathcal{Y}$. Also, the sequence must be a valid output of the encode method.

Returns:

  • output (NDArray[integer])

    The sequence of decoded source symbols. It is a 1D-array with elements in $\mathcal{X}$.

Examples:

>>> lzss = komm.LempelZivSSCode(
...     search_size=8,
...     lookahead_size=4,
...     source_cardinality=4,
... )
>>> lzss.decode([0, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 0, 1, 0, 1, 0, 0, 1])
array([3, 0, 0, 0, 3, 2, 0, 3])