komm.LempelZiv78Code
Lempel–Ziv 78 (LZ78 or LZ2) code. It is a lossless data compression algorithm which is asymptotically optimal for ergodic sources. Let $\mathcal{X}$ be the source alphabet, and $\mathcal{Y}$ be the target alphabet. The token format is $(p, x)$, where $p \in \mathbb{N}$ is the index of the corresponding dictionary entry, and $x \in \mathcal{X}$ is the source symbol following the match. The index $p$ is represented as a variable-size word in $\mathcal{Y}^k$, where $k = \log_{|\mathcal{Y}|}(i + 1)$, and $i$ is the size of the dictionary at the moment. For more details, see MacK03, Sec. 6.4, Say06, Sec. 5.4.2 and CT06, Sec. 13.4.2.
Note
Here, for simplicity, we assume that the source alphabet is $\mathcal{X} = [0 : |\mathcal{X}|)$ and the target alphabet is $\mathcal{Y} = [0 : |\mathcal{Y}|)$, where $|\mathcal{X}| \geq 2$ and $|\mathcal{Y}| \geq 2$ are called the source cardinality and target cardinality, respectively.
Parameters:
-
source_cardinality
(int
) –The source cardinality $|\mathcal{X}|$. Must satisfy $|\mathcal{X}| \geq 2$.
-
target_cardinality
(int
) –The target cardinality $|\mathcal{Y}|$. Must satisfy $|\mathcal{Y}| \geq 2$. The default value is $2$ (binary).
Examples:
>>> lz78 = komm.LempelZiv78Code(2) # Binary source, binary target
>>> lz78 = komm.LempelZiv78Code(3, 4) # Ternary source, quaternary target
source_to_tokens()
Encodes a given sequence of source symbols to the corresponding list of tokens.
Examples:
>>> lz78 = komm.LempelZiv78Code(2)
>>> lz78.source_to_tokens([1, 0, 1, 1, 0, 1, 0, 1, 0, 0, 0])
[(0, 1), (0, 0), (1, 1), (2, 1), (4, 0), (2, 0)]
tokens_to_source()
Decodes a given list of tokens to the corresponding sequence of source symbols.
Examples:
>>> lz78 = komm.LempelZiv78Code(2)
>>> lz78.tokens_to_source([(0, 1), (0, 0), (1, 1), (2, 1), (4, 0), (2, 0)])
array([1, 0, 1, 1, 0, 1, 0, 1, 0, 0, 0])
tokens_to_target()
Returns the target alphabet representation corresponding to a given list of tokens.
Examples:
>>> lz78 = komm.LempelZiv78Code(2)
>>> lz78.tokens_to_target([(0, 1), (0, 0), (1, 1), (2, 1), (4, 0), (2, 0)])
array([1, 0, 0, 0, 1, 1, 1, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0])
target_to_tokens()
Returns the list of tokens corresponding to a given target alphabet representation.
Examples:
>>> lz78 = komm.LempelZiv78Code(2)
>>> lz78.target_to_tokens([1, 0, 0, 0, 1, 1, 1, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0])
[(0, 1), (0, 0), (1, 1), (2, 1), (4, 0), (2, 0)]
encode()
Encodes a sequence of source symbols to a sequence of target symbols.
Parameters:
-
input
(ArrayLike
) –The sequence of source symbols to be encoded. Must be a 1D-array with elements in $\mathcal{X}$.
Returns:
-
output
(NDArray[integer]
) –The sequence of encoded target symbols. It is a 1D-array with elements in $\mathcal{Y}$.
Examples:
>>> lz78 = komm.LempelZiv78Code(2)
>>> lz78.encode([1, 0, 1, 1, 0, 1, 0, 1, 0, 0, 0])
array([1, 0, 0, 0, 1, 1, 1, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0])
>>> lz78 = komm.LempelZiv78Code(2, 8)
>>> lz78.encode(np.zeros(15, dtype=int))
array([0, 1, 0, 2, 0, 3, 0, 4, 0])
decode()
Decodes a sequence of target symbols to a sequence of source symbols.
Parameters:
-
input
(ArrayLike
) –The sequence of target symbols to be decoded. Must be a 1D-array with elements in $\mathcal{Y}$. Also, the sequence must be a valid output of the
encode
method.
Returns:
-
output
(NDArray[integer]
) –The sequence of decoded source symbols. It is a 1D-array with elements in $\mathcal{X}$.
Examples:
>>> lz78 = komm.LempelZiv78Code(2)
>>> lz78.decode([1, 0, 0, 0, 1, 1, 1, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0])
array([1, 0, 1, 1, 0, 1, 0, 1, 0, 0, 0])
>>> lz78 = komm.LempelZiv78Code(2, 8)
>>> lz78.decode([0, 1, 0, 2, 0, 3, 0, 4, 0])
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])