/******************************************************************************
* This file is part of The Unicode Tools Of Rexx (TUTOR) *
* See https://rexx.epbcn.com/tutor/ *
* and /~https://github.com/JosepMariaBlasco/TUTOR *
* Copyright © 2023-2025 Josep Maria Blasco <josep.maria.blasco@epbcn.com> *
* License: Apache License 2.0 (https://www.apache.org/licenses/LICENSE-2.0) *
******************************************************************************/
This class, defined in the /components/utilities/MultiStageTable.cls
package, specializes in producing two-stage tables, three-stage tables, or, in general multi-stage tables.
Multi-stage tables are recommended in The Unicode Standard 15.0, section 5.1, Data Structures for Character Conversion, "Multistage Tables", pp. 196–7.
This is not a general implementation of multi-stage tables, but a custom, tailored one, specific to Unicode.
The indexes for these tables run from 0 to the size of the buffer. Negative values will raise a syntax error, and indexes greater than the size of the buffer will return width copies of "00"X.
The compress method compresses a buffer and returns two smaller, compressed, tables, "chunks" and "offsets" (see below). The chunk_size argument, if specified, should be a power of two and a divisor of the length of the buffer; chunk_size defaults to 256.
Buffer is a n-byte string, where n is a multiple of chunk_size.
The compression technique works as follows: the buffer source string is supposed to be compressible, i.e., it is supposed to contain different segments which are identical
(for example, because they are copies of "00"X
). The buffer string will be broken in a series of substrings of size chunk_size, and,
instead of storing the substring itself, we will store a reference to the substring. Thus, when two identical substrings of buffer are found,
the first copy is stored, only once, in an indexed "chunks" table, and its offset (divided by chunk_size) is stored
in a separate "offsets" table. A reference is supposed to be much smaller than the subarray itself, and this makes for the desired compression.
Note: To allow for maximum compression, we are supposing that the quantity of different sub-arrays does not exceed 256. This allows to store the references to the sub-arrays in a single byte.
Returns the n-th element of the multi-stage table, when 0 < n <= table_size, or a string containing width copies of "00"X
, when n > table_size.
Negative or non-numeric values of n will raise a Syntax error.
Creates a new multi-stage table. This will be a two-stage table when big_values is not specified or has the .Nil
value (the default),
and a three-stage table when big_values is specified as a non-nil value.
The offset and chunks tables should have been created by the compress class method; chunk_size, when specified, should be the same value used in the compress call, and a power of two. Table_size should be the size of the buffer used in the compress call; it defaults to 217.
Width and big_values are optional. When width is specified for a two-stage table, a string of width bytes will be returned by the "[]" call; when width is specified for a three-stage table, the 1-byte value obtained from offset and chunks is multiplied by width and used as an index into big_values, to return a substring of width bytes.