jBase Query Languge Conversion Codes

Conversion Processing

A	Algebraic functions.
B	Subroutine call.
C	Concatenation.
D	Internal and external dates.
D1 and D2	Associates controlling and dependent fields.
F	Mathematical functions.
G	Group extract.
L	Length.
MC	Mask character.
MD	Mask decimal.
MK	Mask metric.
ML and MR	Mask with justification.
MP	Mask packed decimal.
MS	Mask Sequence.
MT	Mask time.
P	Pattern match.
R	Range check.
S	Substitution.
T	Text extraction.
TFILE	File translation.
U	User exit.
W	Timestamps.
JBCUserConversions	How to create user-defined conversion codes

jQL Dictionary Conversions and Correlatives

For dates and times, simple date format functions have been applied to use the configured locale to support the standard conversions D and MTS. Formatting numbers via MR/ML/MD, use locale for Thousands, Decimal Point and Currency notation.

TimeStamp "W{Dx}{Tx}"

In addition, to provide for timestamp functionality included is a suite of conversions including A, F and I types. This is to generate a timestamp, displayed for date and/or time in short, long, and full formats. These conversions also support non-Gregorian locales. The meaning of the components of the conversion is as follows:
W => Is a new conversion code so not to clash with existing conversions.
D => Date
T - => Time
x => Format option: S = Short, M = Medium, L = Long, F = Full

"WDS" or "WTS SHORT is completely numeric.12/13/52 or 3:30pm
"WDM" MEDIUM is longer. Jan 12, 1952
"WDL" or "WTL" LONG is longer.
January 12, 1952 or 3:30:32pm
"WDF" or "WTF" FULL is specified completely.

Data Conversion

When executing programs in international mode, it processes all variable contents as UTF-8 encoded sequences. As such all data must be held as UTF-8 encoded byte sequences. This means that data imported into an account configured to operate in international mode must be converted from the data in the current code page to UTF-8. Normally if ALL the data are eight bit bytes in the range 0x00-0x7f (ASCII) then no conversion is necessary as these values are effectively already UTF-8 encoded. However values outside of the 0x00-0x7f range must be converted into UTF-8 proper such that there can be no ambiguity between character set code page values.
For instance, the character represented by the hex value 0xE0 in the Latin2 code page, (ISO-8859-2), is described as "LATIN SMALL LETTER R WITH ACUTE". However the same hex value in the Latin1 code page, (ISO-8859-1), is used to represent the character "LATIN SMALL LETTER A WITH GRAVE".
To avoid this clash of code pages the Unicode specification provides unique hex value representations for both of these characters within the specifications 32-bit value sequence.

Example

Unicode value 0x00E0 used to represent LATIN SMALL LETTER A WITH GRAVE
Unicode value 0x0155 used to represent LATIN SMALL LETTER R WITH ACUTE
NOTE: that UTF-8 is an encoding of 32 bit Unicode values, which also has especially properties (as described earlier), which can be used effectively with Unix and Windows platforms.
Another good reason for complete conversion from the original code page to UTF-8 is that doing so also removes the requirement for conversions when reading/writing to files, as this would add massive and unnecessary overhead to ALL application processing, whereas the conversion from original code page to UTF-8 is a one off cost.

Saturday, December 18, 2010