upload
The Unicode Consortium
Industri: Computer; Software
Number of terms: 11048
Number of blossaries: 0
Company Profile:
The Unicode Consortium or Unicode Inc. is a not-for-profit organization that coordinates the development of the Unicode standard. Its stated goal is to eventually enable computers to operate in all languages from around the world. The consortium develops and publishes a list of freely-available ...
A step in the algorithm for Unicode Normalization Forms, during which decomposed sequences are replaced by primary composites, where possible.
Industry:Computer; Software
Starting from the second character in the coded character sequence (of a Canonical Decomposition or Compatibility Decomposition) and proceeding sequentially to the final character, perform the following steps: R1 Seek back (left) in the coded character sequence from the character C to find the last Starter L preceding C in the character sequence. R2 If there is such an L, and C is not blocked from L, and there exists a Primary Composite P which is canonically equivalent to the sequence <L, C>, then replace L by P in the sequence and delete C from the sequence. * When the algorithm completes, all Non-blocked Pairs canonically equivalent to a Primary Composite will have been systematically replaced by those Primary Composites. * The replacement of the Starter L in R2 requires continuing to check the succeeding characters until the character at that position is no longer part of any Non-blocked Pair that can be replaced by a Primary Composite. For example, consider the following hypothetical coded character sequence: <U+007A z, U+0335 short stroke overlay, U+0327 cedilla, U+0324 diaeresis below, U+0301 acute>. None of the first three combining marks forms a Primary Composite with the letter z. However, the fourth combining mark in the sequence, acute, does form a Primary Composite with z, and it is not Blocked from the z. Therefore, R2 mandates the replacement of the sequence <U+007A z, ... U+0301 acute> with <U+017A z-acute, ...>, even though there are three other combining marks intervening in the sequence. * The character C in R1 is not necessarily a non-starter. It is necessary to check all characters in the sequence, because there are sequences <L, C> where both L and C are Starters, yet there is a Primary Composite P which is canonically equivalent to that sequence. For example, Indic two-part vowels often have canonical decompositions into sequences of two spacing vowel signs, each of which has Canonical_Combining_Class&#61;0 and which is thus a Starter by definition. Nevertheless, such a decomposed sequence has an equivalent Primary Composite.
Industry:Computer; Software
A character that is not identical to its canonical decomposition. It may also be known as a canonical precomposed character or a canonical composite character. * For example, U+00E0 Latin small letter a with grave is a canonical decomposable character because its canonical decomposition is to the sequence <U+0061 Latin small letter a, U+0300 combining grave accent>. U+212A Kelvin sign is a canonical decomposable character because its canonical decomposition is to U+004B Latin capital letter K.
Industry:Computer; Software
Mapping to an inherently equivalent sequence—for example, mapping ä to a + combining umlaut. The decomposition of a character or character sequence that results from recursively applying the canonical mappings found in the Unicode Character Database, until no characters can be further decomposed, and then reordering nonspacing marks. * A canonical decomposition does not remove formatting information.
Industry:Computer; Software
Two character sequences are said to be canonical equivalents if their full canonical decompositions are identical. * For example, the sequences <o, combining-diaeresis> and <ö> are canonical equivalents. Canonical equivalence is a Unicode property. It should not be confused with language-specific collation or matching, which may add other equivalencies. For example, in Swedish, ö is treated as a completely different letter from o and is collated after z. In German, ö is weakly equivalent to oe and is collated with oe. In English, ö is just an o with a diacritic that indicates that it is pronounced separately from the previous letter (as in coöperate) and is collated with o. * By definition, all canonical-equivalent sequences are also compatibility-equivalent sequences.
Industry:Computer; Software
In a decomposed character sequence D, exchange the positions of the characters in each Reorderable Pair until the sequence contains no more Reorderable Pairs. * In effect, the Canonical Ordering Algorithm is a local bubble sort that guarantees that a Canonical Decomposition or a Compatibility Decomposition will contain no subsequences in which a combining mark is followed directly by another combining mark that has a lower, non-zero combining class. * Canonical ordering is defined in terms of application of the Canonical Ordering Algorithm to an entire decomposed sequence. For example, canonical decomposition of the sequence <U+1E0B latin small letter d with dot above, U+0323 combining dot below> would result in the sequence <U+0064 latin small letter d, U+0307 combining dot above, U+0323 combining dot below>, a sequence which is not yet in canonical order. Most decompositions for Unicode strings are already in canonical order.
Industry:Computer; Software
A mark that is used to indicate how a text is to be chanted or sung.
Industry:Computer; Software
Synonym for uppercase letter.
Industry:Computer; Software
The association of the uppercase, lowercase, and titlecase forms of a letter.
Industry:Computer; Software
A character C is defined to be case-ignorable if C has the value MidLetter or the value MidNumLet for the Word_Break property or its General_Category is one of Nonspacing_Mark (Mn), Enclosing_Mark (Me), Format (Cf), Modifier_Letter (Lm), or Modifier_Symbol (Sk).
Industry:Computer; Software