This file is raw output from pdftotext and may not be ideal for distribution. If you are a maintainer for Hackipedia, please sit down when you have time and clean this text version up. Source PDF: /mnt/fw-js/docs/Character sets/ECMA/ECMA-043 8-Bit Coded Character Set Structure and Rules.pdf Like all conversions the text below should be fully readable as UTF-8 unicode text. --------------------------------------------------------------- Standard ECMA-43 3 r d Edition – December 1991 Rep r inted in e le c tr o nic fo r m in J a nuar y 1 9 9 9 Standardizing Information and Communication Systems 8-Bit Coded Character Set Structure and Rules P h o n e : + 4 1 2 2 8 4 9 . 6 0 . 0 0 - F a x : + 4 1 2 2 8 4 9 . 6 0 . 0 1 - U R L : h t t p : / / www. e c m a . c h - I n t e r n e t : h e l p d e s k @ e c m a . c h . Standard ECMA-43 D e c e mb e r 1 9 9 1 Standardizing Information and Communication Systems 8-Bit Coded Character Set Structure and Rules P h o n e : + 4 1 2 2 8 4 9 . 6 0 . 0 0 - F a x : + 4 1 2 2 8 4 9 . 6 0 . 0 1 - U R L : h t t p : / / www. e c m a . c h - I n t e r n e t : h e l p d e s k @ e c m a . c h MB E-043.DOC 24-02-99 15,45 . Brief History ECMA published the first edition of this Standard ECMA-43 for an 8-bit coded character set in December 1974. It was a very general standard based on the facilities offered by the code extension techniques of Standard ECMA-35. Since 1974 these techniques have evolved considerably and, as a consequence, a 4th edition of Standard ECMA-35 was published in March 1985. It was then decided to revise Standard ECMA-43 so as to take advantage of the additional facilities provided by Standard ECMA-35 and at the same time to specify a definite structure and precise rules for the definition of an 8- bit coded character set. The 2nd edition of Standard ECMA-43 was technically identical with the 2nd edition of ISO 4873. Further developments of ISO 4873 led to the introduction of a new features, the "G Set hierarchy", which allows the presence of a coded character in more than one G set. Moreover the G0 set is now fully specified. It corresponds to the graphic part of the International Reference Version (IRV) of Standard ECMA-6 (sixth edition of December 1991). Adopted by the General Assembly of ECMA in December 1991. - ii - . - i - Table of contents 1 Scope 1 2 Conformance and implementation 1 2.1 Conformance 1 2.1.1 Conformance of information interchange 1 2.1.2 Conformance of devices 1 2.2 Implementation 1 3 Normative references 2 4 Definitions 2 4.1 Active position 2 4.2 Bit combination 2 4.3 Byte 2 4.4 Character 2 4.5 Character position 2 4.6 Coded-character-data-element (CC-data-element) 2 4.7 Coded character set; code 2 4.8 Code extension 2 4.9 Code table 2 4.10 Control character 2 4.11 Control function 3 4.12 Device 3 4.13 Escape sequence 3 4.14 Final byte 3 4.15 Graphic character 3 4.16 Graphic symbol 3 4.17 Repertoire 3 4.18 User 3 5 Notation, code table and names 3 5.1 Notation 3 5.2 Code table 4 5.3 Names 4 6 Structure of the 8-bit code 4 6.1 Elements of the 8-bit code 4 6.2 Identification of the elements of the 8-bit code 5 6.3 Invocation 5 6.3.1 C0 set 5 6.3.2 Character SPACE 5 6.3.3 G0 set 5 6.3.4 Character DELETE 5 6.3.5 C1 set 5 6.3.6 G1 set 5 6.3.7 G2 set 5 6.3.8 G3 set 5 - ii - 7 Specification of the characters of the 8-bit code 5 7.1 C0 set 5 7.2 Character ESCAPE 5 7.3 Character SPACE 6 7.4 G0 set 6 7.5 Character DELETE 8 7.6 C1 set 8 7.7 G1 set 8 7.8 G2 set 8 7.9 G3 set 9 7.10 Summary of the specification of the 8-bit code 9 8 Levels 9 8.1 Level 1 9 8.2 Level 2 9 8.3 Level 3 10 9 Version of the 8-bit code 10 9.1 Contents of a version 10 9.2 Unique coding of characters 10 10 Identification of version and level 10 10.1 Purpose and context of identification 10 10.2 Identification of level 11 10.3 Identification of a version 11 10.4 Switching from one version to another 11 10.5 Switching from one level to another 11 Annex A - Restrictions applicable to the C0 and C1 sets 15 Annex B - Shift functions 17 Annex C - Composite graphic characters 19 Annex D - Use of bit combinations 00/14 and 00/15 21 Annex E - Main differences between the 2nd edition (1985) and the present (third) edition 23 1 Scope This ECMA Standard specifies an 8-bit code derived from, and compatible with, the 7-bit coded character set specified in ECMA-6. The characteristics of this code are also in conformance with the code extension techniques specified in ECMA-35. This ECMA Standard specifies an 8-bit code with a number of options. It also provides guidance on how to exercise the options to define specific versions. This code is primarily intended for general information interchange within an 8-bit environment among data processing systems and associated equipment, and within data communication systems. The need for graphic characters and control functions in data processing has also been taken into account. The code includes the ten digits as well as the 52 small and capital letters of the basic Latin alphabet and may include accented letters, special Latin letters and/or the letters of one or several non-Latin alphabet(s). 2 Conformance and implementation 2.1 Conformance 2.1.1 Conformance of information interchange A coded-character-data-element (CC-data-element) within coded information for interchange is in conformance with a version of this ECMA Standard if all the coded representations of characters within that CC-data-element conform to the requirements of clause 9. A claim of conformance shall identify the version adopted. 2.1.2 Conformance of devices A device is in conformance with this ECMA Standard if it conforms to the requirements of 2.1.2.1, and either or both of 2.1.2.2 and 2.1.2.3 below. A claim of conformance shall identify the document which contains the description specified in 2.1.2.1, and shall identify the version adopted. 2.1.2.1 Device description A device that conforms to this ECMA Standard shall be the subject of a description that identifies the means by which the user may supply characters to the device, or may recognize them when they are made available to him, as specified respectively in 2.1.2.2 and 2.1.2.3 below. 2.1.2.2 Originating devices An originating device shall allow its user to supply any sequence of characters from the version adopted, and shall be capable of transmitting their coded representations within a CC-data-element. 2.1.2.3 Receiving devices A receiving device shall be capable of receiving and interpreting any coded representations of characters that are within a CC-data-element, and that conform to 2.1.1 of this ECMA Standard, and shall make the corresponding characters available to its user in such a way that the user can identify them from among those of the version adopted, and can distinguish them from each other. 2.2 Implementation The use of this code requires definitions of its implementation in various media. For example, these could include punched tapes, punched cards, magnetic and optical media and transmission channels, thus permitting interchange of data to take place either indirectly by means of an intermediate recording in a physical medium, or by local connection of various units (such as input and output devices and computers) or by means of data transmission equipment. The implementation of this code in physical media and for transmission, taking into account the need for error checking, is the subject of other international standards. - 2 - 3 Normative references ECMA-6 7-bit coded character set for information interchange (1991). ECMA-35 Code extension techniques (1985). ECMA-48 Control functions for 7-bit and 8-bit coded character sets (1991). ISO International Register of Coded Character Sets to be Used with Escape Sequences (ISO 2375). 4 Definitions For the purpose of this ECMA Standard the following definitions apply. 4.1 Active position The character position which is to image the graphic symbol representing the next graphic character or relative to which the next control function is to be executed. NOTE 1 In general, the active position is indicated in a display by a cursor. 4.2 Bit combination An ordered set of bits used for the representation of characters. 4.3 Byte A bit string that is operated upon as a unit. 4.4 Character A member of a set of elements used for the organization, control or representation of data. 4.5 Character position The portion of a display that is imaging or is capable of imaging a graphic symbol. 4.6 Coded-character-data-element (CC-data-element) An element of interchanged information that is specified to consist of a sequence of coded representations of characters, in accordance with one or more identified standards for coded character sets. NOTE 2 In a communication environment according to the Reference Model for Open Systems Interconnection of ISO 7498, a CC-data-element will form all or part of the information that corresponds to the Presentation-Protocol-Data-Unit (PPDU) defined in that International Standard. NOTE 3 When information interchange is accomplished by means of interchangeable media, a CC-data-element will form all or part of the information that corresponds to the user data, and not that recorded during formatting and initialization. 4.7 Coded character set; code A set of unambiguous rules that establishes a character set and the one-to-one relationship between the characters of the set and their bit combinations. 4.8 Code extension The techniques for the encoding of characters that are not included in the character set of a given code. 4.9 Code table A table showing the character allocated to each bit combination in a code. 4.10 Control character A control function the coded representation of which consists of a single bit combination. - 3 - 4.11 Control function An action that affects the recording, processing, transmission, or interpretation of data, and that has a coded representation consisting of one or more bit combinations. 4.12 Device A component of information processing equipment which can transmit, and/or receive, coded information within CC-data-elements. NOTE 4 It may be an input/output device in the conventional sense, or a process such as an application program or gateway function. 4.13 Escape sequence A string of bit combinations that is used for control purposes in code extension procedures. The first of these bit combinations represents the control function ESCAPE. 4.14 Final byte The bit combination that terminates an escape sequence or a control sequence. 4.15 Graphic character A character, other than a control function, that has a visual representation normally handwritten, printed or displayed, and that has a coded representation consisting of one or more bit combinations. 4.16 Graphic symbol A visual representation of a graphic character or of a control function. 4.17 Repertoire A specified set of characters that are represented by means of one or more bit combinations of a coded character set. 4.18 User A person or other entity that invokes the services provided by a device. NOTE 5 This entity may be a process such as an application program if the "device" is a code convertor or a gateway function, for example. NOTE 6 The characters, as supplied by the user or made available to him, may be in the form of codes local to the device, or of non-conventional visible representations, provided that 2.1.2 above is satisfied. 5 Notation, code table and names 5.1 Notation The bits of the bit combinations of the 8-bit code are identified by b 8 , b 7 , b 6 , b 5 , b 4 , b 3 , b 2 and b 1 , where b 8 is the highest-order, or most-significant bit, and b 1 is the lowest-order, or least-significant, bit. The bit combinations may be interpreted to represent integers in the range 0 to 255 in binary notation by attributing the following weights to the individual bits: Bit b8 b7 b6 b5 b4 b3 b2 b1 Weight 128 64 32 16 8 4 2 1 In this ECMA Standard, the bit combinations are identified by notations of the form xx/yy, where xx and yy are numbers in the range 00 to 15. The correspondence between the notations of the form xx/yy and the bit combinations consisting of the bits b 8 to b 1 , is as follows: − xx is the number represented by b 8 , b 7 , b 6 and b 5 where these bits are given the weights 8, 4, 2 and 1 respectively; - 4 - − yy is the number represented by b 4 , b 3 , b 2 and b 1 where these bits are given the weights 8, 4, 2 and 1 respectively. The notations of the form xx/yy are the same as the ones used to identify code table positions, where xx is the column number and yy is the row number (see 5.2). 5.2 Code table An 8-bit code table consists of 256 positions arranged in 16 columns and 16 rows. The columns and rows are numbered 00 to 15. The code table positions are identified by notations of the form xx/yy, where xx is the column number and yy is the row number. The positions of the code table are in one-to-one correspondence with the bit combinations of the code. The notation of a code table position, of the form xx/yy, is the same as that of the corresponding bit combination. 5.3 Names This ECMA Standard assigns a unique name to each character. In addition, it specifies an acronym for control characters and for the characters SPACE and DELETE, and a graphic symbol for each graphic character. By convention, only capital letters, space and hyphen are used for writing the names of the characters. For acronyms only capital letters, and digit are used. It is intended that the acronyms and this convention be retained in all translations of the text. The names chosen to denote graphic characters are intended to reflect their customary meaning. However, this ECMA Standard does not define and does not restrict the meanings of graphic characters. Neither does it specify a particular style or font design for the graphic characters when imaged. 6 Structure of the 8-bit code 6.1 Elements of the 8-bit code The 8-bit code consists of the following parts (see figure 1). a) A C0 set A set of up to 30 control characters represented by bit combinations 00/00 to 01/15, except 00/14 and 00/15 which shall be unused. b) The character SPACE A graphic character represented by bit combination 02/00. c) A G0 set A set of 94 graphic characters represented by bit combinations 02/01 to 07/14. d) The character DELETE A character represented by bit combination 07/15. e) A C1 set A set of up to 32 control characters represented by bit combinations 08/00 to 09/15. f) A G1 set A set of up to 96 graphic characters represented by bit combinations 10/00 to 15/15. g) A G2 set A set of up to 96 graphic characters. h) A G3 set A set of up to 96 graphic characters. - 5 - 6.2 Identification of the elements of the 8-bit code The method of identification of the code elements listed in 6.1 is specified in clause 10. 6.3 Invocation 6.3.1 C0 set The identification of the C0 set also invokes that set. 6.3.2 Character SPACE The character SPACE shall be represented by bit combination 02/00. It is not explicitly invoked. 6.3.3 G0 set The G0 set shall be as specified in 7.4. It is not explicitly invoked. 6.3.4 Character DELETE The character DELETE shall be represented by bit combination 07/15. It is not explicitly invoked. 6.3.5 C1 set The identification of the C1 set also invokes that set. 6.3.6 G1 set The identification of the G1 set also invokes that set. The locking-shift function LS1R shall also invoke the G1 set. 6.3.7 G2 set Either the set as a whole shall be invoked by the locking-shift function LS2R (see annex B) into columns 10 to 15, or individual characters of it shall be invoked by means of the single-shift function SS2, (see 7.6). 6.3.8 G3 set Either the set as a whole shall be invoked by the locking-shift function LS3R (see annex B) into columns 10 to 15, or individual characters of the set shall be invoked by means of the single-shift function SS3 (see 7.6). 7 Specification of the characters of the 8-bit code The use of control functions such as BACKSPACE or CARRIAGE RETURN, for the coded representation of composite characters is prohibited by this ECMA Standard (see annex C). 7.1 C0 set The requirements for the C0 set are: − bit combinations 00/14 and 00/15 shall not be used (see annex D); − the control character ESCAPE shall be represented by bit combination 01/11; − any control characters can be allocated to the other bit combinations subject to the restrictions specified in annex A. NOTE 7 A C0 set comprising only ESCAPE represented by bit combination 01/11 has been registered (Registration ISO-IR No. 104), and is identified by ESC 02/01 04/07. 7.2 Character ESCAPE ESCAPE is a control character used to form escape sequences. In this ECMA Standard the use of escape sequences is specified in clause 10. Table 1 - ESCAPE Acronym Name Coded representation ESC ESCAPE 01/11 - 6 - 7.3 Character SPACE A graphic character having a visual representation consisting of the absence of a graphic symbol. It causes the active position to be advanced by one character position. Table 2 - SPACE Acronym Name Coded representation SP SPACE 02/00 7.4 G0 set The 94 bit combinations 02/01 to 07/14 are used to represent graphic characters. All graphic characters allocated to bit combinations in the range 02/01 to 07/14 are spacing characters, that is they cause the active position to advance by one character position. The graphic characters allocated by this ECMA Standard to these 94 bit combinations are specified in table 3. Table 3 - Graphic characters of the G0 set Graphic symbol Name Coded representation ! EXCLAMATION MARK 02/01 “ QUOTATION MARK 02/02 # NUMBER SIGN 02/03 $ DOLLAR SIGN 02/04 % PERCENT SIGN 02/05 & AMPERSAND 02/06 ‘ APOSTROPHE 02/07 ( LEFT PARENTHESIS 02/08 ) RIGHT PARENTHESIS 02/09 * ASTERISK 02/10 + PLUS SIGN 02/11 , COMMA 02/12 - HYPHEN – MINUS 02/13 . FULL STOP 02/14 / SOLIDUS 02/15 0 DIGIT ZERO 03/00 1 DIGIT ONE 03/01 2 DIGIT TWO 03/02 3 DIGIT THREE 03/03 4 DIGIT FOUR 03/04 5 DIGIT FIVE 03/05 6 DIGIT SIX 03/06 7 DIGIT SEVEN 03/07 8 DIGIT EIGHT 03/08 9 DIGIT NINE 03/9 : COLON 03/10 ; SEMICOLON 03/11 < LESS-THAN SIGN 03/12 = EQUALS SIGN 03/13 > GREATER-THAN SIGN 03/14 ? QUESTION MARK 03/15 @ COMMERCIAL AT 04/00 A LATIN CAPITAL LETTER A 04/01 B LATIN CAPITAL LETTER B 04/02 C LATIN CAPITAL LETTER C 04/03 D LATIN CAPITAL LETTER D 04/04 E LATIN CAPITAL LETTER E 04/05 (continued) - 7 - Table 3 – Grapahic characters of the G0 set Graphic symbol Name Coded representation F LATIN CAPITAL LETTER F 04/06 G LATIN CAPITAL LETTER G 04/07 H LATIN CAPITAL LETTER H 04/08 I LATIN CAPITAL LETTER I 04/09 J LATIN CAPITAL LETTER J 04/10 K LATIN CAPITAL LETTER K 04/11 L LATIN CAPITAL LETTER L 04/12 M LATIN CAPITAL LETTER M 04/13 N LATIN CAPITAL LETTER N 04/14 O LATIN CAPITAL LETTER O 04/15 P LATIN CAPITAL LETTER P 05/00 Q LATIN CAPITAL LETTER Q 05/01 R LATIN CAPITAL LETTER R 05/02 S LATIN CAPITAL LETTER S 05/03 T LATIN CAPITAL LETTER T 05/04 U LATIN CAPITAL LETTER U 05/05 V LATIN CAPITAL LETTER V 05/06 W LATIN CAPITAL LETTER W 05/07 X LATIN CAPITAL LETTER X 05/08 Y LATIN CAPITAL LETTER Y 05/09 Z LATIN CAPITAL LETTER Z 05/10 [ LEFT SQUARE BRACKET 05/11 \ REVERSE SOLIDUS 05/12 ] RIGHT SQUARE BRACKET 05/13 ^ CIRCUMFLEX ACCENT 05/14 _ LOW LINE 05/15 ` GRAVE ACCENT 06/00 a LATIN SMALL LETTER A 06/01 b LATIN SMALL LETTER B 06/02 c LATIN SMALL LETTER C 06/03 d LATIN SMALL LETTER D 06/04 e LATIN SMALL LETTER E 06/05 f LATIN SMALL LETTER F 06/06 g LATIN SMALL LETTER G 06/07 h LATIN SMALL LETTER H 06/08 i LATIN SMALL LETTER I 06/09 j LATIN SMALL LETTER J 06/10 k LATIN SMALL LETTER K 06/11 l LATIN SMALL LETTER L 06/12 m LATIN SMALL LETTER M 06/13 n LATIN SMALL LETTER N 06/14 o LATIN SMALL LETTER O 06/15 p LATIN SMALL LETTER P 07/00 q LATIN SMALL LETTER Q 07/01 r LATIN SMALL LETTER R 07/02 s LATIN SMALL LETTER S 07/03 t LATIN SMALL LETTER T 07/04 u LATIN SMALL LETTER U 07/05 v LATIN SMALL LETTER V 07/06 (continued) - 8 - Table 3 – Grapahic characters of the G0 set Graphic symbol Name Coded representation w LATIN SMALL LETTER W 07/07 x LATIN SMALL LETTER X 07/08 y LATIN SMALL LETTER Y 07/09 z LATIN SMALL LETTER Z 07/10 { LEFT CURLY BRACKET 07/11  VERTICAL LINE 07/12 } RIGHT CURLY BRACKET 07/13 ~ TILDE 07/14 7.5 Character DELETE DEL was originally used to erase or obliterate an erroneous or unwanted character in punched tape. DEL may be used for media-fill or time-fill. DEL characters may be inserted into, or removed from, a data stream without affecting the information content of that stream, but such action may affect the information layout and/or the control of equipment. Table 4 - DELETE Coded Acronym Name representation DEL DELETE 07/15 .7.6 C1 set The C1 set is available for up to 32 control characters in addition to those provided by the C0 set. It shall not include any of the control characters of the C0 set of ISO 6429. No specific control characters are allocated to bit combinations 08/00 to 08/13 and 09/00 to 09/15 by this ECMA Standard. When the single-shift functions SS2 and SS3 are used, they shall be allocated to bit combinations 08/14 and 08/15, respectively, otherwise these bit combinations shall not be used. NOTE 8 A C1 set comprising only SS2 and SS3 allocated to these bit combinations has been registered (Registration ISO-IR No. 105), and is identified by ESC 02/02 04/07. .7.7 G1 set The G1 set shall be either a 94-character or a 96-character set of graphic characters. This set is available for graphic characters in addition to those provided by the G0 set. Either a unique graphic character shall be allocated to each bit combination or the bit combination shall be declared unused. The characters of the G1 set are represented by bit combinations 10/01 to 15/14 if the G1 is a 94-character set, or by bit combinations 10/00 to 15/15 if the G1 set is a 96-character set. 7.8 G2 set The G2 set shall be either a 94-character or a 96-character set of graphic characters. This set is available for graphic characters in addition to those provided by the G0 and the G1 sets. Either a unique graphic character shall be allocated to each bit combination or the bit combination shall be declared unused. If the G2 set is a 94-character set, then no characters shall be allocated to bit combinations 10/00 and 15/15. - 9 - The characters of the G2 set shall be invoked either by the single-shift function SS2 or by the locking-shift function LS2R. − When invoked by SS2, each character is represented by the bit combination of SS2 followed by one of the bit combinations in the range 02/01 to 07/14 if the G2 set is a 94-character set, or 02/00 to 07/15 if the G2 set is a 96-character set. − When invoked by LS2R, the characters of the G2 set are represented by bit combinations 10/01 to 15/14 if the G2 set is a 94-character set, or by bit combinations 10/00 to 15/15 if the G2 set is a 96- character set. 7.9 G3 set The G3 set shall be a 94-character or a 96-character set of graphic characters. This set is available for graphic characters in addition to those provided by the G0, the G1 and the G2 sets. Either a unique graphic character shall be allocated to each bit combination or the bit combination shall be declared unused. If the G3 set is a 94-character set, then no character shall be allocated to bit combinations 10/00 and 15/15. The characters of the G3 set shall be invoked either by the single-shift function SS3 or by the locking-shift function LS3R. − When invoked by SS3, each character is represented by the bit combination of SS3 followed by one of the bit combinations in the range 02/01 to 07/14 if the G3 set is a 94-character set, or 02/00 to 07/15 if the G3 set is a 96-character set. − When invoked by LS3R, the characters of the G3 set are represented by bit combinations 10/01 to 15/14 if the G3 set is a 94-character set, or by bit combinations 10/00 to 15/15 if the G3 set is a 96- character set. 7.10 Summary of the specification of the 8-bit code Figure 1 summarizes the specification of the elements of the 8-bit code. 8 Levels This ECMA Standard specifies three nested levels of implementation. 8.1 Level 1 Level 1 (see figure 2) comprises the following facilities: − a C0 set; − the character SPACE represented by bit combination 02/00; − the G0 set; − the character DELETE represented by bit combination 07/15; − a C1 set; − a G1 set. At Level 1 no shift functions shall be used and the G0 and G1 sets are assumed to be invoked permanently in columns 02 to 07 and 10 to 15, respectively. At Level 1 the C1 set and/or the G1 set may be empty if there is no requirement for control characters in addition to those provided by the C0 set and/or for graphic characters in addition to those provided by the G0 set. At Level 1 a version shall not include a G2 or a G3 set. 8.2 Level 2 Level 2 (see figure 3) comprises the facilities of Level 1, and in addition to them: − a G2 set the characters of which shall be invoked individually by SS2; − a G3 set the characters of which shall be invoked individually by SS3. - 10 - At Level 2 no other shift functions shall be used. The G1 set shall not be empty; either the G2 or the G3 set may be empty but not both. The C1 set shall not be empty, it shall contain at least the single-shift functions SS2 and SS3. 8.3 Level 3 Level 3 (see figure 4) comprises all the facilities of Level 2, with the addition of the following three shift functions: − LS1R − LS2R − LS3R The G1 set shall not be empty; either the G2 or the G3 set may be empty, but not both. The G1, G2 and G3 sets can be invoked explicitly by LS1R, LS2R and LS3R, respectively. Individual characters of the G2 and G3 sets can be invoked by SS2 and SS3, respectively. The C1 set shall not be empty, it shall contain at least the single-shift functions SS2 and SS3. 9 Version of the 8-bit code 9.1 Contents of a version A version of the 8-bit code is a coded character set in accordance with clauses 6, 7 and 8, in which a C0 set, a C1 and a G1 set and, optionally, a G2 and a G3 set have been uniquely identified. The level of implementation (clause 8) shall also be identified. Such a version will generally be the subject of a specification document which states how the above options have been exercised. Such a specification is said to be in accordance with this ECMA Standard. 9.2 Unique coding of characters In a version the same character may occur in more than one of the G0, G1, G2 and G3 sets. Such a character shall be regarded as the same character as a character in another of those sets if both characters have the same name within the specifications, or ISO International Register entries, that respectively define the two sets. If the same character has been allocated to more than one of the G0, G1, G2 and G3 sets, either within the set itself or within the character repertoire associated with that set, then that character shall be represented by the coded representation taken from the lowest numbered set (in the sequence G0, G1, G2, G3) in which the character has been allocated. A coded representation for such a character within one of the other, higher numbered sets shall not be used, even if the higher numbered set is already invoked and the lowest numbered set in which the character is allocated is not currently invoked. 10 Identification of version and level 10.1 Purpose and context of identification CC-data-elements conforming to a version of this ECMA Standard are intended to form all or part of a composite unit of coded information that is interchanged between a sender and a recipient. The identification of the version of this ECMA Standard that has been adopted by the originator shall also be available to the recipient. The route by which such identification is communicated to the recipient is outside the scope of this ECMA Standard. However, some standards for interchange of coded information may permit, or require, that the coded representation of the identification applicable to the CC-data-elements forms a part of the interchanged information. This clause specifies a coded representation for the identification of a version and a level of this ECMA Standard. Such coded representations form all or part of an identifying data element, which may be included in information interchange in accordance with the relevant standard. - 11 - 10.2 Identification of level A level of this ECMA Standard shall be identified by means of an announcer sequence from the following list. ESC 02/00 04/12 shall identify Level 1. ESC 02/00 04/13 shall identify Level 2. ESC 02/00 04/14 shall identify Level 3. 10.3 Identification of a version The identification of a version of this ECMA Standard shall comprise a set of identifications, one for each of the C sets and G sets that constitute the version. Each identification in the set shall consist of a designating escape sequence of the type shown below. ESC 02/01 F shall identify the C0 set. ESC 02/02 F shall identify the C1 set. ESC 02/08 04/02 shall identify the G0 set (Registration ISO-IR No. 6). ESC 02/09 F or ESC 02/13 F shall identify the G1 set. ESC 02/10 F or ESC 02/14 F shall identify the G2 set. ESC 02/11 F or ESC 02/15 F shall identify the G3 set. The final byte F of these sequences shall be obtained from the International Register (ISO 2375). If any of the C sets or G sets is empty, the identification shall be the same escape sequence in which the final byte F is 07/14. For a version of this ECMA Standard that conforms to Level 1, no identifications for the G2 and G3 sets shall be included. NOTE 9 The designating escape sequence for the G0 set is indicated with the Final Byte which corresponds to the G0 set of this ECMA Standard. Whilst this designation is not necessary according to this edition of ECMA-43, it has been kept for reasons of compatibility with the 2nd edition of this ECMA Standard. 10.4 Switching from one version to another In information interchange, the use of a different version requires that this version be identified. Thus, if any of the C sets or of the G sets is changed, an announcer sequence according to 10.2 and a set of escape sequences according to 10.3 shall be required. 10.5 Switching from one level to another In information interchange, any change of level, whether or not the C sets and G sets are changed, requires an announcer sequence according to 10.2 and a set of escape sequences according to 10.3. - 12 - b8 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 b7 0 0 0 1 1 1 1 0 0 0 0 0 1 1 1 1 b6 0 0 1 0 0 1 1 0 0 1 0 0 0 0 1 1 b5 0 1 0 0 1 0 1 0 1 0 0 1 0 1 0 1 b4 b3 b2 b1 00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 0 0 0 0 00 SP 0 0 0 1 01 0 0 1 0 02 0 0 1 1 03 0 1 0 0 04 0 1 0 1 05 0 1 1 0 06 0 1 1 1 07 1 0 0 0 08 1 0 0 1 09 1 0 1 0 10 1 0 1 1 11 ESC 1 1 0 0 12 1 1 0 1 13 1 1 1 0 14 1 2 1 1 1 1 15 1 DEL 2 C0 G0 C1 1 See 7.1 G1 G2 G3 2 See 7.6 99-0006-A Figure 1 – Elements of the 8-bit code - 13 - b8 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 b7 0 0 0 1 1 1 1 0 0 0 0 0 1 1 1 1 b6 0 0 1 0 0 1 1 0 0 1 0 0 0 0 1 1 b5 0 1 0 0 1 0 1 0 1 0 0 1 0 1 0 1 b4 b3 b2 b1 00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 0 0 0 0 00 SP 0 0 0 1 01 0 0 1 0 02 0 0 1 1 03 0 1 0 0 04 0 1 0 1 05 0 1 1 0 06 0 1 1 1 07 C0 G0 C1 G1 1 0 0 0 08 1 0 0 1 09 1 0 1 0 10 1 0 1 1 11 1 1 0 0 12 1 1 0 1 13 1 1 1 0 14 1 1 1 1 15 DEL 99-0007-A Figure 2 – Level 1 b8 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 b7 0 0 0 1 1 1 1 0 0 0 0 0 1 1 1 1 b6 0 0 1 0 0 1 1 0 0 1 0 0 0 0 1 1 b5 0 1 0 0 1 0 1 0 1 0 0 1 0 1 0 1 b4 b3 b2 b1 00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 0 0 0 0 00 SP 0 0 0 1 01 0 0 1 0 02 0 0 1 1 03 0 1 0 0 04 0 1 0 1 05 0 1 1 0 06 0 1 1 1 07 C0 G0 C1 G1 1 0 0 0 08 1 0 0 1 09 1 0 1 0 10 1 0 1 1 11 1 1 0 0 12 1 1 0 1 13 1 1 1 0 14 1 1 1 1 15 DEL SS2 SS3 G2 G3 99-0008-A Figure 3 – Level 2 - 14 - b8 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 b7 0 0 0 1 1 1 1 0 0 0 0 0 1 1 1 1 b6 0 0 1 0 0 1 1 0 0 1 0 0 0 0 1 1 b5 0 1 0 0 1 0 1 0 1 0 0 1 0 1 0 1 b4 b3 b2 b1 00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 0 0 0 0 00 0 0 0 1 01 0 0 1 0 02 0 0 1 1 03 0 1 0 0 04 0 1 0 1 05 0 1 1 0 06 0 1 1 1 07 C0 G0 C1 1 0 0 0 08 1 0 0 1 09 1 0 1 0 10 1 0 1 1 11 SS2 SS3 1 1 0 0 12 1 1 0 1 13 1 1 1 0 14 1 1 1 1 15 LS1R LS2R LS3R G1 G2 G3 99-0013-A Figure 4 – Level 3 - 15 - Annex A (normative) Restrictions applicable to the C0 and C1 sets The definition of some control functions in this ECMA Standard assumes that data associated with them is to be processed serially in a forward direction. When they are included in strings of data which are processed other than serially in a forward direction or when these control functions are included in data formatted for fixed-record processing they may have undesirable effects or may require additional special treatment to ensure that they result in their desired function. Whilst this ECMA Standard specifies requirements for bit combinations 00/14, 00/15 and 01/11 of the C0 set, it places the following restrictions on the use of the remaining 29 bit combinations: a) If control characters described in the C0 set of ECMA-48 are used, they shall have the coded representations and the definitions specified therein. b) None of these control characters may be allocated to the C1 set. c) Transmission control characters are intended to control or facilitate transmission of information over telecommunication networks. Procedures for the use of the transmission control characters on telecommunication networks are the subject of other International Standards, for example ISO 1745. - 16 - - 17 - Annex B (informative) Shift functions The use of the shift functions mentioned in this ECMA Standard is specified in Standard ECMA-35. They are defined in Standard ECMA-48 and registered in the International Register of Coded Character Sets to be Used with Escape Sequences. The coding of these shift functions is reproduced here for convenience only. LS1R : ESC 07/14 LS2R : ESC 07/13 LS3R : ESC 07/12 - 18 - - 19 - Annex C (informative) Composite graphic characters Generally, the specified graphic character set constituting the repertoire of a code is identical with the graphic characters allocated to the bit combinations of that code. In the case of ISO 6937 the graphic character repertoire is larger than the coded graphic character set. In any case, each graphic character of a graphic character repertoire is represented by a graphic symbol at a character position. After imaging the graphic symbol the active position is advanced by one character position. By means of the control function GRAPHIC CHARACTER COMBINATION (GCC), specified in ISO 6429, it is possible to represent the graphic symbols of more than one graphic character at one and the same character position. This presentation does not constitute a new graphic symbol but only a juxtaposition of the graphic symbols of the characters considered. This does not create a new coded graphic character nor does it add a graphic character to the specified repertoire. It only images the graphic symbols of characters of that repertoire in a particular manner. The use of the control functions BACKSPACE (BS) or CARRIAGE RETURN (CR), as permitted by other International Standards, for example by ISO 646, allows the creation of coded characters in addition to those of the coded character set. Whilst the graphic symbol representing such new coded character consists of a combination of two or more graphic symbols, each representing a character of the coded character set (or of the repertoire where it is larger), this combined graphic symbol must be regarded as a new graphic symbol in its own right and not as a juxtaposition of its components. Using the latter method makes the graphic character repertoire undefined and requires agreement between the sender and the recipient of the data containing such additional characters. This requirement can often not be met. This International Standard therefore prohibits the use of BACKSPACE or CARRIAGE RETURN to create additional characters and requires each graphic character to be represented either by a single 8-bit byte or by the defined series of bit combinations as listed exhaustively in ISO 6937 where the allowed repertoire, which is larger than the coded character set, is fully specified. Examples It is not allowed to generate the character "not equal" by combining the characters EQUALS and SOLIDUS by means of the control function BACKSPACE. = BS / → ≠ It is allowed to generate the presentation of the letters LATIN CAPITAL LETTER P, LATIN SMALL LETTER T and LATIN SMALL LETTER S at one character position by means of the control function GCC. GCC P t s → Pts In the first case a new character with a new graphic symbol has been added to the repertoire. In the second case three characters of the specified repertoire have been imaged in a particular way. - 20 - - 21 - Annex D (informative) Use of bit combinations 00/14 and 00/15 In Standard ECMA-35, the shift functions LS1 and LS0 are represented by the bit combinations 00/14 and 00/15, respectively, in an 8-bit code. These two bit combinations are not used in Standard ECMA-43. Bit combinations 00/14 and 00/15 are available to represent SO and SI, or LS1 and LS0 when transcoding between a 7-bit code and an 8-bit code. However, this use is outside the scope of this Standard. - 22 - - 23 - Annex E (informative) Main differences between the second edition (1985) and the present (third) edition of this ECMA Standard 1. The G0 set is fully defined, i.e. specific graphic characters have been allocated to all 94 character positions. 2. A hierarchy of the G sets is specified which allows the assignment of a unique coded representation to a character when that character is present in more than one G set. 3. The description of the identification of the C and G sets is separated from the specification of the coding structure. . Printed copies can be ordered from: ECMA 114 Rue du Rhône CH-1204 Geneva Switzerland Fax: +41 22 849.60.01 Internet: documents@ecma.ch Files can be downloaded from our FTP site, ftp.ecma.ch. This Standard is available from library ECMA-ST as a compacted, self-expanding file in MSWord 6.0 format (file E043-DOC.EXE) and as an Acrobat PDF file (file E043- PDF.PDF). File E043-EXP.TXT gives a short presentation of the Standard. Our web site, http://www.ecma.ch, gives full information on ECMA, ECMA activities, ECMA Standards and Technical Reports. ECMA 114 Rue du Rhône CH-1204 Geneva Switzerland This Standard ECMA-43 is available free of charge in printed form and as a file. See inside cover page for instructions.