This file is raw output from pdftotext and may not be ideal for distribution. If you are a maintainer for Hackipedia, please sit down when you have time and clean this text version up. Source PDF: /mnt/fw-js/docs/ATSC/A-53 ATSC Digital Television Standard (Part-1-6-2007).pdf Like all conversions the text below should be fully readable as UTF-8 unicode text. --------------------------------------------------------------- A/53: ATSC Digital Television Standard, Parts 1 - 6, 2007 3 January 2007 Advanced Television Systems Committee, Inc. 1750 K Street, N.W., Suite 1200 Washington, D.C. 20006 2 Doc. A/53, Part 1:2007 3 January 2007 ATSC Digital Television Standard Part 1 – Digital Television System (A/53, Part 1:2007) Advanced Television Systems Committee 1750 K Street, N.W. Suite 1200 Washington, D.C. 20006 www.atsc.org ATSC A/53, Part 1 (System) 3 January 2007 The Advanced Television Systems Committee, Inc., is an international, non-profit organization developing voluntary standards for digital television. The ATSC member organizations represent the broadcast, broadcast equipment, motion picture, consumer electronics, computer, cable, satellite, and semiconductor industries. Specifically, ATSC is working to coordinate television standards among different communications media focusing on digital television, interactive systems, and broadband multimedia communications. ATSC is also developing digital television implementation strategies and presenting educational seminars on the ATSC standards. ATSC was formed in 1982 by the member organizations of the Joint Committee on InterSociety Coordination (JCIC): the Electronic Industries Association (EIA), the Institute of Electrical and Electronic Engineers (IEEE), the National Association of Broadcasters (NAB), the National Cable Television Association (NCTA), and the Society of Motion Picture and Television Engineers (SMPTE). Currently, there are approximately 140 members representing the broadcast, broadcast equipment, motion picture, consumer electronics, computer, cable, satellite, and semiconductor industries. ATSC Digital TV Standards include digital high definition television (HDTV), standard definition television (SDTV), data broadcasting, multichannel surround-sound audio, and satellite direct-to-home broadcasting. 2 ATSC A/53, Part 1 (System) 3 January 2007 Table of Contents 1. SCOPE AND DOCUMENTATION STRUCTURE.....................................................................................5 1.1 Documentation Structure 5 2. REFERENCES .........................................................................................................................................5 2.1 Normative References 5 2.2 Informative References 6 3. DEFINITIONS ...........................................................................................................................................6 3.1 Compliance Notation 7 3.2 Treatment of Syntactic Elements 7 3.3 Terms Employed 7 3.4 Symbols, Abbreviations, and Mathematical Operators 12 3.4.1 Arithmetic Operators 12 3.4.2 Logical Operators 13 3.4.3 Relational Operators 13 3.4.4 Bitwise Operators 14 3.4.5 Assignment 14 3.4.6 Mnemonics 14 3.4.7 Constants 14 3.4.8 Method of Describing Bit Stream Syntax 14 3.4.8.1 Definition of bytealigned Function 15 3.4.8.2 Definition of nextbits Function 15 3.4.8.3 Definition of next_start_code Function 15 4. SPECIFICATION AND CONSTRAINTS (NORMATIVE) ........................................................................16 4.1 ATSC System Definition 16 4.2 Digital Television Services 16 5. SYSTEM OVERVIEW (INFORMATIVE).................................................................................................16 5.1 System Block Diagram 17 ANNEX A: HISTORICAL BACKGROUND (INFORMATIVE) ......................................................19 1. FOREWORD...........................................................................................................................................19 2. HISTORICAL BACKGROUND...............................................................................................................19 2.1 Advisory Committee on Advanced Television Service (ACATS) 20 2.2 Digital HDTV Grand Alliance (Grand Alliance) 20 2.3 Organization for Documenting the Digital Television Standard 21 2.4 Principles for Documenting the Digital Television Standard 21 3 ATSC A/53, Part 1 (System) 3 January 2007 Index of Tables and Figures Table 3.1 Next Start Code 15 Table 4.1 Service Types 16 Figure 5.1 ITU-R digital terrestrial television broadcasting model. 17 4 ATSC A/53, Part 1 (System) 3 January 2007 ATSC Digital Television Standard – Part 1: Digital Television System 1. SCOPE AND DOCUMENTATION STRUCTURE The Digital Television Standard describes the system characteristics of the U. S. advanced television (ATV) system. The document and its normative Parts provide detailed specification of the parameters of the system including the video encoder input scanning formats and the pre- processing and compression parameters of the video encoder, the audio encoder input signal format and the pre-processing and compression parameters of the audio encoder, the service multiplex and transport layer characteristics and normative specifications, and the VSB RF/Transmission subsystem. 1.1 Documentation Structure The documentation of the Digital Television Standard consists of this Part and several related Parts that provide a general system overview, a list of reference documents, and sections relating to the system as a whole. The system is modular in concept and the specifications for each of the modules are provided in other parts: Part 1 – System (formerly the body, plus the former Annex F) Part 2 – RF/Transmission System Characteristics (formerly Annex D) Part 3 – Service Multiplex and Transport Subsystem Characteristics (formerly Annex C) Part 4 – MPEG-2 Video System Characteristics (formerly Annex A) Part 5 – AC-3 Audio System Characteristics (formerly Annex B) Part 6 – High Efficiency Audio System Characteristics (formerly Annex G). 2. REFERENCES At the time of publication, the editions indicated were valid. All standards are subject to revision and amendment, and parties to agreements based on this standard are encouraged to investigate the possibility of applying the most recent editions of the documents listed below 2.1 Normative References Specific normative references can be found in each Part of this Standard. Those Parts are listed below along with an additional normative reference applicable to all Parts: [1] ATSC: “ATSC Digital Television Standard, Part 2 – RF/Transmission System Characteristics,” Doc. A/53, Part 2:2007, Advanced Television Systems Committee, Washington, D.C., 3 January 2007. [2] ATSC: “ATSC Digital Television Standard, Part 3 – Service Multiplex and Transport Subsystem Characteristics,” Doc. A/53, Part 3:2007, Advanced Television Systems Committee, Washington, D.C., 3 January 2007. [3] ATSC: “ATSC Digital Television Standard, Part 4 – MPEG-2 Video System Characteristics,” Doc. A/53, Part 4:2007, Advanced Television Systems Committee, Washington, D.C., 3 January 2007. 5 ATSC A/53, Part 1 (System) 3 January 2007 [4] ATSC: “ATSC Digital Television Standard, Part 5 – AC-3 Audio System Characteristics,” Doc. A/53, Part 5:2007, Advanced Television Systems Committee, Washington, D.C., 3 January 2007. [5] ATSC: “ATSC Digital Television Standard, Part 6 – High Efficiency Audio System Characteristics,” Doc. A/53, Part 6:2007, Advanced Television Systems Committee, Washington, D.C., 3 January 2007. [6] IEEE: “Use of the International Systems of Units (SI): The Modern Metric System”, Doc. IEEE/ASTM SI 10-2002, Institute of Electrical and Electronics Engineers, New York, N.Y., 2002. 2.2 Informative References The Digital Television Standard is based on the ISO/IEC MPEG-2 Video Standard [15], the Digital Audio Compression (AC-3) Standard [7], and the ISO/IEC MPEG-2 Systems Standard [14]. Those references are listed here for the convenience of the reader. In addition, a guide to the use of the Digital Television Standard [8] is listed. [7] ATSC: “Digital Audio Compression (AC-3),” Doc. A/52B, Advanced Television Systems Committee, Washington, D.C., 14 June 2005. [8] ATSC: “Guide to the Use of the ATSC Digital Television Standard,” Doc. A/54A, Advanced Television Systems Committee, Washington, D.C., 4 December 2003. [9] ATSC: “Program and System Information Protocol for Terrestrial Broadcast and Cable,” Doc. A/65C with Amendment No. 1, Advanced Television Systems Committee, Washington, D.C., 2 January 2006 (Amendment No. 1 dated 9 May 2006). [10] ATSC: “Data Broadcast Standard,” Doc. A/90 with Amendment No. 1 and Corrigendum No. 1 and No. 2, Advanced Television Systems Committee, Washington, D.C., 26 July 2000 (Amendment 1 dated 14 May 2002, Corrigendum 1 and 2 dated 1 April 2002). [11] ATSC: “Software Download Data Service,” Doc. A/97, Advanced Television Systems Committee, Washington, D.C., 16 November 2004. [12] ATSC: “Synchronization Standard for Distributed Transmission, Revision A,” Doc. A/110A, Advanced Television Systems Committee, Washington, D.C., 19 July 2005. [13] ATSC: “Code Point Registry,” Advanced Television Systems Committee, Washington, D.C., 2007, http://www.atsc.org/standards/cpr.html. [14] ISO: “ISO/IEC IS 13818-1:2000 (E), International Standard, Information technology – Generic coding of moving pictures and associated audio information: systems.” [15] ISO: “ISO/IEC IS 13818-2:2000 (E), International Standard, Information technology – Generic coding of moving pictures and associated audio information, video.” 3. DEFINITIONS With respect to definition of terms, abbreviations, and units, the practice of the Institute of Electrical and Electronics Engineers (IEEE) as outlined in the Institute’s published standards shall be used [6]. Where an abbreviation is not covered by IEEE practice, or industry practice differs from IEEE practice, then the abbreviation in question will be described in Section 3.4 of this document. Many of the definitions included therein are derived from definitions adopted by MPEG. 6 ATSC A/53, Part 1 (System) 3 January 2007 3.1 Compliance Notation As used in this document, “shall” denotes a mandatory provision of the standard. “Should” denotes a provision that is recommended but not mandatory. “May” denotes a feature whose presence does not preclude compliance, which may or may not be present at the option of the implementor. 3.2 Treatment of Syntactic Elements This document contains symbolic references to syntactic elements used in the audio, video, and transport coding subsystems. These references are typographically distinguished by the use of a different font (e.g., restricted), may contain the underscore character (e.g., sequence_end_code) and may consist of character strings that are not English words (e.g., dynrng). 3.3 Terms Employed For the purposes of the Digital Television Standard, the following definitions of terms apply: ACATS – Advisory Committee on Advanced Television Service. access unit – A coded representation of a presentation unit. In the case of audio, an access unit is the coded representation of an audio frame. In the case of video, an access unit includes all the coded data for a picture, and any stuffing that follows it, up to but not including the start of the next access unit. If a picture is not preceded by a group_start_code or a sequence_header_code, the access unit begins with a picture start code. If a picture is preceded by a group_start_code and/or a sequence_header_code, the access unit begins with the first byte of the first of these start codes. If it is the last picture preceding a sequence_end_code in the bit stream all bytes between the last byte of the coded picture and the sequence_end_code (including the sequence_end_code) belong to the access unit. A/D – Analog to digital converter. AES – Audio Engineering Society. anchor frame – A video frame that is used for prediction. I-frames and P-frames are generally used as anchor frames, but B-frames are never anchor frames. ANSI: American National Standards Institute. Asynchronous Transfer Mode (ATM) – A digital signal protocol for efficient transport of both constant-rate and bursty information in broadband digital networks. The ATM digital stream consists of fixed-length packets called “cells,” each containing 53 8-bit bytes—a 5-byte header and a 48-byte information payload. ATM – See asynchronous transfer mode. ATV – The U. S. advanced television system. bidirectional pictures or B-pictures or B-frames – Pictures that use both future and past pictures as a reference. This technique is termed bidirectional prediction. B-pictures provide the most compression. B-pictures do not propagate coding errors as they are never used as a reference. bit rate – The rate at which the compressed bit stream is delivered from the channel to the input of a decoder. 7 ATSC A/53, Part 1 (System) 3 January 2007 block – A block is an 8-by-8 array of pel values or DCT coefficients representing luminance or chrominance information. bps – Bits per second. byte-aligned – A bit in a coded bit stream is byte-aligned if its position is a multiple of 8-bits from the first bit in the stream. channel – A digital medium that stores or transports a digital television stream. coded representation – A data element as represented in its encoded form. compression – Reduction in the number of bits used to represent an item of data. constant bit rate – Operation where the bit rate is constant from start to finish of the compressed bit stream. CRC – The cyclic redundancy check to verify the correctness of the data. data element – An item of data as represented before encoding and after decoding. DCT – See discrete cosine transform. decoded stream – The decoded reconstruction of a compressed bit stream. decoder – An embodiment of a decoding process. decoding (process) – The process defined in the Digital Television Standard that reads an input coded bit stream and outputs decoded pictures or audio samples. decoding time-stamp (DTS) – A field that may be present in a PES packet header that indicates the time that an access unit is decoded in the system target decoder. discrete cosine transform – A mathematical transform that can be perfectly undone and which is useful in image compression. DTS – See decoding time-stamp. editing – A process by which one or more compressed bit streams are manipulated to produce a new compressed bit stream. Conforming edited bit streams are understood to meet the requirements defined in the Digital Television Standard. elementary stream (ES) –A generic term for one of the coded video, coded audio, or other coded bit streams. One elementary stream is carried in a sequence of PES packets with one and only one stream_id. elementary stream clock reference (ESCR) – A time stamp in the PES stream from which decoders of PES streams may derive timing. encoder – An embodiment of an encoding process. encoding (process) – A process that reads a stream of input pictures or audio samples and produces a valid coded bit stream as defined in the Digital Television Standard. entropy coding – Variable length lossless coding of the digital representation of a signal to reduce redundancy. entry point – Refers to a point in a coded bit stream after which a decoder can become properly initialized and commence syntactically correct decoding. The first transmitted picture after an entry point is either an I-picture or a P-picture. If the first transmitted picture is not an I- picture, the decoder may produce one or more pictures during acquisition. ES – See elementary stream. 8 ATSC A/53, Part 1 (System) 3 January 2007 ESCR – See elementary stream clock reference. event – An event is defined as a collection of elementary streams with a common time base, an associated start time, and an associated end time. field – For an interlaced video signal, a “field” is the assembly of alternate lines of a frame. Therefore, an interlaced frame is composed of two fields, a top field and a bottom field. forbidden – This term, when used in clauses defining the coded bit stream, indicates that the value shall never be used. This is usually to avoid emulation of start codes. frame – A frame contains lines of spatial information of a video signal. For progressive video, these lines contain samples starting from one time instant and continuing through successive lines to the bottom of the frame. For interlaced video a frame consists of two fields, a top field and a bottom field. One of these fields will commence one field later than the other. GOP – See group of pictures. group of pictures (GOP) – A group of pictures consists of one or more pictures in sequence. HDTV – See high-definition television. High-definition television (HDTV) – High-definition television has a resolution of approximately twice that of conventional television in both the horizontal (H) and vertical (V) dimensions and a picture aspect ratio (H × V) of 16:9. ITU-R Recommendation 1125 further defines “HDTV quality” as the delivery of a television picture which is subjectively identical with the interlaced HDTV studio standard. high level – A range of allowed picture parameters defined by the MPEG-2 video coding specification which corresponds to high definition television. Huffman coding – A type of source coding that uses codes of different lengths to represent symbols which have unequal likelihood of occurrence. IEC – International Electrotechnical Commission. intra-coded pictures or I-pictures or I-frames – Pictures that are coded using information present only in the picture itself and not depending on information from other pictures. I- pictures provide a mechanism for random access into the compressed video data. I-pictures employ transform coding of the pel blocks and provide only moderate compression. ISO – International Organization for Standardization. ITU – International Telecommunication Union. layer – One of the levels in the data hierarchy of the video and system specification. level – A range of allowed picture parameters and combinations of picture parameters. macroblock – In the advanced television system, a macroblock consists of four blocks of luminance and one each Cr and Cb block. main level – A range of allowed picture parameters defined by the MPEG-2 video coding specification with maximum resolution equivalent to ITU-R Recommendation 601. main profile – A subset of the syntax of the MPEG-2 video coding specification that is expected to be supported over a large range of applications. Mbps – 1,000,000 bits per second. motion vector – A pair of numbers which represent the vertical and horizontal displacement of a region of a reference picture for prediction. 9 ATSC A/53, Part 1 (System) 3 January 2007 MP@HL – Main profile at high level. MP@ML – Main profile at main level. MPEG – Refers to standards developed by the ISO/IEC JTC1/SC29 WG11, Moving Picture Experts Group. MPEG may also refer to the Group. MPEG-2 – Refers to ISO/IEC standards 13818-1 (systems), 13818-2 (video), 13818-3 (Audio), 13818-4 (Compliance). pack – A pack consists of a pack header followed by zero or more packets. It is a layer in the system coding syntax. packet data – Contiguous bytes of data from an elementary data stream present in the packet. packet identifier (PID) – A unique integer value used to associate elementary streams of a program in a single or multi-program transport stream. packet – A packet consists of a header followed by a number of contiguous bytes from an elementary data stream. It is a layer in the system coding syntax. padding – A method to adjust the average length of an audio frame in time to the duration of the corresponding PCM samples, by continuously adding a slot to the audio frame. Part – A Part is an independently-maintainable portion of an ATSC document. It shares a common root document number with other Parts of the document. payload – Payload refers to the bytes which follow the header byte in a packet. For example, the payload of a transport stream packet includes the PES_packet_header and its PES_packet_data_bytes or pointer_field and PSI sections, or private data. A PES_packet_payload, however, consists only of PES_packet_data_bytes. The transport stream packet header and adaptation fields are not payload. PCR – See program clock reference. pel – See pixel. PES packet header – The leading fields in a PES packet up to but not including the PES_packet_data_byte fields where the stream is not a padding stream. In the case of a padding stream, the PES packet header is defined as the leading fields in a PES packet up to but not including the padding_byte fields. PES packet – The data structure used to carry elementary stream data. It consists of a packet header followed by PES packet payload. PES stream – A PES stream consists of PES packets, all of whose payloads consist of data from a single elementary stream, and all of which have the same stream_id. PES – An abbreviation for packetized elementary stream. picture – Source, coded, or reconstructed image data. A source or reconstructed picture consists of three rectangular matrices representing the luminance and two chrominance signals. PID – See packet identifier. pixel – “Picture element” or “pel.” A pixel is a digital sample of the color intensity values of a picture at a single point. PMT – Program Map Table. The collection of all the TS_program_map_section()s. predicted pictures or P-pictures or P-frames – Pictures that are coded with respect to the nearest previous I- or P-picture. This technique is termed forward prediction. P-pictures 10 ATSC A/53, Part 1 (System) 3 January 2007 provide more compression than I-pictures and serve as a reference for future P-pictures or B- pictures. P-pictures can propagate coding errors when P-pictures (or B-pictures) are predicted from prior P-pictures where the prediction is flawed. presentation time-stamp (PTS) – A field that may be present in a PES packet header that indicates the time that a presentation unit is presented in the system target decoder. presentation unit (PU) – A decoded audio access unit or a decoded picture. profile – A defined subset of the syntax specified in the MPEG-2 video coding specification program clock reference (PCR) – A time stamp in the transport stream from which decoder timing is derived. program element – A generic term for one of the elementary streams or other data streams that may be included in the program. program specific information (PSI) – PSI consists of normative data which is necessary for the demultiplexing of transport streams and the successful regeneration of programs. program – A program is a collection of program elements. Program elements may be elementary streams. Program elements need not have any defined time base; those that do have a common time base and are intended for synchronized presentation. PSI – See program specific information. PTS – See presentation time-stamp. PU – See presentation unit. quantizer – A processing step which intentionally reduces the precision of DCT coefficients. random access – The process of beginning to read and decode the coded bit stream at an arbitrary point. reserved – This term, when used in clauses defining the coded bit stream, indicates that the value may be used in the future for Digital Television Standard extensions. Unless otherwise specified within this Standard, all reserved bits shall be set to “1”. SCR – See system clock reference. scrambling – The alteration of the characteristics of a video, audio, or coded data stream in order to prevent unauthorized reception of the information in a clear form. This alteration is a specified process under the control of a conditional access system. SDTV – See standard-definition television. slice – A series of consecutive macroblocks. SMPTE – Society of Motion Picture and Television Engineers. source stream – A single, non-multiplexed stream of samples before compression coding. splicing – The concatenation performed on the system level or two different elementary streams. It is understood that the resulting stream must conform totally to the Digital Television Standard. standard-definition television (SDTV) – This term is used to signify a digital television system in which the quality is approximately equivalent to that of NTSC. This equivalent quality may be achieved from pictures sourced at the 4:2:2 level of ITU-R Recommendation 601 and subjected to processing as part of the bit rate compression. The results should be such that 11 ATSC A/53, Part 1 (System) 3 January 2007 when judged across a representative sample of program material, subjective equivalence with NTSC is achieved. Also called standard digital television. start codes – 32-bit codes embedded in the coded bit stream that are unique. They are used for several purposes including identifying some of the layers in the coding syntax. Start codes consist of a 24 bit prefix (0x000001) and an 8 bit stream_id. STD input buffer – A first-in, first-out buffer at the input of a system target decoder for storage of compressed data from elementary streams before decoding. STD – See system target decoder. still picture – A coded still picture consists of a video sequence containing exactly one coded picture which is intra-coded. This picture has an associated PTS and the presentation time of succeeding pictures, if any, is later than that of the still picture by at least two picture periods. system clock reference (SCR) – A time stamp in the program stream from which decoder timing is derived. system header – The system header is a data structure that carries information summarizing the system characteristics of the Digital Television Standard multiplexed bit stream. system target decoder (STD) – A hypothetical reference model of a decoding process used to describe the semantics of the Digital Television Standard multiplexed bit stream. time-stamp – A term that indicates the time of a specific action such as the arrival of a byte or the presentation of a presentation unit. transport stream packet header – The leading fields in a Transport Stream packet up to and including the continuity_counter field. variable bit rate – Operation where the bit rate varies with time during the decoding of a compressed bit stream. VBV – See video buffering verifier. video buffering verifier (VBV) – A hypothetical decoder that is conceptually connected to the output of an encoder. Its purpose is to provide a constraint on the variability of the data rate that an encoder can produce. video sequence – A video sequence is represented by a sequence header, one or more groups of pictures, and an end_of_sequence code in the data stream. 8 VSB – Vestigial sideband modulation with 8 discrete amplitude levels. 16 VSB – Vestigial sideband modulation with 16 discrete amplitude levels. 3.4 Symbols, Abbreviations, and Mathematical Operators The symbols, abbreviations, and mathematical operators used to describe the Digital Television Standard are those adopted for use in describing MPEG-2 and are similar to those used in the “C” programming language. However, integer division with truncation and rounding are specifically defined. The bitwise operators are defined assuming two’s-complement representation of integers. Numbering and counting loops generally begin from 0. 3.4.1 Arithmetic Operators + Addition. 12 ATSC A/53, Part 1 (System) 3 January 2007 – Subtraction (as a binary operator) or negation (as a unary operator). ++ Increment. -- Decrement. * or × Multiplication. ^ Power. / Integer division with truncation of the result toward 0. For example, 7/4 and –7/–4 are truncated to 1 and –7/4 and 7/–4 are truncated to –1. // Integer division with rounding to the nearest integer. Half-integer values are rounded away from 0 unless otherwise specified. For example 3//2 is rounded to 2, and –3//2 is rounded to –2. DIV Integer division with truncation of the result towards –∞. % Modulus operator. Defined only for positive numbers. Sign( ) Sign(x) =1 x>0 =0 x == 0 = –1 x < 0 NINT ( ) Nearest integer operator. Returns the nearest integer value to the real-valued argument. Half-integer values are rounded away from 0. sin Sine. cos Cosine. exp Exponential. √ Square root. log10 Logarithm to base ten. loge Logarithm to base e. 3.4.2 Logical Operators || Logical OR. && Logical AND. ! Logical NOT. 3.4.3 Relational Operators > Greater than. ≥ Greater than or equal to. < Less than. ≤ Less than or equal to. == Equal to. != Not equal to. max [,...,] The maximum value in the argument list. min [,...,] The minimum value in the argument list. 13 ATSC A/53, Part 1 (System) 3 January 2007 3.4.4 Bitwise Operators & AND. | OR. >> Shift right with sign extension. >> Shift left with 0 fill. 3.4.5 Assignment = Assignment operator. 3.4.6 Mnemonics The following mnemonics are defined to describe the different data types used in the coded bit stream. bslbf Bit string, left bit first, where “left” is the order in which bit strings are written in the Standard. Bit strings are written as a string of 1s and 0s within single quote marks, e.g. ‘1000 0001’. Blanks within a bit string are for ease of reading and have no significance. uimsbf Unsigned integer, most significant bit first. The byte order of multi-byte words is most significant byte first. 3.4.7 Constants π 3.14159265359... e 2.71828182845... 3.4.8 Method of Describing Bit Stream Syntax Each data item in the coded bit stream described below is in bold type. It is described by its name, its length in bits, and a mnemonic for its type and order of transmission. The action caused by a decoded data element in a bit stream depends on the value of that data element and on data elements previously decoded. The decoding of the data elements and definition of the state variables used in their decoding are described in the clauses containing the semantic description of the syntax. The following constructs are used to express the conditions when data elements are present, and are in normal type. Note this syntax uses the “C” code convention that a variable or expression evaluating to a non-zero value is equivalent to a condition that is true. while ( condition ) { If the condition is true, then the group of data elements occurs next in the data stream. data_element This repeats until the condition is not true. ... } do { The data element always occurs at least once. The data element is repeated until the data_element condition is not true. ...} while ( condition ) if ( condition) { If the condition is true, then the first group of data elements occurs next in the data data_element stream. ... } 14 ATSC A/53, Part 1 (System) 3 January 2007 else { If the condition is not true, then the second group of data elements occurs next in the data_element data stream. ... } for (i = 0;i=312) s=mod(s, 312)+1; end if end for 38 ATSC A/53 Part 2 (Transmission) 3 January 2007 The function round() means “round up to the next integer value.” The function mod() represents the operation modulo. For example, in case of P=6, the segment positions are given by s=(0, 26, 52, 78, 104, 130,156 , 182, 208, 234, 260, 286). 6.8.2 Packing of Enhanced Mode Data Within Packets The Enhanced Reed-Solomon encoding block shall be 184 bytes long, of which 20 bytes are parity. Refer to Table 6.2. For the case of a one-half rate Enhanced code, the Enhanced coder outputs 2 bits for each input bit, and Enhanced mode data shall be packed as one Enhanced Reed-Solomon block to a pair of data segments (1 bit per symbol). For the case of a one-fourth rate Enhanced code, the Enhanced coder outputs 4 bits for each input bit, and Enhanced mode data shall be packed as one Enhanced Reed-Solomon block for every 4 data segments (one-half bit per symbol). The packing of Enhanced mode Reed-Solomon blocks into data segments is shown in Table 6.6. Note: All below are shown conceptually pre-interleave; the interleaving process will disperse the data in the transmitted output). 39 ATSC A/53 Part 2 (Transmission) 3 January 2007 Table 6.6 Enhanced Data Encapsulation For ½ Rate Outer Code tx hdr E8-VSB data Main RS parity (12 symbols) (736 symbols) (80 symbols) 3 bytes Payload 92 bytes 20 Bytes tx hdr E8-VSB data E8-VSB RS parity Main RS parity (12 symbols) (576 symbols) (160 symbols) (80 symbols) 3 bytes Payload 72 bytes 20 bytes 20 bytes tx hdr E8-VSB data Main RS parity (12 symbols) (736 symbols) (80 symbols) 3 bytes Payload 92 bytes 20 bytes tx hdr E8-VSB data E8-VSB RS parity Main RS parity (12 symbols) (576 symbols) (160 symbols) (80 symbols) 3 bytes Payload 72 bytes 20 bytes 20 bytes 2 bits/symbol 1 bit/symbol 2 bits/symbol For ¼ Rate Outer Code tx hdr E8-VSB data Main RS parity (12 symbols) (736 symbols) (80 symbols) 3 bytes Payload 46 bytes 20 bytes tx hdr E8-VSB data Main RS parity (12 symbols) (736 symbols) (80 symbols) 3 bytes Payload 46 bytes 20 bytes tx hdr E8-VSB data Main RS parity (12 symbols) (736 symbols) (80 symbols) 3 bytes Payload 46 bytes 20 bytes tx hdr E8-VSB data E8-VSB RS parity Main RS parity (12 symbols) (416 symbols) (320 symbols) (80 symbols) 3 bytes Payload 26 bytes 20 bytes 20 bytes 2 bits/symbol 0.5 bit/symbol 2 bits/symbol 6.8.3 E-8VSB Enhancement Signaling On odd data fields (positive PN63), the presence of E8VSB shall be signaled by setting symbol 92 to level ‘+5’. 6.9 Modulation 6.9.1 Bit-to-Symbol Mapping Figure 6.8 shows the mapping of the outputs of the trellis decoder to the nominal signal levels of (–7, –5, –3, –1, 1, 3, 5, 7). As shown in Figure 6.14, the nominal levels of Data Segment Sync and Data Field Sync are –5 and +5. The value of 1.25 is added to all these nominal levels after the bit-to-symbol mapping function for the purpose of creating a small pilot carrier. 6.9.2 Pilot Addition A small in-phase pilot shall be added to the data signal. The frequency of the pilot shall be the same as the suppressed-carrier frequency as shown in Figure 6.4. This may be generated in the following manner. A small (digital) DC level (1.25) shall be added to every symbol (data and sync) of the digital baseband data plus sync signal (±l, ±3, ±5, ±7). The power of the pilot shall be 11.3 dB below the average data signal power. 40 ATSC A/53 Part 2 (Transmission) 3 January 2007 6.9.3 8 VSB Modulation Method The VSB modulator receives the 10.76 Msymbols/s, 8-level trellis encoded composite data signal (pilot and sync added). The DTV system performance is based on a linear phase raised cosine Nyquist filter response in the concatenated transmitter and receiver, as shown in Figure 6.17. The system filter response is essentially flat across the entire band, except for the transition regions at each end of the band. Nominally, the roll-off in the transmitter shall have the response of a linear phase root raised cosine filter. R = .1152 1.0 .5 0 d d d = .31 MHz d d 5.38 MHz 6 MHz Figure 6.17 Nominal VSB system channel response (linear phase raised cosine Nyquist filter). 7. TRANSMISSION CHARACTERISTICS FOR HIGH DATA RATE MODE 7.1 Overview The high data rate mode trades off transmission robustness (28.3 dB signal-to-noise threshold) for payload data rate (38.57 Mbps). Most parts of the high data rate mode VSB system are identical or similar to the terrestrial system. A pilot, Data Segment Sync, and Data Field Sync are all used to provide enhanced operation. The pilot in the high data rate mode also is 11.3 dB below the data signal power. The symbol, segment, and field signals and rates are all the same, allowing either receiver to lock up on the other’s transmitted signal. Also, the data frame definitions are identical. The primary difference is the number of transmitted levels (8 versus 16) and the use of trellis coding and NTSC interference rejection filtering in the terrestrial system. The RF spectrum of the high data rate modem transmitter looks identical to the terrestrial system, as illustrated in Figure 6.4. Figure 7.1 illustrates a typical data segment, where the number of data levels is seen to be 16 due to the doubled data rate. Each portion of 828 data symbols represents 187 data bytes and 20 Reed-Solomon bytes followed by a second group of 187 data bytes and 20 Reed-Solomon bytes (before convolutional interleaving). 41 ATSC A/53 Part 2 (Transmission) 3 January 2007 Data Data segment Data + FEC segment sync sync +15 +13 +11 +9 +7 +5 +3 +1 -1 -3 -5 -7 -9 -11 -13 -15 Levels before 4 Symbols 828 Symbols 4 Symbols pilot addition (pilot=2.5) Data Segment 832 Symbols Figure 7.1 16-VSB data segment. Figure 7.2 shows the block diagram of the transmitter. It is identical to the terrestrial VSB system except the trellis coding shall be replaced with a mapper that converts data to multi-level symbols. See Figure 7.3. Reed- Data RF Data Pilot VSB Randomizer Solomon Inter- Mapper MUX Insertion Modulator Up- Encoder leaver Converter Segment Sync Field Sync Figure 7.2 16-VSB transmitter. 42 ATSC A/53 Part 2 (Transmission) 3 January 2007 Xa Xb Xc Xd O Byte to 1 1 1 1 +15 Symbol 1 1 1 0 +13 Conversion 1 1 0 1 +11 MSB 1 1 0 0 +9 7 Xa1 Xa 6 Xb1 1 0 1 1 +7 1st Nibble Xb 1 0 1 0 +5 5 Xc1 4 Xd1 1 0 0 1 +3 From 1 0 0 0 +1 3 Xa2 Xc To Byte 0 1 1 1 -1 Interleaver 2 Xb2 MUX 2nd Nibble 0 1 1 0 -3 1 Xc2 Xd 0 1 0 1 -5 0 Xd2 LSB 0 1 0 0 -7 0 0 1 1 -9 0 0 1 0 -11 0 0 0 1 -13 0 0 0 0 -15 Figure 7.3 16-VSB mapper. 7.2 Channel Error Protection and Synchronization 7.2.1 Data Randomizer See Section 6.4.1.1. 7.2.2 Reed-Solomon Encoder See Section 6.4.1.2. 7.2.3 Interleaving The interleaver shall be a 26 data segment inter-segment convolutional byte interleaver. Interleaving is provided to a depth of about 1/12 of a data field (2 ms deep). Only data bytes shall be interleaved. 7.2.4 Data Segment Sync See Section 6.5.1. 7.2.5 Data Field Sync See Section 6.5.2. 7.3 Modulation 7.3.1 Bit-to-Symbol Mapping Figure 7.3 shows the mapping of the outputs of the interleaver to the nominal signal levels (–15, –13, –11, ..., 11, 13, 15). As shown in Figure 7.1, the nominal levels of Data Segment Sync and Data Field Sync are –9 and +9. The value of 2.5 is added to all these nominal levels after the bit- to-symbol mapping for the purpose of creating a small pilot carrier. 43 ATSC A/53 Part 2 (Transmission) 3 January 2007 7.3.2 Pilot Addition A small in-phase pilot shall be added to the data signal. The frequency of the pilot shall be the same as the suppressed-carrier frequency as shown in Figure 6.4. This may be generated in the following manner. A small (digital) DC level (2.5) shall be added to every symbol (data and sync) of the digital baseband data plus sync signal (+l, +3, +5, +7, +9, +11, +13, +15). The power of the pilot shall be 11.3 dB below the average data signal power. 7.3.3 16 VSB Modulation Method The modulation method shall be identical to that in Section 6.9.3, except the number of transmitted levels shall be 16 instead of 8. End of Part 2 44 Doc. A/53, Part 3:2007 3 January 2007 ATSC Digital Television Standard Part 3 – Service Multiplex and Transport Subsystem Characteristics (A/53, Part 3:2007) Advanced Television Systems Committee 1750 K Street, N.W. Suite 1200 Washington, D.C. 20006 www.atsc.org ATSC A/53 Part 3 (Transport) 3 January 2007 The Advanced Television Systems Committee, Inc., is an international, non-profit organization developing voluntary standards for digital television. The ATSC member organizations represent the broadcast, broadcast equipment, motion picture, consumer electronics, computer, cable, satellite, and semiconductor industries. Specifically, ATSC is working to coordinate television standards among different communications media focusing on digital television, interactive systems, and broadband multimedia communications. ATSC is also developing digital television implementation strategies and presenting educational seminars on the ATSC standards. ATSC was formed in 1982 by the member organizations of the Joint Committee on InterSociety Coordination (JCIC): the Electronic Industries Association (EIA), the Institute of Electrical and Electronic Engineers (IEEE), the National Association of Broadcasters (NAB), the National Cable Television Association (NCTA), and the Society of Motion Picture and Television Engineers (SMPTE). Currently, there are approximately 140 members representing the broadcast, broadcast equipment, motion picture, consumer electronics, computer, cable, satellite, and semiconductor industries. ATSC Digital TV Standards include digital high definition television (HDTV), standard definition television (SDTV), data broadcasting, multichannel surround-sound audio, and satellite direct-to-home broadcasting. 2 ATSC A/53 Part 3 (Transport) 3 January 2007 Table of Contents 1. SCOPE .....................................................................................................................................................6 2. REFERENCES .........................................................................................................................................6 2.1 Normative References 6 2.2 Informative References 6 3. COMPLIANCE NOTATION ......................................................................................................................7 3.1 Treatment of Syntactic Elements 7 3.2 Symbols, Abbreviations, and Mathematical Operators 7 4. DEFINITION OF TERMS ..........................................................................................................................7 5. SYSTEM OVERVIEW...............................................................................................................................7 6. SPECIFICATION ......................................................................................................................................9 6.1 MPEG-2 Systems Standard 9 6.1.1 Video T-STD 9 6.1.2 Audio T-STD 10 6.1.2.1 Audio T-STD When TS-E is Not Present 10 6.1.2.2 Audio T-STD When TS-E is Present 10 6.1.3 Program Constraints 10 6.1.3.1 ATSC Digital Television (service_type 0x02) 10 6.1.3.2 ATSC Audio (service_type 0x03) 10 6.1.3.3 Unassociated/Small Screen Service (service_type 0x06) 10 6.2 Identification of MPEG-2 Private Ranges 11 6.2.1 MPEG-2 Registration Descriptor 11 6.2.1.1 Program Identifier 11 6.2.1.2 Audio Elementary Stream Identifier 11 6.2.1.3 Other Program Element Identifiers 11 6.3 Audio Constraints 12 6.4 Constraints on PSI 12 6.4.1 Constraints on Main Services’ Program Specific Information 12 6.4.2 PAT-E 13 6.4.3 PMT-E 13 6.4.4 Multiple PCR_PIDs per Program and a Common Time Base 13 6.4.5 Constraints on Optional Enhanced Services’ Program Specific Information 13 6.5 PES Constraints 14 6.5.1 MPEG-2 Video PES Constraints (for streams of stream_type 0x02) 14 6.5.2 AC-3 Audio PES Constraints (for Streams of stream_type 0x81) 15 6.5.3 Audio PES Constraints for Enhanced AC-3 (stream_type = 0x87) 15 6.6 Services and Features 15 6.6.1 System Information and Program Guide 15 3 ATSC A/53 Part 3 (Transport) 3 January 2007 6.6.1.1 SI base_PID for TS-M 16 6.6.1.2 SI base_PID for TS-E 16 6.6.1.3 System Information and Program Guide STD Model 16 6.6.2 Specification of ATSC Private Data 16 6.7 Assignment of Identifiers 16 6.7.1 AC-3 Audio Stream Type 16 6.7.2 MPEG-2 Video Stream Type 16 6.7.3 Enhanced AC-3 Audio Stream Type 17 6.8 Descriptors 17 6.8.1 AC-3 Audio Descriptor 17 6.8.2 Program Smoothing Buffer Descriptor 17 6.8.3 ISO-639 Language Descriptor 18 6.8.4 ATSC Private Information Descriptor 18 6.8.5 Enhanced Signaling Descriptor 19 6.9 PID Value Assignments 21 6.10 Extensions to the MPEG-2 Systems Specification 22 6.10.1 Scrambling Control 22 7. FEATURES OF 13818-1 NOT SUPPORTED BY THIS STANDARD.....................................................22 7.1 Program Streams 22 7.2 Still Pictures 22 8. TRANSPORT SUBSYSTEM INTERFACES AND BIT RATES ..............................................................22 8.1 Transport Subsystem Input Characteristics 22 8.2 Transport Subsystem Output Characteristics 23 9. PACKET DELIVERY TO THE E-VSB AND VSB MODULATION SYSTEM ..........................................23 9.1 Head End Reference Model 23 9.2 Receiver Reference Model 24 9.3 Stream Delays 25 9.4 PCR Correction 26 9.5 Packet Ordering 26 9.6 Main Stream Packet Jitter Handling (Informative) 26 4 ATSC A/53 Part 3 (Transport) 3 January 2007 Index of Tables and Figures Table 6.1 ATSC Private Information Descriptor 19 Table 6.2 Bit Stream Syntax for the Enhanced Signaling Descriptor 20 Table 6.3 Linkage Preference Values 21 Table 6.4 Transmission Method Values 21 Table 6.5 Transport Scrambling Control Field 22 Figure 5.1 Sample organization of functionality in a transmitter-receiver pair for a single program. 8 Figure 9.1 Head end reference model. 24 Figure 9.2 Reference receiver model. 25 5 ATSC A/53 Part 3 (Transport) 3 January 2007 ATSC Digital Television Standard – Part 3: Service Multiplex and Transport Subsystem Characteristics 1. SCOPE This part of the ATSC Digital Television Standard constitutes the normative specification for the transport subsystem of the Digital Television Standard. The syntax and semantics of this specification conform to ISO/IEC 13818-1 [3], with additional constraints and conditions specified in this standard. Within this context, other ATSC standards may further constrain and/or supplement the transport subsystem specification. 1 2. REFERENCES At the time of publication, the editions indicated were valid. All standards are subject to revision and amendment, and parties to agreements based on this standard are encouraged to investigate the possibility of applying the most recent editions of the documents listed below. 2.1 Normative References The following documents contain provisions that in whole or in part, through reference in this text, constitute provisions of this standard. [1] ATSC: “Digital Audio Compression (AC-3, E-AC-3) Standard,” Doc. A/52B, Advanced Television Systems Committee, Washington, D.C., 14 June 2005. [2] ATSC: “Program and System Information Protocol for Terrestrial Broadcast and Cable,” Doc. A/65C with Amendment No. 1, Advanced Television Systems Committee, Washington, D.C., 2 January 2006 (Amendment No. 1 dated 9 May 2006). [3] ISO: “ISO/IEC IS 13818-1:2000 (E), International Standard, Information technology – Generic coding of moving pictures and associated audio information: systems.” 2.2 Informative References [4] ATSC: “ATSC Digital Television Standard, Part 2 – RF/Transmission System Characteristics,” Doc. A/53, Part 2:2007, Advanced Television Systems Committee, Washington, D.C., 3 January 2007. [5] ATSC: “ATSC Digital Television Standard, Part 5 – AC-3 Audio System Characteristics,” Doc. A/53, Part 5:2007, Advanced Television Systems Committee, Washington, D.C., 3 January 2007. [6] ATSC: “Data Broadcast Standard,” Doc. A/90 with Amendment No. 1 and Corrigendum No. 1 and No. 2, Advanced Television Systems Committee, Washington, D.C., 26 July 2000 (Amendment 1 dated 14 May 2002; Corrigendum 1 and 2 dated 1 April 2002). [7] ATSC: “Digital Television Standard, Part 1 – Digital Television System,” Doc. A/53- 1:2007, Advanced Television Systems Committee, Washington, D.C., 3 January 2007. 1 Note that there is a coordinated effort underway among ATSC, CEA, and SMPTE to revise and clarify standards related to delivering closed captions, AFD, and bar data so that each describes the aspects of the system for which they are primarily responsible without overlap. This effort is expected to result in revisions of those sections in the ATSC Standards. 6 ATSC A/53 Part 3 (Transport) 3 January 2007 3. COMPLIANCE NOTATION As used in this document, “shall” denotes a mandatory provision of the standard. “Should” denotes a provision that is recommended but not mandatory. “May” denotes a feature whose presence does not preclude compliance that may or may not be present at the option of the implementer. 3.1 Treatment of Syntactic Elements This document contains symbolic references to syntactic elements used in the audio, video, and transport coding subsystems. These references are typographically distinguished by the use of a different font (e.g., restricted), may contain the underscore character (e.g., sequence_end_code) and may consist of character strings that are not English words (e.g., dynrng). 3.2 Symbols, Abbreviations, and Mathematical Operators The symbols, abbreviations, and mathematical operators used herein are as found in Section 3.4 of ATSC A/53 Part 1 [7]. 4. DEFINITION OF TERMS PAT-E – A table with the same syntax as Program Association Table as defined by ISO/IEC 13818-1 [3] transmitted using an enhanced VSB mode defined in A/53-2 [4]. PMT-E – A table with the same syntax as Program Map Table as defined by ISO/IEC 13818-1 [3] transmitted using an enhanced VSB mode defined in A/53-2 [4]. PMT-E_PID – A PID that identifies the Transport Stream packets that carry TS_program_map_section()s in a TS-E. Program – Program shall mean the collection of all elements within the emission that have the same program number, independent of the methods used to propagate the program elements. Linked – Alternative elements of a Program are ‘Linked’ when they have identical values in the linked_component_tag field of their Enhanced Signaling Descriptors. TS-M – The portion of TS-R that contains only all Transport Stream packets transmitted by the main mode (see A/53-2 [4]) (see Figure 9.2). TS-R – The recombined Transport Stream containing all Transport Stream packets delivered by all transmission modes (main, one-half rate and one-quarter rate) (see A/53-2 [4]) (see Figure 9.2). TS-E – The portion of TS-R that contains only all Transport Stream packets transmitted by one- half rate and/or one-quarter rate modes (see A/53-2 [4]) (see Figure 9.2). TS-Ea – The portion of TS-E that contains only all Transport Stream packets transmitted by one-half rate mode (see A/53-2 [4]) (see Figure 9.2). TS-Eb – The portion of TS-E that contains only all Transport Stream packets transmitted by one-quarter rate mode (see A/53-2 [4]) (see Figure 9.2). 5. SYSTEM OVERVIEW The transport format and protocol for the Digital Television Standard is a compatible subset of the MPEG-2 Systems specification defined in ISO/IEC 13818-1 [3]. It is based on a fixed-length 7 ATSC A/53 Part 3 (Transport) 3 January 2007 packet transport stream approach that has been defined and optimized for digital television delivery applications. As illustrated in Figure 5.1, the transport subsystem resides between the application (e.g., audio or video) encoding and decoding functions and the transmission subsystem. The encoder’s transport subsystem is responsible for formatting the coded elementary streams and multiplexing the different components of the program for transmission. It also is responsible for delivering packets intended for transmission using coding methods defined in A/53-2 [4]. A receiver recovers the elementary streams for the individual application decoders and for the corresponding error signaling. The transport subsystem also incorporates other higher protocol layer functionality related to properly timing the packets to enable receiver synchronization. Transmitter (video, audio, data, etc.) Sources for encoding packetization and multiplexing Application Transport Encoders * Modu- * lator * * Transport Stream elementary streams, private sections, or PES Transmission Clock Format Receiver depacketization and demultiplexing Presentation Application Transport Decoders Demod- * * ulator * Transport * elementary Stream with streams, private error sections, or PES signaling with error signaling clock control Clock Figure 5.1 Sample organization of functionality in a transmitter-receiver pair for a single program. One approach to describing the system multiplexing approach is to consider it as a combination of multiplexing at two different layers. In the first layer, single program transport streams are formed by multiplexing Transport Stream (TS) packets from one or more Packetized Elementary Stream (PES) and/or private section (ISO/IEC 13818-1 [3] Table 2-30) sources. In the second layer, one or more single program transport streams are combined to form a service 8 ATSC A/53 Part 3 (Transport) 3 January 2007 multiplex of programs (also known as a multi-program transport stream in the MPEG-2 Systems standard, and a Digital Television Standard multiplexed bit stream in this ATSC standard). Program Specific Information (PSI), carried within Transport Stream packets, relates to the identification of programs and the components of each program. In specifying the characteristics of the signal processing required of a system based upon this standard, the functions to be performed may be described in terms of encoders and multiplexers at the transmitting end and in terms of a Reference Receiver at the receiving end. A Reference Receiver is defined such that it performs the functions described in the way that they are described herein, which may or may not be the way in which actual receivers, built to this standard, are designed to operate. Some mandatory transmission requirements are stated herein in terms of the resulting transmission stream elements that are required to be produced, through use of the Reference Receiver functional definition. 6. SPECIFICATION This section of the standard describes the coding constraints that apply to the use of the MPEG-2 systems specification [3] in the digital television system including mandatory main and optional enhanced services. 6.1 MPEG-2 Systems Standard The transport subsystem shall comply with the transport stream definition of the MPEG-2 Systems standard as specified in ISO/IEC 13818-1 [3] and shall be further constrained as specified herein. Program shall mean the collection of all elements within the emission that have the same value of MPEG-2 program_number, independent of the methods used to propagate the program elements. 6.1.1 Video T-STD Video streams in TS-R (Figure 3) shall conform to the T-STD as defined in Sections 2.4.2.2 and 2.4.2.3 of ISO/IEC 13818-1 [3] and shall follow the constraints for the level encoded in the video elementary stream. When there is a video stream of stream_type 0x02 in TS-R, the T-STD buffer Bn defined in ISO/IEC 13818-1 [3], Section 2.4.2 shall apply 2 for such a stream. When a TS-E is present, video streams of stream_type 0x02 may need to be constructed with constraints on the buffer size of the T-STD as defined in Section 2.4.2 of ISO/IEC 13818-1 [3] so that the variable time delay caused by the E-VSB system does not result in a violation of the requirement that the TS-R conform per the preceding paragraph. When there is a video stream of stream_type 0x02 in TS-E, the T-STD buffer Bn defined in ISO/IEC 13818-1 [3] Section 2.4.2 for the stream in TS-E may underflow when the low_delay flag in the video sequence header is set to ‘1’ (per ISO/IEC 13818-1 [3], Section 2.4.2.6). When the low_delay flag in the video sequence header is set to ‘1’ for this TS-E stream, [3] at Section 2.4.2.6 requires that the delay of data through the STD buffer defined by tdn( j) – t(i) is <= 1 second except for still picture video data where tdn( j) – t(i) is <= 60 sec, for all bytes contained in access unit j. The terms tdn( j), t(i), and j are those defined in ISO/IEC 13818-1 [3], Section 2.4.2. Any elementary stream containing Still Picture data shall include a video_stream_descriptor() in 2 Section 7.2 also constrains video streams of stream_type 0x02 in TS-M. 9 ATSC A/53 Part 3 (Transport) 3 January 2007 accordance with ISO/IEC 13818-1 [3] Section 2.6.2 and shall have the value of the field still_picture_flag set to ‘1’ and the interval between I frames shall not be greater than 60 seconds. 6.1.2 Audio T-STD 6.1.2.1 Audio T-STD When TS-E is Not Present The audio T-STD shall comply with Section 3.6 of Annex A of ATSC Standard A/52 [1]. 6.1.2.2 Audio T-STD When TS-E is Present Audio streams of stream_type 0x81 in TS-R (Figure 9.2) shall conform to the T-STD as defined in Sections 2.4.2 and 2.4.2.3 of ISO/IEC 13818-1 [3], with the buffer size BSn = 2592 bytes. Audio streams of stream_type 0x87 (enhanced AC-3) shall conform to the T-STD as defined in Sections 2.4.2 and 2.4.2.3 of ISO/IEC 13818-1 [3], with the buffer size BSn = 5184 bytes. 6.1.3 Program Constraints This section standardizes how to carry Programs 3 in the ATSC system. Each Program shall be constrained to contain certain standardized elementary streams, which are dependent on the 4 service_type associated with that Program and the use of TS-E. Programs may also contain private elementary streams as defined herein and/or by other ATSC standards. 6.1.3.1 ATSC Digital Television (service_type 0x02) For Programs of service_type 0x02, if there is a video program element in the TS-E, the stream_type for that video program element shall be 0x02 and the video Elementary Stream shall be based on the same video content 5 as a video program Elementary Stream in TS-M. In addition, for any given Program of service_type 0x02: • There shall be at most one video program element present in TS-M. • There shall be at most one video program element present in TS-Ea. • There shall be at most one video program element present in TS-Eb. If there is one or more audio program element in the TS-E, the stream_type for each audio Elementary Stream in TS-E shall be 0x81 or 0x87 and each program element shall be based upon the same audio source as an audio element of the Program in TS-M. 6.1.3.2 ATSC Audio (service_type 0x03) For Programs of service_type 0x03 that include audio program elements in both TS-M and TS-E, the stream_type for each audio Elementary Stream in TS-E shall be 0x81 or 0x87 and each program element shall be based upon the same audio source as an audio element of the Program in TS-M. 6.1.3.3 Unassociated/Small Screen Service (service_type 0x06) An additional type of television service is reserved for use in TS-E called “unassociated/small screen service,” which shall use service_type 0x06. An unassociated/small screen service carries 3 Program is defined in Section 3. 4 This field is defined in A/65 [2], Section 6.3.1 and Table 6.7. 5 The video in the TS-E may be a different aspect ratio, frame rate, and/or bitrate than that in TS- M. 10 ATSC A/53 Part 3 (Transport) 3 January 2007 programming whose contents are not based on the same schedule of programming as that carried in TS-M. Programs of service_type 0x06 shall be announced in TS-E and all program elements shall be carried only in TS-E. Further constraints on program elements for service_type 0x06 are to be defined at a future date. 6.2 Identification of MPEG-2 Private Ranges ATSC defines code points in the MPEG-2 user private range and may define code points private to ATSC users within this range. 6.2.1 MPEG-2 Registration Descriptor Under circumstances as defined below, this standard uses the MPEG-2 Registration Descriptor described in Sections 2.6.8 and 2.6.9 of ISO/IEC 13818-1 [3] to identify the contents of programs and program elements to decoding equipment. No more than one MPEG-2 Registration Descriptor shall appear in any given descriptor loop. The presence of an MPEG-2 Registration Descriptor in any descriptor loop shall not affect the meaning of any other descriptor(s) in the same descriptor loop. The ATSC Private Information Descriptor (defined in Section 6.8.4) shall be the method to carry descriptor-based information associated with a private entity. 6.2.1.1 Program Identifier Programs that conform to ATSC standards may be identified by use of an MPEG-2 Registration Descriptor (as defined in Sections 2.6.8 and 2.6.9 of ISO/IEC 13818-1 [3]). When present for this purpose, the MPEG-2 Registration Descriptor shall be placed in the descriptor loop immediately following the program_info_length field of the TS_program_map_section() describing this program and the format_identifier field of this MPEG-2 Registration Descriptor shall have a value 0x4741 3934 (“GA94” in ASCII). 6.2.1.2 Audio Elementary Stream Identifier The presence of audio elementary streams of stream_type 0x81 that conform to ATSC standards may be indicated by use of an MPEG-2 Registration Descriptor (as defined in Sections 2.6.8 and 2.6.9 of ISO/IEC 13818-1 [3]). When present for this purpose, the MPEG-2 Registration Descriptor shall be placed in the descriptor loop immediately following the ES_info_length field in the TS_program_map_section() for each program element of stream_type 0x81 (AC-3 audio) and the format_identifier field of the MPEG-2 Registration Descriptor shall have a value 0x4143 2D33 (“AC-3” in ASCII). 6.2.1.3 Other Program Element Identifiers Any program element carrying content not described by an approved ATSC standard shall be identified with an MPEG-2 Registration Descriptor (as defined in Sections 2.6.8 and 2.6.9 of ISO/IEC 13818-1 [3]). The format_identifier field of the MPEG-2 Registration Descriptor shall be 6 registered with the SMPTE Registration Authority, LLC . The descriptor shall be placed in the descriptor loop immediately following the ES_info_length field in the TS_program_map_section() for each such non-standard program element. 6 The ISO/IEC-designated registration authority for the format_identifier is SMPTE Registration Authority, LLC. See (http://www.smpte-ra.org/mpegreg.html). 11 ATSC A/53 Part 3 (Transport) 3 January 2007 6.3 Audio Constraints For each Program of service_type 0x02 delivered in TS-M, if audio is present, at least one audio component shall be a complete main audio service (CM) 7. For each Program of service_type 0x03 delivered in TS-M, at least one audio component shall be a complete main audio service (CM). For each Program of service_type 0x02 completely delivered in TS-E, if audio is present, at least one audio component shall be a complete main audio service (CM). For each Program of service_type 0x03 completely delivered in TS-E, at least one audio component shall be a complete main audio service (CM). Audio program elements within one Program with the same value of linked_component_tag (defined in Section 6.8.5) shall be of the same audio service type (for example CM, VI, or HI) 8. 6.4 Constraints on PSI All Program elements in the Transport Stream shall be described in the PSI. 6.4.1 Constraints on Main Services’ Program Specific Information There are the following constraints on the PSI information in TS-M: • Transport Stream packets identified by a particular PMT_PID value shall be constrained to carry only one program definition, as described by a single TS_program_map_section(). For terrestrial broadcast applications, these Transport Stream packets shall be further constrained to carry no other kind of PSI table. • The Transport Stream shall be constructed such that the time interval between the byte containing the last bit of the TS_program_map_section() containing television program information and successive occurrences of the same TS_program_map_section() shall be less than or equal to 400 milliseconds. • The program numbers are associated with the corresponding PMT_PIDs in the Program Association Table (PAT). The Transport Stream shall be constructed such that the time interval between the byte containing the last bit of the program_association_section() and successive occurrences of the program_association_section() shall be less than or equal to 100 milliseconds. However, when program_association_section()s, CA_section()s, and TS_program_map_section()s are approaching their maximum allowed sizes, the potential exists to exceed the 80,000 bps rate specified in ISO/IEC 13818-1 [3] Section 2.4.2.3. In cases where the table section sizes are such that the 100 millisecond repetition rate of the program_association_section() would cause the 80,000 bps maximum rate to be exceeded, the time interval between the byte containing the last bit of the program_association_section() may be increased but in no event shall exceed 140 milliseconds, so that under no circumstances the limit of 80,000 bps is exceeded. • When an Elementary Stream of stream_type 0x02 (MPEG-2 video) is present in the Transport Stream, the data_stream_alignment_descriptor() (described in Section 2.6.10 of ISO/IEC 13818-1 [3]) shall be included in the descriptor loop immediately following the 7 CM is defined in Section 6 of A/53-5 [5]. 8 Audio service types are defined in Section 6 of A/53-5 [5] and should not be confused with the service_type field defined in A/65 [2]. 12 ATSC A/53 Part 3 (Transport) 3 January 2007 ES_info_length field in the TS_program_map_section() describing that Elementary Stream. The descriptor_tag value shall be set to 0x06, the descriptor_length value shall be set to 0x01, and the alignment_type value shall be set to 0x02 (video access unit). • Adaptation headers shall not occur in Transport Stream packets identified by a program_map_PID value for purposes other than for signaling with the discontinuity_indicator that the version_number (Section 2.4.4.9 of ISO/IEC 13818-1 [3]) may be discontinuous. • Adaptation headers shall not occur in Transport Stream packets identified by PID 0x0000 (the PAT PID) for purposes other than for signaling with the discontinuity_indicator that the version_number (Section 2.4.4.5 of ISO/IEC 13818-1 [3]) may be discontinuous. • This standard does not define a Network Information Table (NIT) as specified in MPEG- 2 Systems. The use of program_number 0x0000 should be avoided as MPEG-2 Systems reserves this value for the network_PID, which in turn is used to identify the TS packets of a NIT. 6.4.2 PAT-E Sections of the PAT-E shall be carried in one or more program_association_section()s within TS-E packets and shall be identified by the PID 0x1FF7 and shall only reference PMT_PID values for TS_program_map_section()s present in Transport Stream packets in TS-E. Sections of the PAT-E shall be present only in the TS-Eb when TS-Eb is present. 6.4.3 PMT-E Sections of PMT-E shall be carried in TS_program_map_section()s within TS-E packets identified by the PAT-E and shall only reference PID values for packets that are present in the TS-E. Sections of PMT-E shall be present only in the TS-Eb when TS-Eb is present. 6.4.4 Multiple PCR_PIDs per Program and a Common Time Base For every Program that includes program elements in both TS-M and TS-E, there is a TS_program_map_section() carried in TS-M and a TS_program_map_section() carried in TS-E with a common value of program_number. For such a Program, each of the PCRs referenced by the PCR_PID fields in the two TS_program_map_section()s shall be samples from a common time clock. Note that a PCR_PID value of 0x1FFF has the same meaning as in ISO/IEC13818-1 [3] Section 2.4.4.9. 6.4.5 Constraints on Optional Enhanced Services’ Program Specific Information There are the following constraints on the PSI information in TS-E: • Only one Program shall be described in a transport bit stream for TS_program_map_section()s carried in packets with a particular PMT-E_PID value. Transport stream packets containing a TS_program_map_section() and identified with a particular PID value shall not be used to transmit any other kind of PSI table (identified by a different table_id). • TS-E should be constructed such that the time interval between the byte containing the last bit of the TS_program_map_section() containing television program information and successive occurrences of the same TS_program_map_section() should be less than or equal to 1600 milliseconds. • The program numbers referenced in the PAT-E shall be associated with the corresponding PMT-E_PIDs in the collection of program_association_section()s, which are sections of the PAT-E. The sections that make up the PAT-E shall be carried in packets 13 ATSC A/53 Part 3 (Transport) 3 January 2007 identified with the PID 0x1FF7. The TS-E should be constructed such that the time interval between the byte containing the last bit of the program_association_section() and successive occurrences of the program_association_section() should be less than or equal to 800 milliseconds. • When the video elementary stream_type is equal to 0x02 the descriptor loop immediately following ES_info_length in the TS_program_map_section() shall contain the data_stream_alignment_descriptor() described in Section 2.6.10 of ISO/IEC 13818-1[3] and the alignment_type field shown in Table 2-47 of ISO/IEC 13818-1 shall be 0x02. • Adaptation headers shall not occur in Transport Stream packets of the PMT-E_PID for purposes other than for signaling with the discontinuity_indicator that the version_number (Section 2.4.4.5 of ISO/IEC 13818-1) may be discontinuous. 6.5 PES Constraints Packetized Elementary Stream syntax and semantics shall be used to encapsulate the audio and video elementary stream information. The Packetized Elementary Stream syntax is used to convey the Presentation Time-Stamp (PTS) and Decoding Time-Stamp (DTS) information required for decoding audio and video information with synchronism. This section describes the coding constraints on this MPEG-2 Systems layer. Within the PES packet header, the following restrictions shall apply: • PES_scrambling_control shall be coded as ‘00’. • ESCR_flag shall be coded as ‘0’. • ES_rate_flag shall be coded as ‘0’. • PES_CRC_flag shall be coded as ‘0’. Within the PES packet extension, the following restrictions shall apply. • PES_private_data_flag shall be coded as ‘0’. • pack_header_field_flag shall be coded as ‘0’. • program_packet_sequence_counter_flag shall be coded as ‘0’. • P-STD_buffer_flag shall be coded as ‘0’. 6.5.1 MPEG-2 Video PES Constraints (for streams of stream_type 0x02) Each PES packet shall begin with a video access unit, as defined in Section 2.1.1 of ISO/IEC 13818-1 [3], which shall be aligned with the PES packet header. The first byte of a PES packet payload shall be the first byte of a video access unit. Each PES header shall contain a PTS. Additionally, it shall contain a DTS as appropriate. For terrestrial broadcast, the PES packet shall not contain more than one coded video frame, and shall be void of video picture data only when transmitted in conjunction with the discontinuity_indicator to signal that the continuity_counter may be discontinuous. Within the PES packet header, the following restrictions apply: • The PES_packet_length shall be coded as 0x0000 • data_alignment_indicator shall be coded as ‘1’ 14 ATSC A/53 Part 3 (Transport) 3 January 2007 6.5.2 AC-3 Audio PES Constraints (for Streams of stream_type 0x81) The AC-3 audio decoder may be capable of simultaneously decoding more than one elementary stream containing different program elements, and then combining the program elements into a complete program. In this case, the audio decoder may sequentially decode audio frames (or audio blocks) from each elementary stream and do the combining (mixing together) on a frame or (block) basis. In order to have the audio from the two elementary streams reproduced in exact sample synchronism, it is necessary for the original audio elementary stream encoders to have encoded the two audio program elements frame synchronously; i.e., if audio program 1 has sample 0 of frame n at time t0, then audio program 2 should also have frame n beginning with its sample 0 at the identical time t0. If the encoding is done frame synchronously, then matching audio frames should have identical values of PTS. If PES packets from two AC-3 audio services that are to be decoded simultaneously contain identical values of PTS then the corresponding encoded audio frames contained in the PES packets should be presented to the audio decoder for simultaneous synchronous decoding. If the PTS values do not match (indicating that the audio encoding was not frame synchronous) then the audio frames that are closest in time may be presented to the audio decoder for simultaneous decoding. In this case, the two services may be reproduced out of sync by as much as one-half of a frame time (which is often satisfactory; e.g., a voice-over does not require precise timing). The value of stream_id for AC-3 shall be ‘1011 1101’ (private_stream_1). 6.5.3 Audio PES Constraints for Enhanced AC-3 (stream_type = 0x87) The value of stream_id for enhanced AC-3 shall be ‘1011 1101’ (private_stream_1). If an audio service is delivered in TS-E, and is Linked to an audio service delivered in TS-M, constraints are required on the content of the corresponding audio access units, and on the PTS values in the corresponding PES packets, of the two Linked audio services in order to enable a receiver to perform sample time synchronous switching between the two audio services (e.g., fallback audio). The constraints are: 1) That the audio access units of AC-3 in TS-M, and the corresponding audio access units of enhanced AC-3 in TS-E shall be constructed from sets of input audio samples that are time synchronous; and 2) That the presentation times corresponding to the corresponding audio access units shall be identical. Note that not every audio access unit has an explicit value of PTS in a PES because there is only one PTS per PES while there can be multiple audio access units per PES. However, there is a unique presentation time for each audio access unit, and it is the presentation times that must be identical to enable a receiver to switch between the two audio streams with exact sample synchronism. 6.6 Services and Features 6.6.1 System Information and Program Guide Transport Streams shall include system information and program guide data formatted according to the structure and syntax described in ATSC Standard A/65 “Program and System Information Protocol for Terrestrial Broadcast and Cable” [2]. System information and program guide data 15 ATSC A/53 Part 3 (Transport) 3 January 2007 shall be conveyed in Transport Stream packets of PID 0x1FFB, which shall be reserved exclusively for this purpose. System information provides data necessary for navigation among digital service offerings. The program guide database allows a receiver to build an on-screen grid of program information for the various services that may be available. 6.6.1.1 SI base_PID for TS-M System information and program guide data shall be conveyed in Transport Stream packets of PID 0x1FFB, which shall be reserved exclusively within TS-R (Figure 9.2) for this purpose. 6.6.1.2 SI base_PID for TS-E System information and program guide data shall be conveyed in TS-E packets of PID 0x1FF9, which shall be reserved exclusively within TS-R (Figure 9.2) for this purpose. 6.6.1.3 System Information and Program Guide STD Model The STD model for program guide and system information is specified in ATSC Standard A/65 [2]. 6.6.2 Specification of ATSC Private Data Within the ATSC set of standards, private data may be transported by various means: 1) Data services – Carriage of ATSC data services including system information shall be as documented in applicable ATSC standards. See for example the ATSC A/90 Data Broadcast Standard [6]. 2) Private program elements – The stream_type codes in the range 0xC4 to 0xFF shall be available for stream types defined privately (not described by ATSC standards). Such privately-defined program elements are associated with an MPEG-2 Registration Descriptor (see Section 6.2.1.3). 3) Adaptation fields – Private data may be transmitted within the adaptation field of Transport Stream packets (Sections 2.4.3.4 and 2.4.3.5 of ISO/IEC 13818-1 [3]). Program elements that include private data in the adaptation fields of their Transport Stream packets shall be associated with an MPEG-2 Registration Descriptor (see Section 6.2.1.3). 6.7 Assignment of Identifiers In this section, those identifiers and codes that shall have a fixed value are summarized. These include PES Stream IDs and Descriptors. Stream_type codes for program element types managed by the ATSC Code Points Registrar (currently assigned or available for future assignment) shall be in the range 0x80 to 0xC3. Stream_type code 0x81 has already been assigned within the Digital Television Standard (see Section 6.7.1). Those descriptor_tag codes managed by the ATSC Code Points Registrar (currently assigned or available for future assignment) shall be in the range 0x40 to 0xEF. 6.7.1 AC-3 Audio Stream Type The stream_type value for AC-3 audio program elements shall be 0x81. 6.7.2 MPEG-2 Video Stream Type The stream_type value for MPEG-2 video program elements shall be as defined in ISO/IEC 13818-1 [3] which value is 0x02. 16 ATSC A/53 Part 3 (Transport) 3 January 2007 6.7.3 Enhanced AC-3 Audio Stream Type The stream_type value for the enhanced AC-3 audio program element defined in A/52 [1] Annex E shall be 0x87. 6.8 Descriptors Unless explicitly stated to the contrary for a given descriptor, no more than one descriptor with a given value of descriptor_tag shall appear in any descriptor loop. 6.8.1 AC-3 Audio Descriptor When an Elementary Stream of stream_type 0x81 (AC-3 audio) or stream_type 0x87 (E-AC-3) is present in the digital television Transport Stream, an AC-3 Audio Descriptor (AC- 3_audio_stream_descriptor()) shall be included in the descriptor loop immediately following the ES_info_length field in the TS_program_map_section() describing that Elementary Stream. The syntax shall be as given in Table A2 of Annex A of ATSC Standard A/52 [1]. The following constraints shall apply to the AC-3 Audio Descriptor: 1) The value of the descriptor_tag shall be 0x81. 2) The 6-bit field for Bit Rate Code shall have a value in the range ‘000000’ through ‘001111’ or ‘100000’ through ‘101111’; i.e., signaling a bit rate less than or equal to 448 kbps. 3) The num_channels field shall have a value in the range 1 to 13. 4) The langcod field is a reserved field. Audio language shall be indicated using an ISO-639 Language Descriptor (see Section 6.8.3). 5) The descriptor shall identify the type of the audio service in the bsmod field, which shall be the same as the bsmod field in the elementary stream associated with this descriptor. 6) The descriptor may optionally carry a 3-byte language code that is represented per ISO- 639. If this language code is present in the AC-3_audio_stream_descriptor(), it shall match the language code carried in the ISO_639_language_descriptor(), if present. Effective 1 March 2008, audio language shall be indicated by including the optional ISO-639 Language bytes within the AC-3_audio_stream_descriptor(), at which point the use of the ISO-639 Language Descriptor to indicate language shall be optional, but recommended to support legacy devices requiring the ISO-639_language_descriptor(). Informative note: Receiving devices are expected to use the bsmod (bit stream mode) field in the AC-3_audio_stream_descriptor() to determine the type of each AC-3 or E-AC-3 audio stream rather than the audio_type field in the ISO_639_language_descriptor(). 6.8.2 Program Smoothing Buffer Descriptor The TS_program_map_section() of each program shall contain a smoothing buffer descriptor pertaining to that program in accordance with Section 2.6.30 of ISO/IEC 13818-1. During the continuous existence of a program, the value of the elements of the smoothing buffer descriptor shall not change. The fields of the smoothing buffer descriptor shall meet the following constraints: 17 ATSC A/53 Part 3 (Transport) 3 January 2007 • The field sb_leak_rate shall be allowed to range up to the maximum transport rates specified in Section 7.2. • The field sb_size shall have a value less than or equal to 2048. The size of the smoothing buffer is thus ≤ 2048 bytes. 6.8.3 ISO-639 Language Descriptor In the ATSC Digital Television System, the ISO_639_language_descriptor() defined in ISO/IEC 13818-1 [3] Section 2.6.18 shall be used to indicate the language of AC-3 audio (stream_type 0x81) or enhanced AC-3 (stream_type 0x87) Elementary Stream components. Effective 1 March 2008, audio language, when indicated, shall be indicated by including the ISO-639 Language bytes within the AC-3_audio_stream_descriptor(), at which point the use of the ISO-639 Language Descriptor to indicate language shall be optional, but recommended to support legacy devices requiring the ISO-639_language_descriptor(). When used, the ISO_639_language_descriptor() shall be included in the descriptor loop immediately following the ES_info_length field in the TS_program_map_section() for each Elementary Stream of stream_type 0x81 (AC-3 audio) or 0x87 (enhanced AC-3 audio) or when the number of AC-3 or enhanced AC-3 audio Elementary Streams in the TS_program_map_section() having the same value of bit stream mode (bsmod in the AC-3 Audio Descriptor) is two or more. Informative note: As an example, consider an MPEG-2 program that includes two AC-3 audio ES components: a AC-3 Complete Main (CM) audio track (bsmod = 0) and a Visually Impaired (VI) audio track (bsmod = 2). Inclusion of the ISO_639_language_descriptor() is optional for this program. If a second CM track were to be added, however, it would then be necessary to include ISO_639_language_descriptor()s in the TS_program_map_section(). The audio_type field in any ISO_639_language_descriptor() used in this standard shall be set to 0x00 (meaning “undefined”). An ISO_639_language_descriptor() may be present in the TS_program_map_section() in other positions as well, for example to indicate the language or languages of a textual data service program element. 6.8.4 ATSC Private Information Descriptor The ATSC_private_information_descriptor() provides a method to carry and unambiguously label private information. More than one ATSC_private_information_descriptor() may appear within a single descriptor loop. Table 6.1 defines the bit-stream syntax of the ATSC_private_information_descriptor(). 18 ATSC A/53 Part 3 (Transport) 3 January 2007 Table 6.1 ATSC Private Information Descriptor Syntax No. of Bits Format ATSC_private_information_descriptor() { descriptor_tag 8 0xAD descriptor_length 8 uimsbf format_identifier 32 uimsbf for (i = 0; i < N; i++) { private_data_byte 8 bslbf } } descriptor_tag – This 8-bit field is set to 0xAD. descriptor_length – This 8-bit field specifies the number of bytes of the descriptor immediately following the descriptor_length field. format_identifier – The format_identifier is a 32-bit field as defined in ISO/IEC 13818-1 [3], Section 2.6.9 for the registration_descriptor(). Only format_identifier values registered and recognized by the SMPTE Registration Authority, LLC shall be used (see http://www.smpte- ra.org/mpegreg.html) 9. Its use in this descriptor shall scope and identify only the private information contained within this descriptor. private_data_byte – The syntax and semantics of this field is defined by the assignee of the format_identifier value. 6.8.5 Enhanced Signaling Descriptor The Enhanced Signaling Descriptor identifies the method of terrestrial broadcast transmission (main, one-half -rate, one-quarter rate) 10 of a program element, and when a program element is an alternative to another program element, is Linked to the alternative and indicates the broadcaster’s preference. For example, an Enhanced Signaling Descriptor may indicate that although an audio stream is the same language and content as another, it is preferred for some reason. The Enhanced Signaling Descriptor may be included in either descriptor loop in the TS_program_map_section() structure. When the Enhanced Signaling Descriptor is only in the descriptor loop in the TS_program_map_section() just following the program_info_length field, the linkage_preference field shall have the value ‘00’. In this case the Enhanced Signaling Descriptor shall indicate the transmission method (tx_method) for every program element described in the TS_program_map_section() that does not have a separate instance of the Enhanced Signaling Descriptor included in the descriptor loop immediately following the ES_info_length field (see precedence rule below). The Enhanced Signaling Descriptor shall be present for each program element that is Linked (linkage_preference not equal to ‘00’) to another program element. Therefore the Enhanced 9 SMPTE Registration Authority, LLC, 595 West Hartsdale Avenue, White Plains, NY 10607 USA. 10 See ATSC A/53-2 [4] for definitions. 19 ATSC A/53 Part 3 (Transport) 3 January 2007 Signaling Descriptor appears in TS_program_map_section()s in the TS-M, TS-Ea and TS-Eb to establish linkage. When the Enhanced Signaling Descriptor is in the descriptor loop immediately following the ES_info_length field in the TS_program_map_section() then: • No more than one alternative program element shall be referenced for any given program element. • If the program element has the linkage_preference value of ‘00,’ then the program element is not Linked. • If the program element has the linkage_preference value of ‘01,’ then the Linked alternative program element shall have the linkage_preference value of ‘01’ and the two shall have identical values for program_number and shall have identical values for linked_component_tag. • If the program element has the linkage_preference value of ‘10,’ then the Linked alternative program element shall have the linkage_preference value of ‘11’ and the two shall have identical values for program_number and shall have identical values for linked_component_tag. • If the program element has the linkage_preference value of ‘11,’ then the Linked alternative program element shall have the linkage_preference value of ‘10’ and the two shall have identical values for program_number and shall have identical values for linked_component_tag. If the Enhanced Signaling Descriptor is in both descriptor loops, the Enhanced Signaling Descriptor in the descriptor loop immediately following the ES_info_length field shall take precedence for the particular program element described therein. An Enhanced Signaling Descriptor shall be present for each program element transmitted using one-half rate or one-quarter rate coding in either descriptor loop in the TS_program_map_section(), or both descriptor loops in the TS_program_map_section(), subject to the above precedence rule. The bit stream syntax for the Enhanced Signaling Descriptor is shown in Table 6.2. Table 6.2 Bit Stream Syntax for the Enhanced Signaling Descriptor 11 Syntax No. of Bits Format enhanced_signaling_descriptor () { descriptor_tag 8 0xB2 descriptor_length 8 uimsbf linkage_preference 2 uimsbf tx_method 2 uimsbf if (linkage_preference != ‘00’) linked_component_tag 4 uimsbf else reserved 4 ‘1111’ } 11 This descriptor may be extended in the future to provide additional capability by adding a byte loop after the last defined byte. Proposals to do so are expected to be considered in the future. 20 ATSC A/53 Part 3 (Transport) 3 January 2007 descriptor_tag – This 8-bit unsigned integer shall have the value 0xB2, identifying this descriptor as an enhanced_signaling_descriptor. descriptor_length – This 8-bit unsigned integer that shall indicate the length (in bytes) immediately following this field up to the end of this descriptor. linkage_preference – This 2-bit bit field indicates whether or not the associated program element is Linked to another program element in the transmission. If so, it identifies the broadcaster’s preference for the program element associated with this descriptor. The linkage_preference field shall be coded as defined in Table 6.3. reserved – This 4 bit field shall contain all ‘1’s. Table 6.3 Linkage Preference Values linkage_preference Meaning ‘00’ not_linked – This element is not Linked to any other program element in the transmission ‘01’ linked_no_preference – There is no preference between Linked program elements ‘10’ linked_preferred –This program element is preferred over the Linked program element ‘11’ linked_not_preferred – The Linked program element is preferred over this program element tx_method – This 2-bit field shall identify the VSB transmission method used to transmit the associated program element. See Table 6.4. Table 6.4 Transmission Method Values tx_method Meaning ‘00’ main – This program element transmitted using main coding ‘01’ half_rate – This program element transmitted using rate-one-half enhanced coding ‘10’ quarter_rate – This program element transmitted using rate-one-quarter enhanced coding ‘11’ Reserved for signaling future transmission methods linked_component_tag – A 4-bit unsigned integer that links the associated program element to an equivalent, preferred, or less preferred alternative. The Linked program element is the program element labeled with the same value of linked_component_tag in a TS_program_map_section labeled with an equivalent value of program_number as the TS_program_map_section that carries this descriptor. If the program element has the linkage_preference value equal to ‘00’, then the program element is not Linked and the linked_component_tag value shall be reserved. See Table 6.2. 6.9 PID Value Assignments In order to avoid collisions with fixed PID values and ranges already established in this and other international standards, transport_packet() PID field values are restricted as follows: • TS packets identified with PID values in the range 0x1FF0 – 0x1FFE shall only be used to transport data compliant with ATSC-recognized standards specifying fixed-value PID assignments in that range. (Informative note: One such use is A/65, which requires the use of 0x1FFB to identify packets containing certain tables defined in that standard.) • In order to avoid collisions with fixed PID values and ranges already established in this and other international standards, PID values used to identify Transport Stream packets 21 ATSC A/53 Part 3 (Transport) 3 January 2007 carrying TS_program_map_section() or program elements shall not be set below 0x0030. (Informative note: One such use is in ETS 300 468, which requires the use of 0x0011 to identify packets containing certain tables defined in that standard.) 6.10 Extensions to the MPEG-2 Systems Specification This Section covers extensions to the MPEG-2 Systems specification. 6.10.1 Scrambling Control The scrambling control field within the packet header allows all states to exist in the digital television system as defined in Table 6.5. Table 6.5 Transport Scrambling Control Field transport_scrambling_ Function control ‘00’ packet payload not scrambled ‘01’ not scrambled, state may be used as a flag for private use defined by the service provider. ‘10’ packet payload scrambled with “even” key ‘11’ packet payload scrambled with “odd” key Elementary Streams for which the transport_scrambling_control field does not exclusively have the value of ‘00’ for the duration of the program must carry a CA_descriptor in accordance with Section 2.6.16 of ISO/IEC 13818-1 [3]. The implementation of a digital television delivery system that employs conditional access will require the specification of additional data streams and system constraints. 7. FEATURES OF 13818-1 NOT SUPPORTED BY THIS STANDARD The transport definition is based on the MPEG-2 Systems standard, ISO/IEC 13818-1; however, it does not implement all parts of the standard. This section describes those elements that are omitted from or constrained by this standard. 7.1 Program Streams This part of the ATSC Digital Television Standard does not include those portions of ISO/IEC 13818-1[3] and Annex A of ATSC Standard A/52 [1] that pertain exclusively to Program Stream specifications. 7.2 Still Pictures This Annex does not include those portions of ISO/IEC 13818-1 Transport Stream specification that pertain to the Still Picture model for the main service (TS-M). 8. TRANSPORT SUBSYSTEM INTERFACES AND BIT RATES 8.1 Transport Subsystem Input Characteristics The MPEG-2 Systems standard defines system coding at two hierarchical layers: The packetized elementary stream (PES) and the systems stream, either in Transport Stream or Program Stream format (the ATSC only uses the Transport Stream format). Under this standard and by common 22 ATSC A/53 Part 3 (Transport) 3 January 2007 industry usage, private_section encapsulated data is a parallel layer to PES. Physical implementations may include the PES packetizer within a video, audio, or other data encoder; and a private_section encapsulator within a data encoder; and not as part of the transport subsystem. Therefore, the inputs to the transport subsystem may be elementary streams, PES packets, or private_section encapsulated data. 8.2 Transport Subsystem Output Characteristics Conceptually, the output from the transport subsystem is a continuous MPEG-2 Transport Stream as defined in this document at a constant rate of Tr Mbps when transmitted in an 8 VSB system and 2Tr when transmitted in a 16 VSB system where ⎛ 188 ⎞ ⎛ 312 ⎞ ⎛ 684 ⎞ Tr = 2 × ⎜ ⎟⎜ ⎟⎜ ⎟ × 4.5 = 19.39... Mbps ⎝ 208 ⎠ ⎝ 313 ⎠ ⎝ 286 ⎠ The symbol rate Sr in Msymbols per second for the transmission subsystem (see Section 5 of ATSC A/53-2 [4]) is ⎛ 684 ⎞ Sr = ⎜ ⎟ × 4.5 = 10.76... Msymbols per second ⎝ 286 ⎠ Tr and Sr shall be locked to each other in frequency. Note: The signals in the source coding subsystems (see A/53-4, -5, and -6) and the signals in the transport/transmission subsystems (A/53-2 and -3) are not required to be frequency-locked to each other, and in many implementations will operate asynchronously. In such systems, the frequency drift can necessitate the occasional insertion or deletion of a null packet so that the transport subsystem accommodates the frequency disparity and thereby meets the requirement to remain locked with respect to the transmission subsystem symbol rate. All Transport Streams conforming to this standard shall conform to the ISO/IEC 13818-1 [3] T-STD (Transport System Target Decoder) model. 9. PACKET DELIVERY TO THE E-VSB AND VSB MODULATION SYSTEM This section describes a reference model for interfacing with the enhanced transmission system model as described in ATSC A/53-2 [4] and a receiver reference model. These models are used in specifying system constraints and requirements. 9.1 Head End Reference Model The E-VSB exciter defined in ATSC A/53-2 [4] can accept three separate Input Streams (IS), one each for Transport Stream packets destined to be sent via each of the one-half rate (IS-Ea), one-quarter rate (IS-Eb) and main rate (IS-N) modes. As evinced by ATSC A/53-2 [4], only one of the three inputs has a Transport Stream packet at any one time. 23 ATSC A/53 Part 3 (Transport) 3 January 2007 IS-Ea IS-Ea’ Rate ½ Fixed Rate ½ ½ Input Delay Buffer EVSB Exciter Model MPEG-2 IS-E IS-Eb (Annex D) IS To be an IS-Eb’ Data Transport To be at Rate ¼ Fixed Enhanced Rate ¼ ¼ Input Sources Stream what Rate? Delay Buffer Packet? Yes Multiplexer IS-N Normal Mode IS-N’ Normal No Fixed Delay Input Buffer Figure 9.1 Head end reference model. The above Head End Reference Model (Figure 9.1) models the data flow for Transport Stream packets from the final MPEG-2 multiplexer through logic and buffers and into the three exciter inputs (ATSC A/53-2 [4]). The decisions about whether a packet is enhanced or not, and, if enhanced, which of one- quarter-rate or one-half-rate is to be used (as shown by diamond boxes in Figure 9.1), are made based on the PID value in each Transport Stream Packet. Special rules for certain streams (like PSIP-E data) may apply (see A/65 [2]). The three fixed delay buffers shown in Figure 9.1 are utilized to effect a delay on each stream (IS-Ea, IS-Eb, IS-N) to compensate for the fixed variance of delivery time due to the additional interleaving and error correction applied to streams in IS-E. The magnitude of the delay on each stream (IS-Ea, IS-Eb, IS-N) that must be applied to the various streams varies by which MAP is being utilized. The stream labeled “IS” in Figure 9.1 may be a MPEG-2 Transport Stream, but need not be. A Transport Stream has constraints and requirements on it as to timing that need not be met at IS, but shall be met at TS-R in the Receiver Reference Model. It is, however, at least “similar” to a MPEG-2 Transport Stream. 9.2 Receiver Reference Model The modeled Reference Receiver (as shown in Figure 9.2) has zero processing time, has a packet selector and has additional de-interleaving for the enhanced data. 24 ATSC A/53 Part 3 (Transport) 3 January 2007 TS-E Packet TS-Ea Reassembly Select next Rate ½ Buffer available TS -Ea MPEG-2 or TS-Eb Packet conformant Rate? transport TS-E stream Enhanced Data Deinterleaver & Rate ¼ TS-E Packet Additional Reassembly TS-R Transport Processing Buffer TS-Eb Demux Null Yes Off-Air VSB Enhanced Signal Demodulator Packet? Select next available TS, TS-E No TS-M or Null Packet Emits 188-byte packet immediately when available Figure 9.2 Reference receiver model. The modeled Reference Receiver (as shown in Figure 9.2) has two 188-byte buffers that are filled by rate-one-quarter and rate-one-half data as they are recovered from the rate-one-quarter and rate-one-half E-VSB segments, respectively. The streams TS-Ea and TS-Eb contain the series of bytes from the respective TS-E reassembly buffers for each packet after it is assembled. At any time a buffer has an entire Transport Stream packet available for output (188 bytes), that buffer is selected by the switches and output immediately. The Reference Receiver does not reorder packets identified with a particular PID. Note that due to the transmission system and the instantaneously modeled Reference Receiver behavior, it is not possible for more than one data source (TS-Ea, TS-Eb, or TS-M) to be emitting data at the same time. However, as it is possible for no data to be available at a given packet time, the Reference Receiver selects a Null packet for insertion, so that the output is at a constant packet rate 12 to deliver 19.39… Mbps. The emitted E-VSB stream shall be ordered and coded such that when processed by the Reference Receiver, the output from the switch (TS-R) shall be an MPEG-2 Transport Stream compliant with Sections 1 through 7 of this Part of the ATSC Digital Television Standard. The PCRs shall be adjusted as described in Section 9.4. TS-E is the portion of the E-VSB transmission that results in TS-E in the Reference Receiver of Figure 9.2. 9.3 Stream Delays Some streams shall be delayed relative to other streams by a number of segment times to compensate for the additional processing of enhanced-mode streams. The actual values needed are dependent on the particular mix of the enhanced and main rates defined in ATSC A/53-2 [4] Section 5.6. Note that buffers of the below sizes are sufficient to accomplish this delay: 12 This models a system that produces a constant rate packet stream output, however this may not be necessary in a real receiver. 25 ATSC A/53 Part 3 (Transport) 3 January 2007 • One-half rate stream: 128 kB • One-quarter rate stream: 128 kB • Main stream: 3 MB This delay compensates for the E-VSB interleaver delay and 164/188 byte packing delay. 9.4 PCR Correction As the main, one-half rate, and one-quarter rate Transport Stream packet streams are combined for transmission, the order that the Transport Stream packets are received by the Reference Receiver from each of the Transport Stream packet streams will be different from original generation (and therefore they arrive at different times relative to one another). The time difference is dependent on the packet cadence resulting from a particular rate selection per Table 5.3 in ATSC A/53-2 [4]. Any Transport Stream packets that are received by the Reference Receiver in an order different from original generation and that carry PCRs require adjustment due to this time displacement and the packet placement within each data frame. The PCRs of all packets shall be adjusted by the emission system to compensate for the variations in the packet delivery time as compared to the time multiplexed. The adjustment is based on the timing of a virtual model stream containing the combined packets of all three streams after recovery in the receiver. (Note that when no E-VSB stream is present, PCR correction for absolute packet timing occurs prior to the exciter.) This virtual stream represents the order in which the packets are provided to a reference decoder. The virtual stream is modeled as making each one-quarter rate and/or one-half rate 188-byte TS packet (recovered from 2 or 3 enhanced packets) available with no processing delay. The Reference Receiver has instantaneous processing. The PCRs shall be adjusted by the transmission subsystem such that when processed by the Reference Receiver, the output from the switch (TS-R) has PCRs compliant with ISO13818-1 [3]. 9.5 Packet Ordering Within each packet stream that is identified with a particular PID, the packets identified by that PID shall not be reordered by the transmission subsystem. 9.6 Main Stream Packet Jitter Handling (Informative) The packet timing issues described in this section are solely related to the data packing structure (at the packet level). The Transport Stream system target decoder buffer model described in ISO/IEC 13818-1 [3] §2.4.2 has two different buffers of differing sizes and speeds (three for video). The first order buffer, TBn, is the smallest and fastest. Care should be taken during the multiplexing process to ensure meeting the requirements that TBn and T-STD not be violated for any stream at TS-R. Adding E-VSB packets to the mix will cause each main stream Transport Stream packet to incur delay with respect to their original packet timing. The amount of delay for each main stream Transport Stream packet varies as a function of the MAP number and position in the frame (indeed, this is true for one-half rate and one-quarter rate Transport Stream packets as well). As the ratio of E-VSB to main increases, the maximum delay that may be imposed on main stream packets generally increases due to the increasing limitation on the available packet 26 ATSC A/53 Part 3 (Transport) 3 January 2007 position in the VSB frame. At the physical layer, the main stream packets tend to get bunched together near the bottom of the data frame. The E-VSB emissions system induces additional variable delay (jitter) to Transport Stream packets. This delay is knowable in real-time, with sufficient integration between the MPEG encoding/multiplexing system and the emission system. Absent such integration, it may be possible to meet the T-STD requirement (and other timing/buffering-based requirements) by limiting the usage of T-STD buffers by the encoding system, allowing for some additional jitter. In particular, for video packets, it is possible that the packet delay can be accommodated by increasing the average amount of data in the video T-STD buffer or other means. However, audio is not forced to reduce its bit rate in step with the video. In fact, even when the available bit rate will not support MPEG-2 video, it can still carry full rate audio. The audio decoder buffer was not designed to tolerate a significant added packet delay. A consequence of this is that the audio buffer could underflow unless care is taken in the utilization of the audio buffer by the system. It may be necessary to constrain the video/audio encoders and/ or the transmission system in order to meet the requirement to conform with the MPEG-2 T-STD model under various circumstances. In particular, ensuring that no overflow of TBn occurs may require particular implementation attention to the time between TS-M packets in a TS multiplex as compared to the time between TS-M packets in the transmitted frame. Under certain circumstances, the number of consecutive packets at TS-R for any particular stream of TS packets with the same PID must be limited, in order to not overflow TBn. For example, when the chosen E-VSB mix provides only a very low rate main stream that is occupied entirely by an AC-3 audio program, it may not be possible to prevent TBn from overflowing after insertion of the E-VSB packets. Avoidance of this overflow condition may require changes to the original allocation of packet rate to TS-M services or eschewing certain combinations in the multiplex. End of Part 3 27 Doc. A/53, Part 4:2007 3 January 2007 Editorially modified 28 September 2007 ATSC Digital Television Standard Part 4 – MPEG-2 Video System Characteristics (A/53, Part 4:2007) Advanced Television Systems Committee 1750 K Street, N.W. Suite 1200 Washington, D.C. 20006 www.atsc.org ATSC A/53 Part 4 (Video) 3 January 2007 The Advanced Television Systems Committee, Inc., is an international, non-profit organization developing voluntary standards for digital television. The ATSC member organizations represent the broadcast, broadcast equipment, motion picture, consumer electronics, computer, cable, satellite, and semiconductor industries. Specifically, ATSC is working to coordinate television standards among different communications media focusing on digital television, interactive systems, and broadband multimedia communications. ATSC is also developing digital television implementation strategies and presenting educational seminars on the ATSC standards. ATSC was formed in 1982 by the member organizations of the Joint Committee on InterSociety Coordination (JCIC): the Electronic Industries Association (EIA), the Institute of Electrical and Electronic Engineers (IEEE), the National Association of Broadcasters (NAB), the National Cable Television Association (NCTA), and the Society of Motion Picture and Television Engineers (SMPTE). Currently, there are approximately 140 members representing the broadcast, broadcast equipment, motion picture, consumer electronics, computer, cable, satellite, and semiconductor industries. ATSC Digital TV Standards include digital high definition television (HDTV), standard definition television (SDTV), data broadcasting, multichannel surround-sound audio, and satellite direct-to-home broadcasting. Note: On 28 September 2007 this document was editorially modified to correct the following error. In Section 6.1.3, Table 6.3, the value for progressive_sequence is given as “see Table 3”. However, there is no Table 3 in the document. The correct reference is “Table 6.2.” The error has been corrected in this document. 2 ATSC A/53 Part 4 (Video) 3 January 2007 Table of Contents 1. SCOPE .....................................................................................................................................................5 2. REFERENCES .........................................................................................................................................5 2.1 Normative References 5 2.2 Informative References 6 3. COMPLIANCE NOTATION ......................................................................................................................6 3.1 Treatment of Syntactic Elements 6 3.2 Symbols, Abbreviations, and Mathematical Operators 6 4. SYSTEM OVERVIEW (INFORMATIVE) ...................................................................................................7 5. POSSIBLE VIDEO INPUTS......................................................................................................................8 6. SOURCE CODING SPECIFICATION .......................................................................................................9 6.1 Constraints with Respect to ISO/IEC 13818-2 Main Profile 9 6.1.1 Sequence Header Constraints 9 6.1.2 Compression Format Constraints 10 6.1.3 Sequence Extension Constraints 10 6.1.4 Sequence Display Extension Constraints 10 6.1.5 Picture Header Constraints 11 6.1.6 Picture Coding Constraints 11 6.2 Bit Stream Specifications Beyond MPEG-2 11 6.2.1 Picture Extension and User Data Syntax 11 6.2.2 Picture User Data Syntax 12 6.2.3 ATSC Picture User Data Semantics 13 6.2.3.1 Captioning Data 13 6.2.3.2 Bar Data 14 6.2.3.2.1 Recommended Receiver Response to Bar Data 16 6.2.4 Active Format Description Data 16 6.2.4.1 AFD Syntax 17 6.2.4.2 AFD Semantics 17 6.2.4.3 Recommended Receiver Response to AFD 18 6.2.5 Relationship Between Bar Data and AFD (Informative) 18 3 ATSC A/53 Part 4 (Video) 3 January 2007 Index of Tables and Figures Table 5.1 Standardized Video Input Formats 9 Table 6.1 Sequence Header Constraints 9 Table 6.2 Compression Format Constraints 10 Table 6.3 Sequence Extension Constraints 10 Table 6.4 Sequence Display Extension Constraints 11 Table 6.5 Picture Extension and User Data Syntax 12 Table 6.6 Picture User Data Syntax 12 Table 6.7 Captioning Data Syntax 13 Table 6.8 Bar Data Syntax 15 Table 6.9 Line Number Designation 15 Table 6.10 Active Format Description Syntax 17 Table 6.11 Active Format 18 Figure 4.1 ITU-R digital terrestrial television broadcasting model. 7 Figure 4.2 High level view of encoding equipment. 8 4 ATSC A/53 Part 4 (Video) 3 January 2007 ATSC Digital Television Standard – Part 4: MPEG-2 Video System Characteristics 1. SCOPE This Part describes the characteristics of the video subsystem of the Digital Television Standard. The input formats and bit stream characteristics are described in separate sections. 1 2. REFERENCES At the time of publication, the editions indicated were valid. All standards are subject to revision, and parties to agreement based on this standard are encouraged to investigate the possibility of applying the most recent editions of the documents listed below. 2.1 Normative References The following documents contain provisions which, through reference in this text, constitute provisions of this standard. [1] CEA: “Digital Television (DTV) Closed Captioning,” Doc. CEA-708-C, Consumer Electronics Association, Arlington, VA, 30 July 2006. [2] ISO: “ISO/IEC IS 13818-1:2000 (E), International Standard, Information technology – Generic coding of moving pictures and associated audio information: systems.” [3] ISO: “ISO/IEC IS 13818-2:2000 (E), International Standard, Information technology – Generic coding of moving pictures and associated audio information: video.” [4] SMPTE: “Standard for Television—Component Video Signal 4:2:2, Bit-Parallel Digital Interface,” Doc. SMPTE 125M (1995), Society of Motion Picture and Television Engineers, White Plains, N.Y., 1995. [5] SMPTE: “Standard for Television—Composite Analog Video Signal, NTSC for Studio Applications,” Doc. SMPTE 170M (2004), Society of Motion Picture and Television Engineers, White Plains, N.Y., 2004. [6] SMPTE: “Standard for Television—Bit-Parallel Digital Interface, Component Video Signal 4:2:2 16 x 9 Aspect Ratio,” Doc. SMPTE 267M (1995), Society of Motion Picture and Television Engineers, White Plains, N.Y., 1995. [7] SMPTE: “Standard for Television—1920 x 1080 Scanning and Analog and Parallel Digital Interfaces for Multiple Picture Rates,” Doc. SMPTE 274M (2005), Society of Motion Picture and Television Engineers, White Plains, N.Y., 2005. [8] SMPTE: “Standard for Television—720 x 483 Active Line at 59.94-Hz Progressive Scan Production, Digital Representation,” Doc. SMPTE 293M (2003), Society of Motion Picture and Television Engineers, White Plains, N.Y., 2003. 1 Note that there is a coordinated effort underway among ATSC, CEA, and SMPTE to revise and clarify standards related to delivering closed captions so that each describes the aspects of the system for which they are primarily responsible without overlap. This effort is expected to result in revisions of those sections in the ATSC Standards. 5 ATSC A/53 Part 4 (Video) 3 January 2007 [9] SMPTE: “Standard for Television—1280 x 720 Progressive Image Sample Structure, Analog and Digital Representation and Analog Interface,” Doc. SMPTE 296M (2001), Society of Motion Picture and Television Engineers, White Plains, N.Y., 2001. [10] ETSI: “Digital Video Broadcasting (DVB): Implementation Guidelines for the use of MPEG-2 Systems, Video and Audio in Satellite, Cable and Terrestrial Broadcasting Applications,” Doc. ETSI TS 101 154 V1.7.1, Annex B, June 2005. 2.2 Informative References [11] Digital TV Group: “Digital Receiver Implementation Guidelines and Recommended Receiver Reaction to Aspect Ratio Signaling in Digital Video Broadcasting,” Issue 1.2.1, February 2001. [12] ITU: “Encoding Parameters of Digital Television for Studios,” Doc. ITU-R BT.601-5 (1994). [13] ITU: “Parameter values for the HDTV Standards for Production and International Programme Exchange,” Doc. ITU-R BT. 709-5 (2002). [14] SCTE: “Standard for Carriage of NTSC VBI Data in Cable Digital Transport Streams,” Doc. ANSI/SCTE 21 2001R2006, Society of Cable Telecommunications Engineers, Exton, PA, 2006. [15] CEA: “Active Format Description (AFD) & Bar Data Recommended Practice,” Doc. CEA-CEB16, Consumer Electronics Association, Arlington, VA, 31 July 2006. [16] SMPTE: [in development] “Standard for Television—Format for Active Format Description and Bar Data,” Doc. SMPTE 2016-1, Society of Motion Picture and Television Engineers, White Plains, N.Y. [17] ATSC: “Digital Television Standard, Part 1 – Digital Television System,” Doc. A/53, Part 1:2007, Advanced Television Systems Committee, Washington, D.C., 3 January 2007. 3. COMPLIANCE NOTATION As used in this document, “shall” denotes a mandatory provision of the standard. “Should” denotes a provision that is recommended but not mandatory. “May” denotes a feature whose presence does not preclude compliance, that may or may not be present at the option of the implementor. 3.1 Treatment of Syntactic Elements This document contains symbolic references to syntactic elements used in the audio, video, and transport coding subsystems. These references are typographically distinguished by the use of a different font (e.g., restricted), may contain the underscore character (e.g., sequence_end_code) and may consist of character strings that are not English words (e.g., dynrng). 3.2 Symbols, Abbreviations, and Mathematical Operators The symbols, abbreviations, and mathematical operators used herein are as found in Section 3.4 of ATSC A/53 Part 1 [17]. 6 ATSC A/53 Part 4 (Video) 3 January 2007 4. SYSTEM OVERVIEW (INFORMATIVE) A basic block diagram representation of the system is shown in Figure 4.1. According to this model, the digital television system can be seen to consist of three subsystems. • Source coding and compression • Service multiplex and transport • RF/transmission Service Multiplex and Transport RF/Transmission System Video Subsystem Video Video Source Coding and Compression Channel Transport Audio Subsystem Coding Audio Service Multiplex Audio Source Coding and Compression Modulation Ancillary Data Control Data Receiver Characteristics Figure 4.1 ITU-R digital terrestrial television broadcasting model. Figure 4.2 illustrates a high level view of the encoding equipment. This view is not intended to be representative of actual implementations, but is used to illustrate the relationship of various clock frequencies within the encoder. 7 ATSC A/53 Part 4 (Video) 3 January 2007 f27 MHz Program Clock Reference 33 9 program_clock_reference_base Frequency program_clock_reference_extension Divider Network Adaptation Header fv fa Encoder Video In Video Transport fTP FEC and fsym RF Out A/D VSB Encoder Encoder Sync Modulator Insertion Audio In Audio A/D Encoder Figure 4.2 High level view of encoding equipment. The source coding domain, represented schematically by the video, audio, and transport encoders, uses a family of frequencies which are based on a 27 MHz clock (f27MHz). This clock is used to generate a 42-bit sample of the frequency which is partitioned into two parts defined by the MPEG-2 specification. These are the 33-bit program_clock_reference_base and the 9-bit program_clock_reference_extension. The former is equivalent to a sample of a 90 kHz clock which is locked in frequency to the 27 MHz clock, and is used by the audio and video source encoders when encoding the presentation time stamp (PTS) and the decode time stamp (DTS). The audio and video sampling clocks, fa and fv respectively, are frequency-locked to the 27 MHz clock. This can be expressed as the requirement that there exist two pairs of integers, (na, ma) and (nv, mv), such that na fa = × 27 MHz ma and nv fv = × 27 MHz mv 5. POSSIBLE VIDEO INPUTS While not required by this standard, there are certain television production standards, shown in Table 5.1, that define video formats that relate to compression formats specified by this standard. 8 ATSC A/53 Part 4 (Video) 3 January 2007 Table 5.1 Standardized Video Input Formats Video Standard Active Lines Active Samples/ Line SMPTE 274M [7] 1080 1920 SMPTE 296M [9] 720 1280 ITU-R BT.601-5 [12] 483 720 The compression formats may be derived from one or more appropriate video input formats. It may be anticipated that additional video production standards will be developed in the future that extend the number of possible input formats. 6. SOURCE CODING SPECIFICATION The DTV video compression algorithm shall conform to the Main Profile syntax of ISO/IEC 13818-2 [3]. The allowable parameters shall be bounded by the upper limits specified for the Main Profile at High Level. 2 Additionally, all bit streams shall meet the constraints and specifications described in Sections 6.1 and 6.2. 6.1 Constraints with Respect to ISO/IEC 13818-2 Main Profile The following tables list the allowed values for each of the ISO/IEC 13818-2 [3] syntactic elements which are restricted beyond the limits imposed by MP@HL. In these tables conventional numbers denote decimal values, numbers preceded by 0x are to be interpreted as hexadecimal values and numbers within single quotes (e.g., ‘10010100’) are to be interpreted as a string of binary digits. 6.1.1 Sequence Header Constraints Table 6.1 identifies parameters in the sequence header of a bit stream that shall be constrained by the video subsystem and lists the allowed values for each. Table 6.1 Sequence Header Constraints Sequence Header Syntactic Element Allowed Value horizontal_size_value see Table 6.2 vertical_size_value see Table 6.2 aspect_ratio_information see Table 6.2 frame_rate_code see Table 6.2 bit_rate_value (≤ 19.4 Mbps) ≤ 48500 bit_rate_value (≤ 38.8 Mbps) ≤ 97000 vbv_buffer_size_value ≤ 488 The allowable values for the field bit_rate_value are application-dependent. In the primary application of terrestrial broadcast, this field shall correspond to a bit rate which is less than or equal to 19.4 Mbps. In the high data rate mode, the corresponding bit rate is less than or equal to 38.8 Mbps. 2 See ISO/IEC 13818-2 [3], Section 8 for more information regarding profiles and levels. 9 ATSC A/53 Part 4 (Video) 3 January 2007 6.1.2 Compression Format Constraints Table 6.2 lists the allowed compression formats. Table 6.2 Compression Format Constraints vertical_size_value horizontal_size_value aspect_ratio_information frame_rate_code progressive_sequence 1,2,4,5 ‘1’ 1080 3 1920 1,3 4,5 ‘0’ 720 1280 1,3 1,2,4,5,7,8 ‘1’ 1,2,4,5,7,8 ‘1’ 704 2,3 4,5 ‘0’ 480 1,2,4,5,7,8 ‘1’ 640 1,2 4,5 ‘0’ Legend for MPEG-2 coded values: aspect_ratio_information: 1 = square samples, 2 = 4:3 display aspect ratio, 3 = 16:9 display aspect ratio frame_rate_code: 1 = 23.976 Hz, 2 = 24 Hz, 4 = 29.97 Hz, 5 = 30 Hz, 7 = 59.94 Hz, 8 = 60 Hz progressive_sequence: ‘0’ = interlaced scan, ‘1’ = progressive scan 6.1.3 Sequence Extension Constraints Table 6.3 identifies parameters in the sequence extension part of a bit stream that shall be constrained by the video subsystem and lists the allowed values for each. A sequence_extension structure is required to be present after every sequence_header structure. Table 6.3 Sequence Extension Constraints Sequence Extension Syntactic Element Allowed Values progressive_sequence see Table 6.2 profile_and_level_indication see Note chroma_format ‘01’ horizontal_size_extension ‘00’ vertical_size_extension ‘00’ bit_rate_extension ‘0000 0000 0000’ vbv_buffer_size_extension ‘0000 0000’ frame_rate_extension_n ‘00’ frame_rate_extension_d ‘0000 0’ Note: The profile_and_level_indication field shall indicate the lowest profile and level defined in ISO/IEC 13818-2 [3], Section 8, that is consistent with the parameters of the video elementary stream. 6.1.4 Sequence Display Extension Constraints Table 6.4 identifies parameters in the sequence display extension part of a bit stream that shall be constrained by the video subsystem and lists the allowed values for each. 3 Note that 1088 lines are actually coded in order to satisfy the MPEG-2 requirement that the coded vertical size be a multiple of 16 (progressive scan) or 32 (interlaced scan). The bottom 8 lines are black, per MPEG rules. 10 ATSC A/53 Part 4 (Video) 3 January 2007 Table 6.4 Sequence Display Extension Constraints Sequence Display Extension Syntactic Element Allowed Values video_format ‘000’ The values for color_primaries, transfer_characteristics, and matrix_coefficients shall be explicitly indicated in the sequence_display_extension. While all values for color_primaries, transfer_characteristics, and matrix_coefficients defined in Tables 6-7, 6-8, and 6-9 of ISO/IEC 13818-2 [3] are allowed in the transmitted bit stream, it is noted that those of ITU-R BT.709 [13] and SMPTE 170M [5] are the most likely to be in common use. Note: Some previously-encoded legacy material may not have the colorimetry (i.e., color_primaries, transfer_characteristics, and matrix_coefficients) explicitly indicated in the sequence_display_extension, in which case the colorimetry is most likely ITU-R BT.709 [13] for all formats except those formats with vertical_size_value = 480, which are most likely to have colorimetry according to SMPTE 170M [5]. 6.1.5 Picture Header Constraints In all cases other than when vbv_delay has the value 0xFFFF, the value of vbv_delay shall be constrained as follows: vbv_delay ≤ 45000 6.1.6 Picture Coding Constraints The value frame_pred_frame_dct shall be ‘1’ if progressive_frame is ‘1’. 6.2 Bit Stream Specifications Beyond MPEG-2 This section covers the extension and user data part of the video syntax. These data are inserted at the sequence, GOP, and picture level. The syntax used for the insertion of closed captioning 4 in picture user data is described. 6.2.1 Picture Extension and User Data Syntax The picture user data shall be constructed per [3]. Table 6.5 is provided to show the syntax that is required for picture extension and user data. 4 Implementers should note that CEA-708 [1] describes the semantics for closed captions. . 11 ATSC A/53 Part 4 (Video) 3 January 2007 Table 6.5 Picture Extension and User Data Syntax Value No. of Bits Format extension_and_user_data(2) { while ((nextbits() == extension_start_code ) || (nextbits() == user_data_start_code)) { if (nextbits() == extension_start_code) extension_data(2) if (nextbits() == user_data_start_code) user_data() } } 6.2.2 Picture User Data Syntax Table 6.6 describes the picture user data syntax that shall be used. Table 6.6 Picture User Data Syntax 5 Syntax No. of Bits Format user_data() { user_data_start_code 32 bslbf ATSC_identifier 32 bslbf user_data_type_code 8 uimsbf if (user_data_type_code == ‘0x03’) cc_data() else if (user_data_type_code == ‘0x06’) bar_data() else { while (nextbits() != ‘0000 0000 0000 0000 0000 0001’) ATSC_reserved_user_data 8 } next_start_code() } In accordance with the bit stream syntax in Table 6.5, more than one picture user data construct may follow any given picture header. However, no more than one picture user data construct using the same user_data_type_code shall follow any given picture header. Note that picture user data with a 32-bit field following user_data_start_code having a value other than ATSC_identifier may be present in an ATSC-compliant video bit stream. As an example, the afd_identifier (value 0x44544731) is defined for use in ATSC video Elementary Streams (see Section 6.2.4). Receiving devices are expected to process this field and use it to determine the syntax and semantics of the user data construct to follow. 5 Shaded cells in this table indicate syntactic and semantic additions to the ISO/IEC 13818-2 [3] Standard. Note: user_data_type_code values 0x04 and 0x05 are assigned in ANSI/SCTE 21 2001 [14]. 12 ATSC A/53 Part 4 (Video) 3 January 2007 Receiving devices are expected to silently discard any unrecognized video user data encountered in the video bit stream. For example, if an unrecognized 32-bit identifier is seen following the user_data_start_code, or an unrecognized 8-bit user_data_type_code is seen following the ATSC_identifier, data should be discarded until another start code is seen. 6.2.3 ATSC Picture User Data Semantics user_data_start_code – This is set to 0x0000 01B2. ATSC_identifier – This is a 32 bit code that indicates that the video user data conforms to this specification. The value ATSC_identifier shall be 0x4741 3934. user_data_type_code – An 8-bit value that identifies the type of ATSC user data to follow. Value 0x03 indicates cc_data(), value 0x06 indicates bar_data(), and other values are either in use in other standards or are reserved for future use. cc_data() – A data structure defined in Table 6.7. bar_data() – A data structure defined in Table 6.8 indicating the sizes of letterbox or pillarbox areas within the coded video frame. ATSC_reserved_user_data – Reserved for use by ATSC or used by other standards. 6.2.3.1 Captioning Data Table 6.7 describes the syntax of captioning data. Table 6.7 Captioning Data Syntax Syntax No. of Bits Format cc_data() { reserved 1 ’1’ process_cc_data_flag 1 bslbf additional_data_flag 1 bslbf cc_count 5 uimsbf reserved 8 ‘1111 1111’ for (i=0 ; i < cc_count ; i++) { marker_bits 5 ‘1111 1’ cc_valid 1 bslbf cc_type 2 bslbf cc_data_1 8 bslbf cc_data_2 8 bslbf } marker_bits 8 ‘1111 1111’ if (additional_data_flag) { while (nextbits() != ‘0000 0000 0000 0000 0000 0001’) { additional_cc_data } } } 13 ATSC A/53 Part 4 (Video) 3 January 2007 process_cc_data_flag – This flag is set to indicate whether it is necessary to process the cc_data. If it is set to ‘1’, the cc_data has to be parsed and its meaning has to be processed. When it is set to ‘0’, the cc_data can be discarded. additional_data_flag – This flag is set to ‘1’ to indicate the presence of additional user data. cc_count: This 5-bit integer indicates the number of closed caption constructs following this field. It can have values 0 through 31. The value of cc_count shall be set according to the frame rate and coded picture structure (field or frame) such that a fixed bandwidth of 9600 bits per second is maintained for the closed caption payload data. Sixteen (16) bits of closed caption payload data are carried in each pair of the fields cc_data_1 and cc_data_2. cc_valid – This flag is set to ‘1’ to indicate that the two closed caption data bytes that follow are valid. If set to ‘0’ the two data bytes are invalid, as defined in CEA-708 [1]. cc_type – Denotes the type of the two closed caption data bytes that follow, as defined in CEA- 708 [1]. cc_data_1 – The first byte of a closed caption data pair as defined in CEA-708 [1]. cc_data_2 – The second byte of a closed caption data pair as defined in CEA-708 [1]. additional_cc_data – Reserved for future ATSC definition. 6.2.3.2 Bar Data Table 6.8 describes the syntax of bar data. Bar data should be included in video user data whenever the rectangular picture area containing useful information does not extend to the full height or width of the coded frame6 and AFD alone is insufficient to describe the extent of the image. See Section 6.2.4. When present, bar data shall be carried in the data structure bar_data(), within the picture user data syntax as shown in Table 6.6. After any sequence_header() such bar data shall appear before the next picture_data() within extension_and_user_data(2). After introduction, such bar data shall remain in effect until: 1) the next sequence_header(), or 2) extension_and_user_data(2) containing a bar_data() structure which contains new bar data, or 3) extension_and_user_data(2) containing AFD per Section 6.2.4. After any sequence_header(), unless AFD data is present specifying otherwise, the absence of bar data shall indicate that the rectangular picture area containing useful information extends to the full height and width of the coded frame. Bar data is constrained (below) to be signalled in pairs, either top and bottom bars or left and right bars, but not both pairs at once. Bars may be unequal in size. One bar of a pair may be zero width or height. 6 In other words, the video is letterboxed (bars above and/or below video) or pillarboxed (bars left and/or right of video). 14 ATSC A/53 Part 4 (Video) 3 January 2007 Table 6.8 Bar Data Syntax Syntax No. of Bits Format bar_data() { top_bar_flag 1 bslbf bottom_bar_flag 1 bslbf left_bar_flag 1 bslbf right_bar_flag 1 bslbf reserved 4 ‘1111’ if (top_bar_flag == ‘1’) { marker_bits 2 ‘11’ line_number_end_of_top_bar 14 uimsbf } if (bottom_bar_flag == ‘1’) { marker_bits 2 ‘11’ line_number_start_of_bottom_bar 14 uimsbf } if (left_bar_flag == ‘1’) { marker_bits 2 ‘11’ pixel_number_end_of_left_bar 14 uimsbf } if (right_bar_flag == ‘1’) { marker_bits 2 ‘11’ pixel_number_start_of_right_bar 14 uimsbf } marker_bits 8 ‘1111 1111’ while (nextbits() != ‘0000 0000 0000 0000 0000 0001’) { additional_bar_data } } Designation of line numbers for line_number_end_of_top_bar and line_number_start_of_bottom_bar is video format-dependent and shall conform to the applicable standard indicated in Table 6.9. Note: The range of line numbers and pixels within the coded frame for each image format is specified in Table 2 of SMPTE 2016-1 [16]. Table 6.9 Line Number Designation Video Format Applicable Standard 480 Interlaced 4:3 SMPTE 125M [4] 480 Interlaced 16:9 SMPTE 267M [6] 480 Progressive SMPTE 293M [8] 720 Progressive SMPTE 296M [9] 1080 Interlaced SMPTE 274M [7] 1080 Progressive SMPTE 274M [7] 15 ATSC A/53 Part 4 (Video) 3 January 2007 top_bar_flag – This flag shall indicate, when set to ‘1’, that the top bar data is present. If left_bar_flag is ‘1’, this flag shall be set to ‘0’. bottom_bar_flag – This flag shall indicate, when set to ‘1’, that the bottom bar data is present. This flag shall have the same value as top_bar_flag. left_bar_flag – This flag shall indicate, when set to ‘1’, that the left bar data is present. If top_bar_flag is ‘1’, this flag shall be set to ‘0’. right_bar_flag – This flag shall indicate, when set to ‘1’, that the right bar data is present. This flag shall have the same value as left_bar_flag. line_number_end_of_top_bar – A 14-bit unsigned integer value representing the last line of a horizontal letterbox bar area at the top of the reconstructed frame. Designation of line numbers shall be as defined per each applicable standard in Table 6.9. line_number_start_of_bottom_bar – A 14-bit unsigned integer value representing the first line of a horizontal letterbox bar area at the bottom of the reconstructed frame. Designation of line numbers shall be as defined per each applicable standard in Table 6.9. pixel_number_end_of_left_bar – A 14-bit unsigned integer value representing the last horizontal luminance sample of a vertical pillarbox bar area at the left side of the reconstructed frame. Pixels shall be numbered from zero, starting with the leftmost pixel. pixel_number_start_of_right_bar – A 14-bit unsigned integer value representing the first horizontal luminance sample of a vertical pillarbox bar area at the right side of the reconstructed frame. Pixels shall be numbered from zero, starting with the leftmost pixel. additional_bar_data – Reserved for future ATSC definition. 6.2.3.2.1 Recommended Receiver Response to Bar Data Receiving device designers are strongly encouraged to study Consumer Electronics Association (CEA) bulletin CEB16 [15], which contains recommendations regarding the processing of bar data. 6.2.4 Active Format Description Data Active Format Description (AFD) should be included in video user data whenever the rectangular picture area containing useful information does not extend to the full height or width of the coded frame. AFD data may also be included in user data when the rectangular picture area containing useful information extends to the full height and width of the coded frame. When present, the AFD shall be carried using the syntax defined in [10], in extension_and_user_data(2) in the MPEG-2 video Elementary Stream. After any sequence_header() the default aspect ratio of the area of interest shall be that signaled by the parameters in the sequence_header() and sequence_display_extension() structures. After any sequence_header() the AFD, when present, shall appear before the next picture_data(). After introduction, such an AFD shall remain in effect until the next sequence_header() or until a new AFD is introduced. Note: The AFD syntax as shown in Section 6.4.2.1 is identical to that specified in ETSI TS 101 154 V1.7.1 [10], and is reprinted here with permission. Semantics are documented in Section 6.2.4.2; some are intentionally different from those in ETSI 101 154. 16 ATSC A/53 Part 4 (Video) 3 January 2007 6.2.4.1 AFD Syntax Table 6.10 shows the syntax defined in [10] which is provided for the convenience of the reader. Table 6.10 Active Format Description Syntax Syntax No. of Bits Format user_data_start_code 32 bslbf afd_identifier 32 bslbf zero 1 ’0’ active_format_flag 1 bslbf reserved 6 ’00 0001’ if (active_format_flag == ‘1’) { reserved 4 ‘1111’ active_format 4 bslbf } 6.2.4.2 AFD Semantics afd_identifier – A 32-bit field that identifies that the syntax of the user data is Active Format Description. Its value is 0x44544731. active_format_flag – A 1 bit flag. A value of ‘1’ indicates that an active format is described in this data structure. active_format – A 4 bit field describing the “area of interest” in terms of its aspect ratio within the coded frame as defined in ISO/IEC 13818-2 [3]. The active_format is used by the decoder in conjunction with the “source aspect ratio.” The source aspect ratio is derived from the “display aspect ratio” (DAR) signaled in the aspect_ratio_information, the horizontal_size, vertical_size, and display_horizontal_size and display_vertical_size if present (see ISO/IEC 13818-2 [3]): • If sequence_display_extension() is not present, source aspect ratio = DAR • If sequence_display_extension() is present, source aspect ratio = display_horizontal_size vertical_size DAR × × display_vertical_size horizontal_size The combination of source aspect ratio and active_format allows the decoder to identify whether the “area of interest” is the whole of the frame (e.g. source aspect ratio 16:9, active_format 16:9 center), a letterbox within the frame (e.g. source aspect ratio 4:3, active_format 16:9 center), or a “pillarbox” within the frame (e.g. source aspect ratio 16:9, active_format 4:3 center). Table 6.11 defines the coding of the active_format field that shall be used. 17 ATSC A/53 Part 4 (Video) 3 January 2007 Table 6.11 Active Format Description active_format 4:3 coded frames 16:9 coded frames ‘0000’ undefined (see below) undefined (see below) ‘0001’ Reserved Reserved ‘0010’ – ‘0011” Not recommended Not recommended 0100 Aspect ratio greater than 16:9 (see below) Aspect ratio greater than 16:9 (see below) ‘0101’ – ‘0111’ Reserved Reserved ‘1000’ 4:3 full frame image 16:9 full frame image ‘1001’ 4:3 full frame image 4:3 pillarbox image ‘1010’ 16:9 letterbox image 16:9 full frame image ‘1011’ 14:9 letterbox image 14:9 pillarbox image ‘1100’ Reserved Reserved ‘1101’ 4:3 full frame image, alternative 14:9 center 4:3 pillarbox image, alternative 14:9 center ‘1110’ 16:9 letterbox image, alternative 14:9 center 16:9 full frame image, alternative 14:9 center ‘1111’ 16:9 letterbox image, alternative 4:3 center 16:9 full frame image, alternative 4:3 center AFD ‘0000’ indicates that information is not available and is undefined. Unless bar data is available, DTV receivers and video equipment should interpret the active image area as being the same as that of the coded frame. AFD ‘0000’, when accompanied by bar data, signals that the image’s aspect ratio is narrower than 16:9, but is not either 4:3 or 14:9. The bar data should be used to determine the extent of the image. AFD ‘0100’, which should be accompanied by bar data, signals that the image’s aspect ratio is wider than 16:9, as is typically the case with widescreen features. The bar data should be used to determine the height of the image. Use of either ‘0010’ or ‘0011’ is not recommended in the ATSC television system. Values ‘0001’, ‘0101’ through ‘0111’, and ‘1100’ are reserved. 6.2.4.3 Recommended Receiver Response to AFD Receiving device designers are strongly encouraged to study Consumer Electronics Association (CEA) bulletin CEB16 [15], which contains recommendations regarding the processing of AFD. 6.2.5 Relationship Between Bar Data and AFD (Informative) Certain combinations of Active Format Description and bar data may be present in video user data (either, neither, or both). Note that AFD data may not always exactly match bar data because AFD only deals with 4:3, 14:9, and 16:9 aspect ratios while bar data can represent nearly any aspect ratio. When AFD and bar data are present together, AFD should be used in preference to bar data, except in the cases of AFD ‘0000’ and ‘0100’, where bar data should be used in concert with AFD as described above. End of Part 4 18 Doc. A/53, Part 5:2007 3 January 2007 ATSC Digital Television Standard Part 5 – AC-3 Audio System Characteristics (A/53, Part 5:2007) Advanced Television Systems Committee 1750 K Street, N.W. Suite 1200 Washington, D.C. 20006 www.atsc.org ATSC A/53, Part 5 (AC-3) 3 January 2007 The Advanced Television Systems Committee, Inc., is an international, non-profit organization developing voluntary standards for digital television. The ATSC member organizations represent the broadcast, broadcast equipment, motion picture, consumer electronics, computer, cable, satellite, and semiconductor industries. Specifically, ATSC is working to coordinate television standards among different communications media focusing on digital television, interactive systems, and broadband multimedia communications. ATSC is also developing digital television implementation strategies and presenting educational seminars on the ATSC standards. ATSC was formed in 1982 by the member organizations of the Joint Committee on InterSociety Coordination (JCIC): the Electronic Industries Association (EIA), the Institute of Electrical and Electronic Engineers (IEEE), the National Association of Broadcasters (NAB), the National Cable Television Association (NCTA), and the Society of Motion Picture and Television Engineers (SMPTE). Currently, there are approximately 140 members representing the broadcast, broadcast equipment, motion picture, consumer electronics, computer, cable, satellite, and semiconductor industries. ATSC Digital TV Standards include digital high definition television (HDTV), standard definition television (SDTV), data broadcasting, multichannel surround-sound audio, and satellite direct-to-home broadcasting. 2 ATSC A/53, Part 5 (AC-3) 3 January 2007 Table of Contents 1. SCOPE .....................................................................................................................................................5 2. REFERENCES .........................................................................................................................................5 2.1 Normative References 5 2.2 Informative Reference 5 3. COMPLIANCE NOTATION ......................................................................................................................5 3.1 Treatment of Syntactic Elements 5 3.2 Symbols, Abbreviations, and Mathematical Operators 6 4. SYSTEM OVERVIEW (INFORMATIVE)...................................................................................................6 5. SPECIFICATION ......................................................................................................................................8 5.1 Constraints With Respect to ATSC Standard A/52 8 5.2 Sampling Frequency 9 5.3 Bit Rate 9 5.4 Audio Coding Modes 9 5.5 Dialogue Level 9 5.6 Dynamic Range Compression 10 5.7 STD Audio Buffer Size 10 6. MAIN AND ASSOCIATED SERVICES...................................................................................................10 6.1 Summary of Service Types 10 6.2 Complete Main Audio Service (CM) 11 6.3 Main Audio Service, Music and Effects (ME) 11 6.4 Visually Impaired (VI) 11 6.5 Hearing Impaired (HI) 12 6.6 Dialogue (D) 12 6.7 Commentary (C) 12 6.8 Emergency (E) 13 6.9 Voice-Over (V0) 13 7. AUDIO ENCODER INTERFACES..........................................................................................................13 7.1 Audio Encoder Input Characteristics 13 7.2 Audio Encoder Output Characteristics 14 3 ATSC A/53, Part 5 (AC-3) 3 January 2007 Index of Tables and Figures Table 5.1 Audio Constraints 9 Table 6.1 Audio Service Types 11 Figure 4.1 ITU-R digital terrestrial television broadcasting model. 6 Figure 4.2 High level view of encoding equipment. 7 Figure 4.3 Audio subsystem in the digital television system. 8 4 ATSC A/53, Part 5 (AC-3) 3 January 2007 ATSC Digital Television Standard – Part 5: AC-3 Audio System Characteristics 1. SCOPE This Part describes the audio system characteristics and normative specifications of the Digital Television Standard. 2. REFERENCES At the time of publication, the editions indicated were valid. All standards are subject to revision and amendment, and parties to agreement based on this standard are encouraged to investigate the possibility of applying the most recent editions of the documents listed below. 2.1 Normative References The following documents contain provisions which in whole or part, through reference in this text, constitute provisions of this standard. [1] AES: “AES Recommended practice for digital audio engineering—Serial transmission format for two-channel linearly represented digital audio data,” Doc. AES3-2003, Audio Engineering Society, New York, N.Y., 2003. (This document is a revision of AES3-1992, including subsequent amendments). [2] ANSI: “Specification for Sound Level Meters,” Doc. ANSI S1.4-1983 (R 2001) with Amd.S1.4A-1995, American National Standards Institute, Washington, D.C. [3] ATSC: “Digital Audio Compression (AC-3, E-AC-3),” Doc. A/52B, Advanced Television Systems Committee, Washington, D.C., 14 June 2005. 2.2 Informative Reference [4] ATSC: “Digital Television Standard, Part 1 – Digital Television System,” Doc. A/53, Part 1:2007, Advanced Television Systems Committee, Washington, D.C., 3 January 2007. 3. COMPLIANCE NOTATION As used in this document, “shall” denotes a mandatory provision of the standard. “Should” denotes a provision that is recommended but not mandatory. “May” denotes a feature whose presence does not preclude compliance, that may or may not be present at the option of the implementor. 3.1 Treatment of Syntactic Elements This document contains symbolic references to syntactic elements used in the audio, video, and transport coding subsystems. These references are typographically distinguished by the use of a different font (e.g., restricted), may contain the underscore character (e.g., sequence_end_code) and may consist of character strings that are not English words (e.g., dynrng). 5 ATSC A/53, Part 5 (AC-3) 3 January 2007 3.2 Symbols, Abbreviations, and Mathematical Operators The symbols, abbreviations, and mathematical operators used herein are as found in Section 3.4 of ATSC A/53 Part 1 [4]. 4. SYSTEM OVERVIEW (INFORMATIVE) A basic block diagram representation of the system is shown in Figure 4.1. According to this model, the digital television system can be seen to consist of three subsystems. • Source coding and compression • Service multiplex and transport • RF/transmission Service Multiplex and Transport RF/Transmission System Video Subsystem Video Video Source Coding and Compression Channel Transport Audio Subsystem Coding Audio Service Multiplex Audio Source Coding and Compression Modulation Ancillary Data Control Data Receiver Characteristics Figure 4.1 ITU-R digital terrestrial television broadcasting model. Figure 4.2 illustrates a high level view of encoding equipment. This view is not intended to be representative of actual implementations, but is used to illustrate the relationship of various clock frequencies within the encoder. 6 ATSC A/53, Part 5 (AC-3) 3 January 2007 f27 MHz Program Clock Reference 33 9 program_clock_reference_base Frequency program_clock_reference_extension Divider Network Adaptation Header fv fa Encoder Video In Video Transport fTP FEC and fsym RF Out A/D VSB Encoder Encoder Sync Modulator Insertion Audio In Audio A/D Encoder Figure 4.2 High level view of encoding equipment. The source coding domain, represented schematically by the video, audio, and transport encoders, uses a family of frequencies which are based on a 27 MHz clock (f27MHz). This clock is used to generate a 42-bit sample of the frequency which is partitioned into two parts defined by the MPEG-2 specification. These are the 33-bit program_clock_reference_base and the 9-bit program_clock_reference_extension. The former is equivalent to a sample of a 90 kHz clock which is locked in frequency to the 27 MHz clock, and is used by the audio and video source encoders when encoding the presentation time stamp (PTS) and the decode time stamp (DTS). The audio and video sampling clocks, fa and fv respectively, are frequency-locked to the 27 MHz clock. This can be expressed as the requirement that there exist two pairs of integers, (na, ma) and (nv, mv), such that na fa = × 27 MHz ma and nv fv = × 27 MHz mv As illustrated in Figure 4.3, the audio subsystem comprises the audio encoding/decoding function and resides between the audio inputs/outputs and the transport subsystem. The audio encoder(s) is (are) responsible for generating the audio elementary stream(s) which are encoded representations of the baseband audio input signals. At the receiver, the audio subsystem is responsible for decoding the audio elementary stream(s) back into baseband audio. 7 ATSC A/53, Part 5 (AC-3) 3 January 2007 Audio Elementary Transport VSB RF Stream(s) Packets Transmission Audio Audio Transport Transmission Source Encoder(s) Subsystem Subsystem Specified in this Channel Annex Receiver Receiver Reconstructed Audio Transport Transmission Audio Decoder(s) Subsystem Subsystem Audio Transport VSB RF Elementary Packets Reception Stream(s) Figure 4.3 Audio subsystem in the digital television system. 5. SPECIFICATION This Section forms the normative specification of the audio system. The audio compression system shall conform with the Digital Audio Compression (AC-3) Standard, subject to the constraints outlined in this Section. 5.1 Constraints With Respect to ATSC Standard A/52 The digital television audio coding system is based on the Digital Audio Compression (AC-3) Standard specified in the body of ATSC Doc. A/52 [3] (the non-normative annexes are not included). Constraints on the system are shown in Table 5.1, which shows permitted values of certain syntactical elements. These constraints are described in Sections 5.2 – 5.4. The receive audio buffer is specified in Section 5.7. 8 ATSC A/53, Part 5 (AC-3) 3 January 2007 Table 5.1 Audio Constraints AC-3 Syntactical Comment Allowed value Element fscod Indicates sampling rate ‘00’ (indicates 48 kHz) frmsizecod Main audio service or associated audio service containing all ≤ ’011110’ (indicates ≤ necessary program elements 448 kb/s) frmsizecod Single channel associated service containing a single program ≤ ‘010000’ (indicates ≤ element 128 kbps) frmsizecod Two channel dialogue associated service ≤ ‘010100’ (indicates ≤ 192 kbps) (frmsizecod) Combined bit rate of a main and an associated service intended to (total ≤ 576 kbps) be simultaneously decoded acmod Indicates number of channels ≥ ‘001’ 5.2 Sampling Frequency The system conveys digital audio sampled at a frequency of 48 kHz, locked to the 27 MHz system clock. The 48 kHz audio sampling clock is defined as: 48 kHz audio sample rate = ( 2 ÷1125 ) × ( 27 MHz system clock ) If analog signal inputs are employed 1, the A/D converters should sample at 48 kHz. If digital inputs are employed, the input sampling rate shall be 48 kHz, or the audio encoder shall contain sampling rate converters which convert the sampling rate to 48 kHz. 5.3 Bit Rate A main audio service, or an associated audio service which is a complete service (containing all necessary program elements) shall be encoded at a bit rate less than or equal to 448 kbps. A single channel associated service containing a single program element shall be encoded at a bit rate less than or equal to 128 kbps. A two channel associated service containing only dialogue shall be encoded at a bit rate less than or equal to 192 kbps. The combined bit rate of a main service and an associated service which are intended to be decoded simultaneously shall be less than or equal to 576 kbps. 5.4 Audio Coding Modes Audio services shall be encoded using any of the audio coding modes specified in A/52 [3], with the exception of the 1+1 mode. The value of acmod in the AC-3 bit stream shall have a value in the range of 1–7, with the value 0 prohibited. 5.5 Dialogue Level The value of the dialnorm parameter in the AC-3 elementary bit stream shall indicate the level of average spoken dialogue within the encoded audio program. Dialogue level may be measured by means of an “A” weighted integrated measurement (LAeq) (ANSI S1.4) [2]. (Receivers use the value of dialnorm to adjust the reproduced audio level so as to normalize the dialogue level.) 1 Either via AES3 [1] signals or embedded in the corresponding video. 9 ATSC A/53, Part 5 (AC-3) 3 January 2007 5.6 Dynamic Range Compression Each encoded audio block may contain a dynamic range control word (dynrng) that is used by decoders (by default) to alter the level of the reproduced audio. The control words allow the decoded signal level to be increased or decreased by up to 24 dB. In general, elementary streams may have dynamic range control words inserted or modified without affecting the encoded audio. When it is necessary to alter the dynamic range of audio programs which are broadcast, the dynamic range control word should be used. 5.7 STD Audio Buffer Size The main audio buffer (BSn, see A/52 Annex A [3]) shall be 2592 bytes. 6. MAIN AND ASSOCIATED SERVICES An AC-3 elementary stream contains the encoded representation of a single audio service. Multiple audio services are provided by multiple elementary streams. Each elementary stream is conveyed by the transport multiplex with a unique PID. There are a number of audio service types which may (individually) be coded into each elementary stream. Each AC-3 elementary stream is tagged as to its service type using the bsmod bit field. There are two types of main service and six types of associated service. Each associated service may be tagged (in the AC-3 audio descriptor in the transport PSI data) as being associated with one or more main audio services. Each AC-3 elementary stream may also be tagged with a language code. Associated services may contain complete program mixes, or may contain only a single program element. Associated services which are complete mixes may be decoded and used as is. They are identified by the full_svc bit in the AC-3 descriptor (see A/52, Annex A [3]). Associated services which contain only a single program element are intended to be combined with the program elements from a main audio service. This section specifies the meaning and use of each type of service. In general, a complete audio program (what is presented to the listener over the set of loudspeakers) may consist of a main audio service, an associated audio service that is a complete mix, or a main audio service combined with an associated audio service. The capability to simultaneously decode one main service and one associated service is required in order to form a complete audio program in certain service combinations described in this section. This capability may not exist in some receivers. 6.1 Summary of Service Types The audio service types are listed in Table 6.1. 10 ATSC A/53, Part 5 (AC-3) 3 January 2007 Table 6.1 Audio Service Types bsmod Type of Service ‘000’ (0) Main audio service: complete main (CM) ‘001’ (1) Main audio service: music and effects (ME) ‘010’ (2) Associated service: visually impaired (VI) ‘011’ (3) Associated service: hearing impaired (HI) ‘100’ (4) Associated service: dialogue (D) ‘101’ (5) Associated service: commentary (C) ‘110’ (6) Associated service: emergency (E) ‘111’ (7) Associated service: voice-over (VO) 6.2 Complete Main Audio Service (CM) The CM type of main audio service contains a complete audio program (complete with dialogue, music, and effects). This is the type of audio service normally provided. The CM service may contain from 1 to 5.1 audio channels. The CM service may be further enhanced by means of the VI, HI, C, E, or VO associated services described below. Audio in multiple languages may be provided by supplying multiple CM services, each in a different language. 6.3 Main Audio Service, Music and Effects (ME) The ME type of main audio service contains the music and effects of an audio program, but not the dialogue for the program. The ME service may contain from 1 to 5.1 audio channels. The primary program dialogue is missing and (if any exists) is supplied by simultaneously encoding a D associated service. Multiple D associated services in different languages may be associated with a single ME service. 6.4 Visually Impaired (VI) The VI associated service typically contains a narrative description of the visual program content. In this case, the VI service shall be a single audio channel. The simultaneous reproduction of both the VI associated service and the CM main audio service allows the visually impaired user to enjoy the main multi-channel audio program, as well as to follow (by ear) the on-screen activity. The dynamic range control signal in this type of VI service is intended to be used by the audio decoder to modify the level of the main audio program. Thus the level of the main audio service will be under the control of the VI service provider, and the provider may signal the decoder (by altering the dynamic range control words embedded in the VI audio elementary stream) to reduce the level of the main audio service by up to 24 dB in order to assure that the narrative description is intelligible. Besides providing the VI service as a single narrative channel, the VI service may be provided as a complete program mix containing music, effects, dialogue, and the narration. In this case, the service may be coded using any number of channels (up to 5.1), and the dynamic range control signal applies only to this service. The fact that the service is a complete mix shall be indicated in the AC-3 descriptor (see A/52, Annex A [3]). 11 ATSC A/53, Part 5 (AC-3) 3 January 2007 6.5 Hearing Impaired (HI) The HI associated service typically contains only dialogue which is intended to be reproduced simultaneously with the CM service. In this case, the HI service shall be a single audio channel. This dialogue may have been processed for improved intelligibility by hearing impaired listeners. Simultaneous reproduction of both the CM and HI services allows the hearing impaired listener to hear a mix of the CM and HI services in order to emphasize the dialogue while still providing some music and effects. Besides providing the HI service as a single dialogue channel, the HI service may be provided as a complete program mix containing music, effects, and dialogue with enhanced intelligibility. In this case, the service may be coded using any number of channels (up to 5.1). The fact that the service is a complete mix shall be indicated in the AC-3 descriptor (see A/52, Annex A [3]). 6.6 Dialogue (D) The D associated service contains program dialogue intended for use with an ME main audio service. The language of the D service is indicated in the AC-3 bit stream, and in the audio descriptor. A complete audio program is formed by simultaneously decoding the D service and the ME service and mixing the D service into the center channel of the ME main service (with which it is associated). If the ME main audio service contains more than two audio channels, the D service shall be monophonic (1/0 mode). If the main audio service contains two channels, the D service may also contain two channels (2/0 mode). In this case, a complete audio program is formed by simultaneously decoding the D service and the ME service, mixing the left channel of the ME service with the left channel of the D service, and mixing the right channel of the ME service with the right channel of the D service. The result will be a two channel stereo signal containing music, effects, and dialogue. Audio in multiple languages may be provided by supplying multiple D services (each in a different language) along with a single ME service. This is more efficient than providing multiple CM services, but, in the case of more than two audio channels in the ME service, requires that dialogue be restricted to the center channel. Some receivers may not have the capability to simultaneously decode an ME and a D service. 6.7 Commentary (C) The commentary associated service is similar to the D service, except that instead of conveying essential program dialogue, the C service conveys optional program commentary. The C service may be a single audio channel containing only the commentary content. In this case, simultaneous reproduction of a C service and a CM service will allow the listener to hear the added program commentary. The dynamic range control signal in the single channel C service is intended to be used by the audio decoder to modify the level of the main audio program. Thus the level of the main audio service will be under the control of the C service provider, and the provider may signal the decoder (by altering the dynamic range control words embedded in the C audio elementary 12 ATSC A/53, Part 5 (AC-3) 3 January 2007 stream) to reduce the level of the main audio service by up to 24 dB in order to assure that the commentary is intelligible. Besides providing the C service as a single commentary channel, the C service may be provided as a complete program mix containing music, effects, dialogue, and the commentary. In this case the service may be provided using any number of channels (up to 5.1). The fact that the service is a complete mix shall be indicated in the AC-3 descriptor (see A/52, Annex A [3]). 6.8 Emergency (E) The E associated service is intended to allow the insertion of emergency or high priority announcements. The E service is always a single audio channel. An E service is given priority in transport and in audio decoding. Whenever the E service is present, it will be delivered to the audio decoder. Whenever the audio decoder receives an E type associated service, it will stop reproducing any main service being received and only reproduce the E service out of the center channel (or left and right channels if a center loudspeaker does not exist). The E service may also be used for non-emergency applications. It may be used whenever the broadcaster wishes to force all decoders to quit reproducing the main audio program and reproduce a higher priority single audio channel. 6.9 Voice-Over (V0) The VO associated service is a single channel service intended to be reproduced along with the main audio service in the receiver. It allows typical voice-overs to be added to an already encoded audio elementary stream without requiring the audio to be decoded back to baseband and then re-encoded. It is always a single audio channel. It has second priority (only the E service has higher priority). It is intended to be simultaneously decoded and mixed into the center channel of the main audio service. The dynamic range control signal in the VO service is intended to be used by the audio decoder to modify the level of the main audio program. Thus the level of the main audio service may be controlled by the broadcaster, and the broadcaster may signal the decoder (by altering the dynamic range control words embedded in the VO audio elementary stream) to reduce the level of the main audio service by up to 24 dB during the voice-over. Some receivers may not have the capability to simultaneously decode and reproduce a voice- over service along with a program audio service. 7. AUDIO ENCODER INTERFACES 7.1 Audio Encoder Input Characteristics Audio signals which are input to the digital television system may be in analog or digital form. Audio signals should have any dc offset removed before being encoded. If the audio encoder does not include a dc blocking high-pass filter, the audio signals should be high-pass-filtered before being applied to the encoder. In general, input signals should be quantized to at least 16- bit resolution. The audio compression system can convey audio signals with up to 24-bit resolution. Physical interfaces for the audio inputs to the encoder may be defined as voluntary industry standards by the AES, SMPTE, or other standards organizations. 13 ATSC A/53, Part 5 (AC-3) 3 January 2007 7.2 Audio Encoder Output Characteristics Conceptually, the output of the audio encoder is an elementary stream which is formed into PES packets within the transport subsystem. It is possible that systems will be implemented wherein the formation of audio PES packets takes place within the audio encoder. In this case, the output(s) of the audio encoder(s) would be PES packets. Physical interfaces for these outputs (elementary streams and/or PES packets) may be defined as voluntary industry standards by SMPTE or other standards organizations. End of Part 5 14 Doc. A/53, Part 6:2007 3 January 2007 ATSC Digital Television Standard Part 6 - Enhanced AC-3 Audio System Characteristics (A/53, Part 6:2007) Advanced Television Systems Committee 1750 K Street, N.W. Suite 1200 Washington, D.C. 20006 www.atsc.org ATSC A/53, Part 6 (Enhanced AC-3) 3 January 2007 The Advanced Television Systems Committee, Inc., is an international, non-profit organization developing voluntary standards for digital television. The ATSC member organizations represent the broadcast, broadcast equipment, motion picture, consumer electronics, computer, cable, satellite, and semiconductor industries. Specifically, ATSC is working to coordinate television standards among different communications media focusing on digital television, interactive systems, and broadband multimedia communications. ATSC is also developing digital television implementation strategies and presenting educational seminars on the ATSC standards. ATSC was formed in 1982 by the member organizations of the Joint Committee on InterSociety Coordination (JCIC): the Electronic Industries Association (EIA), the Institute of Electrical and Electronic Engineers (IEEE), the National Association of Broadcasters (NAB), the National Cable Television Association (NCTA), and the Society of Motion Picture and Television Engineers (SMPTE). Currently, there are approximately 140 members representing the broadcast, broadcast equipment, motion picture, consumer electronics, computer, cable, satellite, and semiconductor industries. ATSC Digital TV Standards include digital high definition television (HDTV), standard definition television (SDTV), data broadcasting, multichannel surround-sound audio, and satellite direct-to-home broadcasting. 2 ATSC A/53, Part 6 (Enhanced AC-3) 3 January 2007 Table of Contents 1. SCOPE .....................................................................................................................................................5 2. REFERENCES .........................................................................................................................................5 2.1 Normative References 5 2.2 Informative References 5 3. COMPLIANCE NOTATION ......................................................................................................................5 3.1 Treatment of Syntactic Elements 6 3.2 Symbols, Abbreviations, and Mathematical Operators 6 4. SYSTEM OVERVIEW...............................................................................................................................6 5. SPECIFICATION ......................................................................................................................................6 5.1 Constraints With Respect to ATSC Standard A/52B Annex E 7 5.2 Sampling Frequency 7 5.3 Frame Size 7 5.4 Audio Coding Modes 7 5.5 Dialogue Level 8 5.6 Dynamic Range Compression - Artistic 8 5.7 Dynamic Range Compression - Heavy 8 6. MAIN AND ASSOCIATED SERVICES.....................................................................................................8 6.1 Summary of Service Types 8 6.2 Complete Main Audio Service (CM) 9 6.3 Visually Impaired (VI) 9 6.4 Hearing Impaired (HI) 9 6.5 Commentary (C) 9 7. AUDIO ENCODER INTERFACES............................................................................................................9 3 ATSC A/53, Part 6 (Enhanced AC-3) 3 January 2007 Index of Tables and Figures Table 5.1 Audio Constraints 7 Table 6.1 Audio Service Types 9 Figure 4.1 Audio subsystem in the digital television system. 6 4 ATSC A/53, Part 6 (Enhanced AC-3) 3 January 2007 ATSC Digital Television Standard – Part 6: Enhanced AC-3 Audio System Characteristics 1. SCOPE This Part describes the robust mode audio system characteristics and normative specifications of the Digital Television Standard. Audio encoded per this Part may be transmitted over a TS-E (see A/53-3 [4]). 2. REFERENCES All standards are subject to revision and amendment, and parties to agreement based on this standard are encouraged to investigate the possibility of applying the most recent editions of the documents listed below. 2.1 Normative References The following documents contain provisions which in whole or part, through reference in this text, constitute provisions of this standard. At the time of publication, the editions indicated were valid. [1] ATSC: “Digital Audio Compression (AC-3, E-AC-3) Standard,” Doc. A/52B, Advanced Television Systems Committee, Washington, D.C. 14 June 2005. [2] AES: “AES Recommended Practice for digital audio engineering—Serial transmission format for two-channel linearly represented digital audio data,” Doc. AES3-2003, Audio Engineering Society, New York, N.Y., 2003. (This document is a revision of AES3- 1992.) [3] ANSI: “Specification for Sound Level Meters,” Doc. ANSI S1.4-1983 (R 2001) with Amd.S1.4A-1995, American National Standards Institute, Washington, D.C., 2001. 2.2 Informative References [4] ATSC: “ATSC Digital Television Standard, Part 3 – Service Multiplex and Transport Subsystem Characteristics,” Doc. A/53, Part 3:2007, Advanced Television Systems Committee, Washington, D.C., 3 January 2007. [5] ATSC: “ATSC Digital Television Standard, Part 5 – AC-3 Audio System Characteristics,” Doc. A/53, Part 5:2007, Advanced Television Systems Committee, Washington, D.C., 3 January 2007. [6] ATSC: “Digital Television Standard, Part 1 – Digital Television System,” Doc. A/53, Part 1:2007, Advanced Television Systems Committee, Washington, D.C., 3 January 2007. 3. COMPLIANCE NOTATION As used in this document, “shall” or “will”, denotes a mandatory provision of the standard. “Should” denotes a provision that is recommended but not mandatory. “May” denotes a feature whose presence does not preclude compliance, and that may or may not be present at the option of the implementer. 5 ATSC A/53, Part 6 (Enhanced AC-3) 3 January 2007 3.1 Treatment of Syntactic Elements This document contains symbolic references to syntactic elements used in the audio, video, and transport coding subsystems. These references are typographically distinguished by the use of a different font (e.g., restricted), may contain the underscore character (e.g., sequence_end_code) and may consist of character strings that are not English words (e.g., dynrng). 3.2 Symbols, Abbreviations, and Mathematical Operators The symbols, abbreviations, and mathematical operators used herein are as found in Section 3.4 of ATSC A/53 Part 1 [5]. 4. SYSTEM OVERVIEW As illustrated in Figure 4.1, the audio subsystem comprises the audio encoding/decoding function and resides between the audio inputs/outputs and the transport subsystem. The audio encoder(s) is (are) responsible for generating the audio elementary stream(s) which are encoded representations of the baseband audio input signals. At the receiver, the audio subsystem is responsible for decoding the audio elementary stream(s) back into baseband audio. Audio Elementary Transport VSB RF Stream(s) Packets Transmission Audio Audio Transport Transmission Source Encoder(s) Subsystem Subsystem Specified in this Channel Annex Receiver Receiver Reconstructed Audio Transport Transmission Audio Decoder(s) Subsystem Subsystem Audio Transport VSB RF Elementary Packets Reception Stream Figure 4.1 Audio subsystem in the digital television system. 5. SPECIFICATION This Section forms the normative specification for the robust mode audio system that may be transmitted as part of TS-E (see A/53-3 [4]). The robust mode audio compression system 6 ATSC A/53, Part 6 (Enhanced AC-3) 3 January 2007 conforms to Annex E of the A/52 [1] Digital Audio Compression (AC-3) Standard, subject to the constraints outlined in this Section. 5.1 Constraints With Respect to ATSC Standard A/52 Annex E The robust mode digital television audio coding system shall use the Enhanced AC-3 Digital Audio Compression Standard specified in Annex E of ATSC Doc. A/52 [1], and as constrained by this Part. Audio bit streams encoded per that specification may be included the TS-E that is delivered by E-VSB. Constraints on the robust mode audio system shall be as shown in Table 5.1, which shows permitted values of certain syntactical elements. These constraints are further described in Sections 5.2 – 5.4, and Section 6. Table 5.1 Audio Constraints AC-3 Syntactical Comment Allowed value Element fscod Indicates sampling rate ‘00’ (indicates 48 kHz) frmsize Indicates the size of the audio frame ≤ ’011 1000 0000’ (indicates a frame size ≤ 448 kb/s for a six block frame) bstyp Indicates an independent stream (no ‘00’ sub-streams) acmod Indicates number of channels, ≥ ‘001’ prohibits 1+1 mode bsmod Restricts audio service types to CM, 0, 2, 3, or 5 VI, HI, C 5.2 Sampling Frequency The system conveys digital audio sampled at a frequency of 48 kHz that shall be locked to the 27 MHz MPEG-2 system clock. The 48 kHz audio sampling clock is defined as: 48 kHz audio sample rate = (2 ÷1125) × (27 MHz MPEG-2 system clock) If analog signal inputs are employed 1, the A/D converters shall sample at 48 kHz locked to the 27 MHz clock. If digital inputs are employed, the input sampling rate shall be 48 kHz locked to the system clock, or the audio encoder shall contain sampling rate converters which convert the sampling rate to 48 kHz locked to the system clock. 5.3 Frame Size The audio frame size shall be less than or equal to 1792 bytes. This implies a bit-rate limitation of 448 kb/s for AC-3 frames of 1536 samples (32 msec at 48 kHz). 5.4 Audio Coding Modes Audio services shall be encoded using any of the audio coding modes specified in A/52, with the exception of the 1+1 mode. The value of acmod in the AC-3 bit stream shall have a value in the range of 1–7, with the value 0 prohibited. 1 Either via AES3 [2] signals or embedded in the corresponding video. 7 ATSC A/53, Part 6 (Enhanced AC-3) 3 January 2007 5.5 Dialogue Level The value of the dialnorm parameter in the AC-3 elementary bit stream shall indicate the level of average spoken dialogue within the encoded audio program. Dialogue level may be measured by means of an “A” weighted integrated measurement (LAeq) [3]. (Receivers use the value of dialnorm to adjust the reproduced audio level so as to normalize the dialogue level.) In order to enable clean switching (i.e., without level shifts) between main and fallback audio services (that might have a different number of audio channels), linked audio services shall have values of dialnorm that result in matched dialogue levels when decoded by compliant decoders. 5.6 Dynamic Range Compression - Artistic Each encoded audio block may contain a dynamic range control word (dynrng) that is used by decoders (by default) to alter the level of the reproduced audio. The control words allow the decoded signal level to be increased or decreased by up to 24 dB. In general, elementary streams may have dynamic range control words inserted or modified without affecting the encoded audio. When it is necessary to alter the dynamic range of audio programs that are broadcast, the dynamic range control word should be used. In order to enable clean switching between main and fallback audio services (that might have a different number of audio channels), linked audio services shall have values of dynrng that result in matched audio levels when decoded by compliant decoders. 5.7 Dynamic Range Compression - Heavy Each encoded audio frame may contain a dynamic range control word (compr) that may be optionally used by decoders to render the audio with a very narrow dynamic range. The control words allow the decoded signal level to be increased or decreased by up to 48 dB. In order to enable clean switching between main and fallback audio services (that might have a different number of audio channels), linked audio services shall have values of compr that result in matched audio levels when decoded by compliant decoders. 6. MAIN AND ASSOCIATED SERVICES An AC-3 elementary stream contains the encoded representation of a single audio service. Multiple audio services are provided by multiple elementary streams. Each elementary stream is conveyed by the transport multiplex with a unique PID. There are a number of audio service types that may (individually) be coded into each elementary stream. Each AC-3 elementary stream is tagged as to its service type using the bsmod bit field. There is a complete main service and there are three types of associated services. Associated services delivered in a TS-E shall contain complete program mixes containing all audio program elements (dialog, music, effects, etc.) that are intended to be presented to a listener. This is indicated by the full_svc bit in the AC-3 descriptor being set to a value of ‘1’ (see A/53-3 [4] and A/52, Annex A [1]). This section specifies the meaning and use of each type of service. 6.1 Summary of Service Types The audio service types shall be as listed in Table 6.1. 8 ATSC A/53, Part 6 (Enhanced AC-3) 3 January 2007 Table 6.1 Audio Service Types bsmod Type of Service ‘000’ (0) Main audio service: complete main (CM) ‘010’ (2) Associated service: visually impaired (VI) ‘011’ (3) Associated service: hearing impaired (HI) ‘101’ (5) Associated service: commentary (C) 6.2 Complete Main Audio Service (CM) The CM type of main audio service shall contain a complete audio program (complete with dialogue, music, and effects). This is the type of audio service normally provided. The CM service may contain from 1 to 5.1 audio channels. Audio in multiple languages may be provided by supplying multiple CM services, each in a different language. 6.3 Visually Impaired (VI) The VI associated service a complete program mix containing music, effects, dialogue, and additionally a narration that describes the picture content. The VI service may be coded using any number of channels (up to 5.1). 6.4 Hearing Impaired (HI) The HI service is a complete program mix containing music, effects, and dialogue with enhanced intelligibility. The HI service may be coded using any number of channels (up to 5.1). 6.5 Commentary (C) The commentary associated service is a complete program mix containing music, effects, dialogue, and additionally some special commentary. This service may be provided using any number of channels (up to 5.1). 7. AUDIO ENCODER INTERFACES See A/53-5 [5], Section 7. End of Part 6 9