This file is raw output from pdftotext and may not be ideal for distribution. If you are a maintainer for Hackipedia, please sit down when you have time and clean this text version up. Source PDF: /mnt/main/jmc-storage/docs/ATSC/A-54a Guide to the Use of the ATSC Digital Television Standard, inc Corrigendum No. 1 (recommended) (2006).pdf Like all conversions the text below should be fully readable as UTF-8 unicode text. --------------------------------------------------------------- Doc. A/54A 4 December 2003 Corrigendum No. 1 dated 20 December 2006 Recommended Practice: Guide to the Use of the ATSC Digital Television Standard, including Corrigendum No. 1 Advanced Television Systems Committee 1750 K Street, N.W. Suite 1200 Washington, D.C. 20006 www.atsc.org ATSC Guide to Use of the ATSC DTV Standard 4 December 2003 The Advanced Television Systems Committee, Inc., is an international, non-profit organization developing voluntary standards for digital television. The ATSC member organizations represent the broadcast, broadcast equipment, motion picture, consumer electronics, computer, cable, satellite, and semiconductor industries. Specifically, ATSC is working to coordinate television standards among different communications media focusing on digital television, interactive systems, and broadband multimedia communications. ATSC is also developing digital television implementation strategies and presenting educational seminars on the ATSC standards. ATSC was formed in 1982 by the member organizations of the Joint Committee on InterSociety Coordination (JCIC): the Electronic Industries Association (EIA), the Institute of Electrical and Electronic Engineers (IEEE), the National Association of Broadcasters (NAB), the National Cable and Telecommunications Association (NCTA), and the Society of Motion Picture and Television Engineers (SMPTE). Currently, there are approximately 160 members representing the broadcast, broadcast equipment, motion picture, consumer electronics, computer, cable, satellite, and semiconductor industries. ATSC Digital TV Standards include digital high definition television (HDTV), standard definition television (SDTV), data broadcasting, multichannel surround-sound audio, and satellite direct-to-home broadcasting. 2 ATSC Guide to Use of the ATSC DTV Standard 4 December 2003 Table of Contents 1. SCOPE........................................................................................................................................................8 2. REFERENCES............................................................................................................................................8 2.1 Normative References 8 2.2 Informative References 8 3. DEFINITIONS .............................................................................................................................................9 3.1 Treatment of Syntactic Elements 9 3.2 Terms Employed 9 3.3 Symbols, Abbreviations, and Mathematical Operators 15 3.3.1 Arithmetic Operators 15 3.3.2 Logical Operators 16 3.3.3 Relational Operators 16 3.3.4 Bitwise Operators 16 3.3.5 Assignment 16 3.3.6 Mnemonics 16 3.3.7 Method of Describing Bit Stream Syntax 16 4. OVERVIEW OF THE ATSC DIGITAL TELEVISION SYSTEM.................................................................18 4.1 System Block Diagram 19 4.1.1 Application Encoders/Decoders 20 4.1.2 Transport (de)Packetization and (de)Multiplexing 21 4.1.3 RF Transmission 21 4.1.4 Receiver 21 5. VIDEO SYSTEMS .....................................................................................................................................22 5.1 Overview of Video Compression and Decompression 22 5.1.1 MPEG-2 Levels and Profiles 22 5.1.2 Compatibility with MPEG-2 22 5.1.3 Overview of Video Compression 23 5.2 Video Preprocessing 23 5.2.1 Video Compression Formats 23 5.2.2 Precision of Samples 25 5.2.3 Source-Adaptive Processing 25 5.2.4 Film Mode 26 5.2.5 Color Component Separation and Processing 26 5.2.6 Anti-Alias Filtering 27 5.2.7 Number of Lines Encoded 27 5.3 Concatenated Sequences 27 5.4 Guidelines for Refreshing 28 5.5 Active Format Description (AFD) 28 5.5.1 Active Area Signaling 29 5.5.2 Existing Standards 30 5.5.3 Treatment of Active Areas Greater than 16:9 31 5.5.4 Active Format Description (AFD) and Bar Data 32 6. AUDIO SYSTEMS.....................................................................................................................................32 6.1 Audio System Overview 33 6.2 Audio Encoder Interface 33 6.2.1 Input Source Signal Specification 34 3 ATSC Guide to Use of the ATSC DTV Standard 4 December 2003 6.2.2 Output Signal Specification 34 6.3 AC-3 Digital Audio Compression 35 6.3.1 Overview and Basics of Audio Compression 35 6.3.2 Transform Filter Bank 36 6.3.3 Coded Audio Representation 37 6.3.4 Bit Allocation 38 6.3.5 Rematrixing 38 6.3.6 Coupling 39 6.4 Bit Stream Syntax 39 6.4.1 Sync Frame 39 6.4.2 Splicing, Insertion 39 6.4.3 Error Detection Codes 40 6.5 Loudness and Dynamic Range 40 6.5.1 Loudness Normalization 40 6.5.2 Dynamic Range Compression 41 6.6 Main, Associated, and Multi-Lingual Services 43 6.6.1 Overview 43 6.6.2 Summary of Service Types 43 6.6.3 Multi-Lingual Services 44 6.6.4 Detailed Description of Service Types 45 6.7 Audio Bit Rates 48 6.7.1 Typical Audio Bit Rates 48 6.7.2 Audio Bit Rate Limitations 48 7. DTV TRANSPORT....................................................................................................................................49 7.1 Introduction 49 7.2 MPEG-2 Basics 49 7.2.1 Standards Layering 50 7.3 MPEG-2 Transport Stream Packet 50 7.3.1 MPEG-2 TS Packet Structure 50 7.3.2 MPEG-2 Transport Stream Packet Syntax 51 7.4 MPEG-2 Transport Stream Data Structures 52 7.4.1 Tables and Sections 52 7.4.2 MPEG-2 Private Section 53 7.4.3 MPEG-2 PSI 54 7.4.4 MPEG-2 Packetized Elementary Stream (PES) Packet 55 7.5 Multiplex Concepts 56 7.6 MPEG-2 Timing and Buffer Model 58 7.6.1 MPEG-2 System Timing 58 7.6.2 Buffer Model 62 7.7 Supplemental Information 63 7.7.1 MPEG-2 Descriptors 63 7.7.2 Code Point Conflict Avoidance 65 7.7.3 Understanding MPEG Syntax Tables 67 8. RF TRANSMISSION .................................................................................................................................71 8.1 System Overview 71 8.2 Bit Rate Delivered to a Transport Decoder by the Transmission Subsystem 72 8.3 Performance Characteristics of Terrestrial Broadcast Mode 73 8.4 Transmitter Signal Processing 75 4 ATSC Guide to Use of the ATSC DTV Standard 4 December 2003 8.5 Upconverter and RF Carrier Frequency Offsets 76 8.5.1 Nominal DTV Pilot Carrier Frequency 76 8.5.2 Requirements for Offsets 76 8.5.3 Upper DTV Channel into Lower Analog Channel 77 8.5.4 Other Offset Cases 78 8.5.5 Summary: DTV Frequency 79 8.5.6 Frequency Tolerances 79 8.5.7 Hardware Options for Tight Frequency Control 80 8.5.8 Additional Considerations 80 8.6 Performance Characteristics of High Data Rate Mode 80 9. RECEIVER SYSTEMS..............................................................................................................................82 9.1 General Issues Concerning DTV Reception 82 9.1.1 Planning Factors Used by ACATS PS/WP3 82 9.1.2 Noise Figure 84 9.1.3 Co-Channel and Adjacent-Channel Rejection 84 9.1.4 Unintentional Radiation 85 9.1.5 Direct Pickup (DPU) 85 9.2 Grand Alliance Receiver Design 85 9.2.1 Tuner 86 9.2.2 Channel Filtering and VSB Carrier Recovery 88 9.2.3 Segment Sync and Symbol Clock Recovery 90 9.2.4 Non-Coherent and Coherent AGC 92 9.2.5 Data Field Synchronization 92 9.2.6 Interference Rejection Filter 93 9.2.7 Channel Equalizer 96 9.2.8 Phase Tracker 97 9.2.9 Trellis Decoder 99 9.2.10 Data De-Interleaver 101 9.2.11 Reed-Solomon Decoder 102 9.2.12 Data De-Randomizer 102 9.2.13 Receiver Loop Acquisition Sequencing 102 9.2.14 High Data Rate Mode 102 9.3 Receiver Equalization Issues 103 9.4 Transport Stream Processing Issues in the Receiver 103 9.5 Receiver Video Issues 104 9.5.1 Multiple Video Programs 105 9.5.2 Concatenation of Video Sequences 105 9.5.3 D-Frames 106 9.5.4 Adaptive Video Error Concealment Strategy 106 9.6 Receiver Audio Issues 107 9.6.1 Audio Coding 107 9.6.2 Audio Channels and Services 107 9.6.3 Loudness Normalization 108 9.6.4 Dynamic Range Control 108 9.6.5 Tracking of Audio Data Packets and Video Data Packets 109 CORRIGENDUM NO. 1 ...............................................................................................................................110 5 ATSC Guide to Use of the ATSC DTV Standard 4 December 2003 List of Figures and Tables Figure 4.1 Block diagram of functionality in a transmitter/receiver pair. 20 Figure 5.1 Video coding in relation to the ATV system. 23 Figure 5.8 Coding and active area. 30 Figure 5.9 Example of active video area greater than 16:9 aspect ratio. 31 Figure 6.1 Audio subsystem within the digital television system. 33 Figure 6.2 Overview of audio compression system. 36 Figure 6.3 AC-3 synchronization frame. 39 Figure 7.1 MPEG-2 transport stream program multiplex. 57 Figure 7.2 MPEG-2 constant delay buffer model. 59 Figure 7.3 MPEG-2 system time clock. 60 Figure 7.4 The MPEG-2 PTS and marker_bits. 61 Figure 8.1 Segment error probability, 8-VSB with 4 state trellis decoding, RS (207,187). 74 Figure 8.2 Cumulative distribution function of 8-VSB peak-to-average power ratio (in ideal linear system). 75 Figure 8.3 16-VSB error probability. 81 Figure 8.4 Cumulative distribution function of 16-VSB peak-to-average power ratio. 81 Figure 9.1 Block diagram of Grand Alliance prototype VSB receiver. 85 Figure 9.2 Block diagram of the tuner in the prototype VSB receiver. 86 Figure 9.3 Tuner, IF amplifier, and FPLL in the prototype VSB receiver. 88 Figure 9.4 Data segment sync. 91 Figure 9.5 Segment sync and symbol clock recovery with AGC. 91 Figure 9.6 Data field sync recovery in the prototype VSB receiver. 93 Figure 9.7. Location of NTSC carriers — comb filtering. 94 Figure 9.8 NTSC interference rejection filter in prototype VSB receiver. 95 Figure 9.9 Equalizer in the prototype VSB receiver. 97 Figure 9.10 Phase-tracking loop portion of the phase-tracker. 98 Figure 9.11 Trellis code de-interleaver. 99 Figure 9.12 Segment sync removal in prototype 8 VSB receiver. 99 Figure 9.13 Trellis decoding with and without NTSC rejection filter. 100 Figure 9.14 Connceptual diagram of convolutional de-interleaver. 101 Table 3.1 Next Start Code 18 Table 5.1 Compression Formats 24 Table 5.2 Standardized Video Input Formats 24 Table 6.1 Table of Service Types 43 Table 6.2 Typical Audio Bit Rate 48 Table 7.1 Table Format 67 Table 7.2a IF Statement 68 6 ATSC Guide to Use of the ATSC DTV Standard 4 December 2003 Table 7.2b IF Statement 68 Table 7.3 For-Loop Example 1 69 Table 7.4 For-Loop Example 2 69 Table 7.5 General Descriptor Format 70 Table 7.6 For-Loop Example 3 70 Table 8.1 Parameters for VSB Transmission Modes 72 Table 8.2 DTV Pilot Carrier Frequencies for Two Stations (Normal offset above lower channel edge: 309.440559 kHz) 79 Table 9.1 Receiver Planning Factors Used by PS/WP3 83 Table 9.2 DTV Interference Criteria 85 Table 9.3 Digital Television Standard Video Formats 104 7 ATSC Guide to Use of the ATSC DTV Standard 4 December 2003 Recommended Practice A/54A: Guide to the Use of the ATSC Digital Television Standard 1. SCOPE This guide provides tutorial information and an overview of the digital television system defined by ATSC Standard A/53, ATSC Digital Television Standard. In addition, recommendations are given for operating parameters for certain aspects of the DTV system. 2. REFERENCES 2.1 Normative References There are no normative references. 2.2 Informative References 1. AES 3-1992 (ANSI S4.40-1992): “AES Recommended Practice for digital audio engineering — Serial transmission format for two-channel linearly represented digital audio data,” Audio Engineering Society, New York, N.Y. 2. ANSI S1.4-1983: “Specification for Sound Level Meters.” 3. ATSC IS-191 (2003): “DTV Lip Sync at Emission Encoder Input: ATSC IS Requirements for a Recommended Practice,” Advanced Television Systems Committee, Washington, D.C. 4. ATSC Standard A/52A (2001): “Digital Audio Compression (AC-3),” Advanced Television Systems Committee, Washington, D.C., August 20, 2001. 5. ATSC Standard A/53B (2001) with Amendment 1 (2002) and Amendment 2 (2003): “ATSC Digital Television Standard,” Advanced Television Systems Committee, Washington, D.C., carrying the cover date of August 7, 2001. 6. ATSC Standard A/65B (2003): “Program and System Information Protocol,” Advanced Television Systems Committee, Washington, D.C., March 18, 2003. 7. ATSC Standard A/70 (2000): “Conditional Access System for Terrestrial Broadcast with Amendment,” Advanced Television Systems Committee, Washington, D.C., May 31, 2000. 8. IEC 651 (1979): “Sound Level Meters.” 9. IEC 804 (1985), Amendment 1 (1989): “Integrating/Averaging Sound level Meters.” 10. IEEE Standard 100-1992: The New IEEE Standard Dictionary of Electrical and Electronic Terms, Institute of Electrical and Electronics Engineers, New York, N.Y. 11. ISO/IEC 11172-1, “Information Technology - Coding of moving pictures and associated audio for digital storage media at up to about 1.5 Mbit/s - Part 1: Systems.” 12. ISO/IEC 11172-2, “Information Technology - Coding of moving pictures and associated audio for digital storage media at up to about 1.5 Mbit/s - Part 2: Video.” 13. ISO/IEC IS 13818-1:2000 (E), International Standard, Information technology – Generic coding of moving pictures and associated audio information: Systems. 14. ISO/IEC IS 13818-2, International Standard (1996), MPEG-2 Video. 15. ISO/IEC IS 13818-1:2000 (E), International Standard, Information technology – Generic coding of moving pictures and associated audio information: Systems. 16. ISO/IEC CD 13818-4, MPEG Committee Draft (1994): “MPEG-2 Compliance.” 17. ITU-R BT. 601-4 (1994): “Encoding parameters of digital television for studios.” 18. ITU-R BT.601-5 (1995): Encoding Parameters of Digital Television for Studios. 8 ATSC Guide to Use of the ATSC DTV Standard 4 December 2003 19. SMPTE 125M (1995): “Standard for Television—Component Video Signal 4:2:2, Bit- Parallel Digital Interface,” Society of Motion Picture and Television Engineers, White Plains, N.Y. 20. SMPTE 170M (1999): “Standard for Television—Composite Analog Video Signal, NTSC for Studio Applications,” Society of Motion Picture and Television Engineers, White Plains, N.Y. 21. SMPTE 267M (1995): “Standard for Television—Bit-Parallel Digital Interface, Component Video Signal 4:2:2 16 × 9 Aspect Ratio,” Society of Motion Picture and Television Engineers, White Plains, N.Y. 22. SMPTE 274M (1998): “Standard for Television—1920 × 1080 Scanning and Analog and Parallel Digital Interfaces for Multiple Picture Rates,” Society of Motion Picture and Television Engineers, White Plains, N.Y. 23. SMPTE 293M (2003): “Standard for Television—720 × 483 Active Line at 59.94-Hz Progressive Scan Production, Digital Representation,” Society of Motion Picture and Television Engineers, White Plains, N.Y. 24. SMPTE 296M (2001): :Standard for Television—1280 × 720 Progressive Image Sample Structure, Analog and Digital Representation and Analog Interface, Society of Motion Picture and Television Engineers, White Plains, N.Y. 25. SMPTE/EBU: “Task Force for Harmonized Standards for the Exchange of Program Material as Bitstreams - Final Report: Analyses and Results,” Society of Motion Picture and Television Engineers, White Plains, N.Y., July 1998. 26. SMPTE Recommended Practice 202 (2002): “Video Alignment for MPEG Coding,” Society of Motion Picture and Television Engineers, White Plains, N.Y., 2002. 27. Digital TV Group: “Digital Receiver Implementation Guidelines and Recommended Receiver Reaction to Aspect Ratio Signaling in Digital Video Broadcasting,” Issue 1.2, August 2000. 3. DEFINITIONS The following definitions are included here for reference but the precise meaning of each may vary slightly from standard to standard. Where an abbreviation is not covered by IEEE practice, or industry practice differs from IEEE practice, then the abbreviation in question will be described in Section 3.3 of this document. Many of the definitions included therein are derived from definitions adopted by MPEG. 3.1 Treatment of Syntactic Elements This document contains symbolic references to syntactic elements used in the audio, video, and transport coding subsystems. These references are typographically distinguished by the use of a different font (e.g., restricted), may contain the underscore character (e.g., sequence_end_code) and may consist of character strings that are not English words (e.g., dynrng). 3.2 Terms Employed For the purposes of the Digital Television Standard, the following definitions apply: ACATS Advisory Committee on Advanced Television Service. access unit A coded representation of a presentation unit. In the case of audio, an access unit is the coded representation of an audio frame. In the case of video, an access unit includes all the coded data for a picture, and any stuffing that follows it, up to but not including the start of the next access unit. If a picture is not preceded by a group_start_code or a 9 ATSC Guide to Use of the ATSC DTV Standard 4 December 2003 sequence_header_code, the access unit begins with a picture start code. If a picture is preceded by a group_start_code and/or a sequence_header_code, the access unit begins with the first byte of the first of these start codes. If it is the last picture preceding a sequence_end_code in the bit stream, all bytes between the last byte of the coded picture and the sequence_end_code (including the sequence_end_code) belong to the access unit. A/D Analog to digital converter. AFT Active format description. AES Audio Engineering Society. anchor frame A video frame that is used for prediction. I-frames and P-frames are generally used as anchor frames, but B-frames are never anchor frames. ANSI American National Standards Institute. asynchronous transfer mode (ATM) A digital signal protocol for efficient transport of both constant-rate and bursty information in broadband digital networks. The ATM digital stream consists of fixed-length packets called “cells,” each containing 53 8-bit bytes—a 5-byte header and a 48-byte information payload. ATM See asynchronous transfer mode. ATTC Advanced Technology Test Center. AWGN Additive white Gaussian noise. bidirectional pictures or B-pictures or B-frames Pictures that use both future and past pictures as a reference. This technique is termed bidirectional prediction. B-pictures provide the most compression. B-pictures do not propagate coding errors as they are never used as a reference. bit rate The rate at which the compressed bit stream is delivered from the channel to the input of a decoder. block A block is an 8-by-8 array of pel values or DCT coefficients representing luminance or chrominance information. bps Bits per second. byte-aligned A bit in a coded bit stream is byte-aligned if its position is a multiple of 8-bits from the first bit in the stream. channel A digital medium that transports a digital television stream. coded representation A data element as represented in its encoded form. compression Reduction in the number of bits used to represent an item of data. constant bit rate Operation where the bit rate is constant from start to finish of the compressed bit stream. conventional definition television (CDTV) This term is used to signify the analog NTSC television system as defined in ITU-R Recommendation 470. See also standard definition television and ITU-R Recommendation 1125. CRC The cyclic redundancy check used to verify the correctness of the data. D-frame A frame coded according to an MPEG-1 mode that uses dc coefficients only. data element An item of data as represented before encoding and after decoding. DCT See discrete cosine transform. decoded stream The decoded reconstruction of a compressed bit stream. decoder An embodiment of a decoding process. decoding (process) The process defined in the Digital Television Standard that reads an input coded bit stream and outputs decoded pictures or audio samples. 10 ATSC Guide to Use of the ATSC DTV Standard 4 December 2003 decoding time-stamp (DTS) A field that may be present in a PES packet header which indicates the time that an access unit is decoded in the system target decoder. DFS Data field synchronization. digital storage media (DSM) A digital storage or transmission device or system. discrete cosine transform A mathematical transform that can be perfectly undone and which is useful in image compression. DSM-CC Digital storage media command and control. DSM Digital storage media. DSS Data segment synchronization. DTV Digital television, the system described in the ATSC Digital Television Standard. DTS See decoding time-stamp. D/U Desired (signal) to undesired (signal) ratio. DVCR Digital video cassette recorder editing A process by which one or more compressed bit streams are manipulated to produce a new compressed bit stream. Conforming edited bit streams are understood to meet the requirements defined in the Digital Television Standard. elementary stream (ES) A generic term for one of the coded video, coded audio, or other coded bit streams. One elementary stream is carried in a sequence of PES packets with one and only one stream_id. elementary stream clock reference (ESCR) A time stamp in the PES Stream from which decoders of PES streams may derive timing. EMM See entitlement management message. encoder An embodiment of an encoding process. encoding (process) A process that reads a stream of input pictures or audio samples and produces a valid coded bit stream as defined in the Digital Television Standard. entitlement control message (ECM) Entitlement control messages are private conditional access information that specify control words and possibly other stream-specific, scrambling, and/or control parameters. entitlement management message (EMM) Entitlement management messages are private conditional access information that specify the authorization level or the services of specific decoders. They may be addressed to single decoders or groups of decoders. entropy coding Variable length lossless coding of the digital representation of a signal to reduce redundancy. entry point Refers to a point in a coded bit stream after which a decoder can become properly initialized and commence syntactically correct decoding. The first transmitted picture after an entry point is either an I-picture or a P-picture. If the first transmitted picture is not an I- picture, the decoder may produce one or more pictures during acquisition. ES See elementary stream. essence In its simplest form, essence = content – metadata. In this context, (video) essence is the image itself without any of the transport padding (H and V intervals, ancillary data, etc). event An event is defined as a collection of elementary streams with a common time base, an associated start time, and an associated end time. field For an interlaced video signal, a “field” is the assembly of alternate lines of a frame. Therefore, an interlaced frame is composed of two fields, a top field and a bottom field. FIR Finite-impulse-response. 11 ATSC Guide to Use of the ATSC DTV Standard 4 December 2003 forbidden This term, when used in clauses defining the coded bit stream, indicates that the value must never be used. This is usually to avoid emulation of start codes. FPLL Frequency and phase locked loop. frame A frame contains lines of spatial information of a video signal. For progressive video, these lines contain samples starting from one time instant and continuing through successive lines to the bottom of the frame. For interlaced video, a frame consists of two fields, a top field and a bottom field. One of these fields will commence one field later than the other. GOP See group of pictures. group of pictures (GOP) A group of pictures consists of one or more pictures in sequence. HDTV See high-definition television. high-definition television (HDTV) High-definition television provides significantly improved picture quality relative to conventional (analog NTSC) television and a wide screen format (16:9 aspect ratio). The ATSC Standard enables transmission of HDTV pictures at several frame rates and one of two picture formats; these are listed in the top two lines of Table 5.1. The ATSC Standard also enables the delivery digital sound in various formats. high level A range of allowed picture parameters defined by the MPEG-2 video coding specification that corresponds to high-definition television. Huffman coding A type of source coding that uses codes of different lengths to represent symbols that have unequal likelihood of occurrence. IEC International Electrotechnical Commission. intra coded pictures or I-pictures or I-frames Pictures that are coded using information present only in the picture itself and not depending on information from other pictures. I-pictures provide a mechanism for random access into the compressed video data. I-pictures employ transform coding of the pel blocks and provide only moderate compression. ISI Intersymbol interference. ISO International Organization for Standardization. ITU International Telecommunication Union. layer One of the levels in the data hierarchy of the video and system specification. level A range of allowed picture parameters and combinations of picture parameters. LMS Least mean squares. macroblock In the DTV system a macroblock consists of four blocks of luminance and one each Cr and Cb block. main level A range of allowed picture parameters defined by the MPEG-2 video coding specification with maximum resolution equivalent to ITU-R Recommendation 601. main profile A subset of the syntax of the MPEG-2 video coding specification. Mbps 1,000,000 bits per second. motion vector A pair of numbers that represent the vertical and horizontal displacement of a region of a reference picture for prediction. MP@HL Main profile at high level. MP@ML Main profile at main level. MPEG Refers to standards developed by the ISO/IEC JTC1/SC29 WG11, Moving Picture Experts Group. MPEG may also refer to the Group itself. MPEG-1 Refers to ISO/IEC standards 11172-1 (Systems), 11172-2 (Video), 11172-3 (Audio), 11172-4 (Compliance Testing), and 11172-5 (Technical Report). 12 ATSC Guide to Use of the ATSC DTV Standard 4 December 2003 MPEG-2 Refers to ISO/IEC standards 13818-1 (Systems), 13818-2 (Video), 13818-3 (Audio), 13818-4 (Compliance). pack A pack consists of a pack header followed by zero or more packets. It is a layer in the system coding syntax. packet data Contiguous bytes of data from an elementary data stream present in the packet. packet identifier (PID) A unique integer value used to associate elementary streams of a program in a single or multi-program transport stream. packet A packet consists of a header followed by a number of contiguous bytes from an elementary data stream. It is a layer in the system coding syntax. padding A method to adjust the average length of an audio frame in time to the duration of the corresponding PCM samples, by continuously adding a slot to the audio frame. payload Payload refers to the bytes that follow the header byte in a packet. For example, the payload of a transport stream packet includes the PES_packet_header and its PES_packet_data_bytes or pointer_field and PSI sections, or private data. A PES_packet_payload, however, consists only of PES_packet_data_bytes. The transport stream packet header and adaptation fields are not payload. PCR See program clock reference. pel See pixel. PES packet header The leading fields in a PES packet up to but not including the PES_packet_data_byte fields where the stream is not a padding stream. In the case of a padding stream, the PES packet header is defined as the leading fields in a PES packet up to but not including the padding_byte fields. PES packet The data structure used to carry elementary stream data. It consists of a packet header followed by PES packet payload. PES stream A PES stream consists of PES packets, all of whose payloads consist of data from a single elementary stream, and all of which have the same stream_id. PES Packetized elementary stream. picture Source, coded, or reconstructed image data. A source or reconstructed picture consists of three rectangular matrices representing the luminance and two chrominance signals. PID See packet identifier. pixel “Picture element” or “pel.” A pixel is a digital sample of the color intensity values of a picture at a single point. predicted pictures or P-pictures or P-frames Pictures that are coded with respect to the nearest previous I or P-picture. This technique is termed forward prediction. P-pictures provide more compression than I-pictures and serve as a reference for future P-pictures or B-pictures. P- pictures can propagate coding errors when P-pictures (or B-pictures) are predicted from prior P-pictures where the prediction is flawed. presentation time-stamp (PTS) A field that may be present in a PES packet header that indicates the time that a presentation unit is presented in the system target decoder. presentation unit (PU) A decoded audio access unit or a decoded picture. profile A defined subset of the syntax specified in the MPEG-2 video coding specification. program clock reference (PCR) A time stamp in the transport stream from which decoder timing is derived. program element A generic term for one of the elementary streams or other data streams that may be included in the program. 13 ATSC Guide to Use of the ATSC DTV Standard 4 December 2003 program specific information (PSI) PSI consists of normative data that is necessary for the demultiplexing of transport streams and the successful regeneration of programs. program A program is a collection of program elements. Program elements may be elementary streams. Program elements need not have any defined time base; those that do have a common time base and are intended for synchronized presentation. PSI See program specific information. PSIP Program and System Information Protocol, as defined in ATSC A/65. PTS See presentation time-stamp. quantizer A processing step that intentionally reduces the precision of DCT coefficients. random access The process of beginning to read and decode the coded bit stream at an arbitrary point. reserved This term, when used in clauses defining the coded bit stream, indicates that the value may be used in the future for Digital Television Standard extensions. Unless otherwise specified, all reserved bits are set to “1”. ROM Read-only memory. SAW filter Surface-acoustic-wave filter. SCR See system clock reference. scrambling The alteration of the characteristics of a video, audio, or coded data stream in order to prevent unauthorized reception of the information in a clear form. This alteration is a specified process under the control of a conditional access system. SDTV See standard definition television. slice A series of consecutive macroblocks. SMPTE Society of Motion Picture and Television Engineers. source stream A single, non-multiplexed stream of samples before compression coding. splicing The concatenation performed on the system level of two different elementary streams. It is understood that the resulting stream must conform totally to the Digital Television Standard. standard definition television (SDTV) This term is used to signify a digital television system in which the quality is approximately equivalent to that of NTSC. This equivalent quality may be achieved from pictures sourced at the 4:2:2 level of ITU-R Recommendation 601 and subjected to processing as part of bit rate compression. The results should be such that when judged across a representative sample of program material, subjective equivalence with NTSC is achieved. See also conventional definition television and ITU-R Recommendation 1125. start codes 32-bit codes embedded in the coded bit stream that are unique. They are used for several purposes including identifying some of the layers in the coding syntax. Start codes consist of a 24 bit prefix (0x000001) and an 8 bit stream_id. STC System time clock. STD See system target decoder. STD input buffer A first-in, first-out buffer at the input of a system target decoder for storage of compressed data from elementary streams before decoding. still picture A coded still picture consists of a video sequence containing exactly one coded picture that is intra-coded. This picture has an associated PTS and the presentation time of succeeding pictures, if any, is later than that of the still picture by at least two picture periods. system clock reference (SCR) A time stamp in the program stream from which decoder timing is derived. 14 ATSC Guide to Use of the ATSC DTV Standard 4 December 2003 system header The system header is a data structure that carries information summarizing the system characteristics of the Digital Television Standard multiplexed bit stream. system target decoder (STD) A hypothetical reference model of a decoding process used to describe the semantics of the Digital Television Standard multiplexed bit stream. time-stamp A term that indicates the time of a specific action, such as the arrival of a byte or the presentation of a presentation unit. TOV Threshold of visibility, defined as 2.5 data segment errors per second. transport stream packet header The leading fields in a transport stream packet up to and including the continuity_counter field. variable bit rate Operation where the bit rate varies with time during the decoding of a compressed bit stream. VBV See video buffering verifier. video buffering verifier (VBV) A hypothetical decoder that is conceptually connected to the output of an encoder. Its purpose is to provide a constraint on the variability of the data rate that an encoder can produce. video sequence A video sequence is represented by a sequence header, one or more groups of pictures, and an end_of_sequence code in the data stream. 8 VSB Vestigial sideband modulation with 8 discrete amplitude levels. 16 VSB Vestigial sideband modulation with 16 discrete amplitude levels. 3.3 Symbols, Abbreviations, and Mathematical Operators The symbols, abbreviations, and mathematical operators used to describe the Digital Television Standard are those adopted for use in describing MPEG-2 and are similar to those used in the “C” programming language. However, integer division with truncation and rounding are specifically defined. The bitwise operators are defined assuming two’s-complement representation of integers. Numbering and counting loops generally begin from 0. 3.3.1 Arithmetic Operators + Addition. – Subtraction (as a binary operator) or negation (as a unary operator). ++ Increment. -- Decrement. * or × Multiplication. ^ Power. / Integer division with truncation of the result toward 0. For example, 7/4 and –7/–4 are truncated to 1 and –7/4 and 7/–4 are truncated to –1. // Integer division with rounding to the nearest integer. Half-integer values are rounded away from 0 unless otherwise specified. For example 3//2 is rounded to 2, and –3//2 is rounded to –2. DIV Integer division with truncation of the result towards –∞. % Modulus operator. Defined only for positive numbers. Sign( ) Sign(x) =1 x>0 =0 x == 0 = –1 x < 0 15 ATSC Guide to Use of the ATSC DTV Standard 4 December 2003 NINT ( ) Nearest integer operator. Returns the nearest integer value to the real-valued argument. Half-integer values are rounded away from 0. sin Sine. cos Cosine. exp Exponential. √ Square root. log10 Logarithm to base ten. loge Logarithm to base e. 3.3.2 Logical Operators || Logical OR. && Logical AND. ! Logical NOT. 3.3.3 Relational Operators > Greater than. ≥ Greater than or equal to. < Less than. ≤ Less than or equal to. == Equal to. != Not equal to. max [,...,] The maximum value in the argument list. min [,...,] The minimum value in the argument list. 3.3.4 Bitwise Operators & AND. | OR. >> Shift right with sign extension. << Shift left with 0 fill. 3.3.5 Assignment = Assignment operator. 3.3.6 Mnemonics The following mnemonics are defined to describe the different data types used in the coded bit stream. bslbf Bit string, left bit first, where “left” is the order in which bit strings are written in the Standard. Bit strings are written as a string of 1s and 0s within single quote marks, e.g. ‘1000 0001’. Blanks within a bit string are for ease of reading and have no significance. uimsbf Unsigned integer, most significant bit first. The byte order of multi-byte words is most significant byte first. 3.3.7 Method of Describing Bit Stream Syntax Each data item in the coded bit stream described below is in bold type. It is described by its name, its length in bits, and a mnemonic for its type and order of transmission. 16 ATSC Guide to Use of the ATSC DTV Standard 4 December 2003 The action caused by a decoded data element in a bit stream depends on the value of that data element and on data elements previously decoded. The decoding of the data elements and definition of the state variables used in their decoding are described in the clauses containing the semantic description of the syntax. The following constructs are used to express the conditions when data elements are present, and are in normal type. Note this syntax uses the “C” code convention that a variable or expression evaluating to a non-zero value is equivalent to a condition that is true. while ( condition ) { If the condition is true, then the group of data elements occurs next in the data stream. data_element This repeats until the condition is not true. ... } do { The data element always occurs at least once. The data element is repeated until the data_element condition is not true. ...} while ( condition ) if ( condition) { If the condition is true, then the first group of data elements occurs next in the data data_element stream. ... } else { If the condition is not true, then the second group of data elements occurs next in the data_element data stream. ... } for (i = 0;i100 dB SPL in some home theatre setups). If such abuses occur, there may be a demand for regulatory enforcement of audio levels. Fortunately, bit streams that contain an incorrect value of dialnorm are easily corrected by simply changing the value of the 5-bit dialnorm field in the BSI header. There are two primary methods that broadcast organizations may employ to ensure that the value of dialnorm is set correctly. The first method is to select a suitable dialogue level for use with all programming and conform all baseband audio programs to this level prior to AC-3 encoding. Then the value of dialnorm can be set to one common value for all programs that are encoded. Conforming all programs to a common dialogue level may mean that for some programs the audio level never approaches 100 percent digital level (since they have to be reduced in gain), while for other programs non-reversible (by the receiver) limiting must be engaged in order to prevent them from going over digital 100 percent (since they had to be increased in gain). Pre-encoded programs can be included in broadcasts if they have had the value of dialnorm correctly set, and the receiver will then conform the level. The second (and generally preferred) method is to let all programming enter the encoder at full level, and correct for differing levels by adjusting the encoded value of dialnorm to be correct for each program. In this case, the conforming to a common level is done at the receiver. This method will become more practical as computer remote control of the encoding equipment becomes commonplace. The data base for each audio program to be encoded would include (along with items such as number of channels, language, etc.) the dialogue level. The master control computer would then communicate the value of dialogue level to the audio encoder, which would then place the appropriate value in the bit stream. In the case where a complete audio program is formed from the combination of a main and an associated service, each of the two services being combined will have a value of dialnorm, and the values may not be identical. In this case, the value of dialnorm in each bit stream should be used to alter the level of the audio decoded from that bit stream, prior to the mixing process that combines the audio from the two bit streams to form the complete audio program. 6.5.2 Dynamic Range Compression It is common practice for high quality programming to be produced with wide dynamic range audio, suitable for the highest quality audio reproduction environment. Broadcasters, serving a wide audience, typically process audio in order to reduce its dynamic range. The processed audio is more suitable for the majority of the audience that does not have an audio reproduction environment which matches that of the original audio production studio. In the case of NTSC, all viewers receive the same audio with the same dynamic range, and it is impossible for any viewer to enjoy the original wide dynamic range audio production. The audio coding system provides an embedded dynamic range control system that allows a common encoded bit stream to deliver programming with a dynamic range appropriate for each individual listener. A dynamic range control value (dynrng) is provided in each audio block (every 5 ms). These values are used by the audio decoder in order to alter the level of the reproduced audio for each audio block. Level variations of up to ±24 dB may be indicated. The values of dynrng are generated in order to provide a subjectively pleasing but restricted dynamic range. The unaffected level is dialogue level. For sounds louder than dialogue, values of dynrng will 41 ATSC Guide to Use of the ATSC DTV Standard 4 December 2003 indicated gain reduction. For sounds quieter than dialogue, values of dynrng will indicate a gain increase. The broadcaster is in control of the values of dynrng, and can supply values that generated the amount of compression which the broadcaster finds appropriate. The use of dialogue level as the unaffected level further improves loudness uniformity. By default, the values of dynrng will be used by the audio decoder. The receiver will thus reproduce audio with a reduced dynamic range, as intended by the broadcaster. The receiver may also offer the viewer the option to scale the value of dynrng in order to reduce the effect of the dynamic range compression that was introduced by the broadcaster. In the limiting case, if the value of dynrng is scaled to zero, then the audio will be reproduced with its full original dynamic range. The optional scaling of dynrng can be done differently for values indicating gain reduction (which reduces the levels of loud sounds) and for values indicating gain increases (which makes quiet sounds louder). Thus the viewer may be given independent control of the amount of compression applied to loud and quiet sounds. Therefore, while the broadcaster may introduce dynamic range compression to suit the needs of most of the audience, individual listeners may have the option to choose to enjoy the audio program with more or all of its original dynamic range intact. The dynamic range control words may be generated by the AC-3 encoder. They may also be generated by a processor located before or after the encoder. If the dynamic range processor is located prior to the encoder, there is a path to convey the dynamic range control words from the processor to the encoder, or to a bit stream processor, so that the control words may be inserted into the bit stream. If the dynamic range processor is located after the encoder, it can act upon an encoded stream and directly insert the control words without altering the encoded audio. In general, encoded bit streams may have dynamic range control words inserted or modified without affecting the encoded audio. When it is necessary to alter subjectively the dynamic range of audio programs, the method built into the audio coding subsystem should be used. The system should provide a transparent pathway, from the audio program produced in the audio post production studio, into the home. Signal processing devices such as compressors or limiters that alter the audio signal should not be inserted into the audio signal chain. Use of the dynamic range control system embedded within the audio coding system allows the broadcaster or program provider to appropriately limit the delivered audio dynamic range without actually affecting the audio signal itself. The original audio is delivered intact and is accessible to those listeners who wish to enjoy it. In the case where a complete audio program is formed from the combination of a main and an associated service, each of the two services being combined may have a dynamic range control signal. In most cases, the dynamic range control signal contained in a particular bit stream applies to the audio channels coded in that bit stream. There are three exceptions: • A single-channel visually impaired (VI) associated service containing only a narrative describing the picture content • A single-channel commentary (C) service containing only the commentary channel • A voice-over (VO) associated service In these cases, the dynamic range control signal in the associated service elementary stream is used by the decoder to control the audio level of the main audio service. This allows the provider of the VI, C, or VO service the ability to alter the level of the main audio service in order to make the VI, C, or VO services intelligible. In these cases the main audio service level is controlled by both the control signal in the main service and the control signal in the associated service. 42 ATSC Guide to Use of the ATSC DTV Standard 4 December 2003 6.6 Main, Associated, and Multi-Lingual Services 6.6.1 Overview An AC-3 elementary stream contains the encoded representation of a single audio service. Multiple audio services are provided by multiple elementary streams. Each elementary stream is conveyed by the transport multiplex with a unique PID. There are a number of audio service types that may (individually) be coded into each elementary stream. Each elementary stream is tagged as to its service type using the bsmod bit field. There are two types of main service and six types of associated service. Each associated service may be tagged (in the AC-3 audio descriptor in the transport PSI data) as being associated with one or more main audio services. Each AC-3 elementary stream may also be tagged with a language code. Associated services may contain complete program mixes, or may contain only a single program element. Associated services that are complete mixes may be decoded and used as is. They are identified by the full_svc bit in the AC-3 descriptor (see [4], Annex A). Associated services that contain only a single program element are intended to be combined with the program elements from a main audio service. This section describes each type of service and gives usage guidelines. In general, a complete audio program (what is presented to the listener over the set of loudspeakers) may consist of a main audio service, an associated audio service that is a complete mix, or a main audio service combined with one associated audio service. The capability to simultaneously decode one main service and one associated service is required in order to form a complete audio program in certain service combinations described in this Section. This capability may not exist in some receivers. 6.6.2 Summary of Service Types The service types that correspond to each value of bsmod are defined in the Digital Audio Compression (AC-3) Standard and in Annex B of the Digital Television Standard. The information is reproduced in Table 6.1 and the following paragraphs briefly describe the meaning of these service types. Table 6.1 Table of Service Types bsmod Type of Service 000 (0) Main audio service: complete main (CM) 001 (1) Main audio service: music and effects (ME) 010 (2) Associated service: visually impaired (VI) 011 (3) Associated service: hearing impaired (HI) 100 (4) Associated service: dialogue (D) 101 (5) Associated service: commentary (C) 110 (6) Associated service: emergency (E) 111 (7) Associated service: voice-over (VO) 6.6.2.1 Complete Main Audio Service (CM) This is the normal mode of operation. All elements of a complete audio program are present. The 6 audio program may be any number of channels from 1 to 5.1 . 6 5.1 channel sound refers to a system reproducing the following signals: right, center, left, right surround, left surround, and low-frequency enhancement (LFE). 43 ATSC Guide to Use of the ATSC DTV Standard 4 December 2003 6.6.2.2 Main Audio Service, Music and Effects (ME) All elements of an audio program are present except for dialogue. This audio program may contain from 1 to 5.1 channels. Dialogue may be provided by a D associated service (that may be simultaneously decoded and added to form a complete program). 6.6.2.3 Associated Service: Visually Impaired (VI) This is typically a single-channel service, intended to convey a narrative description of the picture content for use by the visually impaired, and intended to be decoded along with the main audio service. The VI service also may be provided as a complete mix of all program elements, in which case it may use any number of channels (up to 5.1). 6.6.2.4 Associated Service: Hearing Impaired (HI) This is typically a single-channel service, intended to convey dialogue that has been processed for increased intelligibility for the hearing impaired, and intended to be decoded along with the main audio service. The HI service also may be provided as a complete mix of all program elements, in which case it may use any number of channels (up to 5.1). 6.6.2.5 Associated Service: Dialogue (D) This service conveys dialogue intended to be mixed into a main audio service (ME) that does not contain dialogue. 6.6.2.6 Associated Service: Commentary (C) This service typically conveys a single-channel of commentary intended to be optionally decoded along with the main audio service. This commentary channel differs from a dialogue service, in that it contains optional instead of necessary program content. The C service also may be provided as a complete mix of all program elements, in which case it may use any number of channels (up to 5.1). 6.6.2.7 Associated Service: Emergency Message (E) This is a single-channel service, which is given priority in reproduction. If this service type appears in the transport multiplex, it is routed to the audio decoder. If the audio decoder receives this service type, it will decode and reproduce the E channel while muting the main service. 6.6.2.8 Associated Service: Voice-Over (VO) This is a single-channel service intended to be decoded and added into the center loudspeaker channel. 6.6.3 Multi-Lingual Services Each audio bit stream may be in any language. In order to provide audio services in multiple languages a number of main audio services may be provided, each in a different language. This is the (artistically) preferred method, because it allows unrestricted placement of dialogue along with the dialogue reverberation. The disadvantage of this method is that as much as 384 kbps is needed to provide a full 5.1-channel service for each language. One way to reduce the required bit-rate is to reduce the number of audio channels provided for languages with a limited audience. For instance, alternate language versions could be provided in 2-channel stereo with a bit-rate of 128 kbps. Or, a mono version can be supplied at a bit-rate of approximately 64–96 kbps. 44 ATSC Guide to Use of the ATSC DTV Standard 4 December 2003 Another way to offer service in multiple languages is to provide a main multi-channel audio service (ME) that does not contain dialogue. Multiple single-channel dialogue associated services (D) can then be provided, each at a bit-rate of approximately 64–96 kbps. Formation of a complete audio program requires that the appropriate language D service be simultaneously decoded and mixed into the ME service. This method allows a large number of languages to be efficiently provided, but at the expense of artistic limitations. The single-channel of dialogue would be mixed into the center reproduction channel, and could not be panned. Also, reverberation would be confined to the center channel, which is not optimum. Nevertheless, for some types of programming (sports, etc.) this method is very attractive due to the savings in bit rate it offers. Some receivers may not have the capability to simultaneously decode an ME and a D service. Stereo (two-channel) service without artistic limitation can be provided in multiple languages with added efficiency by transmitting a stereo ME main service along with stereo D services. The D and appropriate language ME services are simply combined in the receiver into a complete stereo program. Dialogue may be panned, and reverberation may be placed included in both channels. A stereo ME service can be sent with high quality at 192 kbps, while the stereo D services (voice only) can make use of lower bit-rates, such as 128 or 96 kbps per language. Some receivers may not have the capability to simultaneously decode an ME and a D service. Note that during those times when dialogue is not present, the D services can be momentarily removed, and their data capacity used for other purposes. 6.6.4 Detailed Description of Service Types 6.6.4.1 CM—Complete Main Audio Service The CM type of main audio service contains a complete audio program (complete with dialogue, music, and effects). This is the type of audio service normally provided. The CM service may contain from 1 to 5.1 audio channels. The CM service may be further enhanced by means of the VI, HI, C, E, or VO associated services described below. Audio in multiple languages may be provided by supplying multiple CM services, each in a different language. 6.6.4.2 ME—Main Audio Service, Music and Effects The ME type of main audio service contains the music and effects of an audio program, but not the dialogue for the program. The ME service may contain from 1 to 5.1 audio channels. The primary program dialogue is missing and (if any exists) is supplied by providing a D associated service. Multiple D services in different languages may be associated with a single ME service. 6.6.4.3 VI—Visually Impaired The VI associated service typically contains a narrative description of the visual program content. In this case, the VI service is a single audio channel. Simultaneous reproduction of the VI service and the main audio service allows the visually impaired user to enjoy the main multi- channel audio program, as well as to follow the on-screen activity. This allows the VI service to be mixed into one of the main reproduction channels (the choice of channel may be left to the listener) or to be provided as a separate output (which, for instance, might be delivered to the VI user via open-air headphones). The dynamic range control signal in this type of VI service is intended to be used by the audio decoder to modify the level of the main audio program. Thus the level of the main audio service will be under the control of the VI service provider, and the provider may signal the decoder (by altering the dynamic range control words embedded in the VI audio elementary 45 ATSC Guide to Use of the ATSC DTV Standard 4 December 2003 stream) to reduce the level of the main audio service by up to 24 dB in order to assure that the narrative description is intelligible. Besides providing the VI service as a single narrative channel, the VI service may be provided as a complete program mix containing music, effects, dialogue, and the narration. In this case, the service may be coded using any number of channels (up to 5.1), and the dynamic range control signal applies only to this service. The fact that the service is a complete mix is indicated in the AC-3 descriptor (see A/52, Annex A). 6.6.4.4 HI—Hearing Impaired The HI associated service typically contains only a single-channel of dialogue and is intended for use by those whose hearing impairments make it difficult to understand the dialogue in the presence of music and sound effects. The dialogue may be processed for increased intelligibility by the hearing impaired. The hearing impaired listener may wish to listen to a mixture of the single-channel HI dialogue track and the main program audio. Simultaneous reproduction of the HI service along with the CM service allows the HI listener to adjust the mixture to control the emphasis on dialogue over music and effects. The HI channel would typically be mixed into the center channel. An alternative would be to deliver the HI signal to a discrete output (which, for instance, might feed a set of open-air headphones worn only by the HI listener.) Besides providing the HI service as a single narrative channel, the HI service may be provided as a complete program mix containing music, effects, and dialogue with enhanced intelligibility. In this case, the service may be coded using any number of channels (up to 5.1). The fact that the service is a complete mix is indicated in the AC-3 descriptor (see [4], Annex A). 6.6.4.5 D—Dialogue The dialogue associated service is employed when it is desired to most efficiently offer multi- channel audio in several languages simultaneously, and the program material is such that the restrictions (no panning, no multi-channel reverberation) of a single dialogue channel may be tolerated. When the D service is used, the main service is of type ME (music and effects). In the case that the D service contains a single-channel, simultaneously decoding the ME service along with the selected D service allows a complete audio program to be formed by mixing the D channel into the center channel of the ME service. Typically, when the main audio service is of type ME, there will be several different language D services available. The transport demultiplexer may be designed to select the appropriate D service to deliver to the audio decoder based on the listener’s language preference (which would typically be stored in memory in the receiver). Or, the listener may explicitly instruct the receiver to select a particular language track, overriding the default selection. If the ME main audio service contains more than two audio channels, the D service will be monophonic (1/0 mode). If the main audio service contains two channels, the D service may contain two channels (2/0 mode). In this case, a complete audio program is formed by simultaneously decoding the D service and the ME service, mixing the left channel of the ME service with the left channel of the D service, and mixing the right channel of the ME service with the right channel of the D service. The result will be a two-channel stereo signal containing music, effects, and dialogue. 6.6.4.6 C—Commentary The commentary associated service is similar to the D service, except that instead of conveying primary program dialogue, the C service conveys optional program commentary. When C service(s) are provided, the receiver may notify the listener of their presence. The listener should 46 ATSC Guide to Use of the ATSC DTV Standard 4 December 2003 be able to call up information (probably on-screen) about the various available C services, and optionally request one of them to be selected for decoding along with the main service. The C service may be added to any loudspeaker channel (the listener may be given this control). Typical uses for the C service might be optional added commentary during a sporting event, or different levels (novice, intermediate, advanced) of commentary available to accompany documentary or educational programming. The C service may be a single audio channel containing only the commentary content. In this case, simultaneous reproduction of a C service and a CM service will allow the listener to hear the added program commentary. The dynamic range control signal in the single-channel C service is intended to be used by the audio decoder to modify the level of the main audio program. Thus the level of the main audio service will be under the control of the C service provider, and the provider may signal the decoder (by altering the dynamic range control words embedded in the C audio elementary stream) to reduce the level of the main audio service by up to 24 dB in order to assure that the commentary is intelligible. Besides providing the C service as a single commentary channel, the C service may be provided as a complete program mix containing music, effects, dialogue, and the commentary. In this case the service may be provided using any number of channels (up to 5.1). The fact that the service is a complete mix is indicated in the AC-3 descriptor (see [4], Annex A). 6.6.4.7 E—Emergency The E associated service is intended to allow the insertion of emergency announcements. The normal audio services do not necessarily have to be replaced in order for the emergency message to get through. The transport demultiplexer gives first priority to this type of audio service. Whenever an E service is present, it is delivered to the audio decoder by the transport subsystem. When the audio decoder receives an E type associated service, it stops reproducing any main service being received and only reproduces the E service. The E service may also be used for non-emergency applications. It may be used whenever the broadcaster wishes to force all decoders to quit reproducing the main audio program and substitute a higher priority single- channel. 6.6.4.8 VO—Voice-Over It is possible to use the E service for announcements, but the use of the E service leads to a complete substitution of the voice-over for the main program audio. The voice-over associated service is similar to the E service, except that it is intended to be reproduced along with the main service. The systems demultiplexer gives second priority to this type of associated service (second only to an E service). The VO service is intended to be simultaneously decoded and mixed into the center channel of the main audio service that is being decoded. The dynamic range control signal in the VO service is intended to be used by the audio decoder to modify the level of the main audio program. Thus the level of the main audio service will be under the control of the broadcaster, and the broadcaster may signal the decoder (by altering the dynamic range control words embedded in the VO audio bit stream) to reduce the level of the main audio service by up to 24 dB during the voice-over. The VO service allows typical voice-overs to be added to an already encoded audio bit stream, without requiring the audio to be decoded back to baseband and then re-encoded. However, space must be available within the transport multiplex to make room for the insertion of the VO service. 47 ATSC Guide to Use of the ATSC DTV Standard 4 December 2003 6.7 Audio Bit Rates 6.7.1 Typical Audio Bit Rates The information in Table 6.2 provides a general guideline as to the audio bit rates that are expected to be most useful. For main services, the use of the LFE channel is optional and will not affect the indicated data rates. 6.7.2 Audio Bit Rate Limitations The audio decoder input buffer size (and thus part of the decoder cost) is determined by the maximum bit rate that must be decoded. The syntax of the AC-3 standard supports bit rates ranging from a minimum of 32 kbps up to a maximum of 640 kbps per individual elementary bit stream. The bit rate utilized in the digital television system is restricted to 448 kbps in order to reduce the size of the input buffer in the audio decoder, and thus the receiver cost. Receivers can be expected to support the decoding of a main audio service, or an associated audio service that is a complete service (containing all necessary program elements), at a bit rate up to and including 448 kbps. Transmissions may contain main audio services, or associated audio services that are complete services (containing all necessary program elements), encoded at a bit rate up to and including 448 kbps. Transmissions may contain single-channel associated audio services intended to be simultaneously decoded along with a main service encoded at a bit rate up to and including 128 kbps. Transmissions may contain dual-channel dialogue associated services intended to be simultaneously decoded along with a main service encoded at a bit rate up to and including 192 kbps. Transmissions have a further limitation that the combined bit rate of a main and an associated service that are intended to be simultaneously reproduced is less than or equal to 576 kbps. Table 6.2 Typical Audio Bit Rate Type of Service Number of Channels Typical Bit Rates CM, ME, or associated audio service containing all necessary 5 384–448 kbps program elements CM, ME, or associated audio service containing all necessary 4 320-384 kbps program elements CM, ME, or associated audio service containing all necessary 3 192-320 kbps program elements CM, ME, or associated audio service containing all necessary 2 128-256 kbps program elements VI, narrative only 1 64-128 kbps HI, narrative only 1 64-96 kbps D 1 64-128 kbps D 2 96-192 kbps C, commentary only 1 64-128 kbps E 1 64-128 kbps VO 1 64-128 kbps 48 ATSC Guide to Use of the ATSC DTV Standard 4 December 2003 7. DTV TRANSPORT 7.1 Introduction The ATSC DTV system described in core documents A/52 and A/53 provides the framework for conveying information to consumers. Built into this framework is a toolkit of features that can be used to extend the capabilities of the DTV system far beyond what the initial designers might have envisioned. This extensibility is, perhaps, the greatest benefit of digital technology, as well as the source of some confusion about what is required by the Digital Television Standard. This section provides a tutorial description of the functionality and format of the transport subsystem employed in the ATSC DTV system. It is intended to aid the reader in understanding and applying the precise specification of the transport subsystem given in the underlying normative standards documents. The ATSC transport subsystem standard is based on the MPEG- 2 Systems standard (ISO/IEC 13818-1) [13] and is further constrained and extended by Annex C of the Digital Television Standard (A/53) [5]. The MPEG-2 Standard was developed by the Moving Picture Experts Group, part of the International Standards Organization. The transport subsystem employs the fixed-length transport stream packetization approach defined in ISO/IEC13818-1, which is usually referred to as the MPEG-2 Systems Standard. This approach is well-suited to the needs of terrestrial broadcast and cable television transmission of digital television. The use of relatively short, fixed-length packets matches well with the needs and techniques for error protection in both terrestrial broadcast and cable television distribution environments. The ATSC DTV transport may carry a number of television programs. The MPEG-2 term “program” corresponds to an individual digital TV channel or data service, where each program is composed of a number of MPEG-2 program elements (i.e., related video, audio, and data streams). The MPEG-2 Systems Standard support for multiple channels or services within a single, multiplexed bit stream enables the deployment of practical, bandwidth efficient digital broadcasting systems. It also provides great flexibility to accommodate the initial needs of the service to multiplex video, audio, and data while providing a well-defined path to add additional services in the future in a fully backward-compatible manner. By basing the transport subsystem on MPEG-2, maximum interoperability with other media and standards is maintained. Figure 4.1 illustrates the organization of a digital television transmitter-receiver pair and the location of the transport subsystem in the overall system. The transport subsystem resides between the application (e.g., audio or video) encoding and decoding functions and the transmission subsystem. At its lowest layer, the encoder transport subsystem is responsible for formatting the encoded bits and multiplexing the different components of the program for transmission. At the receiver, it is responsible for recovering the bit streams for the individual application decoders and for the corresponding error signaling. The transport subsystem also incorporates other higher-level functionality related to identification of applications and synchronization of the receiver. 7.2 MPEG-2 Basics The MPEG-2 standards are built upon the foundations of the MPEG-1 standards [11]. While the MPEG-1 standards were developed primarily to address the then upcoming video CD marketplace’s need for an interoperable solution for compressed digital video storage and real- time playback at rates of about 1.5 Mbps, MPEG-2 was developed to primarily address the broadcast digital television and DVD markets and includes new features such as: • Improved video and audio compression technologies • Encoding support for both 4:2:0 and 4:2:2 video 49 ATSC Guide to Use of the ATSC DTV Standard 4 December 2003 • Support for the transmission of the coded bit streams in error-prone environments • Support for multiple programs (“channels”) in a single, multiplexed stream. This includes improved synchronization with the capability for each program to have a unique time-base, and the ability to describe and identify a network consisting of multiple multiplexed streams, each containing multiple programs • Conditional access support • Stream buffer management including buffer initialization • Private data transport support In contrast to previously developed standards, the MPEG-2 standards were designed to support full ITU-R 601 standard-definition resolutions, high-definition resolutions, and interlaced sequences. The MPEG-2 standards were also designed to support multi-channel networks carried in error-prone environments (such as terrestrial broadcasting), and the basic constructs used to encapsulate private data and a multitude of data essence formats. MPEG-2 standards are the foundation of several digital television technologies including digital set top boxes (STB), high-definition television (HDTV), and data broadcasting. The MPEG-2 Systems Standard [13] defines the bit stream syntax and the methods necessary for (de)multiplexing, transporting, and synchronizing coded video, coded audio, and other data (including data essence not defined by the MPEG standards, referred to as “private data”). The standard includes the definition of packet formats, the synchronization and timing model, the mechanism for identifying content carried in the bit stream, and the buffer models used to enable a receiving device to properly decode and reconstruct the video, audio, and/or data presentation. The MPEG-2 Systems Standard as constrained and extended by the ATSC is the basis for the remainder of this section. 7.2.1 Standards Layering The MPEG-2 Systems Standard (ISO/IEC 13818-1) [13] provides a toolkit that can be used to create the DTV transport bit stream. This toolkit can be thought of as providing general purpose functionality. Users of the MPEG-2 standards (such as the ATSC) choose tools from the toolkit and specify how they may be used (i.e., specify constraints on the syntax and semantics of the MPEG-2 standards). A/53 describes which portions of the MPEG-2 Systems Standard are to be used in creating the ATSC bit stream and also describes the constraints imposed. In addition to constraining the MPEG-2 Systems Standard, the ATSC has also created compatible extensions to the standard. Some syntactical fields in the MPEG-2 Systems Standard are user defined—other fields have user private ranges. The ATSC is considered a “user” of the MPEG-2 standards and has used the user private areas to create ATSC standardized extensions to the MPEG-2 standards. 7.3 MPEG-2 Transport Stream Packet An MPEG-2 Transport Stream is a continuous series of MPEG-2 Transport Stream packets. An MPEG-2 Transport Stream packet is 188 bytes in length and always begins with the synchronization byte 0x47. 7.3.1 MPEG-2 TS Packet Structure The first four bytes of the MPEG-2 Transport Stream packet are the Transport Stream packet header. The remaining 184 bytes of an MPEG-2 Transport Stream packet may contain an optional adaptation field and up to 184 bytes of Transport Stream packet payload. If the adaptation field is present, it immediately follows the last byte of the Transport Stream packet header. The adaptation field is not part of the Transport Stream packet header nor the Transport 50 ATSC Guide to Use of the ATSC DTV Standard 4 December 2003 Stream packet payload. When the adaptation field is present, the MPEG-2 Transport Stream packet payload’s size is 184 bytes minus the length of the adaptation field. The definition of the contents of an MPEG-2 Transport Stream packet payload may differ depending upon the MPEG-2 stream_type and the encapsulation method. 7.3.2 MPEG-2 Transport Stream Packet Syntax In the packet header, the Packet Identifier (PID) is a 13-bit value used to identify multiplexed packets within the MPEG-2 Transport Stream. Assigning a unique PID value to each bit stream allows Transport Stream packets from up to 8192 (213) separate bit streams to be simultaneously carried within the MPEG-2 Transport Stream. Note that not all bit streams are MPEG-2 Program Elements (e.g., PSI), but all Program Elements are bit streams. The PID provides a unique bit stream (and, therefore, Program Element) association for each Transport Stream packet. The payload_unit_start_indicator is used to signal to the decoder (by being set to ‘1’) that the first byte of something “interesting” can be found within the payload of the current MPEG-2 Transport Stream Packet (an MPEG-2 PES packet (see Section 7.4.4) or MPEG-2 section (see Section 7.4.1). This form of signaling, combined with hardware filtering in the decoder, allows for considerable efficiencies in decoding the contents of the stream. A PES packet must always commence as the first byte of the Transport Stream packet payload and only a single PES packet may begin in a Transport Stream packet. Thus, two PES packets (or portions thereof) are not permissible in a single Transport Stream packet. For MPEG-2 sections (PSI and private sections) carried as payload, when the payload_unit_start_indicator field is set to ‘1’, then the first byte of the MPEG-2 Transport Stream packet payload carries the pointer_field, which indicates the byte offset from the start of the Transport Stream packet payload to the beginning of the next PSI or private section. If the payload_unit_start_indicator field is set to ‘0’, then the first byte of the Transport Stream packet payload is not a pointer_field. Instead, the Transport Stream packet payload contains the continuation of a previously started PSI or private section along with any necessary stuffing bytes. The transport_scrambling_control field indicates if the MPEG-2 Transport Stream packet payload has been scrambled. The MPEG-2 Transport Stream packet header, the optional adaptation field, and the payload of a Null MPEG-2 Transport Stream packet (see Section 7.3.2.1) are never scrambled. The adaptation_field_control field signals the inclusion of the optional adaptation field. The most significant bit of the two-bit field always indicates the presence of the adaptation field. The least significant bit indicates the presence of payload. The continuity_counter field is a 4-bit rolling counter associated with MPEG-2 Transport Stream packets carrying the same PID. The counter is incremented by one for each consecutive Transport Stream packet for a given PID except when the adaptation_field_control field is set to indicate that the Transport Stream packet contains an adaptation field only (no payload) or if it is set to the ‘reserved’ value, or if the Transport Stream packet is a duplicate 7 (these exception cases are known as “non-incrementing conditions”). The continuity_counter is considered “continuous” if it has incremented by one from the continuity_counter value in the previous Transport Stream packet of the same PID or when any of the non-incrementing conditions have 7 The MPEG-2 Systems Standard defines a duplicate Transport Stream packet to be the second of two—and only two—consecutive Transport Stream packets having the same PID that are carrying payload and contain identical byte-by-byte contents (except for the program clock reference, if present). Duplicate Transport Stream packets may be used for additional error resilience purposes. 51 ATSC Guide to Use of the ATSC DTV Standard 4 December 2003 been met. The continuity counter is considered “discontinuous” if it has not incremented by one from the continuity counter value in the previous Transport Stream packet having the same PID and a non-incrementing condition has not been met. Except in the case when the 8 discontinuity_indicator flag has been set to ‘1’ to signal a discontinuous continuity_counter, if a receiver encounters a situation where the continuity_counter is discontinuous, then it should assume that some number of MPEG-2 Transport Stream packets have been lost. Two other fields, the transport_error_indicator and the transport_priority, which are not typically used in ATSC transport Streams, are also carried in the packet header. The transport_error_indicator may be used to indicate that at least one uncorrectable bit error exists in the Transport Stream packet. The transport_priority field may be used to indicate that a Transport Stream packet with the field set to ‘1’ is of higher priority than other Transport Stream packets having the same PID which do not have the field set to ‘1’. The payload field carries the data content. The data content can be one of many types; for example, an MPEG-2 PES packet (which itself may contain an elementary stream) or one or more PSI or private sections. 7.3.2.1 The MPEG-2 Transport Stream Null Packet The MPEG-2 Transport Stream Null packet is a special Transport Stream packet designed to pad an MPEG-2 Transport Stream. While individual MPEG-2 Programs (services) within a multiplexed bit stream may have variable bit-rate characteristics, the overall MPEG-2 Transport Stream must have a constant bit rate. MPEG-2 Transport Stream Null packets are transmitted when there are no other packets ready to be transmitted. This is necessary, since the MPEG-2 equipment creating the Transport Stream must maintain a constant bit rate output. Note that null packets may be added and/or removed by any re-multiplexing process within the data path. MPEG-2 Transport Stream Null packets are always identified by a PID with value 0x1FFF. The Transport Stream Null packet payload may contain any data values. The continuity_counter of a Null Transport Stream packet is undefined, carries no information, and should be ignored. 7.4 MPEG-2 Transport Stream Data Structures MPEG-2 Systems defines two fundamental bit stream data structures. The first, generically called a “section,” is used to encapsulate either descriptive information about the data essence streams (coded video, coded audio, or data) within the Transport Stream service multiplex (e.g., stream type, information needed to extract the streams, program guide information) or a “private data” essence stream itself. The second, called a “Packetized Elementary Stream (PES) packet” is used to encapsulate elementary stream data essence (e.g., coded video, coded audio, or data). 7.4.1 Tables and Sections The MPEG-2 Systems Standard defines tables that provide information necessary to act on or to further describe the data essence streams within the Transport Stream service multiplex. The logical tables are constructed by using one or more Sections. For example, the Program Map Table (PMT) contains information about what elementary streams are parts of which MPEG-2 programs. The PMT is composed of one or more TS_program_map_section sections. A Table is the aggregation of the Sections that comprise it. A Section is divided as necessary to be packetized 8 The MPEG-2 Systems Standard defines the discontinuity_indicator as a flag in the adaptation field syntax. Among other uses, it may be set to indicate a discontinuous continuity_counter value. See 13818-1 subclause 2.4.3.5 for details. 52 ATSC Guide to Use of the ATSC DTV Standard 4 December 2003 into the payload of one or more MPEG-2 Transport Stream packets so that it may be incorporated into the Transport stream service multiplex along with other bit streams. The MPEG-2 Systems Standard defines several different tables, collectively called Program Specific Information (PSI). Using the private_section, which is the MPEG-2 Systems-defined generic section data structure, the ATSC standards define many other tables. 7.4.2 MPEG-2 Private Section The term “section” is a generic term referring to any data structure that is based on the MPEG-2 private_section syntax. The MPEG-2 private_section defines a data encapsulation method used to place private data (that is, data that the MPEG-2 standards do not define, including ATSC- defined sections) into an MPEG-2 Transport Stream packet with a minimum amount of structure [13]. A section, or more specifically the MPEG-2 private_section, always begins with an 8-bit table_id, which uniquely identifies the table of which the section is part. Another field, the section_syntax_indicator, determines whether the “short” or “long” form of the private_section syntax is used. The short form section includes a minimal amount of header information and is limited to carrying a payload of at most 4093 bytes. The long form section incorporates additional header fields, which allow the segmentation of large data structures into multiple parts. A collection of long form sections may accommodate 256 * 4084 bytes of payload (maximum size of 1,045,504 bytes). In practice, most receivers’ incorporate hardware section filtering allowing the receiver to specify filtering criteria for the first eight bytes of a section. This length equates to the byte count necessary to filter the long form private_section header. Hardware assisted filtering offloads the processing burden from the host processor and enables the receiver to specify exact section identification syntax for the section it is interested in acquiring. The long form section header contains a version_number field, which identifies the revision of the contents of the section. Any time the section’s payload bytes are modified, the version_number must be incremented so that a receiver will be able to determine that the section’s contents have changed. The long form section contains a CRC_32 field as the first byte following the last payload byte, which is used for error detection purposes. The receiver’s 32-bit CRC decoder (the CRC decoder model is described in MPEG-2 Systems, Annex A) calculates the CRC result over all the bytes that comprise a section beginning with the table_id through the last byte of the CRC_32 field itself. A CRC accumulator result of zero indicates that the section was received without error. One or more sections may be placed into an MPEG-2 Transport Stream packet depending on the section’s size. If the section length is smaller than a Transport Stream packet’s payload, then there may be multiple sections contained within the single MPEG-2 Transport Stream packet. Sections that are larger than a single MPEG-2 Transport Stream packet are segmented across multiple MPEG-2 Transport Stream packets. Once the process of packetizing a section commences, a new section will not be packetized into Transport Stream packets having the same PID until the previous section’s packetization has completed. When a section does not completely fill an MPEG-2 Transport Stream packet’s payload area and there is no new section ready to begin filling the remainder of the payload area, the remaining bytes of the MPEG-2 Transport Stream packet are stuffed, or filled, with the value 0xFF. To prevent stuffing byte emulation, the MPEG-2 Systems Standard forbids the use of 0xFF as a table_id value. 53 ATSC Guide to Use of the ATSC DTV Standard 4 December 2003 7.4.3 MPEG-2 PSI MPEG-2 Program Specific Information (PSI) provides data necessary to identify an MPEG-2 Program (i.e., the desired service) and to demultiplex (i.e., separate and extract) the Program and its Program Elements from the MPEG-2 single or multi-program Transport Stream service multiplex. The MPEG-2 Systems Standard currently defines five PSI tables: the Program Association Table (PAT), the Program Map Table (PMT), the Conditional Access Table (CAT), the Network Information Table (NIT), and the Transport Stream Description Table (TSDT). The Program Association Table provides a complete list of all the MPEG-2 Programs (services) within the Transport Stream. The PAT establishes a relationship between each MPEG- 2 Program, via the program_number, and its corresponding program map section (properly defined as TS_program_map_section), via the PID value assigned to the corresponding program map section. Transport Stream packets that contain the PAT are assigned to PID 0x0000. Each program map section contains the mapping between an MPEG-2 Program and the Program Elements. that define the Program (this mapping is called a program definition). Specifically, a program definition establishes a mapping (establishing the relationship) between an MPEG-2 Program Number and the list of the PIDs that identify the individual Program Elements comprising the MPEG-2 Program. The PMT is defined as the complete collection of individual Program Definitions within the Transport Stream, with one TS_program_map_section per MPEG-2 Program. The PMT is unique among the PSI tables in that its contents may be carried as part of different bit streams (i.e., within Transport Stream packets that have different PIDs). This simplifies the addition, deletion, or modification of the PSI for individual MPEG-2 programs, as each can be altered independently. This also simplifies the demultiplexing process as only relevant portions of the Transport Stream need to be parsed by the receiver. In comparison, the other PSI tables are each required to be in its own unique bit stream (within Transport Stream packets of a single, unique PID). However, even though an MPEG-2 Program is announced in a TS_program_map_section, there is no requirement in MPEG-2 that the individual Program Elements are currently present in the Transport Stream. Furthermore, there is no MPEG-2 requirement that all PIDs currently in use are described by any PSI table. Whenever an MPEG-2 Program’s bit stream is scrambled (i.e., the contents are only decodable with the use of a conditional access system process), a CAT must be present in the Transport Stream. The CAT associates aspects of the conditional access system (CA system or CAS), such as access rights sent in entitlement management messages (EMMs), with the scrambled streams. Transport Stream packets which contain the CAT are assigned to PID 0x0001. CA systems provide scrambling of MPEG-2 Programs or individual Program Elements along with end user authorization. While MPEG-2 Programs or individual Program Elements may be scrambled, all of the tables that comprise the PSI are never scrambled. The MPEG standards do not define the contents of the CAT payload. For details of how the ATSC defines CA, see ATSC standard A/70 [7]. The function of the Network Information Table (NIT) is to carry information that applies network-wide (i.e., to all Transport Stream service multiplexes in the delivery/emission network). ATSC standards do not specify the use of the NIT. The function of the Transport Stream Description Table (TSDT) is to carry descriptors that apply to an entire MPEG-2 Transport Stream service multiplex. A/53 neither constrains nor specifies the use of the TSDT. 54 ATSC Guide to Use of the ATSC DTV Standard 4 December 2003 7.4.4 MPEG-2 Packetized Elementary Stream (PES) Packet The MPEG-2 Systems Standard includes a mechanism for efficiently and reliably conveying continuous streams of data (bit streams of compressed audio, compressed video, and/or data) in real-time over a variety of network environments, including terrestrial broadcasting. Each bit stream (Program Element) is segmented into variable-length packets, called Packetized Elementary Stream (PES) packets, which are conveyed in the MPEG-2 Transport Stream and then reassembled at the receiver. MPEG-2 PES packets are used to segment and encapsulate elementary streams such as coded video, coded audio, and private data streams, along with stream synchronization information. Elementary streams are each independently carried in separate PES packets; thus, a PES packet contains data from one and only one elementary stream. A PES packet is further segmented into fixed-length packets, called MPEG-2 Transport Stream packets (see Section 7.3.1). The set of TS packets so created all share a single, common packet identifier (PID). The MPEG-2 PES packet consists of a PES packet header followed by the PES packet payload. Each PES packet may have a variable length. A length field allows explicitly signaling the size of the PES packet (up to 65,536 Bytes) or, in the case of video elementary streams, the size may be indicated as unbounded by setting the packet length field to zero. When encapsulating data into a PES packet, the elementary stream is first segmented into variable byte- sized segments and these segments are encapsulated using the MPEG-2 PES packet syntax. ATSC Standard A/53 has placed constraints on PES packets that encapsulate video elementary streams: an MPEG-2 PES packet may only contain one coded video frame and must be signaled as being unbounded in size by defining the length field as 0x0000. MPEG-2 PES packets carry stream synchronization information in the PES packet header using Presentation Time Stamps (PTS) and Decoding Time Stamps (DTS) fields. The timestamps enable decoding the access units and presenting the access units respectively. The PTS and the DTS are each 33-bits long with units in 90 kHz clock periods. 7.4.4.1 MPEG-2 PES Packet Segmentation, Encapsulation, and Packetization In order to transport an MPEG-2 PES packet, it is first segmented into the payload of one or more MPEG-2 Transport Stream packets (see Section 7.3.1). The first byte of a PES packet must always be the first byte of a Transport Stream packet payload field. When the first byte of a PES packet appears in an MPEG-2 Transport Stream packet, the MPEG-2 Transport Stream packet header’s payload_unit_start_indicator flag must be set to ‘1’. The payload_unit_start_indicator is set to ‘0’ in all subsequent MPEG-2 Transport Stream packets carrying the remaining portion of the PES packet. PES packets are typically much larger than an MPEG-2 Transport Stream packet; however, they can be smaller than an MPEG-2 Transport Stream packet. Only a single PES packet may be packetized into an MPEG-2 Transport Stream packet. 7.4.4.2 Stuffing and the MPEG-2 PES Packet Since the MPEG-2 Transport Stream is composed of autonomous units of Transport Stream packets, “stuffing” is needed when there is insufficient PES packet data to completely fill a Transport Stream packet payload. “Stuffing” is the process of filling out the remainder of a Transport Stream packet with data bytes that carry no useful information, but only take up the remaining available Transport Stream packet payload bytes. For Transport Stream packets carrying PES packets, stuffing is accomplished by defining an adaptation field longer than the sum of the lengths of the data elements in the adaptation field, so that the payload bytes remaining after the adaptation field exactly accommodate the available PES packet data. This extra space in the adaptation field is filled with stuffing bytes. 55 ATSC Guide to Use of the ATSC DTV Standard 4 December 2003 7.5 Multiplex Concepts The MPEG-2 term “program” corresponds to an individual digital TV channel or data service. The MPEG-2 Systems Standard’s support for multiple channels or services within a single, multiplexed bit stream (known as a multi-program Transport Stream or service multiplex) enables the deployment of practical, bandwidth-efficient digital broadcasting systems. This approach enables the delivery of services at various bit rates in one defined construct. The packet identifier (PID), contained in each Transport Stream packet, is the key to sorting out the components or elements in the Transport Stream. The PID is used to reassemble higher level constructs that make up different bit stream elements within the multiplex and can change from Transport Stream packet to packet. This identification mechanism enables the time-based interleaving or multiplexing of services at differing bit rates. For example, video essence typically requires a much higher bit rate than audio essence. A series of Transport Stream packets identified by the same PID contain either a Program Element or descriptive information about one or more Program Elements (a series of Transport Stream packets having the same PID is often referred to as a bit stream). The MPEG-2 Systems standard has set aside a few special PIDs to directly identify Transport Stream packets that contain constructs that assist in locating the individual MPEG-2 Programs and their associated Program Elements. These constructs are collectively called Program Specific Information (PSI). A related set of one or more Program Elements is called an MPEG-2 Program. Figure 7.1 illustrates how two MPEG-2 Programs each consisting of a video and audio Program Element (in these cases each Program Element is also an Elementary Stream) might be multiplexed into an MPEG-2 Transport Stream. The Transport Stream packet payload contents are reassembled into a higher level construct (with different packet sizes and structure). For coded audio and video, this higher layer of packetization is called a Packetized Elementary Stream (PES) packet (see Section 7.4.4). 56 ATSC Guide to Use of the ATSC DTV Standard 4 December 2003 Figure 7.1 MPEG-2 transport stream program multiplex. In Figure 7.1, Program P1’s video stream is illustrated to consist of three MPEG-2 Transport Stream packets identified by PID 0x1024. Each MPEG-2 Transport Stream packet has a continuity_counter associated with the specific PID that enables a receiver to determine if a loss has occurred. In this example, the continuity_counter values begin at 0x3 and end with 0x5 for Program P1’s video stream. The individual MPEG-2 Transport Stream packets that contain this PID are extracted from the multiplexed bit stream and reassembled, in this case making up part of an MPEG-2 PES packet carrying a video elementary stream. Program P1 also has an associated audio stream of packets identified by PID 0x1025. Two MPEG-2 Transport Stream packets from Program P1’s audio stream are shown with the continuity_counter values of 0x2 and 0x3 respectively. Similarly, in Figure 7.1, Program P2’s packet composition is illustrated. In Program P2’s video stream identified by PID 0x0377, the second to last MPEG-2 Transport Stream packet’s continuity_counter is 0xB rather than the expected value of 0x9. This condition may indicate an error and the loss of possibly three MPEG-2 Transport Stream packets having this PID. The next expected and received continuity_counter value is 0xC as illustrated in the diagram. 57 ATSC Guide to Use of the ATSC DTV Standard 4 December 2003 As discussed later, the mechanism for recreating the original System Time Clock (STC) in the decoder uses the actual arrival time of the packets carrying the individual Program Clock References (PCR) as compared to the value carried in the PCR field. Because of this, MPEG-2 Transport Stream packets with a given PID value cannot casually be rearranged in the MPEG-2 Transport Stream. This limitation exists because shifting the relative location of a Transport Stream packet carrying the PCR introduces jitter into the data stream, which may cause the decoder’s System Time Clock (STC) to vary. The temporal location of the individual MPEG-2 Transport Stream packet payload delivery conforms to the buffer model associated with the encapsulation type. Shifting or rearranging the MPEG-2 Transport Stream packets potentially causes buffer model violations by either overflowing or underflowing the buffer, unless such is done without violation of these constraints. Also notice the Null MPEG-2 Transport Stream packets that were interleaved. These MPEG- 2 Transport Stream packets (identified by PID 0x1FFF) may appear anywhere in the stream and are often used to set the Transport Stream service multiplex at a known, fixed overall bit rate, regardless of the total bit rate of all the MPEG-2 programs it contains. For illustrative purposes, a value of 0x03 is shown in the figure for the continuity_counter for the Null packets. In practice any value may be used, as the continuity_counter for Null packets is ignored. 7.6 MPEG-2 Timing and Buffer Model Key elements of the MPEG-2 Systems Standard include a model for system timing and another for buffering. The timing model allows the synchronization of the components making up MPEG-2 programs. The buffer model ensures interoperability between encoders and decoders for information delivery (i.e., ensuring that the necessary information is always available when needed for decoding). 7.6.1 MPEG-2 System Timing One of the basic concepts of the MPEG-2 standards revolves around the system timing model. The timing model was developed to enable the synchronization of video and audio Program Elements that are delivered as separate streams, with differing delivery rates and different sized Presentation Units. As will be discussed below, elements that enable the synchronization are clock references, which allow the decoder to recreate a clock that very closely tracks that used in the encoder, and time stamps, which are used to temporally coordinate the presentation of video and audio Presentation Units. This basic timing model is applicable to other forms of Program Elements, including data. 7.6.1.1 Timing Model The MPEG-2 timing model requires that the clock used to encode the content be regenerated (within specified tolerances) at the receiver and used to decode the content. Video and audio consist of discrete Presentation Units, which must be delivered from the decoder at the same rate as they entered the encoder in order to achieve correct reproduction. For video, the Presentation Unit is a picture (a frame or field of video). For audio, the Presentation Unit is a block of audio samples (also known as an audio frame). The Presentation Unit for data is dependant upon the form of the data, but the basic concept is similar. The output rate at the decoder must match the input rate at the encoder. 58 ATSC Guide to Use of the ATSC DTV Standard 4 December 2003 Constant Delay Video In Encoder Buffer Constant Buffer Decoder Video Out Delay Audio In Encoder Buffer Multiplexer Demultiplexer Buffer Decoder Audio Out Transmission Data In Encoder Buffer Buffer Decoder Data Out Variable Variable Delay Delay Figure 7.2 MPEG-2 constant delay buffer model. In developing the timing model, the MPEG-2 Systems Standard adopted two basic concepts: a constant end-to-end delay and an instantaneous decoding process (see Figure 7.2). The MPEG- 2 systems standard does not specify how the encoders or decoders operate; rather, it specifies the format of the bit stream (the syntax and semantics) and a theoretical decoding buffer model. With these concepts applied to the bit stream, it is possible to develop implementations of both encoders and decoders that consider real-world constraints and will interoperate. In real systems, the delay through the encoding and decoding buffers is variable [13] and the decoding process takes a finite, non-zero and possibly variable, amount of time. The MPEG-2 Systems Standard’s timing and buffer models solve the issues of synchronization of individual elements by use of a common time reference shared by all individual Program Elements of an MPEG-2 Program. This common time clock is referred to as the System Time Clock (STC). 7.6.1.2 System Time Clock (STC) The System Time Clock (STC) is the master clock reference for all encoding and decoding processes. Each encoder samples the STC as needed to create timestamps associated with the data’s Presentation Units. A timestamp associated with a Presentation Unit is referred to as the Presentation Time Stamp (PTS). A timestamp associated with the decoding start time, known as Decoding Time Stamp (DTS), may also appear in the bit stream. The STC is not a normative element in the MPEG-2 Systems standard; however, it is required for synchronized services (including video and audio), meaning that all practical implementations require its use. The STC is represented by a 42-bit counter in units of 27 MHz (27 MHz equals approximately 37 ns per clock period). The STC must be recreated in the decoder in such a way that it very closely matches (within specified tolerances) the STC at the encoder for both buffer management and synchronization reasons. In order for a decoder to reconstruct this clock, the STC is periodically sampled and transmitted in the MPEG-2 Transport Stream packet’s adaptation_field, as clock references known as Program Clock References (PCRs). Figure 7.3 illustrates a general decoder circuit used to recreate the STC. 59 ATSC Guide to Use of the ATSC DTV Standard 4 December 2003 Clock Control Low Voltage 27 MHz Difference Pass Controlled System Clock Filter Oscillator System Timc Clock (STC) Counter PCR Load Figure 7.3 MPEG-2 system time clock. Each MPEG-2 Program may have its own STC or multiple MPEG-2 Programs may share a common STC (by referring to the same Program Element that carries the PCR values). There may be situations where an MPEG-2 Program does not require any form of synchronization and will not need an STC. Also, Program Elements may or may not reference a Program’s STC. The STC increases linearly in time and monotonically. The exception is when there are discontinuities, which are discussed below. Since the STC value is contained within a finite size field, it wraps back to zero when the maximum bit count is achieved, approximately every 26.5 hours. 7.6.1.3 System Clock Frequency The System Time Clock is derived from the system_clock_frequency specified as 27,000,000 Hz ± 810 Hz. The STC period is 1/27 MHz or approximately 37 ns per clock period. 7.6.1.4 Program Clock Reference The Program Clock Reference (PCR) is a 42-bit value used to lock the decoder’s 27 MHz clock to the encoder’s 27 MHz clock, thereby matching the decoder’s STC to the encoder’s STC. The PCR is carried in the MPEG-2 Transport Stream packet’s adaptation_field using the program_clock_reference_base and the program_clock_reference_extension fields. The MPEG-2 Systems standard mandates that the PCR be sent at least every 100 ms or 10 times a second. The PCR may be sent more frequently if desired. In addition, the standard limits the amount of PCR jitter for a compliant stream to no more than ±500 ns. The decoder uses the arrival time of the MPEG-2 Transport Stream packet carrying a PCR value, and the PCR value itself, in comparison to the current value of the STC to adjust the clock control component. Figure 7.3 illustrates an example of how the PCR is used to exactly recreate the STC. The program_clock_reference_base is constructed by dividing the value of the 27 MHz clock reference count by 300. This operation creates a 33-bit value in units of 90 kHz clock periods. The program_clock_reference_extension contains the remainder of the previous division (i.e., the 27 MHz clock modulo 300). The location of the Program Element carrying the PCR for an MPEG-2 Program is signaled in the TS_program_map_section PCR_PID field. The PCR may be carried on the same PID as a video, audio, or data Program Element as the PCR field is independent of the encapsulated data 60 ATSC Guide to Use of the ATSC DTV Standard 4 December 2003 payload. Different MPEG-2 Programs may share the same STC, by referring to the same PCR_PID. MPEG-2 Programs not requiring synchronized decoding and presentation to an STC set the PCR_PID field to the value 0x1FFF indicating that there is not a Program Element carrying a PCR. 7.6.1.5 Presentation Time Stamp (PTS) The Presentation Time Stamp (PTS) is a 33-bit quantity measured in units of 90 kHz clock periods (approximately 11.1 microsecond ticks) carried in the MPEG-2 PES packet header’s PTS or DTS fields. The PTS, when compared against the System Time Clock (STC), indicates when the associated Presentation Unit should be “presented” to the viewer. In the case of video, a picture is displayed and in the case of audio the next audio frame is emitted by the receiver. The PTS must be contained in the MPEG-2 Transport Stream at intervals no longer than 700 ms and the ATSC requires that the PTS be inserted at the beginning of every access unit (i.e., coded picture or audio frame). The PTS, when included, is divided into two fifteen-bit quantities and a 3-bit quantity spread across 36 bits. There are also three “marker bits”, always set to ‘1’, interspersed among the three groups. This division into three parts, along with the inclusion of the marker_bits, avoids start_code emulation in the MPEG-2 PES packet header. Avoiding the emulation of the start_code prevents decoders from incorrectly identifying the start of an elementary stream. Figure 7.4 illustrates the PTS and marker_bits. 33 bit PTS MSB LSB 32...30 29 ........................................... 15 14 ........................................ 0 32...30 29 ........................................... 15 14 ........................................ 0 Marker Marker Marker Bit Bit Bit Figure 7.4 The MPEG-2 PTS and marker_bits. 7.6.1.6 Decoding Time Stamp (DTS) The Decoding Time Stamp (DTS) is a 33-bit quantity, measured in units of 90 kHz clock periods (approximately 11.1 microseconds) that may be carried in the MPEG-2 PES packet header’s DTS field. The MPEG-2 Systems Standard only defines a normative meaning for the DTS field for video. Generally speaking, a video stream is the only stream type that may need the DTS due to picture re-ordering (bi-directionally interpolated pictures are decoded after the “future” frame it references has been decoded. The DTS value, compared to the System Time Clock (STC), indicates when the access unit should be removed from the buffer and decoded. The DTS must always be accompanied by a PTS. If the DTS contains the same value as the PTS, then the DTS is omitted and the decoder assumes that the DTS is equal to the PTS. The DTS, if present, must be contained in the MPEG-2 Transport Stream at intervals no longer than 61 ATSC Guide to Use of the ATSC DTV Standard 4 December 2003 700 ms. The ATSC mandates that the DTS must be inserted at the beginning of every access unit (i.e., coded picture or audio frame), except when the DTS value matches the PTS value. The DTS is encoded in the same manner as the PTS—splitting the 33-bit quantity into three portions and incorporating the marker bits. 7.6.1.7 Discontinuities MPEG-2 Program and MPEG-2 Transport Stream discontinuities are a reality in digital television. Planned discontinuities, where the interruption is not the result of an error, can occur in any number of situations. As an example, the splicing of a commercial into the video and audio streams is a typical planned discontinuity scenario. Other planned discontinuity scenarios include switching between content sources or a new MPEG-2 Program commencing. In each of these cases, the System Time Clock (STC) may be interrupted and set to some new random value from which the count then continues, thus creating a discontinuity in the timeline. The decoder, in all of the above instances, should be notified of the upcoming interruption by the MPEG-2 PES packet header’s discontinuity_indicator. The discontinuity_indicator is used to indicate a discontinuity in the STC or a disruption in the continuity_counter. The signaling of continuity_counter disruptions via the discontinuity_indicator is limited in its practical usefulness. The discontinuity_indicator can be used by the multiplexing process to indicate a known and expected discontinuity in the program time line. MPEG-2 Transport Stream packet loss is not signaled via this indicator bit. An STC interruption is the result of a receiver receiving a new PCR value associated with the MPEG-2 Program that is out of range of a reasonable variance from the expected value, regardless of whether or not the discontinuity_indicator had been set. Receivers receiving the next PCR after either an explicit discontinuity or a PCR out of range must adjust themselves accordingly. In cases where a discontinuity has been signaled explicitly, a receiver typically will simply use the next PCR value received to reset its internal clock phase circuitry without making any frequency adjustments. In cases where a discontinuity has not be signalled explicitly, a receiver typically will begin a clock-error recovery process. This may include tracking PCR values during a predefined time window to make an “intelligent” determination of what adjustments need to be made to the STC, if any. Besides the STC reference changing, another discontinuity that may be encountered as part of the stream changeover or Program interruption involves the MPEG-2 Transport Stream packet header’s continuity_counter value. The continuity_counter may skip to a new value when the newly encoded stream is inserted. Thus, the decoder upon seeing the discontinuity_indicator is made aware of an upcoming continuity_counter change and this change should not be treated as an error or indicative of lost packets. 7.6.2 Buffer Model The MPEG-2 standards define the bit stream syntax itself and the meaning (or semantics) of the bit stream syntax. In order to ensure interoperability of equipment designed to the specifications, the MPEG-2 Systems Standard also precisely specifies the exact definitions of byte arrival and decoding events—and the times at which these occur--through the use of a hypothetical decoder called the Transport Stream System Target Decoder (T-STD). The T-STD is a conceptual model that is used solely for the purpose of defining terms precisely and to model the decoding process during construction or verification of Transport Streams; neither the architecture of the T-STD nor the timing described is meant to preclude the design of practical solutions that implement the buffer and timing model in other ways. The buffers are therefore “virtual” since they may or may not exist within real physical decoders. The T-STD model is structured as follows: 62 ATSC Guide to Use of the ATSC DTV Standard 4 December 2003 • Defines several “virtual buffers” • Rules for when bytes enter and leave each buffer • Rules that constrain buffer fullness • The number of “virtual buffers” in the model varies depending upon the number of streams in the Transport Stream For video elementary streams, the T-STD consists of three buffers: the transport buffer, the multiplex buffer, and the elementary stream buffer. For audio elementary streams and system- related streams (e.g., PSI tables), the T-STD consists of two buffers, the transport buffer and the main buffer. The rules define when bytes enter and leave the buffers in terms of where they occur within the bit stream and either the transport rate or the specified maximum bit rate for the type of elementary stream, depending on the buffer type and the elementary stream type. The transport rate computation is determined by a mathematical formula based on the program clock reference fields encoded in the bit stream, and the maximum bit rate is determined from the profile and level or similar inherent definition. Buffer sizes are defined by specific mathematical formulas based on the buffer type and the elementary stream type. The decoding time is specified in terms of embedded or inferred decoding or presentation time stamps (DTS or PTS, respectively) and may be delayed due to any re-ordering of pictures that is needed (in the case of video elementary streams only). Buffer management is needed to ensure that none of the buffers overflow or, in some cases, underflow. Constraints on buffer fullness say whether a particular buffer is allowed to underflow, whether a buffer must empty occasionally, etc. These rules are all clearly defined in the T-STD model. A suitable analysis tool can verify whether or not a bit stream conforms to the T-STD. It is more difficult to verify that a decoder conforms to the T-STD because true conformance may only be determined by demonstrating that the decoder is capable of decoding all conformant bit streams properly. However, several organizations have prepared special bit streams to stress decoder implementations. These known MPEG-conformant bit streams cause buffers to fill near their capacity, to operate near-empty, and/or to require high-speed transmission between buffers. Receiver implementers are advised to test as many combinations as possible. 7.7 Supplemental Information The remainder of this section describes additional concepts that are important in the understanding of the ATSC transport subsystem. 7.7.1 MPEG-2 Descriptors The MPEG-2 descriptor is a generic structure used to carry information within other MPEG-2 data structures, typically sections (PSI or private). The use of standardized descriptors is often optional. Descriptors can be viewed as a mechanism to extend the information conveyed within another MPEG-2 structure. A descriptor cannot stand alone in the MPEG-2 Transport Stream; rather, it must be contained within a larger syntactic structure, typically within a descriptor loop (an area set aside to carry an arbitrary number of descriptors). The basic descriptor format is a tag byte, followed by a length byte followed by data [13]. The tag byte uniquely identifies the descriptor and the length byte specifies the number of data bytes that immediately follow the length field. The form of the data varies for each specific descriptor. In future versions of a given ATSC standard, additional information may be added to any defined descriptor by simply adding new semantic fields at the end. Receiver designers should always process the length field of all descriptors to ensure that if a receiver finds any information 63 ATSC Guide to Use of the ATSC DTV Standard 4 December 2003 beyond the known fields, then it discards such information but continues parsing the stream at the first byte beyond that indicated by the length field. Receivers may also encounter descriptors that they do not recognize (such as could be added to a new version of the standard created after the receiver was built). To ensure that newly defined descriptors do not cause operational problems in existing equipment, all descriptors defined will adhere to the existing structure. This provides an inherent escape mechanism to allow receivers that don’t understand a particular descriptor to easily skip over it. By jumping the number of bytes listed in the length field, the receiver can proceed to the next item in the loop. Because a receiver that does not recognize a descriptor of a certain type is expected to simply ignore it, the addition of new features via new descriptor definitions is a powerful way to add new features to the protocol while maintaining backward compatibility. 7.7.1.1 ATSC Descriptors ATSC-defined descriptors follow the same behavior as described previously for MPEG-2 descriptors and may be used for similar purposes. ATSC standards have also described the usage of some MPEG descriptors. The following descriptors are defined by ATSC Standards A/52 [4] and A/53 [5]. 7.7.1.1.1 AC-3 Audio Descriptor An AC-3 Audio Descriptor [4] describes an audio service present in an ATSC Transport Stream. In addition to describing a possible audio service(s) that a broadcaster might send, this descriptor(s) provides the receiver with audio set up information such as whether the program is in stereo or surround sound. This descriptor is optionally present in the program element loop of the TS_program_map_section that describes the AC-3 audio elementary stream. See A/65B [6] for required placements. 7.7.1.1.2 ATSC Private Information Descriptor The ATSC Private Information Descriptor [5] provides a method for carrying private information within this descriptor and for unambiguous identification of its registered owner. Since both the identification and the private information are self-contained within a single descriptor, more than one ATSC private information descriptor may appear within a single descriptor loop. The format identifier field appears in both the ATSC Private Information Descriptor and the MPEG Registration Descriptor. Its purpose is the same in both: it identifies the company or organization that has supplied the associated private data. Only values of the format identifier field registered by the ISO-assigned Registration Authority, the Society of Motion Picture Engineers (SMPTE), may be used. 7.7.1.2 MPEG Descriptors Constrained by ATSC The following descriptors have been defined in MPEG-2 Systems (13818-1) [13], but their usage has been constrained by A/53 [5]. 7.7.1.2.1 Data Stream Alignment Descriptor The ATSC requires this descriptor to be present in the program element loop of the TS_program_map_section that describes the video elementary stream. In this context, the descriptor specifies the alignment of video stream syntax with respect to the start of the PES packet payload. The ATSC has constrained the alignment to be the first byte of the start code for a video access unit (alignment_type 0x02). Because a video access unit follows immediately after a GOP or Sequence header, this does not preclude alignment from the beginning of a GOP or Sequence. It does, however, prevent alignment from being at the start of a slice. 64 ATSC Guide to Use of the ATSC DTV Standard 4 December 2003 The use of this descriptor for other stream types is not defined. 7.7.1.2.2 ISO 639 Language Descriptor In the ATSC digital television system, if the ISO_639_language_descriptor (defined in ISO/IEC 13818-1 Section 2.6.18 [13]) is present then it is used to indicate the language of audio Elementary Stream components. If present, then the ISO_639_language_descriptor is included in the descriptor loop immediately following the ES_info_length field in the TS_program_map_section for each Elementary Stream of stream_type 0x81 (AC-3 audio). This descriptor will be present when the number of audio Elementary Streams in the TS_program_map_section having the same value of bit stream mode (bsmod in the AC-3 Audio Descriptor) is two or more. As an example, consider an MPEG-2 program that includes two audio ES components: a Complete Main (CM) audio track (bsmod = 0) and a Visually Impaired (VI) audio track (bsmod = 2). Inclusion of the ISO_639_language_descriptor is optional for this program. If a second CM track were to be added, however, it would then be necessary to include ISO_639_language_descriptors in the TS_program_map_section. The audio_type field in any ISO_639_language_descriptor used ATSC standards is set to 0x00 (meaning “undefined”). An ISO_639_language_descriptor may be present in the TS_program_map_section in other positions as well, for example to indicate the language or languages of a textual data service program element. 7.7.1.2.3 MPEG-2 Registration Descriptor Under certain circumstances, the MPEG-2 Registration Descriptor (MRD) is used to provide unambiguous identification of privately defined fields or private data bytes in associated syntactical structures. The detailed rules for the use of the MRD are found in A/53 [5]. Note that no more than one MRD should appear in any given descriptor loop, since the semantics of this situation are unspecified. This usage restriction does not apply to the ATSC private information descriptor discussed previously. The MRD does not contain the private data itself, while the ATSC private information descriptor is designed to carry the actual private data. 7.7.1.2.4 Smoothing Buffer Descriptor A/53 requires the TS_program_map_section that describes each program to have a Smoothing Buffer Descriptor pertaining to that program [5]. This descriptor signals the required size and leak rate of the smoothing buffer (SBn) to avoid errors in decoding that could be caused by over- or under- flow. During the continuous existence of a program, the value of the elements of the Smoothing Buffer Descriptor are not allowed to change. 7.7.2 Code Point Conflict Avoidance MPEG standards have numerous syntactical fields set aside for private use. When fields, tags, and table identifier fields are assigned a value by MPEG or by an MPEG user, such as ATSC, the values are then known as “code points.”In addition, many other fields have ranges defined as user private. The user (not a standards body) may define one or more of these private fields, tags, and table values. Without some type of coordination mechanism, use of ATSC user private fields and ranges may lead to conflicts between privately defined services. Furthermore, without some form of scoping and registration, different organizations may inadvertently choose to use the same values for these fields, but with different meanings for the semantics of the information carried. The ATSC Digital Television Standard A/53 [5] has placed constraints on the use of 65 ATSC Guide to Use of the ATSC DTV Standard 4 December 2003 private fields and ranges to avoid code point conflicts, through the use of the MPEG-2 Registration Descriptor mechanism. If an organization uses user private fields and/or ranges, to comply with the ATSC standards, one or more MRDs are used as described in A/53. 7.7.2.1 The MPEG-2 Registration Descriptor MPEG-2 Systems defines a registration descriptor: “The registration_descriptor provides a method to uniquely and unambiguously identify formats of private data.” The Society of Motion Picture and Television Engineers (SMPTE) is the ISO-designated registration authority for the 32 bit format_identifier field carried within this descriptor, guaranteeing that every assigned value will be unique. The following sections discuss the use of the MRD to avoid collisions. There are some circumstances where an MRD cannot be used for scoping (for example, the MRD has no significance for other descriptors in the same descriptor loop). The ATSC Private Information Descriptor (described previously) has been defined to allow the carriage of private information in a descriptor. 7.7.2.1.1 Private Information in an MPEG-2 Program The scoping of the use of private structures within an MPEG-2 Program may be done by placing an MRD in the program loop (see Section 7.7.3.3 for an explanation of loops in MPEG-2 syntax) in the PMT (otherwise known as the “outer” loop—the descriptor loop following the program_info_length field). When used in this location, the scope of the MRD is the entire MPEG-2 Program, meaning all of the program elements defined in this instance of the Program Map Table. When the MRD is used to identify the owner of private data, then the identification applies to all program elements comprising the MPEG-2 Program. 7.7.2.1.2 Private Information in an MPEG-2 Program Element MRDs may be placed in the program element loop in the PMT (otherwise known as the “inner” loop—the descriptor loop following the es_info_length field). When used in this location, the scope of the MRD is the individual program element to which the MRD is bound. When the MRD is used to identify the owner of private data, then the identification applies to the single program element. The scope of the MRD also covers the stream_type used for this program element, in the case that a privately defined stream_type is used. 7.7.2.1.3 Multiple MRDs At most, one MRD for any entity at any level will appear; in other words, no more than one MRD will appear in the PMT program loop; no more than one MRD will appear in the PMT program element loop for a particular program element. There is no guarantee of how remultiplexing equipment will behave in the presence of multiple MRDs in a single loop, especially in regards to retaining the original ordering of descriptors in the loop. Multiple MRD’s at the same level would be ambiguous to a receiver. MRDs used at different levels are intended to be complimentary, with a deeper level MRD refining the meaning of a higher level MRD. However, certain combinations of MRDs at different levels may result in streams that may cause problems for standard receivers if the combinations are not expected. As an example, a combination of MRDs that identify the program as defined by company X and a particular program element as defined by company Y would lead to contradictions in interpreting semantic elements. The behavior of a receiver upon 66 ATSC Guide to Use of the ATSC DTV Standard 4 December 2003 receiving a non-conformant stream of this type cannot be specified and construction of streams of this type should be avoided. 7.7.3 Understanding MPEG Syntax Tables ATSC and MPEG-2 standards use a common convention for specifying how to construct the data structures defined in the standards. This convention consists of a table specifying the syntax (the in-order concatenation of the fields), following by a section specifying the semantics (the detailed definitions of the syntax fields). The syntax is specified using C-language “like” (“C- like”) constructs, meaning statements that take the form of the computer language, but would not necessarily be expected to produce reasonable results if run through a compiler. The tables typically have three columns, as shown in the fragment in Table 7.1: • Syntax: The name of the field or a “C-like” construct • No. of Bits: The size of the field in bits. • Format: Either how to order the bits in the field (an acronym or mnemonic is used, which is defined earlier in the standard—in this example, uimsbf means unsigned integer, most significant bit first)—or, when the field has a pre-defined value, the value itself (typically in either binary or hex notation). Table 7.1 Table Format Syntax No. of Bits Format typical_PSI_table() { table_id 8 uimsbf section_syntax_indicator 1 ‘1’ …. … … } It should be noted that when the data structures are constructed, the fields are concatenated using big-endian byte-ordering. This means that for a multi-byte field, the most significant byte is encountered first. A common practice for implementation is to step through the syntax structure and copy the values for the fields to a memory buffer. The end result of this type of operation may vary for multi-byte fields, depending upon the computer architecture. Implementors are cautioned when working with little-endian machines (least significant byte encountered first). It is therefore recommended that implementations use byte-oriented instructions to construct the data (i.e., mask and shift operations). 7.7.3.1 Formatting The curly-bracket characters (‘{‘ and ‘}’) are used to group a series of fields together. In the sample shown in Table 7.1, the curly-bracket characters are used to indicate that all of the fields between the paired curly-brackets belong to the “typical_PSI_table()”. For the conditional and loop statements that follow below, curly-bracket pairs are used to indicate the fields affected by either the conditional or loop statements. The syntax column uses indentation as an aid to the reader (in a similar fashion to a common convention when writing C-code). When a series of fields is grouped, then the convention is to indent them. 67 ATSC Guide to Use of the ATSC DTV Standard 4 December 2003 7.7.3.2 Conditional Statements In many of the constructs, a series of fields is included only if certain conditions are met. This situation is indicated using an “if (condition) { }” statement as shown in Table 7.2a. When this type of statement is encountered, the fields grouped by brackets are included only if the condition is true. Table 7.2a IF Statement Syntax No. of Bits Format typical_PSI_table() { field_1 8 uimsbf if (condition) { field_2 8 uimsbf field_3 8 uimsbf } …. … … } As with C-code, an alternate path may be indicated by an “else” statement, as illustrated in Table 7.2b: Table 7.2b IF Statement Syntax No. of Bits Format typical_PSI_table() { field_1 8 uimsbf if (condition) { field_2 8 uimsbf } else { field_3 8 uimsbf } …. … … } If the condition is true, then field_2 is used; otherwise, field_3 is used. 7.7.3.3 Loop Statements For-loop statements are commonly used in the syntax tables and have the widest variation in style and interpretation 9. The for-loop takes the following form: for ( i=0; i