How does MPEG syntax facilitate parallelism ?

For MPEG-1, slices may consist of an arbitrary number of macroblocks. They can be independently decoded once the picture header side information is known. For parallelism below the slice level, the coded bitstream must first be mapped into fixed-length elements. Further, since macroblocks have coding dependencies on previous macroblocks within the same slice, the data hierarchy must be pre-processed down to the layer of DC DCT coefficients. After this, blocks may be independently inverse transformed and quantized, temporally predicted, and reconstructed to buffer memory. Parallelism is usually more of a concern for encoders. In many encoders today, block matching (motion estimation) and some rate control stages (such as activity and/or complexity measures) are processed for macroblocks independently. Finally, with the exception that all macroblock rows in Main Profile MPEG-2 bitstreams must contain at least one slice, an encoder has the freedom to choose the slice structure.

What is the MPEG color space and sample precision?

MPEG strictly specifies the YCbCr color space, not YUV or YIQ or YPbPr or YDrDb or any other many fine varieties of color difference spaces. Regardless of any bitstream parameters, MPEG-1 and MPEG-2 Video Main Profile specify the 4:2:0 chroma_format, where the color difference channels (Cb, Cr) have half the "resolution" or sample grid density in both the horizontal and vertical direction with respect to luminance.

MPEG-2 High Profile includes an option for 4:2:2 chroma_format, as does the MPEG 4:2:2 Profile (a.k.a. "Studio Profile") naturally. Applications for the 4:2:2 format can be found in professional broadcasting, editing, and contribution-quality distribution environments. The drawback of the 4:2:2 format is simply that it increases the size of the macroblock from six 8x8 blocks (4:2:0) to eight, while increasing the frame buffer size and decoding bandwidth by the same amount (33 %). This increase places the buffering memories well past the magic 16-Mbit limit for semiconductor DRAM devices, assuming the pictures are stored with a maximum of 414,720 pixels (720 pixels/line x 576 lines/frame). The maximum allowable pixel resolution could be reduced by 1/3 to compensate (e.g. 544 x 576). However, if a hardware decoders operate on a macroblock basis in the pipeline, on-chip static memories (SRAM) will increase by 1/3. The benefits offered by 1/3 more pixels generally outweighs full vertical chrominance resolution. Other arguments favoring 4:2:0 over 4:2:2 include:

Vertical decimation increases compression efficiency by reducing syntax overhead posed in an 8 block (4:2:2) macroblock structure.

You're compressing the hell out of the video signal, so what possible difference can the 0:0:2 chromiance high-pass make?

Is 4:2:0 the same as 4:1:1 ?

No, no, definitely no. The following table illustrates the "nuances" between the different chroma formats for a typical "CCIR 601" frame with pixel dimensions of 720 pixels/line x 480 lines/frame:

chroma _format	Y samples per line	Y lines per frame	C samples per line	C lines per frame	horizontal subsampling factor	vertical subsampling factor
4:4:4	720	480	720	480	none	none
4:2:2	720	480	360	480	2:1	none
4:2:0	720	480	360	240	2:1	2:1
4:1:1	720	480	180	480	4:1	none
4:1:0	720	480	180	120	4:1	4:1

3:2:2, 3:1:1, and 3:1:0 are less common variations, but have been documented. As shocking as it may seem, the 4:1:0 ratio was used by Intel's DVI for several years.

The 130 microsecond gap between successive 4:2:0 lines in progressive frames, and 260 microsecond gap in interlaced frames, can introduce some difficult vertical frequencies, but most can be alleviated through pre-processing.

What is the sample precision of MPEG ? How many colors can MPEG represent ?

By definition, MPEG samples have no more and no less than 8-bits uniform sample precision (256 quantization levels). For luminance (which is unsigned) data, black corresponds to level 0, white is level 255. However, in CCIR recommendation 601 chromaticy, luminance (Y) levels 0 through 14 and 236 through 255 are reserved for blanking signal excursions. MPEG currently has no such clipped excursion restrictions, although decoder might take care to insure active samples do not exceed these limits. With three color components per pixel, the total combination is roughly 16.8 million colors (i.e. 24-bits).

How are the subsampled chroma samples cited ?

A. It is moderately important to properly co-site chroma samples, otherwise a sort of chroma shifting effect (exhibited as a "halo") may result when the reconstructed video is displayed. In MPEG-1 video, the chroma samples are exactly centered between the 4 luminance samples (Fig 1.) To maintain compatibility with the CCIR 601 horizontal chroma locations and simplify implementation (eliminate need for phase shift), MPEG-2 chroma samples are arranged as per Fig.2.

Y Y Y Y Y Y Y Y YC Y YC Y

C C C C

Y Y X Y Y Y Y Y YC Y YC Y

Y Y Y Y Y Y Y Y YC Y YC Y

C C C C

Y Y Y Y Y Y Y Y YC Y YC Y

Fig.1 MPEG-1 Fig.2 MPEG-2 Fig.3 MPEG-2 and

4:2:0 organization 4:2:0 organization CCIR Rec. 601

4:2:2 organization

How do you tell an MPEG-1 bitstream from an MPEG-2 bitstream ?

A. All MPEG-2 bitstreams must contain specific extension headers that immediately follow MPEG-1 headers. At the highest layer, for example, the MPEG-1 style sequence_header() is followed by sequence_extension(). Some extension headers are specific to MPEG-2 profiles. For example, sequence_scalable_extension() is not allowed in Main Profile bitstreams.

A simple program need only scan the coded bitstream for byte-aligned start codes to determine whether the stream is MPEG-1 or MPEG-2.

What are start codes?

These 32-bit byte-aligned codes provide a mechanism for cheaply searching coded bitstreams for commencement of various layers of video without having to actually parse variable-length codes or perform any decoder arithmetic. Start codes also provide a mechanism for re-synchronizing in the presence of bit errors. A start code may be preceded by an arbitrary number of zero bytes. The zero bytes can be use to guarantee that a start code occurs within a certain location, or by rate control to increase the bitrate of a coded bitstream.

Coded block pattern

Coded block pattern:

(CBP --not to be confused with Constrained Parameters!) When the frame prediction is particularly good, the displaced frame difference(DFD, or temporal macroblock prediction error) tends to be small, often with entire block energy being reduced to zero after quantization. This usually happens only at low bit rates. Coded

block patterns prevent the need for transmitting EOB symbols in those zero coded blocks. Coded block patterns are transmitted in the macroblock header only if the macrobock_type flag indicates so.

Why is the DC value always divided by 8 ?

Clarification point: The DC value of Intra coded blocks is quantized by a constant stepsize of 8 only in MPEG-1, rendering the 11-bit dynamic range of the IDCT DC coefficient to 8-bits of accuracy. MPEG-2 allows for DC precision of 8, 9, 10, or 11 bits. The quantization stepsize is fixed for the duration of the picture, set by the intra_dc_precision flag in the picture_extension_header().

Why is there a special VLC for DCT_coefficient_first:?

Since the coded_block_pattern in NON-INTRA macroblocks signals every possible combination of all-zero valued and non-zero blocks, the dct_coef_first mechanism assigns a different meaning to the VLC codeword (run = 0, level =+/- 1) that would otherwise represent EOB (10) as the first coefficient in the zig-zag ordered Run-Level token list.

What's the deal with End of Block ?

Saves unnecessary run-length codes. At optimal bitrates, there tends to be few AC coefficients concentrated in the early stages of the zig-zag vector. In MPEG-1, the 2-bit length of EOB implies that there is an average of only 3 or 4 non-zero AC coefficients per block. In MPEG-2 Intra (I) pictures, with a 4-bit EOB code in Table 1, this estimate is between 9 and 16 coefficients. Since EOB is required for all coded blocks, its absence can signal that a syntax error has occurred in the bitstream.

What's this "Macroblock stuffing," and why do people hate it?:

A genuine pain for VLSI implementations, macroblock stuffing was included in MPEG-1 to maintain smoother, constant bitrate control for encoders. However, with normalized complexity/activity measures and buffer management performed a priori (before coding of the macroblock, for example) and local monitoring of coded data buffer levels now a common operation in encoders, (e.g. MPEG-2 encoder Test Model), the need for such localized bitrate smoothing evaporated. Stuffing can be achieved through slice start code padding if required. A good rule of thumb is: if you find often yourself wishing for stuffing more than once per slice, you probably don't have a very good rate control algorithm. Nonetheless, to avoid any temptation, macroblock stuffing is now illegal in MPEG-2 (A general syntax restriction brought to you by the Implementation Studies Subgroup!)

What's the deal with slice_vertical_position and macroblock_address_increment?

The absolute position of the first macroblock within a slice is known by the combination of slice_vertical_position and the macroblock_address_increment. Therefore, the proper place of a lost slice found in a highly corrupt bitstream can be located exactly within the picture. These two syntax elements are also the only known means of detecting slice gaps----areas of the picture which are not represented with any information (including skipped macroblocks). A slice gap occurs when the current macroblock address of the first macroblock in a slice is greater than the previous macroblock address by more than 1 macroblock unit. A slice overlap occurs when the current macroblock address is less than or equal to the previous macroblock's address. The previous macroblock in both instances is the last known macroblock within the previous slice. Because of the semantic interpretation of slice gaps and overlaps, and because of the syntactic restrictions for slice_vertical_position and macroblock_address_increment, it is not syntactically possible for a skipped macroblock to be represented in the first and last positions of a slice. In the past, some (bad) encoders would attempt to signal a run of skipped macroblocks to the end of the slice. These evil skipped macroblocks should be interpreted by a compliant decoder as a gap, not as a string of skipped macroblocks.

What is meant by "modified Huffman VLC tables":

The VLC tables in MPEG are not Huffman tables in the true sense of Huffman coding, but are more like the tables used in Group 3 fax (where the term "modified Huffman tables" was unleashed). They are entropy constrained, that is, non-downloadable and optimized for a limited range of bit rates (sweet spots). A better way would be to say that the tables are optimized for a range of ratios of bit rate to sample rate (e.g. 0.25 bits/pixel to 1.0 bits/pixel). With the exception of a few codewords, the larger tables were carried over from the H.261 standard drafted in the year 1990. This includes the AC run-level symbols, coded_block_pattern, and macroblock_address_increment. MPEG-2 added an "Intra table," also called "Table 1". Note that the dct_coefficient tables assume that positive and negative AC coefficient run-levels are equally probable.

How does MPEG handle 3:2 pulldown?

MPEG-1 video decoders had to decide for themselves when to perform 3:2 pulldown if it was not indicated in the presentation time stamps (PTS) of the Systems layer bitstream. MPEG-2 provides two flags (repeat_first_field, and top_field_first) which explicitly describe whether a frame or field is to be repeated. In progressive sequences, frames can be repeated 2 or 3 times. Simple and Main Profile limit are limited to repeated fields only. It is a general syntactic restriction that repeat_first_field can only be signaled (value ==1) in a frame structured picture. It makes little sense to repeat field pictures in an interlaced video signal since the whole process of 3:2 pulldown conversion was meant to convert progressive, film sequences to the display frame rate of interlaced television.

In the most common scenario, a film sequence will contain 24 frames every second. The bit_rate element in the sequence header will indicate 30 frames/sec, however. On average, every other coded frame will signal a repeat field (repeat_first_field==1) to pad the frame rate from 24 Hz to 30 Hz:

(24 coded frames/sec)*(2 fields/coded frame)*(5 display fields/4 coded fields) = 30 display frames/sec

After all this standardization, what's left for research?

Despite the fact that a comprehensive worldwide standard now exists for digital video, many areas remain wide open for research:

pre-processing: (or "how to fit a square peg in a round hole?").
motion estimation (or "how to efficiently find a good prediction.")
macroblock decision models (efficient, but does it also optimise the ripple effect on subsequent macroblocks ?)
rate control and buffer management in editing environments (MPEG: video only exists within a sequence. Real world: decoder are displaying picture from previous sequence, while reconstructing a picture from the new sequence)
implementation complexity reduction ("let's run it on a Pentium!").

Are some encoders better than others ?

A. Definitely. For example, the motion estimation search range of a has great influence over final picture quality. At a certain point a very large range can actually become detrimental (it may encourage large differential motion vectors, which consume bits). Practical ranges are usually between +/- 15 and +/- 32. As the range doubles, for instance, the search area quadruples. (brain reminder: like the classic relationship between in increase in linear vs. Area ?!?).

Rate control marks a second tell-tale area where some encoders perform significantly better than others.

And finally, the degree of "pre-processing" (now a popular buzzword in the business) signals that the encoder belongs to an elite marketing class.

Is the encoder standardized ?

The encoder rests just outside the normative scope of the standard, as long as the bitstreams it produces are compliant. The decoder, however, is almost deterministic: a given bitstream should reconstruct to a unique set of pictures. However, since the IDCT function is the ONLY non-normative stage in the decoder, an occasional error of a Least Significant Bit per prediction iteration is permitted.

The designer is free to choose among many DCT algorithms and implementations. The IEEE 1180 test referenced in Annex A of the MPEG-1 (ISO/IEC 11172-2) and MPEG-2 (ISO/IEC 13818-2) Video specifications spells out the statistical mismatch tolerance between the Reference IDCT, which is a separable 8x1 "Direct Matrix" DCT implemented with 64-bit floating point accuracy, and the IDCT you are testing for compliance.

What is the TM rate control and adaptive quantization technique ?

A. The Test model (MPEG-2) and Simulation Model (MPEG-1) were not, by any stretch of the imagination, meant to epitomize state-of-the art encoding quality. They were, however, designed to exercise the syntax, verify proposals, and test the relative compression performance of proposals in a timely manner that could be duplicated by co-experimenters. Without simplicity, there would have been no doubt endless debates over model interpretation. Regardless of all else, more advanced techniques would probably trespass into proprietary territory.

The final test model for MPEG-2 is TM version 5b, a.k.a. TM version 6, produced in March 1993 (the time when the MPEG-2 video syntax was "frozen"). The final MPEG-1 simulation model is version 3 ("SM-3"). The MPEG-2 TM rate control method offers a dramatic improvement over the SM method. TM adds more accurate estimation of macroblock complexity through use of limited a priori information. Macroblock quantization adjustments are computed on a macroblock basis, instead of once-per-macroblock row (which in the SM-3 case consisted of an entire slice).

I.How does the TM work?

Rate control and adaptive quantization are divided into three steps:

Step One: Target Bit Allocation

In Complexity Estimation, the global complexity measures assign relative weights to each picture type (I,P,B). These weights (Xi, Xp, Xb) are reflected by the typical coded frame size of I, P, and B pictures (see typical frame size discussion). I pictures are usually assigned the largest weight since they have the greatest stability factor in an image sequence and contain the most "new information" in a sequence. B pictures are assigned the smallest weight since B energy do not propagate into other pictures and are usually more highly correlated with neighboring P and I pictures than P pictures are.

The bit target for a frame is based on the frame type, the remaining number of bits left in the Group of Pictures (GOP) allocation, and the immediate statistical history of previously coded pictures (sort of a "moving average" global rate control, if you will).

Step Two: Rate Control via Buffer Monitoring

Rate control attempts to adjust bit allocation if there is significant difference between the target bits (anticipated bits) and actual coded bits for a block of data. If the virtual buffer begins to overflow, the macroblock quantization step size is increased, resulting in a smaller yield of coded bits in subsequent macroblocks. Likewise, if underflow begins, the step size is decreased. The Test Model approximates that the target picture has spatially uniform distribution of bits. This is a safe approximation since spatial activity and perceived quantization noise are almost inversely proportional. Of course, the user is free to design a custom distribution, perhaps targeting more bits in areas that contain more complex yet highly perceptible data such as text.

Step Three: Adaptive Quantization

The final step modulates the macroblock quantization step size obtained in Step 2 by a local activity measure. The activity measure itself is normalized against the most recently coded picture of the same type (I, P, or B). The activity for a macroblock is chosen as the minimum among the four 8x8 block luminance variances. Choosing the minimum block is part of the concept that a macroblock is no better than the block of highest visible distortion (weakest link in the chain).

Decision:

[deferred to later date]

I.Can motion vectors be used to determine object velocity?

Motion vector information cannot be reliably used as a means of determining object velocity unless the encoder model specifically set out to do so. First, encoder models that optimize picture quality generate vectors that typically minimize prediction error and, consequently, the vectors often do not represent true object translation from picture-to-picture. Standards converters that resample one frame rate to another (as in NTSC to PAL) use different methods (motion vector field estimation, edge detection, et al) that are not concerned with Rate-Distortion theory. Second, motion vectors are not transmitted for all macroblocks anyway.

返回首页|返回问题首页|接下页