Is it possible to code interlaced video with MPEG-1 syntax?

A. Two methods can be applied to interlaced video that maintain syntactic compatibility with MPEG-1 (which was originally designed for progressive frames only). In the field concatenation method, the encoder model can carefully construct predictions and prediction errors that realize good compression but maintain field integrity (distinction between adjacent fields of opposite parity). Some pre-processing techniques can also be applied to the interlaced source video that would, e.g., lessen sharp vertical frequencies.

This technique is not terribly efficient of course. On the other hand, if the original source was progressive (e.g. film), then it is more trivial to convert the interlaced source to a progressive format before encoding. (MPEG-2 would then only offer slightly superior performance through such MPEG-2 enhancements as greater DC coefficient precision, non-linear mquant, intra VLC, etc.) Reconstructed frames are usually re-interlaced in the Display process following the decoding stages.

The second syntactically compatible method codes fields as separate pictures. Rumors have spread that this approach does not quiet work nearly as well as the "pretend it's really a frame" method.

Can MPEG be used to code still frames ?

Yes. MPEG Intra pictures are similar to baseline sequential JPEG pictures.

There are, of course, advantages and disadvantages to using MPEG over JPEG to represent still pictures.

Disadvantages:

1.MPEG has only one color space (YCbCr)

2.MPEG-1 and MPEG-2 Main Profile luma and chroma share quanitzation and VLC tables (4:2:0 chroma_format)

3.MPEG-1 is syntactically limited to 4k x 4k images, and 16k x 16k for MPEG-2.

Advantages:

1.MPEG possesses adaptive quantization which permits better rate control and spatial masking.

2.With its limited still image syntax, MPEG averts any temptation to use unnecessary, expensive, and academic encoding methods that have little impact on the overall picture quality (you know who you are).

3.Philips' CD-I spec. has a requirement for a MPEG still frame mode, with double SIF image resolution. This is technically feasible mostly thanks to the fact that only one picture buffer is needed to decode a still image instead of the 2.5 to 3 buffers needed for IPB sequences.

4.

Why was the 8x8 DCT size chosen?

A. Experiments showed little compaction gains could be achieved with larger transform sizes, especially in light of the increased implementation complexity. A fast DCT algorithm will require roughly double the number of arithmetic operations per sample when the linear transform point size is doubled. Naturally, the best compaction efficiency has been demonstrated using locally adaptive block sizes (e.g. 16x16, 16x8, 8x8, 8x4, and 4x4) [See Gary Sullivan and Rich Baker "Efficient Quadtree Coding of Images and Video," ICASSP 91, pp 2661-2664.].

Inevitably, adaptive block transformation sizes introduce additional side information overhead while forcing the decoder to implement programmable or hardwired recursive DCT algorithms. If the DCT size becomes too large, then more edges (local discontinuities) and the like become absorbed into the transform block, resulting in wider propagation of Gibbs (ringing) and other unpleasant phenomena. Finally, with larger transform sizes, the DC term is even more critically sensitive to quantization noise.

Why was the 16x16 prediction size chosen?

The 16x16 area corresponds to the Least Common Multiple (LCM) of 8x8 blocks, given the normative 4:2:0 chroma ratio. Starting with medium size images, the 16x16 area provides a good balance between side information overhead & complexity and motion compensated prediction accuracy. In gist, experiments showed that the 16x16 was a good trade-off between complexity and coding efficiency.

What do B-pictures buy you?

A. Since bi-directional macroblock predictions are an average of two macroblock areas, noise is reduced at low bit rates (like a 3-D filter, if you will). At nominal MPEG-1 video (352 x 240 x 30, 1.15 Mbit/sec) rates, it is said that B-frames improves SNR by as much as 2 dB. (0.5 dB gain is usually considered worth-while in MPEG). However, at higher bit rates, B- frames become less useful since they inherently do not contribute to the progressive refinement of an image sequence (i.e. not used as prediction by subsequent coded frames). Regardless, B-frames are still politically controversial.

B pictures are interpolative in two ways: 1. predictions in the bi-directional macroblocks are an average from block areas of two pictures 2. B pictures "fill in" like a digital spackle the immediate 3-D video signal without contributing to the overall signal quality beyond that immediate point in time. In other words, a B picture, regardless of its internal make-up of macroblock types, has a life limited only to itself. As mentioned before, B picture energy does not propagate into other frames. In a sense, bits spent on B pictures are wasted.

Why do some people hate B-pictures?

A. Computational complexity, bandwidth, end-to-end delay, and picture buffer size are the four B-frame Pet Peeves. Computational complexity in the decoder is increased since some macroblock modes require averaging between two block predictions (macroblock_motion_forward==1 && macroblock_motion_backward==1).

Worst case, memory bandwidth is increased an extra 15.2 MByte/s (assuming 4:2:0 chroma_format at Main Level), not including any half pel or page-mode overhead) for this extra directional prediction. To really rub it in, an extra picture buffer is needed to store the future reference picture (backwards prediction frame). Finally, an extra picture delay is introduced in the decoder since the frame used for backwards prediction needs to be transmitted to the decoder and reconstructed before the intermediate B-pictures in display order can be decoded.

Cable television have been particularly adverse to B-frames since, for CCIR 601 rate video, the extra picture buffer pushes the decoder DRAM memory requirements past the magic 8- Mbit (1 Mbyte) threshold into the evil realm of 16 Mbits (2 Mbyte).---- although 8-Mbits is fine for 352 x 480 B picture sequence. However, cable often forgets that DRAM does not come in convenient high-volume (low cost) 8- Mbit packages as does friendly 4-Mbit and 16-Mbit packages. In a few years, the cost difference between 16 Mbit and 8 Mbit will become insignificant compared to the bandwidth savings gain through higher compression. For the time being, some cable boxes will start with 8-Mbit and allow future drop-in upgrades to the full 16-Mbit.

How are interlaced and progressive pictures indicated in MPEG?

The following tree may help illustrate the possible layers of progressive and interlaced coding modes. Progressive and interlace bear themselves at different layers of the MPEG bitstream, not just the picture layer?

MPEG-2 sequence

/ \

progressive interlaced sequence

sequence / \

Field picture Frame picture

/ \

Frame or field prediction Frame MB

/ \

Field dct Frame dct

What does it mean to be compliant with MPEG? other than paying your patent royalties ?

There are two areas of conformance/compliance in MPEG:

1.Compliant bitstreams

2.Compliant decoders

Technically speaking, video bitstreams consisting entirely of I-frames are syntactically compliant with the MPEG specification. The I-frame sequence simply utilizes a rather limited subset of the full syntax. Compliant bitstreams must obey the range limits (e.g. motion vectors ranges, bit rates, frame rates, buffer sizes) and permitted syntax elements in the bitstream (e.g. chroma_format, B-pictures, etc).

Decoders, however, must be able to decode all combinations of legal bitstreams.. For example, a decoder which is incapable of decoding P or B frames is definitely not a Main Profile or Constrained Parameters decoder! Likewise, full arithmetic precision must be obeyed before any decoder can be called "MPEG compliant." The IDCT, inverse quantizer, and motion compensated predictor must meet the accuracy requirements defined in the MPEG document. Real-time conformance is more complicated to measure than arithmetic precision, but it reasonable to expect that decoders that skip frames on reasonable bitstreams are not likely to be considered compliant.


返回首页|返回问题首页|接下页