How would you describe MPEG to the Data Compression expert?

A. MPEG video is a block-based coding scheme.

How does MPEG video really compare to TV, VHS, laserdisc ?

VHS picture quality can be achieved for film source video at about 1 million bits per second (with careful application of proprietary encoding methods). Objective comparison of MPEG to VHS is complex and political.

The luminance response curve of VHS places -3 dB (50% response, the common definition of bandlimit) at around analog 2 MHz (digital equivalent to 200 samples/line). VHS chroma is considerably less dense in the horizontal direction than MPEG's 4:2:0 signal (compare 80 samples/line equivalent to 176 !!). From a sampling density perspective, VHS is superior only in the vertical direction (480 luminance lines compared to 240). When other analog factors are taken into account, such as interfield crosstalk and the TV monitor Kell factor, the perceptual vertical advantage becomes much less than 2:1.

VHS is also prone to such inconveniences as timing errors (an annoyance addressed by time base correctors), whereas digital video is fully discretized. Duplication processes for pre-recorded VHS tapes at high speeds (5 to 15 times real time playback speed) introduces additional handicaps. In gist, MPEG-1 at its nominal parameters can match VHS's "sexy low-pass-filtered look," but for critical sequences, is probably overall inferior to a well mastered, well duplicated VHS tape.

With careful coding schemes, broadcast NTSC quality can be approximated at about 3 Mbit/sec, and PAL quality at about 4 Mbit/sec? for film source video. Of course, sports sequences with complex spatial-temporal activity should be treated with higher bit rates, in the neighborhood of 5 and 6 Mbit/sec. Laserdisc is perhaps the most difficult medium to make comparisons with.

Laserdisc:

First, the video encoded onto a laserdisc is composite, which lends the signal to the familiar set of artifacts (reduced color accuracy of YIQ, moirse patterns, crosstalk, etc). The medium's bandlimited signal is often defined by laserdisc player manufacturers and main stream publications as capable of rendering up to 425 TVL (or frequencies with Nyquist at 567 samples/line). An equivalent component digital representation would therefore have sampling dimensions of 567 x 480 x 30 Hz.

The carrier-to-noise ratio of a laserdisc video signal is typically better than 48 dB. Timing accuracy is excellent, certainly better than VHS. Yet some of the clean characteristics of laserdisc can be simulated with MPEG-1 signals as low as 1.15 Mbit/sec (SIF rates), especially for those areas of medium detail (low spatial activity) in the presence of uniform motion ("affine" motion vector fields).

The appearance of laserdisc or Super VHS quality can therefore be obtained for many video sequences with low bit rates, but for the more general class of images sequences, a bit rate ranging from 3 to 6 Mbit/sec is necessary.

What are the typical coded sizes for the MPEG frames?

Typical bit sizes for the three different picture types:
 

Level I B Average 
30 Hz SIF @ 1.15 Mbit/sec  150,000  50,000 20,000 38,000 
30 Hz CCIR 601@ 4 Mbit/sec  400,000  200,000  80,000 130,000

Note: the above example is taken from a standard test sequence coded by the Test Model method, with an I frame distance of 15 (N = 15), and a P frame distance of 3 (M = 3).

Of course, among differing source material, scene changes, and use of advanced encoder models these numbers can be significantly different.

At what bitrates is MPEG-2 video optimal?

The Test subgroup has defined a few example "Sweet spot" sampling dimensions and bit rates for MPEG-2:
 

Dimensions  Coded rate  Application 
352x480x24 Hz (progressive)  2 Mbit/sec  Equivalent to VHS quality. Intended for film source video. Half horizontal 601(HHR). Looks almost broadcast NTSC quality 
544x480x30 Hz (interlaced).  4 Mbit/sec  PAL broadcast quality (nearly full capture of 5.4 MHz luminance signal). 544 samples matches the width of a 4:3 picture windowed within 720 sample/line 16:9 aspect ratio via pan&scan 
704x480x30 Hz.(interlaced)  6 Mbit/sec  Full CCIR 601 sampling dimensions 

These numbers may be too ambitious. Bit rates of 3, 6, and 8 Mbit/sec respectively provide transparent quality for the above application examples when generated by a reasonably sophisticated encoder.

Why does film perform so well with MPEG ?

1. The frame rate is 24 Hz (instead of 30 Hz) which is a savings of some 20%.

2. Film source video is inherently progressive. Hence no fussy interlaced spectral frequencies.

3. The pre-digital source was severely oversampled (compare 352 x 240 SIF to 35 millimeter film at, say, 3000 x 2000 samples). This can result in a very high quality signal, whereas most video cameras do not oversample, especially in the vertical direction.

4. Finally, the spatial and temporal modulation transfer function (MTF) characteristics (motion blur, etc) of film are more amenable to the transform and quantization methods of MPEG.

What is the best compression ratio for MPEG ?

The MPEG sweet spot is about 1.2 bits/pel Intra and 0.35 bits/pixel inter. Experimentation has shown that intra frame coding with the familiar DCT-Quantization-Huffman hybrid algorithm achieves optimal performance at about an average of 1.2 bits/sample or about 6:1 compression ratio. Below this point, artifacts become non-transparent.

Is there an MPEG file format?

The traditional descriptors that file formats provide in headers, such image height, width, color space, etc., are already embedded within the MPEG bitstream in the sequence header. Directory file formats are described in the White Book and DVD specifications.

What is the Digital Video Disc (DVD) ?

In 1994, Toshiba united with Thomson Consumer Electronics, Pioneer, and a handful of Hollywood studios to define a new 12 cm diameter compact disc format for broadcast rate digital video. The new format basically increases the effective areal storage density over the 1982 Red Book format by some 6:1 (800 Mbytes vs 5 GBytes). This is achieved through a combination of shorter laser wavelength, finer track pitch, inter-pit pitch, and better optics. The thickness of the disc is reduced from the Red Book's 1.2 millimeters to 0.6 millimeters. However, the new format can be glue two 0.6 mm thick discs back-to-back, forming a double-size disc 1.2 mm thick with a total capacity of 10 Gbytes. A two hour movie, encoded onto only one side, would contain a video bistream average at 5 Mbit/sec. Or 10 Mbit/sec if distributed on both sides of a disc. Most of the 6:1 gain is achieved though more efficient encoding of bits onto the disc. Only a 2:1 factor comes purely from the reduction in wavelength.

By comparison, today's double-sided analog video laserdiscs have a diameter of 30 cm (571 cm^2 of usable area), and a thickness of 2.4 millimeters. Storage capacity is a maximum of 65 minutes per side.

A future potential format for HDTV may employ a blue wavelength laser (0.4 microns), offering another 2:1 increase in areal density, or 20 Gbytes total. Other alternatives include larger disc sizes. For example, if bit coding at DVD areal densities were applied to the familiar 30 cm disc, the average bitrate for the 65 minutes of video per side would be nearly 70 Mbit/sec !!
 
 

What is the MPEG committee ?

In fact, MPEG is a nickname. The official title is: ISO/IEC JTC1 SC29 WG11.

ISO: International Organization for Standardization

IEC: International Electrotechnical Commission

JTC1: Joint Technical Committee 1

SC29: Sub-committee 29

WG11: Working Group 11 (moving pictures with... uh, audio)

What ever happened to MPEG-3 ?

MPEG-3 was to have targeted HDTV applications with sampling dimensions up to 1920 x 1080 x 30 Hz and coded bitrates between 20 and 40 Mbit/sec. It was later discovered that with some (syntax compatible) fine tuning, MPEG-2 and MPEG-1 syntax worked very well for HDTV rate video. The key is to maintain an optimal balance between sample rate and coded bit rate.

Also, the standardization window for HDTV was rapidly closing. Europe and the United States were on the brink of committing to analog-digital subnyquist hybrid algorithms (D-MAC, MUSE, et al). By 1992, European all-digital projects such as HD-DIVINE and VADIS demonstrated better picture quality with respect to bandwidth using the MPEG syntax. In the United States, the Sarnoff/NBC/Philips/Thomson HDTV consortium had used MPEG-1 syntax from the beginning of its all-digital proposal, and with the exception of motion artifacts (due to limited search range in the encoder), was deemed to have the best picture quality of all three digital proponents in the early 1993 bake-off. HDTV is now part of the MPEG-2 High-1440 Level and High Level toolkit.
 

Why bother having an MPEG-2 ?

A. MPEG-1 was optimized for CD-ROM or applications at about 1.5 Mbit/sec. Video was strictly non-interlaced (i.e. progressive). The international cooperation executed well enough for MPEG-1, that the committee began to address applications at broadcast TV sample rates using the CCIR 601 recommendation (720 samples/line by 480 lines per frame by 30 frames per second or about 15.2 million samples/sec including chroma) as the reference.

Unfortunately, today's TV scanning pattern is interlaced. This introduces a duality in block coding: do local redundancy areas (blocks) exist exclusively in a field or a frame.(or a particle or wave) ? The answer of course is that some blocks are one or the other at different times, depending on motion activity. The additional man years of experimentation and implementation between MPEG-1 and MPEG-2 improved the method of block-based transform coding.

It is often remarked that MPEG-2 spent several hundred man years and 10s of millions of dollars yet only gained 20% coding efficiency over MPEG-1 for interlaced video signals. However, the collaborative process brought companies together, and from that came a standard well agreed upon. In many ways, the political achievement dwarfs the technical one. Also, MPEG-2 was exploratory. Coding of interlaced video was unknown territory. It took some considerable convincing to demonstrate that a simple syntax, akin to MPEG-1, was as efficient as other proposals. Left by themselves, each company would probably have produced a diverse scope of syntax.

Is MPEG patented ?

Many of the companies which participated in the MPEG committee have indicated that they hold patents to fundamental elements of the MPEG syntax and semantics. Already, the group known as the "IRT consortium" (CCETT, IRT, et al) have defined royalty fees and licensing agreements for OEMs of MPEG Layer I and II audio encoders and decoders. The fee is $1 USD per audio channel in small quantities, and $0.50 USD per channel in large quantities.

A royalty and licensing agreement has yet to be reached among holders of Video and Systems patents, however the figure has already been agreed upon, ranging from $3 to $4 per implementation. Whether it is retroactively applicable or not to products already sold, or whether it is possible to avoid the patents via approximation techniques, is not known. The non-profit organization,CableLabs (Boulder, Colorado), is responsible for leading the MPEG Intellectual Property Rights effort (known canonically as the "MPEG Patent Pool."). An agreement is expected by mid 1995.

In order to reach the IS (International Standard) document stage, all parties must have sent in a letter to ISO stating they agree to license their intellectual property on fair and reasonable terms, indiscriminately. For MPEG-1 and MPEG-2, this was accomplished in mid 1993.

Companies which hold patents often cross-license each other. Each party does not have to pay royalties to one another.

Information on the MPEG Intellectual Property Rights group can be found at:  http://www.cablelabs.com

What is White Book

The White Book specifies the file structure and indexing of multiplexed MPEG video and audio streams. White Book also specifies the Karaoke application's reference table which describes programs and their sector locations. At the lowest layer, White Book builds upon the CD-ROM XA spec.. Extension data includes screen pointing devices, address list of all Intra pictures within a program, CD version number, Closed Caption data, and information indexing of MPEG still pictures.

The specific MPEG parameter definitions of White Book are:

Audio coding method: MPEG-1 Layer II

Sampling rate: 44.1 kHz

Coded bit rate: 224 Kbits/sec

Mode: stereo, dual channel, or intensity stereo

Video coding method: MPEG-1

Permitted sample rates:

352 pixels/line x 240 lines/frame x 29.97 frames/sec (NTSC rate)

352 pixels/line x 240 lines/frame x 23.976 frames/sec (NTSC film rate)

352 pixels/line x 288 lines/frame x 25 frame/sec (PAL rate)

Maximum bitrate: 1.1519291 bits/sec

Recommendations include:

pixel aspect ratios: 1.0950 (352x240) or 0.9157 (352 x 288)

Intra pictures be placed at least once every 2 seconds.

Still pictures: ("Intra" picture_coding_type only)

Normal res: 352 x 240 or 352 x 288 (maximum 46 Kbytes coded size)

Double res: 704 x 480 or 704 x 576 (maximum 224 Kbytes coded size)

The other books are:

Red Book: this is the original Compact Disc Audio specification (circa 1980). All other books (Yellow, Green, Orange, White) are identical at the low-level, sharing a common base with Red Book. This grandfather specification defines sectors, tracks, and channel coding (8/14 EFM outer forward error correction (FEC), 8-bit polynomial interleaved Reed-Soloman inner forward error correction, etc), and physical parameters (disc diameter 12 cm, laser wavelength 0.8 microns, track pitch, land-to-pit spacing, digital modulation, etc.).

Yellow Book: first CD-ROM specification (circa 1986). Later appended by the CD-ROM XA spec.

Green Book: CD-I (Compact Disc Interactive).

Orange Book: Kodak Photo CD

ISO 9660: (circa 1988) describes file structure for CD-ROM XA (circa 1988). Similar to MS-DOS, filenames are case insensitive and limited to 8 characters, and 3 extension characters (8.3 format). Many CD-ROMs containing MPEG are nothing more than Yellow Book CD which treat multiplexed video and audio bitstreams as an ordinary file.

Further information can be retrieved from:

Philips Consumer Electronics B.V.

Coordination Office Optical & Magnetic Media Systems

Building SWA-1

P.O. Box 80002

5600 JB Eindhoven

The Netherlands

Tel: +31 40 736409

Fax: +31 40 732113

What are some typical picture sizes and their associated applications ?

352 x 240  SIF. CD WhiteBook Movies, video games. 
352 x 480  HHR. VHS equivalent 
480 x 480  Bandlimited (4.2 Mhz) broadcast NTSC. 
544 x 480  Laserdisc, D-2, Bandlimited PAL/SECAM. 
640 x 480  Square pixel NTSC 
720 x 480  CCIR 601. Studio D-1. Upper limit of Main Level. 

Future topics: