MPEG video is often quoted as achieving compression ratios over 100:1, when in reality the "sweet spot" rests between 8:1 and 30:1.
Here's how the fabled "greater than 100:1" reduction ratio is derived for the popular Compact Disc Video (White Book) bitrate of 1.15 Mbit/sec.
Step 1. Start with the oversampled rate!
Most MPEG video sources originate at a higher sample rate than the "target" sample rate encoded into the final MPEG bitstream. The most popular studio signal, known canonically as "D-1" or "CCIR 601" digital video, is coded at 270 Mbit/sec.
The constant, 270 Mbit/sec, can be derived as follows:
Luminance (Y): | 858 samples/line x 525 lines/frame x 30 frames/sec x 10 bits/sample ~= 135 Mbit/sec |
R-Y (Cb): | 429 samples/line x 525 lines/frame x 30 frames/sec x 10 bits/sample ~= 68 Mbit/sec |
B-Y (Cb): | 429 samples/line x 525 lines/frame x 30 frames/sec x 10 bits/sample ~= 68 Mbit/sec |
Total: | 27 million samples/sec x 10 bits/sample 270 Mbit/sec. |
So, we start with a compression ratio of: 270/1.15... an amazing 235:1 !!!!!
Step 2. Throw in the blanking intervals!
Only 720 out of the 858 luminance samples per line contain active picture information.
In fact, the debate over the true number of active samples is the trigger for many
hair-pulling cat-fights at TV engineering seminars and conventions, so it is healthier to
say that the number lies somewhere between 704 and 720. Likewise, only 480 lines out of
the 525 lines contain active picture information. Again, the actual number is somewhere
between 480 and 496. For the purposes of MPEG-1's and MPEG-2's famous conformance points
(Constrained Parameters Bitstreams and Main Level, respectively), the number shall be 704
samples x 480 lines for luminance, and 352 samples x 480 lines for each of the two
chrominance pictures. Recomputing the source rate, we arrive at:
Y | 704 samples/line x 480 lines x 30 fps x 10 bits/sample ~= 104 Mbit/sec |
C | 2 components x 352 samples/line x 480 lines x 30 fps x 10 bits/sample ~= 104 Mbit/sec |
Total: | ~ 207 Mbit/sec |
The ratio (207/1.15) is now only 180:1
Step 3. Let's Include higher bits/sample!
The MPEG sample precision is 8 bits. There has been some talk of a 10-bit extension, but that's on hold (as of April 2, 1996, 10:35 PM GMT). Studio equipment often quantize samples with 10 bits of accuracy, because some engineers and artists feel the extra dynamic range is needed in the iterative content production loop.) .
Getting rid of this sneaking fator, the ratio is now deflated to only 180 * (8/10 ), or 144:1
Step 4. Ok then, include higher chroma sampling ratio!
The famous CCIR-601studio signal represents the chroma signals (Cb, Cr) with half the horizontal sample density as the luminance signal, but with full vertical "resolution." This particular ratio of subsampled components is known as 4:2:2. However, MPEG-1 and MPEG-2 Main Profile specify the exclusive use of the 4:2:0 format, deemed sufficient for consumer applications, where both chrominance signals have exactly half the horizontal and vertical resolution as luminance (the MPEG Studio Profile, however, centers around the 4:2:2 macroblock structure). Seen from the perspective of pixels being comprised of samples from multiple components, the 4:2:2 signal can be expressed as having an average of 2 samples per pixel (1 for Y, 0.5 for Cb, and 0.5 for Cr). Thanks to the reduction in the vertical direction (resulting in a 352 x 240 chrominance frame), the 4:2:0 signal would, in effect, have an average of 1.5 samples per pixel (1 for Y, and 0.25 for Cb and Cr each). Our source video bit rate may now be recomputed as:
720 pixels x 480 lines x 30 fps x 8 bits/sample x 1.5 samples/pixel = 124 Mbit/sec
... and the ratio is now 108:1.
Step 5. Include pre-subsampled image size?yeah, that the ticket!
As a final act of pre-compression, the CCIR 601 frame is converted to the SIF frame by a subsampling of 2:1 in both the horizontal and vertical directions.... or 4:1 overall. Quality horizontal subsampling can be achieved by the application of a simple FIR filter (7 or 4 taps, for example), and vertical subsampling by either dropping every other field (in effect, dropping every other line) or again by an FIR filter (regulated by an interfield motion detection algorithm). Our ratio now becomes:
352 pixels x 240 lines x 30 fps x 8 bits/sample x 1.5 samples/pixel ~= 30 Mbit/sec !!
.. and the ratio is now only 26:1
Thus, the true A/B comparison should be between the source sequence at the 30 Mbit/sec stage just prior to encoding, which is also the actual specified sample rate in the MPEG bitstream (sequence_header()), and the reconstructed sequence produced from the 1.15 Mbit/sec coded bitstream. If you can achieve compression through subsampling alone, it means you never really needed the extra samples in the first place.
Step 6. Don't forget 3:2 pulldown!
A majority of high budget programs originate from film, not video. Most of the movies encoded onto Compact Disc Video were in fact captured and edited at 24 frames/sec. So, in such an image sequence, 6 out of the 30 frames displayed on a television monitor (30 frame/sec or 60 field/sec is standard NTSC rate in North America and Japan) are in fact redundant and need not be coded into the MPEG bitstream. This naturally leads us to the shocking discovery that the actual soure bit rate has really been 24 Mbit/sec all along (24 fps/30 fps * 30 Mbit/sec), and the compression ratio only a mere 21:1 !!! ("phone the police!").
Even at the seemingly modest 20:1 ratio, "discrepancies" (in polite
conversational terms) will appear between the 24 Mbit/sec source sequence and the
reconstructed sequence. Only conservative ratios in the neighborhood of 12:1 and 8:1 have
demonstrated true transparency for sequences with complex spatial-temporal characteristics
(i.e. rapid, divergent motion and sharp edges, textures, etc.). However, if the video is
carefully encoded by means of pre-processing and intelligent distribution of bits (no,
really), higher ratios can be made to "appear at least artifact-free."
The MPEG-1 specification (official title: ISO/IEC 11172 "Information technology - Coding of moving pictures and associated audio for digital storage media at up to about 1.5 Mbit/s", Copyright 1993.) consists of five parts. Each document is a part of the ISO/IEC standard number 11172. The first three parts reached International Standard status in early 1993 (no coincidence to the nuclear weapons reduction treaty signed back then). Part 4 reached IS in 1994. In mid 1995, Part 5 will go IS.
Part 1---Systems: The first part of the MPEG standard has two primary purposes: 1). a syntax for transporting packets of audio and video bitstreams over digital channels and storage mediums (DSM), 2). a syntax for synchronizing video and audio streams.
Part 2---Video: describes syntax (header and bitstream elements) and semantics (algorithms telling what to do with the bits). Video breaks the image sequence into a series of nested layers, each containing a finer granularity of sample clusters (sequence, picture, slice, macroblock, block, sample/coefficient). At each layer, algorithms are made available which can be used in combination to achieve efficient compression. The syntax also provides a number of different means for assisting decoders in synchronization, random access, buffer regulation, and error recovery. The highest layer, sequence, defines the frame rate and picture pixel dimensions for the encoded image sequence.
Part 3---Audio: describes syntax and semantics for three classes of compression methods. Known as Layers I, II, and III, the classes trade increased syntax and coding complexity for improved coding efficiency at lower bitrates. The Layer II is the industrial favorite, applied almost exclusively in satellite broadcasting (Hughes DSS) and compact disc video (White Book). Layer I has similarities in terms of complexity, efficiency, and syntax to the Sony MiniDisc and the Philips Digitial Compact Cassette (DCC). Layer III has found a home in ISDN, satellite, and Internet audio applications. The sweet spots for the three layers are 384 kbit/sec (DCC), 224 kbit/sec (CD Video, DSS), and 128 Kbits/sec (ISDN/Internet), respectively.
Part 4---Conformance: (circa 1992) defines the meaning of MPEG conformance for all three parts (Systems, Video, and Audio), and provides two sets of test guidelines for determining compliance in bitstreams and decoders. MPEG does not directly address encoder compliance.
Part 5---Software Simulation: Contains an example ANSI C language software encoder and compliant decoder for video and audio. An example systems codec is also provided which can multiplex and demultiplex separate video and audio elementary streams contained in computer data files.
As of March 1995, the MPEG-2 volume consists of a total of 9 parts under ISO/IEC 13818. Part 2 was jointly developed with the ITU-T, where it is known as recommendation H.262. The full title is: "Information Technology--Generic Coding of Moving Pictures and Associated Audio." ISO/IEC 13818. The first five parts are organized in the same fashion as MPEG-1(System, Video, Audio, Conformance, and Software). The four additional parts are listed below:
Part 6 Digital Storage Medium Command and Control (DSM-CC): provides a syntax for controlling VCR-style playback and random-access of bitstreams encoded onto digital storage mediums such as compact disc. Playback commands include Still frame, Fast Forward, Advance, Goto.
Part 7 Non-Backwards Compatible Audio (NBC): addresses the need for a new syntax to efficiently de-correlate discrete mutlichannel surround sound audio. By contrast, MPEG-2 audio (13818-3) attempts to code the surround channels as an ancillary data to the MPEG-1 backwards-compatible Left and Right channels. This allows existing MPEG-1 decoders to parse and decode only the two primary channels while ignoring the side channels (parse to /dev/null). This is analogous to the Base Layer concept in MPEG-2 Scalable video ("decode the base layer, and hope the enhancement layer will be a fad that goes away."). NBC candidates included non-compatible syntax's such as Dolby AC-3. The final NBC document is not expected until 1996.
Part 8 10-bit video extension. Introduced in late 1994, this extension to the video part (13818-2) describes the syntax and semantics for coded representation of video with 10-bits of sample precision. The primary application is studio video (distribution, editing, archiving). Methods have been investigated by Kodak and Tektronix which employ Spatial scalablity, where the 8-bit signal becomes the Base Layer, and the 2-bit differential signal is coded as an Enhancement Layer. Final document is not expected until 1997 or 1998.
[Part 8 has been withdrawn due to lack of interest by industry]
Part 9 Real-time Interface (RTI): defines a syntax for video on demand control signals
between set-top boxes and head-end servers.
In chronological order:
Abbreviation | Official ISO notation | My notation |
- | Problem (unofficial first stage) | barroom witticism or dare |
NI | New work Item | Napkin Item |
NP | New Proposal | Need Permission |
WD | Working Draft | We're Drunk |
CD | Committee Draft | Calendar Deadlock |
DIS | Draft International Standard | Doesn't Include Substance |
IS | International Standard | Induced patent Statements |
Didier Le Gall, "MPEG: A Video Compression Standard for Multimedia
Applications," Communications of the ACM, April 1991, Vol.34, No.4, pp. 47-58
The following journals and conferences have been known to contain information relating to MPEG:
Several MPEG books are under development.
An MPEG book will be produced by the same team behind the JPEG book: Joan Mitchell and Bill Pennebaker.... along with Didier Le Gall. It is expected to be a tutorial on MPEG-1 video and some MPEG-2 video. Van Nostran Reinhold in 1995 or 1996 or maybe 1997.
A book, in the Japanese language, has already been published (ISBN: 4-7561-0247-6). The title is simply called "MPEG" by ASCII publishing.
Keith Jack's second edition of "Video Demystified," to be published in August 1995, will feature a large chapter on MPEG video. Information:
http://www.netstorage.com/kjack/
The DCT and Huffman algorithms receive the most press coverage (e.g. "MPEG is a DCT based scheme with Huffman coding"), but are in fact less significant when compared to the variety of coding modes signaled to the decoder as context-dependent side information. DCT and Huffman are merely in implementation headache to some. The MPEG-1 and MPEG-2 IDCT has the same definition as H.261, H.263, JPEG.
Digital Video Cassette (DVC) employs both an 8x4 and 8x8 DCT.
Constant bitrate streams are buffer regulated to allow continuos transfer of coded data across a constant rate channel without causing an overflow or underflow to a buffer on the receiving end. It is the responsibility of the Encoder's Rate Control stage to generate bitstreams which prevent buffer overflow and underflow. The constant bit rate encoding can be modeled as a reservoir: variable sized coded pictures flow into the bit reservoir, but the reservoir is drained at a constant rate into the communications channel.
The most challenging aspect of a constant rate encoder is, yes, to maintain constant channel rate (without overflowing or underflow a buffer of a fixed depth) while maintaining constant perceptual picture quality.
In the simplest form, variable rate bitstreams do not obey any buffer rules, but will maintain constant picture quality. Constant picture quality is easiest to achieve by holding the macroblock quantizer step size constant, e.g. quantiser_scale_code of 8 (linear) or 12 (non-linear MPEG-2).. In its most advanced form, variable bitrate streams may be more difficult to generate than constant bitrate streams. In "advanced" variable bitrate streams, the instantaneous bit rate (piece-wise bit rate) may be controlled by factors such as:
Summary of bitstream types
Bitrate type | Applications |
constant-rate | fixed-rate communications channels like the original Compact Disc, digital video tape, single channel-per-carrier broadcast signal, hard disk storage |
simple variable-rate | software decoders where the bitstream buffer (VBV) is the storage medium itself (very large). macroblock quantization scale is typically held constant over large number of macroblocks. |
complex variable-rate | Statistical muliplexing (multiple-channel-per-carrier broadcast signals), compact discs and hard disks where the servo mechanisms can be controlled to increase or decrease the channel delivery rate, networked video where overall channel rate is constant but demand is variably share by multiple users, bitstreams which achieve average rates over very long time averages |