For MPEG-1, slices may consist of an arbitrary number of macroblocks. They can be
independently decoded once the picture header side information is known. For parallelism
below the slice level, the coded bitstream must first be mapped into fixed-length
elements. Further, since macroblocks have coding dependencies on previous macroblocks
within the same slice, the data hierarchy must be pre-processed down to the layer of DC
DCT coefficients. After this, blocks may be independently inverse transformed and
quantized, temporally predicted, and reconstructed to buffer memory. Parallelism is
usually more of a concern for encoders. In many encoders today, block matching (motion
estimation) and some rate control stages (such as activity and/or complexity measures) are
processed for macroblocks independently. Finally, with the exception that all macroblock
rows in Main Profile MPEG-2 bitstreams must contain at least one slice, an encoder has the
freedom to choose the slice structure.
MPEG strictly specifies the YCbCr color space, not YUV or YIQ or YPbPr or YDrDb or any other many fine varieties of color difference spaces. Regardless of any bitstream parameters, MPEG-1 and MPEG-2 Video Main Profile specify the 4:2:0 chroma_format, where the color difference channels (Cb, Cr) have half the "resolution" or sample grid density in both the horizontal and vertical direction with respect to luminance.
MPEG-2 High Profile includes an option for 4:2:2 chroma_format, as does the MPEG 4:2:2 Profile (a.k.a. "Studio Profile") naturally. Applications for the 4:2:2 format can be found in professional broadcasting, editing, and contribution-quality distribution environments. The drawback of the 4:2:2 format is simply that it increases the size of the macroblock from six 8x8 blocks (4:2:0) to eight, while increasing the frame buffer size and decoding bandwidth by the same amount (33 %). This increase places the buffering memories well past the magic 16-Mbit limit for semiconductor DRAM devices, assuming the pictures are stored with a maximum of 414,720 pixels (720 pixels/line x 576 lines/frame). The maximum allowable pixel resolution could be reduced by 1/3 to compensate (e.g. 544 x 576). However, if a hardware decoders operate on a macroblock basis in the pipeline, on-chip static memories (SRAM) will increase by 1/3. The benefits offered by 1/3 more pixels generally outweighs full vertical chrominance resolution. Other arguments favoring 4:2:0 over 4:2:2 include:
No, no, definitely no. The following table illustrates the "nuances" between
the different chroma formats for a typical "CCIR 601" frame with pixel
dimensions of 720 pixels/line x 480 lines/frame:
chroma _format |
Y samples per line | Y lines per frame | C samples per line | C lines per frame | horizontal subsampling factor | vertical subsampling factor |
4:4:4 | 720 | 480 | 720 | 480 | none | none |
4:2:2 | 720 | 480 | 360 | 480 | 2:1 | none |
4:2:0 | 720 | 480 | 360 | 240 | 2:1 | 2:1 |
4:1:1 | 720 | 480 | 180 | 480 | 4:1 | none |
4:1:0 | 720 | 480 | 180 | 120 | 4:1 | 4:1 |
3:2:2, 3:1:1, and 3:1:0 are less common variations, but have been documented. As shocking as it may seem, the 4:1:0 ratio was used by Intel's DVI for several years.
The 130 microsecond gap between successive 4:2:0 lines in progressive frames, and 260
microsecond gap in interlaced frames, can introduce some difficult vertical frequencies,
but most can be alleviated through pre-processing.
By definition, MPEG samples have no more and no less than 8-bits uniform sample
precision (256 quantization levels). For luminance (which is unsigned) data, black
corresponds to level 0, white is level 255. However, in CCIR recommendation 601
chromaticy, luminance (Y) levels 0 through 14 and 236 through 255 are reserved for
blanking signal excursions. MPEG currently has no such clipped excursion restrictions,
although decoder might take care to insure active samples do not exceed these limits. With
three color components per pixel, the total combination is roughly 16.8 million colors
(i.e. 24-bits).
A. It is moderately important to properly co-site chroma samples, otherwise a sort of chroma shifting effect (exhibited as a "halo") may result when the reconstructed video is displayed. In MPEG-1 video, the chroma samples are exactly centered between the 4 luminance samples (Fig 1.) To maintain compatibility with the CCIR 601 horizontal chroma locations and simplify implementation (eliminate need for phase shift), MPEG-2 chroma samples are arranged as per Fig.2.
Y Y Y Y Y Y Y Y YC Y YC Y
C C C C
Y Y X Y Y Y Y Y YC Y YC Y
Y Y Y Y Y Y Y Y YC Y YC Y
C C C C
Y Y Y Y Y Y Y Y YC Y YC Y
Fig.1 MPEG-1 Fig.2 MPEG-2 Fig.3 MPEG-2 and
4:2:0 organization 4:2:0 organization CCIR Rec. 601
A. All MPEG-2 bitstreams must contain specific extension headers that immediately follow MPEG-1 headers. At the highest layer, for example, the MPEG-1 style sequence_header() is followed by sequence_extension(). Some extension headers are specific to MPEG-2 profiles. For example, sequence_scalable_extension() is not allowed in Main Profile bitstreams.
A simple program need only scan the coded bitstream for byte-aligned start codes to
determine whether the stream is MPEG-1 or MPEG-2.
These 32-bit byte-aligned codes provide a mechanism for cheaply searching coded
bitstreams for commencement of various layers of video without having to actually parse
variable-length codes or perform any decoder arithmetic. Start codes also provide a
mechanism for re-synchronizing in the presence of bit errors. A start code may be preceded
by an arbitrary number of zero bytes. The zero bytes can be use to guarantee that a start
code occurs within a certain location, or by rate control to increase the bitrate of a
coded bitstream.
Coded block pattern:
(CBP --not to be confused with Constrained Parameters!) When the frame prediction is particularly good, the displaced frame difference(DFD, or temporal macroblock prediction error) tends to be small, often with entire block energy being reduced to zero after quantization. This usually happens only at low bit rates. Coded
block patterns prevent the need for transmitting EOB symbols in those zero coded
blocks. Coded block patterns are transmitted in the macroblock header only if the
macrobock_type flag indicates so.
Clarification point: The DC value of Intra coded blocks is quantized by a constant
stepsize of 8 only in MPEG-1, rendering the 11-bit dynamic range of the IDCT DC
coefficient to 8-bits of accuracy. MPEG-2 allows for DC precision of 8, 9, 10, or 11 bits.
The quantization stepsize is fixed for the duration of the picture, set by the intra_dc_precision
flag in the picture_extension_header().
Since the coded_block_pattern in NON-INTRA macroblocks signals every possible combination of all-zero valued and non-zero blocks, the dct_coef_first mechanism assigns a different meaning to the VLC codeword (run = 0, level =+/- 1) that would otherwise represent EOB (10) as the first coefficient in the zig-zag ordered Run-Level token list.
Saves unnecessary run-length codes. At optimal bitrates, there tends to be few AC
coefficients concentrated in the early stages of the zig-zag vector. In MPEG-1, the 2-bit
length of EOB implies that there is an average of only 3 or 4 non-zero AC coefficients per
block. In MPEG-2 Intra (I) pictures, with a 4-bit EOB code in Table 1, this estimate is
between 9 and 16 coefficients. Since EOB is required for all coded blocks, its absence can
signal that a syntax error has occurred in the bitstream.
A genuine pain for VLSI implementations, macroblock stuffing was included in MPEG-1 to
maintain smoother, constant bitrate control for encoders. However, with normalized
complexity/activity measures and buffer management performed a priori (before coding of
the macroblock, for example) and local monitoring of coded data buffer levels now a common
operation in encoders, (e.g. MPEG-2 encoder Test Model), the need for such localized
bitrate smoothing evaporated. Stuffing can be achieved through slice start code padding if
required. A good rule of thumb is: if you find often yourself wishing for stuffing more
than once per slice, you probably don't have a very good rate control algorithm.
Nonetheless, to avoid any temptation, macroblock stuffing is now illegal in MPEG-2 (A
general syntax restriction brought to you by the Implementation Studies Subgroup!)
The absolute position of the first macroblock within a slice is known by the
combination of slice_vertical_position and the macroblock_address_increment.
Therefore, the proper place of a lost slice found in a highly corrupt bitstream can be
located exactly within the picture. These two syntax elements are also the only known
means of detecting slice gaps----areas of the picture which are not represented with any
information (including skipped macroblocks). A slice gap occurs when the current
macroblock address of the first macroblock in a slice is greater than the previous
macroblock address by more than 1 macroblock unit. A slice overlap occurs when the
current macroblock address is less than or equal to the previous macroblock's address. The
previous macroblock in both instances is the last known macroblock within the previous
slice. Because of the semantic interpretation of slice gaps and overlaps, and because of
the syntactic restrictions for slice_vertical_position and macroblock_address_increment,
it is not syntactically possible for a skipped macroblock to be represented in the first
and last positions of a slice. In the past, some (bad) encoders would attempt to signal a
run of skipped macroblocks to the end of the slice. These evil skipped macroblocks should
be interpreted by a compliant decoder as a gap, not as a string of skipped macroblocks.
The VLC tables in MPEG are not Huffman tables in the true sense of Huffman coding, but
are more like the tables used in Group 3 fax (where the term "modified Huffman
tables" was unleashed). They are entropy constrained, that is, non-downloadable and
optimized for a limited range of bit rates (sweet spots). A better way would be to say
that the tables are optimized for a range of ratios of bit rate to sample rate (e.g. 0.25
bits/pixel to 1.0 bits/pixel). With the exception of a few codewords, the larger tables
were carried over from the H.261 standard drafted in the year 1990. This includes the AC
run-level symbols, coded_block_pattern, and macroblock_address_increment. MPEG-2 added an
"Intra table," also called "Table 1". Note that the dct_coefficient
tables assume that positive and negative AC coefficient run-levels are equally probable.
MPEG-1 video decoders had to decide for themselves when to perform 3:2 pulldown if it was not indicated in the presentation time stamps (PTS) of the Systems layer bitstream. MPEG-2 provides two flags (repeat_first_field, and top_field_first) which explicitly describe whether a frame or field is to be repeated. In progressive sequences, frames can be repeated 2 or 3 times. Simple and Main Profile limit are limited to repeated fields only. It is a general syntactic restriction that repeat_first_field can only be signaled (value ==1) in a frame structured picture. It makes little sense to repeat field pictures in an interlaced video signal since the whole process of 3:2 pulldown conversion was meant to convert progressive, film sequences to the display frame rate of interlaced television.
In the most common scenario, a film sequence will contain 24 frames every second. The bit_rate element in the sequence header will indicate 30 frames/sec, however. On average, every other coded frame will signal a repeat field (repeat_first_field==1) to pad the frame rate from 24 Hz to 30 Hz:
(24 coded frames/sec)*(2 fields/coded frame)*(5 display fields/4 coded fields) = 30
display frames/sec
Despite the fact that a comprehensive worldwide standard now exists for digital video, many areas remain wide open for research:
A. Definitely. For example, the motion estimation search range of a has great influence over final picture quality. At a certain point a very large range can actually become detrimental (it may encourage large differential motion vectors, which consume bits). Practical ranges are usually between +/- 15 and +/- 32. As the range doubles, for instance, the search area quadruples. (brain reminder: like the classic relationship between in increase in linear vs. Area ?!?).
Rate control marks a second tell-tale area where some encoders perform significantly better than others.
And finally, the degree of "pre-processing" (now a popular buzzword in the
business) signals that the encoder belongs to an elite marketing class.
The encoder rests just outside the normative scope of the standard, as long as the bitstreams it produces are compliant. The decoder, however, is almost deterministic: a given bitstream should reconstruct to a unique set of pictures. However, since the IDCT function is the ONLY non-normative stage in the decoder, an occasional error of a Least Significant Bit per prediction iteration is permitted.
The designer is free to choose among many DCT algorithms and implementations. The IEEE
1180 test referenced in Annex A of the MPEG-1 (ISO/IEC 11172-2) and MPEG-2 (ISO/IEC
13818-2) Video specifications spells out the statistical mismatch tolerance between the
Reference IDCT, which is a separable 8x1 "Direct Matrix" DCT implemented with
64-bit floating point accuracy, and the IDCT you are testing for compliance.
A. The Test model (MPEG-2) and Simulation Model (MPEG-1) were not, by any stretch of the imagination, meant to epitomize state-of-the art encoding quality. They were, however, designed to exercise the syntax, verify proposals, and test the relative compression performance of proposals in a timely manner that could be duplicated by co-experimenters. Without simplicity, there would have been no doubt endless debates over model interpretation. Regardless of all else, more advanced techniques would probably trespass into proprietary territory.
The final test model for MPEG-2 is TM version 5b, a.k.a. TM version 6, produced in March 1993 (the time when the MPEG-2 video syntax was "frozen"). The final MPEG-1 simulation model is version 3 ("SM-3"). The MPEG-2 TM rate control method offers a dramatic improvement over the SM method. TM adds more accurate estimation of macroblock complexity through use of limited a priori information. Macroblock quantization adjustments are computed on a macroblock basis, instead of once-per-macroblock row (which in the SM-3 case consisted of an entire slice).
Rate control and adaptive quantization are divided into three steps:
Step One: Target Bit Allocation
In Complexity Estimation, the global complexity measures assign relative weights to each picture type (I,P,B). These weights (Xi, Xp, Xb) are reflected by the typical coded frame size of I, P, and B pictures (see typical frame size discussion). I pictures are usually assigned the largest weight since they have the greatest stability factor in an image sequence and contain the most "new information" in a sequence. B pictures are assigned the smallest weight since B energy do not propagate into other pictures and are usually more highly correlated with neighboring P and I pictures than P pictures are.
The bit target for a frame is based on the frame type, the remaining number of bits left in the Group of Pictures (GOP) allocation, and the immediate statistical history of previously coded pictures (sort of a "moving average" global rate control, if you will).
Step Two: Rate Control via Buffer Monitoring
Rate control attempts to adjust bit allocation if there is significant difference between the target bits (anticipated bits) and actual coded bits for a block of data. If the virtual buffer begins to overflow, the macroblock quantization step size is increased, resulting in a smaller yield of coded bits in subsequent macroblocks. Likewise, if underflow begins, the step size is decreased. The Test Model approximates that the target picture has spatially uniform distribution of bits. This is a safe approximation since spatial activity and perceived quantization noise are almost inversely proportional. Of course, the user is free to design a custom distribution, perhaps targeting more bits in areas that contain more complex yet highly perceptible data such as text.
Step Three: Adaptive Quantization
The final step modulates the macroblock quantization step size obtained in Step 2 by a local activity measure. The activity measure itself is normalized against the most recently coded picture of the same type (I, P, or B). The activity for a macroblock is chosen as the minimum among the four 8x8 block luminance variances. Choosing the minimum block is part of the concept that a macroblock is no better than the block of highest visible distortion (weakest link in the chain).
Decision:
[deferred to later date]
I.Can motion vectors be used to determine object velocity?
Motion vector information cannot be reliably used as a means of determining object velocity unless the encoder model specifically set out to do so. First, encoder models that optimize picture quality generate vectors that typically minimize prediction error and, consequently, the vectors often do not represent true object translation from picture-to-picture. Standards converters that resample one frame rate to another (as in NTSC to PAL) use different methods (motion vector field estimation, edge detection, et al) that are not concerned with Rate-Distortion theory. Second, motion vectors are not transmitted for all macroblocks anyway.