A. MPEG-2 Video Main Profile and Main Level is analogous to MPEG-1's CPB, with sampling limits at CCIR 601 parameters (720x480x30 Hz or 720x576x24 Hz). "Profiles" limit syntax (i.e. algorithms), whereas "Levels" limit coding parameters (sample rates, frame dimensions, coded bitrates, etc.). Together, Video Main Profile and Main Level (abbreviated as MP@ML) normalize complexity within feasible limits of 1994 VLSI technology (0.5 micron), yet still meet the needs of the majority of applications. MP@ML is the conformance point for most cable and satellite TV systems.
[insert a description of each Profiles and Levels here]
A. Yes. The MPEG-1 syntax permits sampling dimensions as high as 4095 x 4095 x 60
frames per second. The MPEG most people think of as "MPEG-1" is really a kind of
subset known as Constrained Parameters bitstream (CPB).
MPEG-1 CPB are a limited set of sampling and bitrate parameters designed to normalize
decoder computational complexity, buffer size, and memory bandwidth while still addressing
the widest possible range of applications. The parameter limits were intentionally
designed to permit decoder implementations integrated with 4 Megabits (512 Kbytes) of
DRAM.
Bitstream Parameter | Limit |
pixels/line | 704 |
lines/frame | 480 or 576 |
pixels/frame | 101,376 pixels |
pixels/second | 2,534,400 |
frames/sec | 30 Hz |
bit rate | 1.86 Mbit/sec |
buffer size | 40 Kbytes |
The sampling limits of CPB are bounded at the ever popular SIF rate: 396 macroblocks (101,376 pixels) per picture if the picture rate is less than or equal to 25 Hz, and 330 macroblocks (84,480 pixels) per picture if the picture rate is 30 Hz. The MPEG nomenclature loosely defines a pixel or "pel" as a unit vector containing a complete luminance sample and one fractional (0.25 in 4:2:0 format) sample from each of the two chrominance (Cb and Cr) channels. Thus, the corresponding bandwidth figure can be computed as:
352 samples/line x 240 lines/picture x 30 pictures/sec x 1.5 samples/pixel
or 3.8 Ms/s (million samples/sec) including chroma, but not including blanking
intervals. Since most decoders are capable of sustaining VLC decoding at a faster rate
than 1.8 Mbit/sec, the coded video bitrate has become the most often waived parameter of
CPB. An encoder which intelligently employs the syntax tools should achieve SIF quality
saturation at about 2 Mbit/sec, whereas an encoder producing streams containing only I
(Intra) pictures might require as much as 8 Mbit/sec to achieve the same video quality.
A. It is an optimum point that allows (just barely) cost effective VLSI implementations in 1992 technology (0.8 microns). It also implies a nominal guarantee of interoperability for decoders and a reasonable class of performance for encoders. Since CPB is the most popular canonical MPEG-1 conformance point, MPEG devices which are not capable of at least meeting SIF rates are usually not considered to be true MPEG by industry.
Picture buffers (i.e. "frame stores") and coded data buffering requirements
for MPEG-1 CPB fit just snugly into 4 Mbit of memory (DRAM).
A. Principal CPB applications are Compact Disc video (White Book or CD-I) and desktop
video. Set-top TV decoders fall into a higher sampling rate category known as "CCIR
601" or "Broadcast rate," which as a rule of thumb, has sampling dimensions
and bandwidth 4 times that of SIF (Constrained Parameter sample rate limit).
A. Yes, some. Remember that CPB limits pictures by macroblock count (or pixels/frame). 416 x 240 x 24 Hz sampling rates are still within these constraints. Deviating from 352 samples/line could throw off many decoder implementations which possess limited horizontal sample rate conversion abilities. Some decoders do in fact include a few rate conversion modes, with a filter usually implemented via binary taps (shifts and adds). Likewise, the target sample rates are usually limited or ratios (e.g. 640, 540, 480 pixels/line, etc.). Future MPEG decoders will likely include on-chip arbitrary sample rate converters, perhaps capable of operating in the vertical direction (although there is little need of this in applications using standard TV monitors where line count is constant, with the possible exception of windowing in cable box graphical user interfaces).
Also, many CD videos are letterboxed at the 16:9 aspect ratio. The actual coded and
display sampling dimensions are 384 x 216 (note 384/216 = 16/9). These programs are
typically movies coded at the more manageable 24 frames/sec.
A. Undocumented ones, yes. A second generation of decoder chips emerged on the market
about 1 year after the first wave of SIF-class decoders. Both LSI Logic and SGS-Thomson
introduced CCIR 601 class MPEG-1 video decoders to fill in the gap between canonical
MPEG-1 (SIF) and the emergence of Main Profile at Main Level (CCIR 601) MPEG-2 decoders.
Under non-disclosure agreement, C-Cube had the CL-950, although since Q2'94, the CL-9100
is now the full MPEG-2 successor in production. MPEG-1 decoders in the "CCIR
601" class, or Main Level, were all too often called "MPEG-1.5" or
"MPEG-1++" decoders. For the first year of operation, the Direct Broadcasting
Satellite service in the United States (Hughes' Direct TV and Hubbard's USSB) called only
upon MPEG-1 syntax to represent interlaced video before switching to full MPEG-2 syntax.
A limited set is available for the choosing in MPEG-1 and the currently defined set of Profiles and Levels of MPEG-2, although "tricks" could be played with Systems-layer Time Stamps to convey non-standard picture rates. The set is: 23.976 Hz (3-2 pulldown NTSC), 24 Hz (Film), 25 Hz (PAL/SECAM or 625/60 video), 29.97 (NTSC), 30 Hz (drop-frame NTSC or component 525/60), 50 Hz (double-rate PAL), 59.97 Hz (double rate NTSC), and 60 Hz (double-rate, drop-frame NTSC/component 525/60 video).
Only 23.976, 24, 25, 29.97, and 30 Hz are within the conformance space of Constrained Parameter Bitstreams and Main Level
Thanks to MPEG's top_field_first and repeat_first_field, it is technically possible to
have somehow irregular coded frame rates and still have a constant display frame rate. But
watch out for VBV compliance!
As more number crunching cycles become available with improvements in semiconductors, several improvements can be made to the MPEG syntax while remaining within the framework of block based transform coding.
Intra coding:
For intra pictures, subband methods such as wavelets combined with improved quantization and entropy coders could gain as much as 2-4 dB over MPEG Intra pictures. The problem becomes more complex when considering the coding of Intra Macroblocks in mixed pictures, such as P or B, since the extent of a subband must, in the simplest of schemes, be limited to the dimensions of a macroblock.
Prediction error coding
One of the strongest gripes against MPEG is the use of the DCT for decorrelating prediction error blocks. One explanation is: although the DCT is suited for the statistical correlation of intra signals, it is much less suited for the statistics of prediction error (Non-Intra) signals.
One common proposal is to replace the prediction error DCT with a Vector Quantizer. Prediction error (Non-intra) blocks typically contain far fewer bits than intra blocks. (The bits that comprise a Non-intra blocks can be thought of as having been previously distributed over previous blocks in previous pictures in the form of coefficients and side information...)
Finer coding unit granularity's:
The size of the transform block could be made smaller, larger, or both (myriad of different sizes). Likewise, the size of the motion compensation block can be made larger or smaller. The cost is more complex semantics (more decoder complexity) and the overhead bits to select the block size. Instead of sharing the same side information, the blocks within the macroblock could be assigned their own motion vectors, macroblock quantization scale factors, etc.
Many advanced techniques were in investigated by MPEG during the formative stages of
the specification, but were eventually eliminated for falling below a threshold controlled
by coding gain vs. implementation complexity. Often, proposals presented a significant
departure from the main stream algorithms under consideration. Each bit added to the
syntax, or rule added to the semantics, represents several gates to a silicon
implementation. From a software perspective, an extra table, if-then or case statement at
multiple points in the decoding program.
During its formative stages, H.263 was known as "H.26P" or "H.26X". It is an ITU-T standard for low-bitrate video and audio teleconferencing. It is designed to be more efficient (at least 2dB) than H.261 for bit rates below 64 kbits/sec (ISDN B channel). The primary target bit rate, approximately 27,000 bits/sec, is the payload rate of the V.34 (a.k.a "V.Fast" or "V.Last") modem standard. In a typical scenario, 20 kbit/sec would be allocated for the video portion, and 6.5 kbit/sec for the speech portion.
Since the H.261 syntax was defined in 1990, techniques and implementation power have naturally improved. H.263 collects many of the advanced methods proposed during MPEGs formative stages into a syntax which shares a common basis more with MPEG-1 video than it does with H.261.
The detailed differences and similarities are summarized below:
Sample rate, precision, and color space:
H.263 pictures are transmitted with QCIF dimensions. MPEG and JPEG allow nearly any picture size to be described in the headers. A fixed picture size promotes interoperability by forcing all implementors to operate at a common rate, rather than by allowing implementors to get away with whatever lowest sample rate the consumer can be "convinced" is acceptable. Another reason for a fixed sample rate is that, unlike MPEG which is generic, H.263 is geared towards a specific application (teleconferencing). Other MPEG applications such as CD Video and Cable TV define their own fixed parameters. Chromaticy is again YCbCr, 4:2:0 macroblock structure, and 8 bits of uniform sample precision.
Tables, bits, and other little things:
H.263 refined the variable length code tables.
[more at a later date]