1 什么是 MPEG?
2 MPEG 的 回顾
3 什么叫 MPEG 视频句法?
11 MPEG 基于 DCT 方案,DCT 是 MPEG 的全部 ?
19 什么是MPEG的采样精度,MPEG能表示多少种彩色 ?
21 如何从 MPEG-2 比特流中描述 MPEG-1比特流?
25 为什么对第一个 DCT 系数存在一个指定的 VLC ?
27 什么是 Macroblock 填充,为什么大家不喜欢这样的东东 ?
28 什么是有关 slice_垂直定位和 macroblock_地址增量 ?
34 什么是 TM 速率控制和自适应量化技术 ?TM 是如何工作的 ?
42 与 MPEG 兼容的含义是什么 ? 除了要付出专利费外 ?
43 什么是档次和等级 ?
I. MPEG-1 编码可以高于采样率为 352 x 240 x30 Hz 的图像吗
?
47 有无关于 SIF 类型应用和解码器的限制参数比特流的方法 ?
50 什么方面可以得到改善,而产生比 MPEG 更好的句法 ?
53 MPEG 视频与 TV, VHS, laserdisc 的比较 ?
59 什么是 Digital Video Disc (DVD) ?
To the real word, MPEG is a generic means of compactly representing digital video and audio signals for consumer distribution. The basic idea is to transform a stream of discrete samples into a bitstream of tokens which takes less space, but is just as filling to the eye (卭r ear). This "transformation," or better representing, exploits perceptual and even some actual statistical redundancies. The orthogonal dimensions of Video and Audio streams can be further linked with the Systems layer---MPEG's own means of keeping the data types synchronized and multiplexed in a common serial bitstream.
The essence of MPEG is its syntax: the little tokens that make up the bitstream. MPEG's semantics then tell you (if you happen to be a decoder, that is) how to inverse represent the compact tokens back into something resembling the original stream of samples. These semantics are merely a collection of rules (which people like to called algorithms, but that would imply there is a mathematical coherency to a scheme cooked up by trial and error?). These rules are highly reactive to combinations of bitstream elements set in headers and so forth.
MPEG is an institution unto itself as seen from within its own universe.
When (unadvisedly) placed in the same room, its inhabitants a blood-letting
debate can spontaneously erupt among, triggered by mere anxiety over the
most subtle juxtaposition of words buried in the most obscure documents.
Such stimulus comes readily from transparencies flashed on an overhead
projector. Yet at the same time, this gestalt will appear to remain totally
indifferent to critical issues set before them for many months. It should
therefore be no surprise that MPEG's dualistic chemistry reflects the extreme
contrasts of its two founding fathers: the fiery Leonardo Chairiglione
(CSELT, Italy) and the peaceful Hiroshi Yasuda (JVC, Japan). The excellent
byproduct of the successful MPEG Processes became an International
Standards document safely administered to the public in three parts: Systems
(Part 1), Video (Part 2), and Audio (Part 3).
A respected method developed by the old Sarnoff Princeton NJ research group was purchased in 1988 by our friend Intel. (The August 1988 issue of Stereo Review discusses the early days of compact disc digital video). It then became known as DVI, or Digital Video Interactive.
Seeing this threat?that is, need for world interoperability, the Fathers of MPEG sought the help of their colleagues to form a committee to standardize a common means of representing video and audio (a la DVI) onto compact discs? and maybe it would be useful for other things too.
MPEG borrowed a significantly from JPEG and, more directly, H.261.
Seeing how this MPEG things was such a good deal, and not wanting to be left behind in the industry, participants amassed, reaching a peak of more than 200 people by 1992.
By the end of the third year (1990), a syntax emerged, which when applied to represent SIF-rate video and compact disc-rate audio at a combined bitrate of 1.5 Mbit/sec, approximated the pleasure-filled viewing experience offered by the standard VHS format.
After demonstrations proved that the syntax was generic enough to be applied to bit rates and sample rates far higher than the original primary target application ("Hey, it actually works!"), a second phase (MPEG-2) was initiated within the committee to define a syntax for efficient representation of broadcast video, or SDTV as it is now known (Standard Definition Television), not to mention the side benefits: frequent flier miles, impress friends, job security, obnoxious party conversations.
Yet efficient representation of interlaced (broadcast) video signals was more challenging than the progressive (non-interlaced) signals thrown at MPEG-1. Similarly, MPEG-1 audio was capable of only directly representing two channels of sound (although Dolby Surround Sound can be mixed into the two channels like any other two channel system).
MPEG-2 would therefore introduce a scheme to decorrelate mutlichannel discrete surround sound audio signals, exploiting the moderately higher redundancy factor in such a scenario. Of course, propriety schemes such as Dolby AC-3 have become more popular in practice.
Need for a third phase (MPEG-3) was anticipated way back in 1991 for High Definition Television, although it was later discovered by late 1992 and 1993 that the MPEG-2 syntax simply scaled with the bit rate, obviating the third phase. MPEG-4 was launched in late 1992 to explore the requirements of a more diverse set of applications (although originally its goal seemed very much like that of the ITU-T SG15 group, which produced the new low-birate videophone standard---H.263).
Today, MPEG (video and systems) is exclusive syntax of the United States
Grand Alliance HDTV specification, the European Digital Video Broadcasting
group, and the Digital Versital Disc (DVD).
1. Compression Ratios over 100:1
As discussed elsewere, articles in the press and marketing literature will often make the claim that MPEG can achieve high quality video with compression ratios over 100:1. These figures often include the oversampling factors in the source video. In reality, the coded sample rate specified in an MPEG image sequence is usually not much larger than 30 times the specified bit rate. Pre-compression through subsampling is chiefly responsible for 3 digit ratios for all video coding methods, including those of the non-MPEG variety ("yuck, blech!").
2. MPEG-1 是 352x240
Both MPEG-1 and MPEG-2 video syntax can be applied at a wide range of bitrates and sample rates. The MPEG-1 that most people are familiar with has parameters of 30 SIF pictures (352 pixels x 240 lines) per second and a coded bitrate less than 1.86 megabits/sec----a combination known as "Constrained Parameters Bitstreams". This popular interoperability point is promoted by Compact Disc Video (White Book).
In fact, it is syntactically possible to encode picture dimensions as high as 4095 x 4095 and a bitrates up to 100 Mbit/sec. This number would be orders of magnitude higher, maybe even infinite, if not for the need to conserve bits in the headers!
随 MPEG-2 规范,最常用的组合构成 "等级",这点在本文后面叙述。两个常用的等级如下:
3. 运动补偿把先前图像中的macroblock进行位移
从先前重构的图像中任意一个16x16象素块(或MPEG-2的16x8象素块)进行 macroblock 预测。
除了图像边缘外,这里对macroblock预测的范围没有限制。
构成预测的参考图像从概念上讲是一组采样点,它们的编码形式不可重装配。一旦重构了一帧
图像,就把原始图像看成编码macroblock的集合,并将其视为一种平面采样的集合,而失去了
它们原来的特征。
显示图像尺寸和帧速率可以与编码成比特流图像的尺寸(分辨率)和帧速率不同。例如,
原图像序列中的图像模型可以下降,在编码前,对每个图像进行滤波和子采样。在重构时,
图像通过内插和上采样来恢复原来的尺寸和帧速率。
实际上,有三个基本阶段(源速率,编码速率,显示速率)采用不同的几种参数。通过序列
头,MPEG句法可用来分别描述编码和显示速率,而实际源速率仅由编码器得知。这就是为什么
MPEG-2引入display_horizontal_size 和 display_vertical_size 头元素的原因,即显示
域元素相对于 MPEG-1 的编码域中的 horizontal_size 和 vertical_size 元素。
5. 图像编码类型(I,B,P)由相同的macroblocks 类型构成。在 I 图像中所有不可剥离
的 macroblock 需在帧内编码(类似 JPEG 图像)。然而,P 图像中的macroblock可采用帧内
编码,也可用非帧内编码(从前面重构的图像中预测)。最后,B 图像中的macroblock可独立
选用帧内,向前预测,向后预测,或前后预测实现编码。macroblock头包含称macroblock_type
的元素,可以象开关那样触发这些模式。
macroblock_type 可能在整个视频句法中成为单个最有用的元素。MPEG-2中引入的 motion_type,
可能是第二个最有用的元素。图像类型(I,P,B)只是把macroblock模式在语义上扩展范围。其
分量开关是:
# 帧内或非帧内
# 前向预测(向前运动)
# 向后预测(向后运动)(开关2+3表示内插,即双向预测)
# 条件补充(macroblock_type),其作用称为预测的“数字
# 自适应量化(macroblock_quantizer_code)
# 无运动补偿的预测
前五个开关大多正交(第六个在 P 图像中采用专门策略,标注为1st
和 2nd 的开关设置为预测OFF,
而没有运动补偿。
DCT 数据,因此,不需形成 macroblock_pattern 或任何的预测开关。同样,当非帧内 macroblock
中没有编码预测误差时,macroblock_quantizer 信号也没有意义。这表明,MPEG 需要读取装置
更紧密的解释。
在 P 图像中跳过 macroblock:
在 B 图像中跳过 macroblock: