Mobile 3DTV content delivery
optimization over DVB-H system

Stereo-video - Video-plus-depth - Impaired videos - Test sequences

Two large scale user studies were conducted to evaluate perceived quality of mobile 3D television and video along its production chain.

The first study targeted the evaluation of different coding methods for mobile 3D television and video applications. The choice of the right coding method is a critical factor in the development process of Mobile3DTV. Different coding approaches had been compared. The goal of the study was to get knowledge about the optimum coding method for mobile 3DTV, but also to get knowledge about the underlying rationale of quality perception.

The second study then targeted the impact of different transmission parameters on the perceived quality of mobile 3DTV. Following the production chain, encoded sequences were evaluated for different error rates, error protection strategies, and coding approaches.

This page lists the tests sequences used in the subjective tests. The used sequences can be downloaded from an FTP server (see the details on the right).

Server: ftp.hhi.de
Dir: HHIMPEG3DV
User: mpeg3dv
Password: Cah#K9xu
While accessing the server, please use a ftp client and not a web browser

Coding study

To be able to evaluate the perceived quality provided by each of the contents, the parameters were chosen in accordance with the settings of the prospective Mobile3DTV system.

The sequences were encoded using two codec profiles. Current mobile devices are only equipped with limited calculation power. However, the development shows that calculation power is increasing rapidly in mobile devices at the moment. So the chosen profiles should respect both of these developments.

The simple baseline profile uses an IPPP structure and CAVLC. The Group of Picture size was set to 1. It refers to low calculation complexity. The complex high profile enables hierarchical B-Frames and CABAC. The Group of Picture size was set to 8. Search Range was set to 8 and Intra Period to 16 in both profiles.

The sequences were encoded with four coding methods: Simulcast, Mixed Resolution, Multiview Video and Video+Depth Coding. For Simulcast, Mixed Resolution and V+D approach the H.264/AVC Reference Software JM 14.2 was used. For MVC the stimuli were encoded using the H.264/MVC Reference Software JMVC 5.0.5.

Due to a variable compressibility of different sequences it is not useful to set these quality levels to fixed bit rates. A rate sufficient for a high quality for one sequence might produce a low quality for other sequences. To guarantee comparable low and high quality for all sequences, individual bit rate points had to be determined for each sequence: To define a high and a low quality for all sequences of the coding test set the quantization parameters (QP) of the codec for simulcast coding was set to 30 for the high quality and 37 for the low quality. This results in a low and high bit rate for each sequence of the coding test set. The resulting bitrates are presented in Table 1.

Table 1 - Target bitrates of the final test sequences per profile and quality level

Profile Quality Bullinger Butterfly Car Horse Mountain Soccer2
Baseline Low 74 143 130 160 104 159
High 160 318 378 450 367 452
High Low 46 94 112 104 78 134
High 99 212 323 284 208 381

For additional details, see the technical report.


Bullinger - Talking head/News

Front shot of a man talking to the audience. The sequence is without any camera movement. It is comparable to a videoconference or an anchorman shot.

No camera movement, low object movement, low structural complexity, low depth complexity.

Resolution: WQVGA, 432x240
Frame rate: 15 Hz
Length: 7.7 s
Spatial details: medium
Temporal details: low
Depth complexity: low

Butterly - Animation

A short clip from the animation "Big Buck Bunny". Big Buck Bunny stands in front of a tree and watches the butterfly. Meanwhile, some squirrels climb up the tree.

No camera movement, high object movement, high structural complexity, medium depth complexity.

Resolution: WQVGA, 432x240
Frame rate: 15 Hz
Length: 12 s
Spatial details: high
Temporal details: medium
Depth complexity: medium

Car - Action/Movie

A car driving along an alley. The camera follows the car. During the selected sequence a lorry is approaching on the opposite lane.

High camera movement, low object movement, medium structural complexity, high depth complexity.

Resolution: WQVGA, 432x240
Frame rate: 15 Hz
Length: 7.8 s
Spatial details: high
Temporal details: high
Depth complexity: medium

Horse - Nature/Documentary

Sequence of a horse eating grass.

No camera movement, low object movement, high structural complexity, medium depth complexity.

Resolution: WQVGA, 432x240
Frame rate: 15 Hz
Length: 9.3 s
Spatial details: medium
Temporal details: low
Depth complexity: high

Mountain - Nature/Documentary

A pan over a mountain area. During the sequence, the camera is slowly moving downwards.

Medium camera movement, low object movement, medium structural complexity, low depth complexity.

Resolution: QVGA, 320x240
Frame rate: 15 Hz
Length: 8 s
Spatial details: high
Temporal details: low
Depth complexity: high

Soccer2 - Sports

A clip of a football match. In the sequence one team prepares for a corner kick. You can see the fans in the background.

High camera movement, high object movement, medium structural complexity, high depth complexity.

Resolution: QVGA, 320x240
Frame rate: 15 Hz
Length: 13.3 s
Spatial details: medium
Temporal details: high
Depth complexity: high

Transmission study

The transmission experiment studied the impact of transmission parameters on the perceived quality of encoded mobile 3D television and video sequences.

In order to study the subjective quality of mobile 3D videos subjected to transmission error, we prepared test sequences that vary content, coding method, protection scheme, error rate and slice mode. The simulations were run to simulate the transmission scenario with the following parameters, settings and technical information.

Table 2 - Test parameters of the transmission study

Contents Heidelberg Alleys, Knights Quest, RollerBlade, RhineValleyMoving
Coding Methods Simulcast, MVC, Video+Depth
Prediction Structures IPPP
Slice Modes OFF, ON (Fixed Slice Sizes of 1300 Bytes)
Protection Structures EEP, UEP
Channel SNR Range 17-21

The simulations were carried out using the DVB-H physical layer parameters presented in Table 3. The settings result in a channel capacity of 13.270.588 bit/s. The mobile channel (COST207 Channel Model Typical Urban 6 taps) with 38.9 km/h receiver velocity relative to source (which corresponds to a maximum Doppler frequency = 24 Hz) is used as the simulated channel model.

Table 3 - DVB-H physical layer transmission parameters

Modulation 16 QAM
Convolutional Code Rate 2/3
Guard Interval 1/4
Channel Bandwidth 8 MHz
Channel Model TU6
Carrier Frequency 666 MHz
Doppler Shift 24 Hz

For additional details, see the technical report.


Heidelberg - Documentary

Low camera movement, low object movement, high structural complexity, various depth complexity.

Resolution: WQVGA, 432x240
Frame rate: 12.5 Hz
Length: 60 s
Spatial details: high
Temporal details: medium
Depth complexity: high

Knights - Animation

Various camera movement, various object movement, low structural complexity, low depth complexity.

Resolution: WQVGA, 432x240
Frame rate: 12.5 Hz
Length: 60 s
Spatial details: high
Temporal details: medium
Depth complexity: high

RhineValleyMoving - Nature/Documentary

High camera movement, high object movement, medium structural complexity, low depth complexity.

Resolution: WQVGA, 432x240
Frame rate: 12.5 Hz
Length: 60 s
Spatial details: high
Temporal details: medium
Depth complexity: medium

Roller - User-created

No camera movement, high object movement, medium structural complexity, medium depth complexity.

Resolution: QVGA, 320x240
Frame rate: 12.5 Hz
Length: 60 s
Spatial details: high
Temporal details: high
Depth complexity: high


Depth range vs. compression study

The study examined the influence of varying depth and compression artifacts on the subjective quality of experience for mobile 3D video and still-image content. Study was divided to two different experiments: video content and still-image.

Content - Four test contents with variable visual characteristics were used in the experiment. There were neither scene cuts nor audio presented in the selected contents.

Parameters – Three depth levels and five quantization parameters were varied. The depth levels contained mono presentation, stereoscopic short and wide baselines. The values of varied quantization parameters were 25, 30, 35, 40, 45. The goal of the selection of these parameters was to systematically tackle the juxtaposition between the positive influence of depth and negative influence of artifacts on experienced quality based on previous work.

Content preparation – The preparation of variable depth levels contained the following steps: for each video clip, multiple stereoscopic versions were prepared, by selecting different camera pairs from the available multi-view video tracks. The left camera of all sequences was kept the same. The 3D-effect of the video was controlled by the position of the right camera, as follows: 1) Monoscopic (2D), video, where left and right camera correspond to the same number in the multiview sequence; 2) Short baseline – a camera baseline, which produces 3D scene with a limited disparity range and less-pronounced, but visible 3D effect; 3) Wide baseline – camera baseline, where the camera baseline is selected to provide the optimal disparity range for the chosen stereoscopic display. By the notation of short baseline, we consider the use scenario where a high-resolution video content is repurposed for mobile use by a direct linear down-scaling.

In the general case, this would result in a ‘shallow’ depth. By notation of wide baseline we consider the scenario where the content is specifically adapted to the viewing conditions of the portable display. In the ideal case, this would result in full utilization of the mobile display’s comfort zone.

Each stereoscopic sequence was converted from its original resolution to the resolution of the target display by using a four-step procedure. 1) The disparity range of each stereo-pair was analyzed. 2) The left and right channels of the video were cropped from the sides with the aim to shift the disparity range and equalize the absolute positive and negative disparity values, as well as to avoid frame violence. The position of the first cropping window in the left channel was kept intact, while varying it in the right channel relatively for the different camera pairs. 3) Both channels were down scaled with respect to the smaller target dimension while maintaining the source aspect ratio. 4) The extra pixels of the larger target dimension were cropped to achieve the display (target) aspect ratio. The position of the second cropping window was the same for all channels and all frames, and was selected manually based on the movie content. For the cropping operations, cubic spline interpolation was applied, while for the resizing (down-scaling), least-squares cubic projection was applied. Following these steps, three depth levels were created for the selected contents. Figure 1 B-C, presents an illustration of the output disparity levels of the test stereoscopic videos. Zero disparity refers to a scene on the display plane, while negative values indicate scenes in front and positive values indicate scenes behind the display plane.

Akko & Kayo

Indoor, studio type, medium complex object motion, no camera motion, medium detail, medium complex depth structure, studio light.

Resolution: VGA, 640x480
Frame rate: 30 Hz
Length: 300 frames, 10 s
Spatial details: medium
Temporal details: medium
Depth complexity: medium

Champagne_tower

Indoor, studio type, complex object motion (fluids, reflections, transparency), no camera motion, high detail, medium complex depth structure, studio light.

Resolution: 1280x960
Frame rate: 30 Hz
Length: 300 frames, 10 s
Spatial details: medium
Temporal details: low
Depth complexity: low

Pantomime

Indoor, studio type, medium complex object motion, no camera motion, medium detail, medium complex depth structure, studio light.

Resolution: 1280x960
Frame rate: 30 Hz
Length: 300 frames, 10 s
Spatial details: medium
Temporal details: medium
Depth complexity: low

Lovebird_1

Outdoor, quiet dialogue in movie type of scene, simple object motion, no camera motion, high detail, complex depth structure, natural light.

Resolution: XGA, 1024x768
Frame rate: 30 Hz
Length: 300 frames, 10 s
Spatial details: medium
Temporal details: low
Depth complexity: medium