QoE Aware VVC Based Omnidirectional and Screen Content Coding
Widespread adoption of immersive media and communication tools, including Virtual Reality (VR), screen sharing and video conferencing applications, demand better compression of non-conventional video contents. Versatile Video Coding (VVC), the latest video coding standard which focusses on versatility introduces new coding tools specifically for the omnidirectional videos (360° videos) and artificially generated video (screen) contents. The special characteristics demonstrated by these non-conventional videos pose crucial challenges in developing efficient encoding algorithms. Moreover, VVC and state-of-the-art compression architectures provide inadequate coding support to address the spherical and the perceptual characteristics and exploit distinct features, including sharp edges and repeating patterns in 360° videos and screen content videos, respectively.
In response, the first contribution introduces spherical characteristics to VVC to support its rectilinear functionalities. To this end, a novel spherical objective metric called the Weighted Craster Parabolic Peak-To-Signal Ratio (WCPPPSNR) is developed and used with newly designed residual weighting and multiple Quantization Parameter (QP) optimization techniques to improve the compression efficiency. This not only brings spherical characteristics to the video codec but acts as two-stage magnitude reduction of redundancy in both spatial and frequency domains. The results report that the proposed algorithms can improve the compression efficiency of VVC Test Model (VTM) 2.2 by 3.18% on average and up to 6.07%.
The second contribution of the thesis proposes a novel 360° encoding that leverages user observed viewport information. In this regard, bidirectional optical flow, Gaussian filter and Spherical Convolutional Neural Networks (Spherical CNN) are deployed to extract perceptual features and predict the user observed viewports. By appropriately fusing the predicted viewports on the 2-D projected 360° video frames, a novel Regions Of Interest (ROI) aware weightmap is developed which can be used to mask the source video and introduce adaptive changes to the VVC coding tools. Comprehensive experiments conducted in the context of VTM 7.0 show that the proposed scheme can improve perceptual quality and reduce bitrates, achieving an average bitrate saving of 5.85% and up to 17.15% for perceptual quality measurements.
The final contribution introduces two affine prediction techniques that can extend the functionality of Intra Block Copy (IBC) and exploit the geometrical transformations between objects and characters in screen content videos. The first technique applies a Control Point Vector (CPV) search mechanism that allows search for more affine transformed IBC blocks in a conventional manner which is identical to the motion estimation in inter blocks. In contrast, the second technique employs a parameter-based approach by predefining suitable affine transformations parameters that are applied on the IBC reference samples and compacting the information necessary to represent these transformations. In the context of VVC standard, the proposed techniques outperform the reference implementations and other state-of-the-art schemes, achieving consistent coding gains and up to 5.41% for specific screen content sequences.
Finally, the proposed contributions have also been examined to test their performance in error prone networking environment by developing a transmission based Quality Of Experience (QoE) model from the VVC dependent parameters. The result shows superior gains in the QoE over the anchor implementations of the respective contributions.
Attend the event
This is a free online event open to everyone. You can attend via Zoom