11am - 12 noon

Monday 19 June 2023

Machine learning for video coding and quality assessment

PhD Viva Open Presentation by Buddhi Erabadda.


back to all events

This event has passed


An enormous amount of videos is being transferred over the network every second. To cope with this large amount of video data, it is fundamental to have efficient video compression algorithms. As such, High Efficiency Video Coding (HEVC) standard introduced the concepts of hierarchical quad-tree structure, coding tree units, and coding units (CU) enabling higher compression, when compared with its predecessors.

These concepts have also been adopted in later standards such as Versatile Video Coding, with higher numbers of partitioning structures and prediction modes. These video coding standards typically adopt brute-force approaches to calculate the optimal coding parameters, significantly increasing their computational complexity. Concomitantly, while these standards achieve superior compression efficiency, advancements and widespread accessibility of technology push for further compression of video content.

One of the prominent sources for increased video data traffic is the increased use of social media platforms where a vast amount of user-generated content (UGC) is added to the network. Assessing the quality of these videos, which varies widely (e.g. from 1080p/2K/4K to 360p and below), is critical for business activities such as efficient storage/bandwidth management, video promotion applications, and spam detection.

First, addressing the high computational complexity of HEVC, two algorithms are proposed in this thesis focusing on intra-prediction and inter-prediction. In both approaches, the brute-force coding parameter selection process is replaced by machine learning-based models that predict the CU split decisions, significantly reducing the computational complexity of the encoding processes. Furthermore, the proposed methods introduce a computational complexity control parameter which offers trading off of computational complexity for coding efficiency depending on the application.

Secondly, addressing the challenge of quality assessment of UGC videos, a no-reference metric is proposed to predict the Mean Opinion Score of videos. The proposed work presents a hybrid metric where compression-related features and metadata-related features are employed, while also combining pixel-related features that are carefully selected from state-of-the-art.

Finally, exploiting more possibilities to achieve further compression, several machine learning-based long-term reference picture selection algorithms are proposed for HEVC inter-prediction. This work focuses on videos with repeated shots where correlation among frames from the same shot are not exploited otherwise. The proposed methods address both static and dynamic (e.g. handheld) camera scenarios and report further compression gains, boosting the performance of HEVC inter-prediction process.