Anil Fernando received the B.Sc. Engineering degree (First class) in Electronic and Telecommunications Engineering from the University of Moratuwa, Sri Lanka in 1995 and the MEng degree (Distinction) in Telecommunications from Asian Institute of Technology (AIT), Bangkok, Thailand in 1997. He completed his PhD in video coding at the Department of Electrical and Electronic Engineering, University of Bristol, UK in February 2001.
Currently, he is a reader in signal processing at the University of Surrey, UK. Prior to that, he was a senior lecturer in Brunel University, UK and an assistant professor in AIT. His current research interests include cloud communications, video coding, Quality of Experience (QoE), intelligent video encoding for wireless systems and video communication in LTE. He has published more than 290 international publications on these areas. He is a senior member of IEEE and a fellow of the HEA, UK. He is also a member of the EPSRC College.
25 JAN 2021
University of Surrey students and staff among first to get innovative new 20-minute COVID-19 test developed by the University
27 MAY 2020
Commercial partners join forces with three universities to develop rapid COVID-19 test kit
Distributed Video Coding (DVC) is an emerging video coding technology that utilizes the distributed source coding principles to build very low cost video encoders, yet with remarkable error resilience. In the common DVC framework, the reconstruction function plays a vital role that has a direct impact on the output video quality. In this paper, a novel algorithm is proposed for the reconstruction function, particularly focusing on the unidirectional DVC architecture. The proposed technique exploits the variations of the bit error rate of the Wyner-Ziv decoded bit stream and the side information stream. The simulation results show that the proposed algorithm yields a significant improvement of the objective and subjective video quality at no additional bit rate cost.
An effective representation of 3D video in future 3D-TV systems consists of monoscopic video (colour component) and associated per-pixel depth information (depth component). As depth component indicates relative distance between objects within the scene and a camera, pixel values change not only when objects move in vertical and horizontal directions but also when they move in a depth direction. Instead of predicting motion of objects in two directions as appearing in traditional video codecs, three-dimensional block matching (3D-BM) achieves more accurate motion estimation in depth video coding. However overall performance of 3D-BM exceeds that of the traditional two-dimensional block matching (2D-BM) only at high bit rate. In this paper, an adaptive 2D-3D BM selection algorithm is introduced to compromise performance of 2DBM and 3D-BM. The Lagrangian optimisation algorithm is applied to select motion estimation mode at a block level. The experiment results reveal that the proposed adaptive motion-estimation-mode selection can improve the performance of 3D-BM at low bit rate while advantages of 3D-BM are preserved at high bit rate.
Determining the best partitioning structure of a Coding Tree Unit (CTU) is one of the most time consuming operations in HEVC encoding. Specifically, it is the evaluation of the quadtree hierarchy using the Rate-Distortion (RD) optimization that has the most significant impact on the encoding time, especially in the cases of High Definition (HD) and Ultra High Definition (UHD) videos. In order to expedite the encoding for low delay applications, this paper proposes a Coding Unit (CU) size selection and encoding algorithm for inter-prediction in the HEVC. To this end, it describes (i) two CU classification models based on Inter N N mode motion features and RD cost thresholds to predict the CU split decision, (ii) an online training scheme for dynamic content adaptation, (iii) a motion vector reuse mechanism to expedite the motion estimation process, and finally introduces (iv) a computational complexity to coding efficiency trade-off process to enable flexible control of the algorithm. The experimental results reveal that the proposed algorithm achieves a consistent average encoding time performance ranging from 55% – 58% and 57% – 61% with average Bjøntegaard Delta Bit Rate (BDBR) increases of 1.93% – 2.26% and 2.14% – 2.33% compared to the HEVC 16.0 reference software for the low delay P and low delay B configurations, respectively, across a wide range of content types and bit rates.
Stereoscopic video coding research has received considerable interest over the past decade as many 3D displays have been developed. Unfortunately, the vast amount of multimedia content needed to transmit or store a stereo image pair or video sequence has hindered its use in commercial applications. As H.264 offers significantly enhanced compression and a "network-friendly" feature, we have used a H.264 compliant stereoscopic video codec  to compress stereo video. The data partitioning (DP) mode in NAL unit of the H.264 codec is exploited for the use of joint source and channel coding (JSCC) taking channel qualities and reliabilities into account. In this paper, we propose a framework of using unequal error protection (UEP) based JSCC scheme on the H.264 compliant stereoscopic video transmission for additive white Gaussian noise (AWGN) channel. Different levels of error protection are assigned to different partitions based on their decoding importance. Performance comparisons are made against equal error protection (EEP) schemes. Results from the simulation show that using UEP schemes, the overall quality of the decoded main and auxiliary video sequence were clearly improved in comparison with the EEP scheme at good SNR but EEP schemes outperformed UEP schemes at low SNR values. © 2005 IEEE.
The proliferation of video consumption, especially over mobile devices, has created a demand for efficient interactive video applications and high-level video analysis. This is particularly significant in real-time applications and resource-limited scenarios. Pixel-domain video processing is often inefficient for many of these applications due to its complexity, whereas compressed domain processing offer fast but unreliable results. In order to achieve fast and effective video processing, this paper proposes a novel video encoding architecture that facilitate efficient compressed domain processing, while maintaining compliance with the mainstream coding standards. This is achieved by optimizing the accuracy of motion information embedded in the compressed video, in addition to compression efficiency. In a motion detection application, we demonstrate that the motion estimated by the proposed encoder can be directly used to extract object information, as opposed to conventionally coded video. The incurred rate distortion overheads can be weighed against the reduced processing required for video analysis targeting a wide spectrum of computer vision applications.
In this paper, we propose a new motion estimation algorithm based on evolutionary strategy (ES) for the H.264 video codec applied to monoscopic video. The proposed technique applies in macroblock basis and performs a parallel local search for the motion vector associated with the minimum motion compensated residue. For this purpose (μ+λ)-ES is used with heuristically and randomly generated population of initial motion vectors. Experimental results show that the proposed scheme can reduce the computational complexity up to 50% of the motion estimation algorithm used in the H.264 reference codec at the same picture quality. Therefore, the proposed algorithm provides a significant improvement in motion estimation in the H.264 video codec. © 2005 IEEE.
Ren, Kan Fernando, WAC Calic, Janko I-Lab, Centre for Communication System Research University of
The emergence of three dimensional (3D) video applications, based on Depth Image Based Rendering (DIBR) has brought lip more requirements of bandwidth, due to the need of depth information. This additional bandwidth requirement need to be tackled to enable the widespread of 3D video applications based on DIBR. Exploiting visual correlations between the color image and the depth image, in depth image coding, will reduce the requirement of high bandwidth required to transmit the additional depth information. In this paper, an object based depth image coding technique is presented which is suitable for low bit rate 3D-TV applications that are based on Depth Image Based Rendering. The proposed method achieves at most 50% bit rate reduction at low bit rates.
Due to the provision of a more natural representation of a scene in the form of left and right eye views, a stereoscopic imaging system provides a more effective method for image/video display. Unfortunately the vast amount of information that needs to be transmitted/stored to represent a stereo image pair / video sequence, has so far hindered its use in commercial applications. However, by properly exploiting the spatial, temporal and binocular redundancy, a stereo image pair or a sequence could be compressed and transmitted through a single monocular channel's bandwidth without unduly sacrificing the perceived stereoscopic image quality. In this paper, we present a new technique for coding stereo video sequences based on H.264  video codec. The proposed codec exploits disparity and worldline correlation in addition to the advance compression techniques inherited by the H.264 standard to achieve a higher video quality especially in the low bit rates. We compare the performance of the proposed CODEC with a DCT-based, modified MPEG-2 stereo video CODEC  and ZTE based stereo video CODEC . We show that the proposed CODEC outperforms the benchmark CODECs in coding both main and auxiliary streams by up to 9.0 dB PSNR gain. © 2005 IEEE.
One method of evaluating the quality of stereoscopic video is the use of conventional two dimensional (2D) objective metrics. Metrics with good representation of the Human Visual System (HVS) will present more accurate evaluation. In this paper we propose a perceptual based objective metric for 2D videos for 3D video quality evaluation. The proposed Perceptual Quality Metric (PQM) shows better results for 3D video quality evaluation and outperforms the Video Quality Metric (VQM); as it is sensitive to slight changes in image degradation and error quantification starts at pixel level right up to the sequence level. Verifications are done through series of subjective tests to show the level of correlation of PQM and user scores.
Abstract—Along with the rapid increase in the availability for high quality video formats such as HD (High Definition), UHD (Ultra HD) and HDR (High Dynamic Range), a huge demand for data rates during their transmission has become inevitable. Consequently, the role of video compression techniques has become crucially important in the process of mitigating the data rate requirements. Even though the latest video codec HEVC (High Efficiency Video Coding) has succeeded in significantly reducing the data rate compared to its immediate predecessor H.264/AVC (Advanced Video Coding), the HEVC coded videos in the meantime have become even more vulnerable to network impairments. Therefore, it is equally important to assess the consumers’ perceived quality degradation prior to transmitting HEVC coded videos over an error prone network, and to include error resilient features so as to minimize the adverse effects those impairments. To this end, this paper proposes a probabilistic model which accurately predicts the overall distortion of the decoded video at the encoder followed by an accurate QP-λ relationship which can be used in the RDO (Rate Distortion Optimization) process. During the derivation process of the probabilistic model, the impacts from the motion vectors, the pixels in the reference frames and the clipping operations are accounted and consequently the model is capable of minimizing the prediction error as low as 3.11% whereas the state-of-theart methods can’t reach below 20.08% under identical conditions. Furthermore, the enhanced RDO process has resulted in 21.41%- 43.59% improvement in the BD-rate compared to the state-ofthe-art error resilient algorithms.
Distributed video coding (DVC), now has become one of the fastest evolving coding techniques in the current signal processing world. It has its remarkable low complex yet powerful encoder which is essential for some applications in the consumer market. In this paper we propose a novel Wyner-Ziv architecture that splits a frame in to two sub frames and leave one as a Key sub frame while encoding the other as a WZ sub frame. The split key sub frame is encoded using a conventional coding scheme as in the other DVC techniques. At the decoder end the sub key frame is used to exploit the spatial and temporal correlations using an intra prediction technique and temporal motion search in order to generate the side information for the corresponding WZ sub frame. Simulation results show that over 1 dB PSNR gain can be obtained with the proposed algorithm over the results of the algorithm presented in the literature  at the same bit rate.
In recent years, with emerging applications such as wireless video surveillance, multimedia sensor networks, disposable video cameras, medical applications and mobile camera phones, the traditional video coding architecture is being challenged. For these emerging applications, Distributed Video Coding (DVC) seems to be able to offer efficient and low-complexity encoding video compression. In this paper, we present a novel transform domain distributed video coding algorithm based on Turbo Trellis Coded Modulation (TTCM). As in the conventional turbo based Wyner-Ziv encoder, transform quantized coefficients are applied to the TTCM encoder and parity bits are generated from both constituent encoders. However, TTCM symbols are not generated at the encoder since they are not sent to the decoder. Parity bits produced by the TTCM encoder are stored in a buffer and transmitted to the decoder upon request. TTCM symbols are generated at the decoder and these symbols are passed to the TTCM decoder for demodulation. Experimental results show that significant rate-distortion (RD) gains compared to the state-of-the-art results available in the literature can be obtained.
In this paper we propose an improved motion estimation algorithm based on Evolutionary strategy (ES) for H.264 video codec applied to video. The proposed technique works in a parallel local search for macroblocks. For this purpose (mu+lambda) ES is used with an initial population of heuristically and randomly generated motion vectors. Experimental results show that the proposed scheme can reduce the computational complexity up to 50% of the motion estimation algorithm used in the H.264 reference codec at the same picture quality. Therefore, the proposed algorithm provides a significant improvement in motion estimation in the H.264 video codec.
Distributed Video Coding (DVC) is known as an emerging video coding technique, which primarily has a modified complexity balance in the encoder and decoder, in contrast to its traditional competitors. In DVC, we have a very simple low cost encoder which is an ideal feature for applications involving a large number of video capturing points located remotely and a centralized shared decoder. In this paper, we introduce a novel approach for DVC, using Turbo Trellis Coded Modulation (TTCM) to generate the parity information at the encoder to be sent over the channel and then to decode the parity with the side information at the decoder. The side information is generated using key frames passed to the decoder by the use of a pixel interpolation technique. TCM symbols are formed using the side information and the parity bit stream which are fed to the TTCM decoder. The decoded output is used to reconstruct the final video sequence. The results are compared with a turbo coding based DVC and it is evident that the proposed method outperforms its turbo code based counterpart by a significant margin.
This paper presents a Region Of Interest (ROI) based video coding technique for H.264 based video coding with improved error resilience and error protection in the foreground. We used the Flexible Macroblock Ordering (FMO) tool in H.264 video coding standard to encode the video frame into three separate slices. First slice carries the background, and other two slices contains alternative macroblock of the foreground forming a check board pattern. Foreground packets are protected with a stronger error correction code than background packets. Experimental results show that the proposed technique improves the objective quality of the foreground by more than 1 dB.
This paper proposes an improved Wyner-Ziv to H.264 video transcoder as part of a framework for mobile to mobile video communications. In this scheme, the user devices keep the low complexity constraints by using the Wyner-Ziv encoding and H.264 decoding algorithms. They shift their complexities to the network where the proposed transcoder is allocated. The main goal of the transcoder is to convert the bitstream and reduce the delay efficiently. The results show that the proposed transcoder reduces the complexity by a factor of 95% with a negligible rate-distortion loss.
In this paper we propose a novel approach to use both motion and disparity information to compress 3D integral video sequences. The integral video sequence is decomposed into 8 viewpoint video sequences and a block search is performed to jointly exploit the motion and disparity redundancies to maximize the compression. An Evolutionary Strategy (ES) based search algorithm is used to reduce the complexity. Experimental results show that an ES based strategy can reduce the motion estimation complexity by 95%.
In this paper, we discuss a novel approach to Distributed Video Coding (DVC) using bitplane based unequal error protection. Distributed video coding (DVC) has recently attracted a vast amount of attention from the video coding community all around the world, since it could prove very well suited for some applications where low-complexity encoders are a must. The primary feature of DVC is its modified complexity balance in the encoder and decoder, in contrast to its traditional competitors. When considering each pixel of the frame, out of the 8 bits it constitutes, each bit position has a different significance decreasing gradually from the MSB to the LSB. Therefore a scheme which protects the initial bitplanes better would undoubtedly be desirable. Here we propose to protest each bitplane considering its significance using variable memory length in the encoder. Since increasing the memory length results in higher computational complexity, the optimum mix of the memory length is to be evaluated for the codec to operate within the conceptual restrictions of low-complex encoder in DVC. In this paper, the proposed codec is compared for performance with another DVC codec which uses a fixed memory length over the whole sequence with comparable complexity. Simulation results show that the proposed method outperforms its fixed memory length counterpart t by a significant margin.
Along with the rapid growth in consumer adoption of modern portable devices, video streaming is expected to dominate a large share of the global internet traffic in the near future. In the wireless communications domain, this trend creates considerable challenges to consumers’ quality of experience (QoE). From a consumer-focused vision, predicting perceptual video quality is extremely important for QoE-based service provisioning. However, available QoE measurement techniques that adopt a full reference model are impractical in real-time transmission since they require the original video sequence to be available at the receiver’s end. Therefore, the primary aim of this study is to present a cross-layer no-reference prediction model for the perceptual quality of 3D video in the wireless domain. The contributions of this study are twofold: first, the impact of selected quality of service (QoS) parameters from both encoding and network levels on QoE is investigated. Also, the obtained QoS/QoE correlation is backed by thorough statistical analysis. Second, a prediction model based on fuzzy logic inference systems (FIS) is developed by mapping chosen QoS parameters to the measured QoE. This model enables a non-intrusive prediction of 3D visual quality. Conclusive results show a significantly high correlation between the predicted video quality and the objectively measured mean opinion scores (MOS). Objective MOS is also validated through methodical subjective assessments. For consumer’s QoE, this study advances the development of reference-free video quality prediction models and QoE control methods for 3D video streaming.
A neural network based technique is proposed to estimate subjective quality of stereoscopic videos. Moreover, to utilize this model for applications where availability of reference signal is not possible to receiver, it applies objective quality of video with minimum dependency on reference signal. This paper presents fast, accurate and consistent subjective quality estimation. Feasibility and accuracy of the proposed technique is thoroughly analyzed with extensive subjective experiments and simulations. Results illustrate that performance measure of 92.3% in subjective quality estimation can be achieved with the proposed technique. © 2014 IEEE.
Interest in 3D video has surged in recent years. However, efforts to improve the quality of compression and transmission schemes are severely hampered by a lack of effective quality evaluation metrics. This is a particularly severe problem for researchers trying to improve the robustness of video transmission to packet loss. Subjective tests for evaluating error robustness present huge requirements in terms of time and resources. To solve this problem, this paper presents a quality metric for 3D video, and evaluates its effectiveness for the measurement of quality in the presence of packet loss. A key feature of the work is the use of depth planes to enable the metric to better model how the Human Visual System (HVS) perceives 3D video. The quality metric results are compared with subjective test results. The correlation between the proposed quality metric and the subjective test results is shown to be stronger than standard quality metrics, such as Video Quality Metric (VQM).
The hierarchical quadtree partitioning of Coding Tree Units (CTU) is one of the striking features in HEVC that contributes towards its superior coding performance over its predecessors. However, the brute force evaluation of the quadtree hierarchy using the Rate-Distortion (RD) optimisation, to determine the best partitioning structure for a given content, makes it one of the most time-consuming operations in HEVC encoding. In this context, this paper proposes an intelligent fast Coding Unit (CU) size selection algorithm to expedite the encoding process of HEVC inter-prediction. The proposed algorithm introduces (i) two CU split likelihood modelling and classification approaches using Support Vector Machines (SVM) and Bayesian probabilistic models, and (ii) a fast CU selection algorithm that makes use of both offline trained SVMs and online trained Bayesian probabilistic models. Finally, (iii) a computational complexity to coding efficiency trade-off mechanism is introduced to flexibly control the algorithm to suit different encoding requirements. The experimental results of the proposed algorithm demonstrate an average encoding time reduction performance of 53.46%, 61.15%, and 58.15% for Low Delay B , Random Access , and Low Delay P configurations, respectively, with Bjøntegaard Delta-Bit Rate (BD-BR) losses of 2.35%, 2.9%, and 2.35%, respectively, when evaluated across a wide range of content types and quality levels.
The exorbitant increase in the computational complexity of modern video coding standards, such as High Efficiency Video Coding (HEVC), is a compelling challenge for resource-constrained consumer electronic devices. For instance, the brute force evaluation of all possible combinations of available coding modes and quadtree-based coding structure in HEVC to determine the optimum set of coding parameters for a given content demand a substantial amount of computational and energy resources. Thus, the resource requirements for real time operation of HEVC has become a contributing factor towards the Quality of Experience (QoE) of the end users of emerging multimedia and future internet applications. In this context, this paper proposes a content-adaptive Coding Unit (CU) size selection algorithm for HEVC intra-prediction. The proposed algorithm builds content-specific weighted Support Vector Machine (SVM) models in real time during the encoding process, to provide an early estimate of CU size for a given content, avoiding the brute force evaluation of all possible coding mode combinations in HEVC. The experimental results demonstrate an average encoding time reduction of 52.38%, with an average Bjøntegaard Delta Bit Rate (BDBR) increase of 1.19% compared to the HM16.1 reference encoder. Furthermore, the perceptual visual quality assessments conducted through Video Quality Metric (VQM) show minimal visual quality impact on the reconstructed videos of the proposed algorithm compared to state-of-the-art approaches.
The brute force rate-distortion optimisation based approach used in the High Efficiency Video Coding(HEVC) encoders to determine the best block partitioning structure for a given content demands an excessive amount of computational resources. In this context, this paper proposes a novel algorithm to reduce the computational complexity of HEVC inter-prediction using Support Vector Machines. The proposed algorithm predicts the Coding Unit (CU) split decision of a particular block enabling the encoder to directly encode the selected block, avoiding the unnecessary evaluation of the remaining CU size combinations. Experimental results demonstrate encoding time reductions of ~58% ~50%2.27%1.89% Bjøntegaard Delta Bit Rate (BDBR) losses for Random Access and Low-Delay B configurations, respectively.
As the limits of video compression and usable wireless radio resources are exhausted, providing increased protection to critical data is regarded as a way forward to increase the effective capacity for delivering video data. This paper explores the provisioning of selective protection in the physical layer to critical video data and evaluates its effectiveness when transmitted through a wireless multipath fading channel. In this paper, the transmission of HEVC encoded video through an LTE-A wireless channel is considered. HEVC encoded video data is ranked based on how often each area of the picture is referenced by subsequent frames within a GOP in the sequence. The critical video data is allotted to the most robust OFDM resource blocks (RBs), which are the radio resources in the time-frequency domain of the LTE-A physical layer, to provide superior protection. The RBs are ranked based on a prediction for their robustness against noise. Simulation results show that the proposed content aware resource allocation scheme helps to improve the objective video quality up to 37dB at lower channel SNR levels when compared against the reference system, which treats video data uniformly. Alternatively, with the proposed technique the transmitted signal power can be lowered by 30% without sacrificing video quality at the receiver.
Determining the best partitioning structure for a given Coding Tree Unit (CTU) is one of the most time consuming operations within the HEVC encoder. The brute force search through quadtree hierarchy has a significant impact on the encoding time of high definition (HD) videos. This paper presents a fast coding unit size decision-taking algorithm for intra prediction in HEVC. The proposed algorithm utilizes a low complex texture analysis technique based on the local range property of a pixel in a given neighborhood. Simulation results show that the proposed algorithm achieves an average of 72.24% encoding time efficiency improvement with similar rate distortion performance compared to HEVC reference software HM12.0 for HD videos.
Determining the best partitioning structure for a CTU is a time consuming operation for the HEVC encoder. This paper presents a fast CU size selection algorithm for HEVC using a CU classification technique. The proposed algorithm achieves an average of 67.83% encoding time efficiency improvement with a negligible rate-distortion loss.
Performance of real-time video processing applications such as surveillance systems, content-based search, is limited by the complexity of video content analysis in the pixel domain. A low complex alternative is to analyse the video in the compressed domain, where content features already available in the compressed video are directly used in the analysis. However, this is achieved at the expense of output precision and reliability, due to compression-efficiency driven feature selection at the encoder. Therefore, video applications could benefit from enhanced reliability of data embedded in the compressed video. In this paper, we present a scalable optimization model that addresses the accuracy of content features in parallel with the conventional rate-distortion optimization criterion. We analyse and optimize rate-distortion performance of video encoder under content description accuracy constrain, using a motion calibrated synthetic data set containing a range of scene and motion complexity levels. Finally, using a natural video data set, we demonstrate that the proposed optimization framework can be used to enhance compressed feature accuracy without incurring a rate-distortion overhead.
The level of user satisfaction has no standard way of measuring for HDR video content due to the proven difficulty of building HDR quality assessment metrics. To overcome this limitation, Quality of Experience (QoE) modelling of HDR video has been proposed to find a robust and accurate HDR video QoE metric. The proposed model is the first attempt towards assessing and devising a non-reference quality metric for HDR video. It is based on finding the correlation between the HDR video features and the subjective test results. The proposed model achieves a significant correlation score of 0.724 with the subjective results.
The energy consumption of Consumer Electronic (CE) devices during media playback is inexorably linked to the computational complexity of decoding compressed video. Reducing a CE device’s the energy consumption is therefore becoming ever more challenging with the increasing video resolutions and the complexity of the video coding algorithms. To this end, this paper proposes a framework that alters the video bit stream to reduce the decoding complexity and simultaneously limits the impact on the coding efficiency. In this context, this paper (i) first performs an analysis to determine the trade-off between the decoding complexity, video quality and bit rate with respect to a reference decoder implementation on a General Purpose Processor (GPP) architecture. Thereafter, (ii) a novel generic decoding complexity-aware video coding algorithm is proposed to generate decoding complexity-rate-distortion optimized High Efficiency Video Coding (HEVC) bit streams. The experimental results reveal that the bit streams generated by the proposed algorithm achieve 29.43% and 13.22% decoding complexity reductions for a similar video quality with minimal coding efficiency impact compared to the state-of-the-art approaches when applied to the HM16.0 and openHEVC decoder implementations, respectively. In addition, analysis of the energy consumption behavior for the same scenarios reveal up to 20% energy consumption reductions while achieving a similar video quality to that of HM 16.0 encoded HEVC bit streams.
The use of machine learning techniques for encoding complexity reduction in recent video coding standards such as High Efficiency Video Coding (HEVC) has received prominent attention in the recent past. Yet, the dynamically changing nature of the video contents makes it evermore challenging to use rigid traditional inference models for predicting the encoding decisions for a given content. In this context, this paper investigates the resulting implications on the coding efficiency and the encoding complexity, when using offline trained and online trained machine-learning models for coding unit size selection in the HEVC intra-prediction. The experimental results demonstrate that the ground truth encoding statistics of the content being encoded, is crucial to the efficient encoding decision prediction when using machine learning based prediction models.
This paper presents an implementation of a Media Aware Network Element (MANE) for dynamic video content adaptation in Scalable HEVC (SHVC) video streaming. The experimental results discuss the varying quality-to-playback time ratio and decoding power consumption with random access period in SHVC encoding under fluctuating and persistent network bandwidth conditions.
This paper proposes a content adaptive fast CU size selection algorithm for HEVC intra-prediction using weighted support vector machines. The proposed algorithm demonstrates an average encoding time reduction of 52.38% with 1.19% average BDBR increase compared to HM16.1 reference encoder.
Even though the latest video compression techniques such as High Efficiency Video coding (HEVC) have succeeded in significantly alleviating the bandwidth consumption during high resolution video transmission, they have become severely susceptible to transmission errors. Overcoming the resulting temporal impact of the transmission errors on the decoded video requires efficient error resilient schemes that can introduce robustness features to the coded video in order to mitigate the negative impact on the viewer. To this end, this paper proposes a rate-controlled error resilient bit allocation scheme, together with an encoding parameter selection process, to adaptively determine the most robust video coding parameters and the decoder error concealment operations during the encoding itself. Consequently, the proposed method has demonstrated 0.48dB-0.62dB PSNR gain over the state-of-the art methods at the same bit rate.
Distributed Video Coding (DVC) has been proposed for different new application domains. This rise is apparently motivated by the very attractive features of its flexibility for building very low cost video encoders and the very high built-in error resilience when applied over noisy communication channels. These features could be very effectively exploited in several application domains including wireless sensor networks for security surveillance and mobile video communications. So far DVC has used PSNR to measure the quality of the decoded video. In this paper, we analyze the performance of the DVC encoded video with three different quality matrices: PSNR, VQM and SSIM. We used one of our side information refinement algorithms in transform domain DVC as the basis for all comparisons. Simulation results show that the side information refinement technique based on motion analysis can achieve a significant coding gain with respective to all quality matrices considered. However, subjective results show that VQM has a very high correlation with the MOS which suggests that VQM is a better quality metric that can be used in DVC.
Internet-based social and interactive video applications have become major constituents of the envisaged applications for next-generation multimedia networks. However, inherently dynamic network conditions, together with varying user expectations, pose many challenges for resource allocation mechanisms for such applications. Yet, in addition to addressing these challenges, service providers must also consider how to mitigate their operational costs (e.g., energy costs, equipment costs) while satisfying the end-user quality of service (QoS) expectations. This paper proposes a heuristic solution to the problem, where the energy incurred by the applications, and the monetary costs associated with the service infrastructure, are minimized while simultaneously maximizing the average end-user QoS. We evaluate the performance of the proposed solution in terms of serving probability, i.e., the likelihood of being able to allocate resources to groups of users, the computation time of the resource allocation process, and the adaptability and sensitivity to dynamic network conditions. The proposed method demonstrates improvements in serving probability of up to 27%, in comparison with greedy resource allocation schemes, and a several-orders-of-magnitude reduction in computation time, compared to the linear programming approach, which significantly reduces the service-interrupted user percentage when operating under variable network conditions.
Martucci's Zero-tree Entropy (ZTE) coding algorithm exploits the advantages of discrete wavelet coding to improve the video quality of inter-frame coding. In this algorithm the wavelet tree is coded into three different tables, type, valz and valnz. Subsequently these three tables are arithmetic coded. However, the statistical variation of the distribution of the values and types are not considered in this categorization. We propose to decompose the zerotree such that this variation is taken into account. In our representation, values and the types are reorganized into hierarchical structures according to the sub band and the orientation. Our test results show that the proposed algorithm can reduce the bit rate per frame up to 73% over the Martucci's algorithm and performs better than EBCOT at higher bit rates.
In order to jointly optimise the quality of video coding on one hand and video analysis on the other, this paper proposes a novel approach to enhance the reusable information content in compressed video domain. By introducing a hierarchical content driven motion estimation mechanism at the encoder, complemented by a statistical prediction of region-of-interest, this approach reduces the complexity and yet increases robustness of the compressed domain vision analysis applications. Taking the object tracking application as an example, we demonstrate that the motion vectors generated by the proposed method can be directly used to extract object information, achieving tracking performance comparable with a pixel domain approach. In addition, we show that the incurred rate distortion (RD) overheads and the effect on encoder complexity are minimal, especially when compared to the reduction of processing required for video analysis targeting a wide spectrum of computer vision applications.
The recent development of three dimensional (3D) display technologies has resulted in a proliferation of 3D video production and broadcasting, attracting a lot of research into capture, compression and delivery of stereoscopic content. However, the predominant design practice of interactions with 3D video content has failed to address its differences and possibilities in comparison the existing 2D video interactions. This paper presents a study of user requirements related to interaction with the stereoscopic 3D video. The study suggests that the change of view, zoom in/out, dynamic video browsing and textual information are the most relevant interactions with stereoscopic 3D video. In addition, we identified a strong demand for object selection that resulted in a follow-up study of user preferences in 3D selection using virtual-hand and ray-casting metaphors. These results indicate that interaction modality affects users' decision of object selection in terms of chosen location in 3D, while user attitudes do not have significant impact. Furthermore, the ray-casting based interaction using Wiimote can outperform the volume-based interaction technique using mouse and keyboard for object positioning accuracy. © 2012 IEEE.
Social interaction of groups of users, amongst themselves and with the media content itself, is increasingly becoming popular due to the advancements in the Internet access technologies. However, multimedia resource provisioning for dispersed user groups poses a challenge and demands innovative technologies. This paper proposes a novel approach based on Particle Swarm Optimization (PSO) to optimally allocate computational and networking resources to a group of interactive users, such that the group Quality-of-Service (QoS) is maximized. We evaluate the performance of the proposed improved PSO method with respect to the state-of-the-art greedy resource allocation mechanisms and related PSO approaches. The ability to find a feasible solution (i.e., the serving probability) and the accuracy of such solutions are compared for different network topologies. The proposed method demonstrates reduced computational complexity, an up to 40% increase in the serving probability compared to the greedy methods, and up to 60 times faster convergence compared to the basic PSO approach. Overall, the comparable QoS level to the optimal solution suggests that the proposed solution efficiently allocates the resources available in the network.
Personalized interactive television broadcasting requires real-time audiovisual processing at levels impractical in end-user equipment. However, guaranteeing Quality of Service (QoS) also remains a challenge for cloud-based solutions. This paper proposes a group QoS optimization approach to the problem that demonstrates significant improvements in the number of users being served.
Error concealment techniques such as motion copying require significant changes to HEVC (High Efficiency Video Coding) motion estimation process when incorporated in error resilience frameworks. This paper demonstrates a novel motion estimation mechanism incorporating the concealment impact from future coding frames to achieve an average 0.73 dB gain over the state-of-the-art.