Placeholder image for staff profiles

Dr Georgios Georgis


Research Fellow
B.Sc., M.Sc., Ph.D

Academic and research departments

Faculty of Engineering and Physical Sciences.

Research

Research interests

My publications

Publications

Georgis G, Lentaris G, Reisis D (2016) Reduced Complexity Super-Resolution for Low-Bitrate Video Compression, IEEE Transactions on Circuits and Systems for Video Technology 26 (2) pp. 332-345
Evolving video applications impose requirements for high image quality, low bitrate, and/or small computational cost. This paper combines state-of-the-art coding and superresolution (SR) techniques to improve video compression both in terms of coding efficiency and complexity. The proposed approach improves a generic decimation-quantization compression scheme by introducing low complexity single-image SR techniques for rescaling the data at the decoder side and by jointly exploring/optimizing the downsampling/upsampling processes. The enhanced scheme achieves improvement of the quality and system's complexity compared with conventional codecs and can be easily modified to meet various diverse requirements, such as effectively supporting any off-the-shelf video codec, for instance H.264/Advanced Video Coding or High Efficiency Video Coding. Our approach builds on studying the generic scheme's parameterization with common rescaling techniques to achieve 2.4-dB peak signal-to-noise ratio (PSNR) quality improvement at low-bitrates compared with the conventional codecs and proposes a novel SR algorithm to advance the critical bitrate at the level of 10 Mb/s. The evaluation of the SR algorithm includes the comparison of its performance to other image rescaling solutions of the literature. The results show quality improvement by 5-dB PSNR over straightforward interpolation techniques and computational time reduction by three orders of magnitude when compared with the highly involved methods of the field. Therefore, our algorithm proves to be most suitable for use in reduced complexity downsampled compression schemes.
Menoutis G, Foteas A, Liakopoulos N, Georgis G, Reisis D, Synnefakis G (2015) A configurable transmitter architecture & organization for XG-PON OLT/ONU/ONT network elements, pp. 673-676 IEEE
The XG-PON standard for Passive Optical Networks
(PONs) imposed high performance requirements for network
equipment. Especially, the 10G transmitter designs of the office
equipment (OLT), the terminals and the network units (ONTs
and ONUs) become quite demanding because of the real-time
requirements for preparing a frame. The current paper introduces
a three layer architecture, scalable with respect to the
bandwidth and suitable to realize the transmitter of the XGPON
OLT/ONT/ONU elements. The architecture?s upper layer
decides what data packets will be transmitted. The second layer?s
microsequencer commands the lowest layer?s modules, which
produce and locally store all the data packets to be transmitted.
The three layer approach allows the architecture to be configured
and organized as either an OLT transmitter or an ONU/ONT
transmitter; and to be scalable and perform the functions of the
OLT at 10 Gbps and those of an ONU/ONT at 2.5 Gbps. The
implementation of a XG-PON ONU transmitter on Xilinx Virtex7
verifies the approach.
Manolopoulos K, Belias A, Georgis G, Reisis DI, Anasontzis EG (2013) Signal Processing for Deep-Sea Observatories with Reconfigurable Hardware, Proceedings of the 19th IEEE International Conference on Electronics, Circuits and Systems (ICECS) pp. 81-84 IEEE
The recent evolution of deep-sea observatories has provided the infrastructure for studying rare phenomena in astroparticle physics, extended phenomena in physical oceanography and environmental monitoring for climate modeling and civic alert systems. The observatories involve sets of sensors distributed in the deep-sea, which transmit data through Gbit electro-optical lines to a shore station for real-time processing. Each set of sensors communicates data and control with the shore station through a readout system. Targeting the improvement of the observatory, the current paper proposes a readout system with enhanced functionality, which includes the ability to reconfigure the communication channels, provide statistic measurements of the communicated data and efficient data filtering. The design of the architecture is suited for FPGA implementation and the instantiation on the Xilinx ML605 board validates the results.
The current paper introduces a real-time architecture for the computation of the Generalized Partial Directed Coherence (GPDC) of multiple signals. The motivating application is the localization and control of epileptic seizures where hitherto published results shown the effectiveness of exploiting Generalized Partial Directed Coherence to quantify and analyse connectivity and interaction of brain structures. To speed up GPDC computations we develop first, a parallelizing strategy leading to the high performance scalable architecture and second, a low-complexity fixed-point reciprocal square root module. We show that a real-time computation is feasible at a speed of 0.027ms for 16 channels and 1.637ms for 128 channels. Furthermore, the implementation results on Xilinx 7A35T, KC705, VC707, KU115 show that the power requirements are quite modest and allow for the embedded application of the engine.
Georgis G, Tzeranis C, Reisis DI, Synnefakis G (2015) XG-PON optical network unit downstream FEC design based on truncated Reed-Solomon code, Proceedings of the 21st IEEE International Conference on Electronics, Circuits and Systems (ICECS) pp. 782-785 IEEE
The XG-PON standard for Passive Optical Networks (PONs) requires the utilization of a Reed-Solomon block code at a 10Gbps downstream rate, dictating low latency and high throughput processing, no word interleaving and no stall between codewords. The current paper presents in detail a parallel architecture which decodes the RS(248,216) shortened code in the XG-PON ONT/ONU receiver. Based on a modified implementation of the Degree-Computationless Modified Euclidean (DCME) algorithm, the designed Key Equation Solver (KES) and its control unit allow for both solving the key equation and computing the number of the errors detected, in 31 clock cycles. Validating the proposed design on a Xilinx Kintex 7 FPGA and comparing to a pipelined serial DCME implementation reveals a reduction of 48% in the number of slices occupied and 6 times regarding the latency induced. Our implementation achieves a throughput of 16Gbps on the specified device thus meeting the XG-PON downstream FEC requirements with relatively low effort. The results could be adapted for a multitude of optical communication standards based on RS codes due to the 64-bit pipelined architecture and the FPGA-transparent HDL design.
Georgis G, Reisis DI, Skordilakis P, Tsakalis KS, Shafique AB, Chatzikonstantis G, Lentaris G (2014) Neuronal connectivity assessment for epileptic seizure prevention: Parallelizing the generalized partial directed coherence on many-core platforms, Proceedings of International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS XIV) pp. 359-366 IEEE
Research on the prevention of epileptic seizures has led to approaches for future treatment techniques, which rely on the demanding computation of generalized partial directed coherence (GPDC) on electroencephalogram (EEG) data. A fast computation of such metrics is a key factor both for the off-line optimization of algorithmic parameters and for its real-time implementation. Aiming at speeding up the GPDC computations on EEG data, the current paper presents massively parallel computational strategies for implementing the GPDC on many-core architectures. We apply the proposed strategies on commercial and experimental many-core platforms and we compare the results of the computation time of a set of EEG data on the Bulldozer and Ivy Bridge x86_64 serial processors. We test the GPUs of nVidia GTX 550 Ti and GTX 670, which at the best case achieve a significant speedup of 190x and 460x respectively. Moreover, we apply the proposed parallelization strategies on the Single-Chip Cloud Computer (SCC), an experimental processor created by Intel Labs.
Georgis G, Lentaris G, Reisis DI (2012) Study of interpolation filters for motion estimation with application in H.264/AVC encoders, Proceedings of 18th IEEE International Conference on Electronics, Circuits, and Systems (ICECS 2011) pp. 9-12 IEEE
Image super-resolution plays an important role in a plethora of applications, including video compression and motion estimation. Detecting fractional displacements among frames facilitates the removal of temporal redundancy and improves the video quality by 2-4 dB PSNR [1] [2]. However, the increased complexity of the Fractional Motion Estimation (FME) process adds a significant computational load to the encoder and sets constraints to real-time designs. Timing analysis shows that FME accounts for almost half of the entire motion estimation period, which in turn accounts for 60?90% of the total encoding time depending on the design configuration.

FME bases on an interpolation procedure to increase the resolution of any frame region by generating sub-pixels between the original pixels. Modern compression standards specify the exact filter to use in the Motion Compensation module allowing the encoder and the decoder to create and use identical reference frames. In particular, H.264/AVC specifies a 6-tap filter for computing the luma values of half-pixels and a low cost 2-tap filter for computing quarter-pixels. Even though it is common practice for encoder designers to integrate the standard 6-tap filter also in the Estimation module (before Compensation), the fact is that the interpolation technique used for detecting the displacements (not computing their residual) is an open choice following certain performance trade-offs.

Aiming at speeding up the Estimation, a process of considerably higher computational demand than the Compensation, this work builds on the potential to implement a lower complexity interpolation technique instead of the H.264 6-tap filter. We integrate in the Estimation module several distinct interpolation techniques not included in the H.264 standard, while keeping the standard H.264/AVC Compensation to measure their impact on the outcome of the prediction engine.

Related bibliography includes both ideas to avoid/replace the standard computations, as well as architecturestargeting the efficient implementation of the H.264 6-tap filtering procedure and the support of its increased memory requirements. To this end, we note that H.264 specifies a kernel with coefficients è1,?5,20,20,?5,1é to be multiplied with six consecutive pixels of the frame (either in column or row format). The resulting six products are accumulated and normalized for the generation of a single half-pixel (between 3 rd and 4 th tap). The operation must be r

Georgis G, Tzeranis C, Reisis D, Synnefakis G (2014) FPGA design of the decoding functions in the physical layer adaptation subsystem of the XG-PON optical network unit/terminal, Ph.D. Research in Microelectronics and Electronics (PRIME), 2014 10th Conference on pp. 1-4
The XG-PON standard for Passive Optical Networks (PONs) has imposed requirements for high performance processing in the architectures of network equipment. Especially, the designs of the 10Gbps receiver terminals and the network units (ONTs and ONUs) can become quite demanding. The current paper focuses on the XG-PON ONT/ONU receiver and presents an FPGA design realizing the decoding functions of the XG-PON physical adaptation layer: The scrambling, the RS(248,216) decoding and the Hybrid Error Correction (HEC) architectures, which are designed to communicate through a 64-bit bus. This work describes the components? features and validates the results by showing the design?s performance on a Xilinx Kintex 7 FPGA.
Husmann C, Georgis G, Nikitopoulos K, Jamieson K (2017) FlexCore: Massively Parallel and Flexible Processing for Large MIMO Access Points, Proceedings of the 14th USENIX Symposium on Networked Systems Design and Implementation (NSDI ?17) pp. 197-211
Large MIMO base stations remain among wireless network designers? best tools for increasing wireless throughput while serving many clients, but current system designs, sacrifice throughput with simple linear MIMO detection algorithms. Higher-performance detection techniques are known, but remain off the table because these systems parallelize their computation at the level of a whole OFDM subcarrier, sufficing only for the lessdemanding linear detection approaches they opt for. This paper presents FlexCore, the first computational architecture capable of parallelizing the detection of large numbers of mutually-interfering information streams at a granularity below individual OFDM subcarriers, in a nearly-embarrassingly parallel manner while utilizing any number of available processing elements. For 12 clients sending 64-QAM symbols to a 12-antenna base station, our WARP testbed evaluation shows similar network throughput to the state-of-the-art while using an order of magnitude fewer processing elements. For the same scenario, our combined WARP-GPU testbed evaluation demonstrates a 19× computational speedup, with 97% increased energy efficiency when compared with the state of the art. Finally, for the same scenario, an FPGAbased comparison between FlexCore and the state of the art shows that FlexCore can achieve up to 96% better energy efficiency, and can offer up to 32× the processing throughput.
Georgis G, Lentaris GL, Reisis DR (2013) Low Complexity Interpolation Filters for Motion Estimation and Application to the H.264 Encoders, In: Ruiz GR (eds.), Design and Architectures for Digital Signal Processing 6 pp. 137-154 InTech
Techniques for image super-resolution play an important role in a plethora of applications, which include video compression and motion estimation. The detection of the fractional displacements among frames facilitates the removal of temporal redundancy and improves the video quality by 2-4 dB PSNR. However, the increased complexity of the Fractional Motion Estimation (FME) process adds a significant computational load to the encoder and sets constraints to real-time designs. Researchers have performed timing analysis for the motion estimation process and they reported that FME accounts for almost half of the entire motion estimation period, which in turn accounts for 60-90% of the total encoding time depending on the design configuration.
Georgis G, Lentaris G, Reisis DI (2013) Single-image super-resolution using low complexity adaptive iterative back-projection., DSP pp. 1-6 IEEE
Georgis G, Lentaris G, Reisis D (2016) Acceleration Techniques and Evaluation on Multicore CPU, GPU and FPGA for Image Processing and Super-Resolution, Journal of Real-Time Image Processing Springer Berlin Heidelberg
Super-Resolution (SR) techniques constitute a key element in image applications, which need high- resolution reconstruction while in the worst case only a single low-resolution observation is available. SR techniques involve computationally demanding processes and thus researchers are currently focusing on SR performance acceleration. Aiming at improving the SR performance, the current paper builds up on the characteristics of the L-SEABI Super-Resolution (SR) method to introduce parallelization techniques for GPUs and FPGAs. The proposed techniques accelerate GPU reconstruction of Ultra-High Definition content, by achieving three (3x) times faster than the real-time performance on mid-range and previous generation devices and at least nine times (9x) faster than the real-time performance on high-end GPUs. The FPGA design leads to a scalable architecture performing four (4x) times faster than the real-time on low-end Xilinx Virtex 5 devices and sixty-nine times (69x) faster than the real-time on the Virtex 2000t. Moreover, we confirm the benefits of the proposed acceleration techniques by employing them on a different category of image-processing algorithms: on window-based Disparity functions, for which the proposed GPU technique shows an improvement over the CPU performance ranging from 14 times (14x) to 64 times (64x) while the proposed FPGA architecture provides 29x acceleration.
Georgis Georgios, Nikitopoulos Konstantinos, Jamieson K (2017) Geosphere: an Exact Depth-First Sphere Decoder Architecture Scalable to Very Dense Constellations, IEEE Access 5 pp. 4233-4249 IEEE
This paper presents the algorithmic design, experimental evaluation, and VLSI implementation of Geosphere, a depth-first sphere decoder able to provide the exact maximumlikelihood solution in dense (e.g., 64) and very dense (e.g., 256, 1024) QAM constellations by means of a geometrically inspired enumeration. In general, linear detection methods can be highly effective when the MIMO channel is well-conditioned. However, this is not the case when the size of the MIMO system increases and the number of transmit antennas approaches the number of the receive antennas. Via our WARP testbed implementation we gather indoor channel traces in order to evaluate the performance gains of sphere detection against zero-forcing and MMSE in an actual indoor environment. We show that Geosphere can nearly linearly scale performance with the number of user antennas; in 4 × 4 multi-user MIMO for 256-QAM modulation at 30 dB SNR there is a 1.7× gain over MMSE and 2.4× over zeroforcing and a 14% and 22% respective gain in 2 × 2 systems. In addition, by using a new node labeling based enumeration technique, low-complexity integer arithmetic and fine-grained clock gating, we implement for up to 1024-QAM constellations and compare in terms of area, delay, power characteristics, the Geosphere VLSI architecture and the best-known best-scalable exact ML sphere decoder. Results show that Geosphere is twice as area-efficient and 70% more energy efficient in 1024-QAM. Even for 16-QAM Geosphere is 13% more area efficient than the best-known implementation for 16-QAM and it is at least 80% more area efficient than state-of-the-art K-best detectors for 64-QAM.
Nikitopoulos Konstantinos, Georgis Georgios, Jayawardena Chathura, Chatzipanagiotis Daniil, Tafazolli Rahim (2018) Massively Parallel Tree Search for High-Dimensional Sphere Decoders, Transactions on Parallel and Distributed Systems IEEE
The recent paradigm shift towards the transmission of large numbers of mutually interfering information streams, as in the
case of aggressive spatial multiplexing, combined with requirements towards very low processing latency despite the frequency
plateauing of traditional processors, initiates a need to revisit the fundamental maximum-likelihood (ML) and, consequently, the
sphere-decoding (SD) detection problem. This work presents the design and VLSI architecture of MultiSphere; the first method to
massively parallelize the tree search of large sphere decoders in a nearly-concurrent manner, without compromising their
maximum-likelihood performance, and by keeping the overall processing complexity comparable to that of highly-optimized sequential
sphere decoders. For a 10 å 10 MIMO spatially multiplexed system with 16-QAM modulation and 32 processing elements, our
MultiSphere architecture can reduce latency by 29å against well-known sequential SDs, approaching the processing latency of linear
detection methods, without compromising ML optimality. In MIMO multicarrier systems targeting exact ML decoding, MultiSphere
achieves processing latency and hardware efficiency that are orders of magnitude improved compared to approaches employing one
SD per subcarrier. In addition, for 16å16 both ?hard?- and ?soft?-output MIMO systems, approximate MultiSphere versions are shown to
achieve similar error rate performance with state-of-the art approximate SDs having akin parallelization properties, by using only one
tenth of the processing elements, and to achieve up to approximately 9å increased energy efficiency.
Georgis Georgios, Thanos Alexios, Filo Marcin, Nikitopoulos Konstantinos (2020) A DSP ACCELERATION FRAMEWORK FOR SOFTWARE-DEFINED RADIOS ON X86 64, ICASSP 2020
This paper presents a DSP acceleration and assessment framework targeting SDR platforms on x86 64 architectures. Driven by the
potential of rapid prototyping and evaluation of breakthrough concepts that these platforms provide, our work builds upon the wellknown
OpenAirInterface codebase, extending it for advanced, previously unsupported modes towards large and massive MIMO such as non-codebook-based multi-user transmissions. We then develop an acceleration/profiling framework, through which we present finegrained
execution results for DSP operations. Incorporating the latest SIMD instructions, our acceleration framework achieves a unitary speedup of up to 10. Integrated into OpenAirInterface, it accelerates computationally expensive MIMO operations by up to 88% across tested modes. Besides resulting in a useful tool for the community, this work provides insight on runtime DSP complexity and the potential of modern x86 64 systems.
Georgis Georgios, Filo Marcin, Thanos Alexios, Husmann Christopher, De Luna Ducoing Juan Carlos, Tafazolli Rahim, Nikitopoulos Konstantinos (2019) SWORD: Towards a Soft and Open Radio Design
for Rapid Development, Profiling,
Validation and Testing,
IEEE Access Institute of Electrical and Electronics Engineers
The vision, as we move to future wireless communication systems, embraces diverse qualities
targeting significant enhancements from the spectrum, to user experience. Newly-defined air-interface
features, such as large number of base station antennas and computationally complex physical layer
approaches come with a non-trivial development effort, especially when scalability and flexibility need to
be factored in. In addition, testing those features without commercial, off-the-shelf equipment has a high
deployment, operational and maintenance cost. On one hand, industry-hardened solutions are inaccessible
to the research community due to restrictive legal and financial licensing. On the other hand, researchgrade
real-time solutions are either lacking versatility, modularity and a complete protocol stack, or, for
those that are full-stack and modular, only the most elementary transmission modes are on offer (e.g., very
low number of base station antennas). Aiming to address these shortcomings towards an ideal research
platform, this paper presents SWORD, a SoftWare Open Radio Design that is flexible, open for research,
low-cost, scalable and software-driven, able to support advanced large and massive Multiple-Input Multiple-
Output (MIMO) approaches. Starting with just a single-input single-output air-interface and commercial
off-the-shelf equipment, we create a software-intensive baseband platform that, together with an acceleration/
profiling framework, can serve as a research-grade base station for exploring advancements towards
future wireless systems and beyond.

Additional publications