11am - 12 noon

Friday 27 June 2025

Generative Models for High Dynamic Range Panorama Outpainting from a Single-View Image

PhD Viva Open Presentation - Jack Hilliard

Online event (Teams) - All Welcome!

Free

Online

Speakers


Generative Models for High Dynamic Range Panorama Outpainting from a Single-View Image

Abstract:

In the past few years, the field of illumination estimation from a limited field of view (LFOV) low dynamic range image (LDRI) has witnessed rapid developments assisted by the innovations in the overarching field of image generation. Initial approaches that could only produce lighting conditions to render diffuse and specular objects accurately have made way for image-based lighting models that can generate the features of a scene to plausibly render mirror reflective objects by utilising high dynamic range (HDR) 360◦ Equirectangular Panorama (ERP) environment maps. Although impressive advancements have been made, the field has not yet realised its apex, where the estimated illumination conditions of a scene can be used to composite an object that is indistinguishable for the human eye. In this thesis, we aim to further the field of illumination estimation to its ultimate goal by investigating generative Artificial Intelligence (AI) approaches to improve the accuracy of the estimated light sources and the plausibility of the generated features while reducing the commonly introduced image artefacts.

Our first contribution focuses on the illumination estimation problem as an outpainting task by incorporating the state-of-the-art approaches from the field of outpainting. We utilise a two-stage Generative Adversarial Network (GAN) model that first generates a diffuse irradiance map of the scene and then uses it, in combination with the initial input LFOV image, to generate the HDR environment map. We also incorporate some techniques such as rotating the ERP by 180◦ and performing circular padding on the sides of the image to reduce the border seam created from using the ERP format. Our approach performs competitively with state-of-the-art illumination estimation methods in terms of lighting accuracy for diffuse and specular surfaces but it lacks in terms of image plausibility and removing artefacts from the generated panorama.

The second contribution aims to remove the warping artefacts created by using the ERP format in standard convolutional neural networks. These appear as warped objects and textures at the image’s poles and as a border seam with non-continuous generation at the sides of the image. We achieve this by creating a U-Net style Vision Transformer (ViT) GAN that leverages the PanoSWIN attention block and circular convolutional padding. This network removed the seam and homogenised the generation at the sides of the image. It also reduced warping artefacts at the poles. This method performs at a state-of-the-art level in terms of lighting accuracy for diffuse surfaces as well as producing plausible panoramas, however, it tends to underperform in terms of the FID score compared to the current state-of-the-art methods.

In our third contribution, we incorporate the state-of-the-art image generation architecture, the Latent Diffusion Model (LDM). We modify the architecture of the LDM to reduce the aforementioned ERP artefacts. We propose an ERP convolutional padding for the latent autoencoder to remove the border seam and aid in continuous generation across the panorama. We propose and compare four Diffusion Transformer (DiT) architectures designed for the ERP format including our previous contribution. We compare this with the standard U-Net architecture for the denoising network and find that although there are benefits of using the an adapted architecture, such as reduced warping at the poles and more accurate lighting estimation, the U-Net generates higher quality and more plausible HDR environment maps. Evaluating our best model in line with current benchmarks, we demonstrate that our model is competitive with state-of-the-art illumination estimation methods.