12 noon - 11pm
Sunday 19 June - Friday 24 June 2022
CVSSP at CVPR 2022
The IEEE / CVF Computer Vision and Pattern Recognition Conference (CVPR) is the premier annual computer vision event.
The IEEE / CVF Computer Vision and Pattern Recognition Conference (CVPR) is the premier annual computer vision event comprising the main conference and several co-located workshops and short courses. With its high quality and low cost, it provides an exceptional value for students, academics and industry researchers. CVPR 2022 will be a hybrid conference, with both in-person and virtual attendance options.
Again this year, CVSSP is presenting a variety of papers. See details below.
Style-Based Global Appearance Flow for Virtual Try-On
Image-based virtual try-on aims to fit an in-shop garment into a clothed person image. To achieve this, a key step is garment warping which spatially aligns the target garment with the corresponding body parts in the person image. Prior methods typically adopt a local appearance flow estimation model. They are thus intrinsically susceptible to difficult body poses/occlusions and large mis-alignments between person and garment images. To overcome this limitation, a novel global appearance flow estimation model is proposed in this work. For the first time, a StyleGAN based architecture is adopted for appearance flow estimation. This enables us to take advantage of a global style vector to encode a whole-image context to cope with the aforementioned challenges.
Doodle It Yourself: Class Incremental Learning by Drawing a Few Sketches
This paper focusses on Few-Shot Incremental Learning (FSCIL) where sketches aid the model in learning novel classes. Gradient consensus coupled with knowledge distillation and graph attention networks, ensure a robust learning while not forgetting old knowledge.
Sketching without Worrying: Noise-Tolerant Sketch-Based Image Retrieval
This paper shows that accuracy for sketch based image retrieval falls due to the presence of noisy strokes in sketch-queries. Accordingly it proposes a reinforcement learning enforced stroke-subset selector that filters out noisy strokes for a successful retrieval, easing the common concern of "I can't sketch".
Sketch3T: Test-time Training for Zero-Shot SBIR
Models trained in existing zero-shot sketch based image retrieval setup, struggle in understanding sketches from test-time distribution. This paper thus introduces a test-time training paradigm, where a model adapts to the test-distribution using just one sketch via a free of cost self-supervised task during inference, without forgetting old knowledge thanks to a novel meta-learning based training framework.
Partially Does It: Towards Scene-Level FG-SBIR with Partial Input
This work upholds an important aspect in understanding scene-sketches -- a scene-sketch may not contain all objects corresponding to its photo, thus resulting in collapsing retrieval accuracy as sketches become relatively emptier or ‘partial’. Addressing this issue, a set-based approach using Optimal Transport has been proposed, to model cross-modal region associativity for better retrieval.
Finding Badly Drawn Bunnies
By Kaiyue Pang, Lan Yang, Yi-Zhe Song, Honggang Zhang.
Paper | @yizhe_song
Everyone can sketch, the debate is on how well. We, for the first time, teach computers to recognise this "how well" problem that tells you just how badly/well drawn your bunny (or any other sketch) is. Our key discovery lies in exploiting the magnitude of a sketch feature as a quantitative quality metric. This is reassuring for many as we no longer need to collect expensive quality annotations from humans to enable the said metric learning. We confirm consistent quality agreements between our proposed metric and human perception through a carefully designed human study. We also showcase the practical benefits in three sketch applications thanks to the successful modelling of sketch quality.
Talk: 10:00AM-12:30PM (CST), 22, June, 2022. 162 - Poster Session 2.1, New Orleans Ernest N. Morial Convention Center.
"The Pedestrian next to the Lamppost'' Adaptive Object Graphs for Better Instantaneous Mapping
Estimating a semantically segmented bird’s-eye-view (BEV) map from a single image has become a popular technique for autonomous control and navigation. However, they show an increase in localization error with distance from the camera. While such an increase in error is entirely expected – localization is harder at distance – much of the drop in performance can be attributed to the cues used by current texture-based models, in particular, they make heavy use of object-ground intersections (such as shadows) , which become increasingly sparse and uncertain for distant objects. In this work, we address these shortcomings in BEV-mapping by learning the spatial relationship between objects in a scene. We propose a graph neural network which predicts BEV objects from a monocular image by spatially reasoning about an object within the context of other objects. Our approach sets a new state-of-the-art in BEV estimation from monocular images across three largescale datasets, including a 50% relative improvement for objects on nuScenes.
Signing at Scale: Learning to Co-Articulate Signs for Large-Scale Photo-Realistic Sign Language Production
By Ben Saunders, Richard Bowden, Cihan Camgoz.
Paper | @BenMSaunders @ncihancamgoz @CogVis
Sign languages are visual languages, with vocabularies as rich as their spoken language counterparts. However, current deep-learning based Sign Language Production (SLP) models produce under-articulated skeleton pose sequences from constrained vocabularies and this limits applicability. To be understandable and accepted by the deaf, an automatic SLP system must be able to generate co-articulated photo-realistic signing sequences for large domains of discourse.
Prof Yi-Zhe SongProfessor of Computer Vision and Machine Learning; Programme Lead of MSc in Artificial Intelligence