1pm - 2pm

Thursday 1 February 2024

Learning Natural Vector Sketch Models

PhD Viva Open Presentation by Ayan Das

All Welcome!


CVSSP/AI Institute
University of Surrey
back to all events

This event has passed


Learning Natural Vector Sketch Models


Sketching is an effective communication tool naturally equipped in humans capable of expressing complex thoughts and concepts using relatively simpler objects. Sketches being produced primarily by humans, may (and should) not be treated as raster images, which are rather dense grid representations acquired by photographic sensors. In this thesis, we study and build vector sketch modelling frameworks that are fundamentally more "natural", i.e., they better mimic human's creative process, as well as offer properties and abilities favorable to downstream digital applications. Sketches, although a good representative, we study a broader class of sparse structures that share a similar creation process. Termed as "Chirographic Data", they additionally include handwritten digits, characters, diagrams etc. 

The motivation behind this thesis is two fold: (1) Learning chirographic generative models that are more "natural", i.e. imitate human's generative/creative process. (2) Building vector sketch tools that are supportive of downstream digital use cases by offering "spatial scalability" and other semantic manipulation abilities. We propose and study a range of three vector representations for sketches and associated creative models (i.e. generative models over the creation process). These representations, and consequently the learned models, are more natural than those in existing literature as their underlying design philosophies are motivated by humans. The generative models also exhibit various properties that can additionally solve downstream digital manipulation tasks like semantic abstraction, re-creation, healing/correction etc.

The most dominant approach for modelling vector sketches use a very generic poly-line sequence as its underlying representation, which effectively makes it a discrete "motor-program". In reality however, humans produce chirographic structures in continuous time, but may learn by observing discretized paths at a specific resolution. Furthermore, humans tend to learn more "holistic" structures (either globally or at a stroke level) instead of dynamics (movement) of the pen. All our proposed approaches in this thesis follow a similar philosophy -- learning the true underlying continuous-time holistic structure, solely from discrete data. Building on our primary models, we also build re-creational models, i.e. generative models conditioned on "perceptive" inputs. Besides being aligned with our primary goal, we also show our proposed solutions to be reasonably efficient w.r.t quantitative metrics, evaluated on a range of relevant datasets.

The first technical chapter, titled "Sketches as parametric curves", introduces our first continuous-time sketch representation that uses parametric curves (e.g. Bezier curves, B-Splines), a family of compact and finite curves used extensively in computer graphics. To this end, we propose a tool to infer the underlying parameters of an observed waypoint/poly-line sequence without an optimization process, aligned very much with how humans infer geometric entities. Equipped with the same, we learn generative models over parametric sketches in the next phase. We also propose an improved representation and model that couples learning and generative modelling into one end-to-end framework powered by a differentiable and complexity-agnostic parametric curve formulation.

Next chapter, titled "Sketches as dynamical systems", introduces a novel representation for sketches based on the classical concept of Dynamical Systems. Specifically, it attempts to capture the time-derivative of the "true" continuous-time process that underlies each data. We make use of "Neural Ordinary Differential Equations" (Neural ODE or NODE, in short), a framework for efficiently learning deep continuous dynamics from discrete sequential observations. This representation is interpretable and intuitive as it captures an underlying "template" of the data being encoded. It is then utilized in order to learn higher level deep latent codes and consequently, generative models over continuous chirographic data space. Such models also enjoy peculiar properties like data efficiency, few-shot capabilities and allow deriving "perceptually abstract" data distributions.

The final technical chapter, titled "Sketches as non-autoregressive sequences", introduces a generative modelling framework that implicitly captures the continuous function that underlies chirograhic data. This framework alleviates the "smoothness problem" caused by explicit continuous-time representations discussed in previous chapters. This non-autoregressive framework captures a "holistic" view of the sketch data while retaining its discrete sequence nature, which adds flexibility to the model. The model uses a modern and prominent class of generative framework called "Denoising Diffusion Probabilistic Models" (or DDPM, in short). Alongside better generation, we show downstream tasks like mixing, implicit conditioning, healing and perceptual abstraction can be easily achieved with such framework only by means of inference.