Sergio Sánchez Santiesteban

Sergio Sánchez Santiesteban


Postgraduate Research Student

About

My research project

Publications

Sergio Sanchez Santiesteban, Sara Atito, Muhammad Awais, Yi-Zhe Song, Josef Kittler (2024)Improved Image Captioning Via Knowledge Graph-Augmented Models, In: ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2024)pp. 4290-4294 Institute of Electrical and Electronics Engineers (IEEE)

Multimodal foundation models, pre-trained on large-scale data, effectively capture vast amounts of factual and commonsense knowledge. However, these models store all their knowledge within their parameters, requiring increasingly larger models and training data to capture more knowledge. To address this limitation and achieve a more scalable and modular integration of knowledge, we propose a novel knowledge graph-augmented multimodal model. This approach enables a base multimodal model to access pertinent information from an external knowledge graph. Our methodology leverages existing general domain knowledge to facilitate vision-language pre-training using paired images and text descriptions. We conduct comprehensive evaluations demonstrating that our model outperforms state-of-the-art models and yields comparable results to much larger models trained on more extensive datasets. Notably, our model reached a 145 Cider score on MS COCO Captions using only 2.9 million samples, outperforming a 1.4B parameter model by 1.7% despite having 11 times fewer parameters.