Kar Balan

Postgraduate Research Student

k.balan@surrey.ac.uk

https://www.linkedin.com/in/kar-balan/

Academic and research departments

Centre for Vision, Speech and Signal Processing (CVSSP).

About

My research project

Decentralised Content Platforms for Equitable and Privacy-Preserving Media Use in Generative AI

My PhD research explores how decentralised technologies can improve trust, transparency, and fairness in generative AI and the creative ecosystem. I focus on mechanisms for content provenance, creator consent, attribution, and compensation, as well as privacy-preserving and verifiable AI systems.

More broadly, my work sits at the intersection of AI, cryptography, and distributed systems, aiming to support accountable AI development and more creator-centric digital economies.

Supervisors

John Collomosse

Andrew Gilbert

Publications

Kar Balan, Andrew Gilbert, John Collomosse (2025)Content ARCs: Decentralized Content Rights in the Age of Generative AI, In: International Conference on AI and the Digital Economy (CADE 2025)2025pp. 42-48 IEEE

DOI: 10.1049/icp.2025.2955

The rise of Generative AI (GenAI) has sparked significant debate over balancing the interests of creative rightsholders and AI developers. As GenAI models are trained on vast datasets that often include copyrighted material, questions around fair compensation and proper attribution have become increasingly urgent. To address these challenges, this paper proposes a framework called Content ARCs (Authenticity, Rights, Compensation). By combining open standards for provenance and dynamic licensing with data attribution, and decentralized technologies, Content ARCs create a mechanism for managing rights and compensating creators for using their work in AI training. We characterize several nascent works in the AI data licensing space within Content ARCs and identify where challenges remain to fully implement the end-to-end framework.

Muhammad Junaid Awan, Kar Balan, John Philip Collomosse (2025)Decentralized Creative Copyright Exchange in the Age of Generative AI ACM

Copyright and AI are at the centre of intense debate. Generative AI (GenAI), trained on vast datasets that often include copyrighted works, has amplified concerns over fair compensation and proper attribution for creative rightsholders. Current approaches to data licensing for GenAI rely on ad hoc agreements with a small number of large dataset owners who have the resources to broker such deals. These models lack transparency, control, and inclusivity in a modern creative economy where content production and consumption are increasingly decentralized. To address these challenges, we present a decentralized platform built on the Content ARCs (Authenticity, Rights, Compensation) framework to enable creators to manage provenance, licensing, and monetisation of their media assets on distributed ledgers using open standards such as C2PA and Open Digital Rights Language (ODRL).

Kar Balan, Alexander Black, Simon Jenni, Andy Parsons, Andrew Gilbert, John Philip Collomosse (2023)DECORAIT -DECentralized Opt-in/out Registry for AI Training

Figure 1: DECORAIT enables creatives to register consent (or not) for Generative AI training using their content, as well as to receive recognition and reward for that use. Provenance is traced via visual matching, and consent and ownership registered using a distributed ledger (blockchain). Here, a synthetic image is generated via the Dreambooth[32] method using prompt "a photo of [Subject]" and concept images (left). The red cross indicates images whose creatives have opted out of AI training via DECORAIT, which when taken into account leads to a significant visual change (right). DECORAIT also determines credit apportionment across the opted-in images and pays a proportionate reward to creators via crypto-currency micropyament. ABSTRACT We present DECORAIT; a decentralized registry through which content creators may assert their right to opt in or out of AI training as well as receive reward for their contributions. Generative AI (GenAI) enables images to be synthesized using AI models trained on vast amounts of data scraped from public sources. Model and content creators who may wish to share their work openly without sanctioning its use for training are thus presented with a data gov-ernance challenge. Further, establishing the provenance of GenAI training data is important to creatives to ensure fair recognition and reward for their such use. We report a prototype of DECO-RAIT, which explores hierarchical clustering and a combination of on/off-chain storage to create a scalable decentralized registry to trace the provenance of GenAI training data in order to determine training consent and reward creatives who contribute that data. DECORAIT combines distributed ledger technology (DLT) with visual fingerprinting, leveraging the emerging C2PA (Coalition for Content Provenance and Authenticity) standard to create a secure, open registry through which creatives may express consent and data ownership for GenAI.

Kar Balan, Robert Learney, Tim Wood (2025)A Framework for Cryptographic Verifiability of End-to-End AI Pipelines, In: Proceedings of the 10th ACM International Workshop on Security and Privacy Analyticspp. 49-59 ACM

DOI: 10.1145/3716815.3729011

The increasing integration of Artificial Intelligence across sectors necessitates robust mechanisms for ensuring transparency, trust, and auditability of its development and deployment. This is particularly important in light of recent calls in various jurisdictions to introduce regulation on AI safety. We propose a framework for complete verifiable AI pipelines, identifying key components and analysing existing cryptographic approaches that contribute to verifiability across different stages of the AI lifecycle, from data sourcing to training, inference, and unlearning. This framework could be used to combat misinformation by providing cryptographic proofs alongside AI-generated assets to allow downstream verification of their provenance and correctness. Our findings underscore the importance of ongoing research to develop cryptographic tools that are not only efficient for isolated AI processes, but that are efficiently 'linkable' across different processes within the AI pipeline, to support the development of end-to-end verifiable AI technologies.

Kar Balan, Andrew Gilbert, John Philip Collomosse (2024)PDFed: Privacy-Preserving and Decentralized Asynchronous Federated Learning for Diffusion Models, In: Proceedings of 21st ACM SIGGRAPH Conference on Visual Media Production8pp. 1-9 Association for Computing Machinery (ACM)

DOI: 10.1145/3697294.3697306

We present PDFed, a decentralized, aggregator-free, and asynchronous federated learning protocol for training image diffusion models using a public blockchain. In general, diffusion models are prone to memorization of training data, raising privacy and ethical concerns (e.g., regurgitation of private training data in generated images). Federated learning (FL) offers a partial solution via collaborative model training across distributed nodes that safeguard local data privacy. PDFed proposes a novel sample-based score that measures the novelty and quality of generated samples, incorporating these into a blockchain-based federated learning protocol that we show reduces private data memorization in the collaboratively trained model. In addition, PDFed enables asynchronous collaboration among participants with varying hardware capabilities, facilitating broader participation. The protocol records the provenance of AI models, improving transparency and auditability, while also considering automated incentive and reward mechanisms for participants. PDFed aims to empower artists and creators by protecting the privacy of creative works and enabling decentralized, peer-to-peer collaboration. The protocol positively impacts the creative economy by opening up novel revenue streams and fostering innovative ways for artists to benefit from their contributions to the AI space.

Frances Liddell, Ella Tallyn, Evan Morgan, Kar Balan, Martin Disley, Theodore Koterwas, Billy Dixon, Caterina Moruzzi, John Philip Collomosse, Chris Elsden (2024)ORAgen: Exploring the Design of Attribution through Media Tokenisation, In: Designing Interactive Systems Conferencepp. 229-233 ACM

DOI: 10.1145/3656156.3663693

In this work-in-progress, we present ORAgen, as ‘unfinished software’, materialised through a demonstrative web application that enables participants to engage with a novel approach to media tokenisation – the ORA framework. By presenting ORAgen in ‘think-aloud’ interviews with 17 professionals working in the creative and cultural industries, we explore potential values of media tokenisation in relation to existing challenges they face related to ownership, rights, and attribution. From our initial findings, we reflect specifically on the challenges of attribution and ongoing control of creative media, and examine how media tokenisation, and underpinning distributed ledger technologies can enable new approaches to designing attribution.

Kar Balan, Alex Black, Simon Jenni, Andrew Gilbert, Andy Parsons, John Collomosse DECORAIT -- DECentralized Opt-in/out Registry for AI Training

DOI: 10.48550/arxiv.2309.14400

We present DECORAIT; a decentralized registry through which content creators may assert their right to opt in or out of AI training as well as receive reward for their contributions. Generative AI (GenAI) enables images to be synthesized using AI models trained on vast amounts of data scraped from public sources. Model and content creators who may wish to share their work openly without sanctioning its use for training are thus presented with a data governance challenge. Further, establishing the provenance of GenAI training data is important to creatives to ensure fair recognition and reward for their such use. We report a prototype of DECORAIT, which explores hierarchical clustering and a combination of on/off-chain storage to create a scalable decentralized registry to trace the provenance of GenAI training data in order to determine training consent and reward creatives who contribute that data. DECORAIT combines distributed ledger technology (DLT) with visual fingerprinting, leveraging the emerging C2PA (Coalition for Content Provenance and Authenticity) standard to create a secure, open registry through which creatives may express consent and data ownership for GenAI.

Kar Balan, Shruti Agarwal, Simon Jenni, Andy Parsons, Andrew Gilbert, John Philip Collomosse (2023)EKILA: Synthetic Media Provenance and Attribution for Generative Art, In: Proceedings of the 2023 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW 2023) Institute of Electrical and Electronics Engineers (IEEE)

DOI: 10.1109/CVPRW59228.2023.00098

We present EKILA; a decentralized framework that enables creatives to receive recognition and reward for their contributions to generative AI (GenAI). EKILA proposes a robust visual attribution technique and combines this with an emerging content provenance standard (C2PA) to address the problem of synthetic image provenance – determining the generative model and training data responsible for an AI-generated image. Furthermore, EKILA extends the non-fungible token (NFT) ecosystem to introduce a tokenized representation for rights, enabling a triangular relationship between the asset’s Ownership, Rights, and Attribution (ORA). Leveraging the ORA relationship enables creators to express agency over training consent and, through our attribution model, to receive apportioned credit, including royalty payments for the use of their assets in GenAI.