11am - 12 noon

Friday 19 June 2026

Decentralised Content Platforms for Equitable and Privacy-Preserving Media Use in Generative AI

PhD Viva Open Presentation - Kar Balan

Hybrid Meeting (21BA02 & Teams) - All Welcome!

Free

21BA02 - Arthur C Clarke building
University of Surrey
Guildford
Surrey
GU2 7XH

Speakers


Decentralised Content Platforms for Equitable and Privacy-Preserving Media Use in Generative AI

Abstract:
The democratisation of digital content creation tools has transformed media production, enabling individuals to move from being only consumers to active creators. Yet, content marketplaces and AI ecosystems remain highly centralised, limiting transparency, control, and fair compensation. Generative AI (GenAI) systems, trained on massive web-scraped datasets, exacerbate these issues by reusing creative work without consent, attribution, or reward, raising legal and ethical concerns. This thesis explores how decentralisation can redistribute power in the creative economy by giving creators agency over the use of their media in GenAI.

First, we introduce a decentralised registry through which creators can assert opt-in/out preferences for AI training. Content is embedded with provenance metadata and registered with robust fingerprints, enabling provenance tracing even after editing or manipulation. This establishes machine-readable, traceable consent specification as the foundation for downstream attribution and reward.

Building on this, we propose methods for training data provenance, attribution, and compensation in GenAI training. The Content ARCs (Authenticity, Rights, Compensation) framework defines a scalable protocol for managing rights and creator compensation. We instantiate this in a decentralised system that traces generative outputs back to the most influential training assets and executes royalty payments to contributors. Several practitioner-facing demonstrators developed in collaboration with GLAM (galleries, libraries, archives, and museums) professionals further illustrate how distributed ledgers could reshape licensing and reward in the creative economy.

Further, GenAI models are prone to memorising training data and reproducing it at generation time, a phenomenon that is particularly problematic for copyrighted creative works, where such regurgitation undermines both creator rights and data privacy. To address this challenge, we present a decentralised federated learning protocol for diffusion models that reduces training data memorisation using a novel sample-based metric integrated into the protocol to detect and discourage memorisation. Complementing this, we develop a framework for end-to-end cryptographically verifiable AI pipelines using zero-knowledge proofs to enable trustless, privacy-preserving audits.

Finally, we explore privacy-preserving natural language search across decentralised content repositories using encrypted queries for similarity search at scale. In this way, decentralisation supports discovery and access to creative content, completing a holistic body of work for a fairer, more transparent GenAI ecosystem and creative economy.