11am - 12 noon
Friday 19 June 2026
Decentralised Content Platforms for Equitable and Privacy-Preserving Media Use in Generative AI
PhD Viva Open Presentation - Kar Balan
Hybrid Meeting (21BA02 & Teams) - All Welcome!
Free
University of Surrey
Guildford
Surrey
GU2 7XH
Decentralised Content Platforms for Equitable and Privacy-Preserving Media Use in Generative AI
Abstract:
The democratisation of digital content creation tools has transformed media production, enabling individuals to move from being only consumers to active creators. Yet, content marketplaces and AI ecosystems remain highly centralised, limiting transparency, control, and fair compensation. Generative AI (GenAI) systems, trained on massive web-scraped datasets, exacerbate these issues by reusing creative work without consent, attribution, or reward, raising legal and ethical concerns. This thesis explores how decentralisation can redistribute power in the creative economy by giving creators agency over the use of their media in GenAI.
First, we introduce a decentralised registry through which creators can assert opt-in/out preferences for AI training. Content is embedded with provenance metadata and registered with robust fingerprints, enabling provenance tracing even after editing or manipulation. This establishes machine-readable, traceable consent specification as the foundation for downstream attribution and reward.
Building on this, we propose methods for training data provenance, attribution, and compensation in GenAI training. The Content ARCs (Authenticity, Rights, Compensation) framework defines a scalable protocol for managing rights and creator compensation. We instantiate this in a decentralised system that traces generative outputs back to the most influential training assets and executes royalty payments to contributors. Several practitioner-facing demonstrators developed in collaboration with GLAM (galleries, libraries, archives, and museums) professionals further illustrate how distributed ledgers could reshape licensing and reward in the creative economy.
Further, GenAI models are prone to memorising training data and reproducing it at generation time, a phenomenon that is particularly problematic for copyrighted creative works, where such regurgitation undermines both creator rights and data privacy. To address this challenge, we present a decentralised federated learning protocol for diffusion models that reduces training data memorisation using a novel sample-based metric integrated into the protocol to detect and discourage memorisation. Complementing this, we develop a framework for end-to-end cryptographically verifiable AI pipelines using zero-knowledge proofs to enable trustless, privacy-preserving audits.
Finally, we explore privacy-preserving natural language search across decentralised content repositories using encrypted queries for similarity search at scale. In this way, decentralisation supports discovery and access to creative content, completing a holistic body of work for a fairer, more transparent GenAI ecosystem and creative economy.
