On the Fundamental Limits of Privacy-Enhancing Technologies

On 1st November 2024, I defended my PhD thesis titled “On the fundamental limits of privacy-enhancing technologies”. The thesis is publicly available via the EPFL library and a short summary of its contributions can be found in the abstract below.

Abstract

The advancement of data-driven decision making has become a key priority in political agendas and private sector strategies. However, the growing demand for data and the opacity of data-driven systems raise concerns about the threats to individual privacy these systems pose. Privacy-enhancing technologies (PETs) are frequently promoted as a technological solution to address this tension between data use and privacy risks. Many suggest that PETs might mitigate privacy risks at little to no cost in data utility or system functionality.

This thesis critically examines such claims and explores the fundamental limits of PETs. We uncover inherent trade-offs of PETs across the three distinct domains: synthetic data as a tool for privacy-preserving data publishing, least-privilege learning in machine learning as a service (MLaaS) settings, and privacy-preserving digital proximity tracing (DPT) systems. In each case, we demonstrate how a rigorous problem formalisation is crucial to understand the fundamental limits of PETs. We show that some trade-offs between intended data use and unintended information leakage are unavoidable and that certain privacy risks are inherent to a system’s core functionality.

First, we investigate synthetic data as a PET. Synthetic data is frequently advertised as a silver-bullet solution to privacy-preserving microdata publishing that addresses the shortcomings of traditional data anonymisation. Contrary to these claims, our research reveals that high-quality synthetic data inevitably leaks sensitive information about individuals while strong privacy guarantees result in significant utility loss.

Next, we formalise the concept of least-privilege representation learning in MLaaS settings and characterise its feasibility. We prove that there is a fundamental trade-off between a representation’s utility for a given task and its leakage beyond the intended task: it is not possible to learn representations that have high utility for the intended task but at the same time prevent inference of any attribute other than the task label itself. Our findings challenge the notion that least-privilege learning provides a promising avenue to mitigate the harms of potential data misuse in MLaaS settings.

Finally, we present an in-depth privacy analysis of DPT systems. We show that certain risks are inherent to the core functionality of these systems and cannot be mitigated through design choices. Our analysis highlights the need to identify and assess privacy risks at the earliest stages of the system design process.

Across all three domains, our research consistently reveals how PETs operate within fixed boundaries defined by their fundamental trade-offs. Our findings challenge the often overly optimistic portrayal of PETs as a technological cure-all for privacy concerns in data-driven innovation. This thesis argues for a new approach to the evaluation and design of data-driven privacy technologies and systems. It emphasises the importance to rigorously formalise and analyse claims about the potential benefits of PETs and to carefully evaluate inherent risks.

Overall, this thesis contributes to a more realistic vision of PETs, in particular, of their capabilities and limitations to enable the privacy-preserving use of data. By acknowledging the fundamental limits of PETs, our research aims to foster a more responsible approach to innovation in data-driven technologies.

Enjoy Reading This Article?