February 18, 2022

Next Tuesday the Computational Ethics in NLP class will be hosting a guest lecture from Patricia Thaine. She is a top expert in privacy in NLP and will be providing an overview of issues in privacy-preserving NLP which should be of interest to many in our community. Please join us for Patricia’s lecture!
Who: Patricia Thaine
When: Tuesday, Feb 22, 10 am–11:20 am PT
Where: Zoom
Abstract: Natural language, either in its written or spoken form, contains some of the most sensitive information we produce. In addition, both their content (what is said or written) and their method of production (how something is said or written) can contain personally identifiable information. These vulnerabilities make natural language artifacts prime targets for malicious actors. Unfortunately, one of the following two scenarios occur when training models to perform NLP tasks on personal data: either model end up being trained on sensitive user information (e.g., speech recognition), making them vulnerable to malicious actors, or their abilities are limited by the scope of available training data due to privacy concerns (e.g., speaker profiling). In fact, a lack of training data might even lead to models never getting created in the first place. To prevent these scenarios from occurring, we have identified four pillars that are required for creating what we shall call perfectly privacy-preserving NLP architectures: training data privacy, input and output privacy, and model privacy. There are a number of approaches to addressing these different privacy requirements within NLP algorithms. These include differential privacy, secure multiparty computation, homomorphic encryption, federated learning, secure enclaves, and data de-identification. We will briefly explain each of these methods, discuss how they have been used in NLP model architectures so far, and deduce how they can be utilized most effectively. Finally, we provide a literature review of the privacy-preserving machine learning approaches that have been proposed thus far and tie them into the perfectly privacy-preserving formalism introduced here.

Bio: Patricia Thaine is the Co-Founder & CEO of Private AI (a Microsoft-backed startup), a Computer Science Ph.D. Candidate at the University of Toronto (on leave), and a Vector Institute alumna. Her R&D work is focused on privacy-preserving natural language processing, with a focus on applied cryptography and re-identification risk. She also does research on computational methods for lost language decipherment. Patricia is a recipient of the NSERC Postgraduate Scholarship, the RBC Graduate Fellowship, the Beatrice “Trixie” Worsley Graduate Scholarship in Computer Science, and the Ontario Graduate Scholarship. She has nine years of research and software development experience, including at the McGill Language Development Lab, the University of Toronto’s Computational Linguistics Lab, the University of Toronto’s Department of Linguistics, and the Public Health Agency of Canada.