Back to Talks

Privacy Techniques for Data Science in Regulated Environments

Jim Klucar

Audience level: Novice
Topic area: Misc


Demand is increasing for technology companies to safeguard individual data. This presentation examines data privacy regulations currently in place, and teaches data privacy algorithms such as K-anonymization, Randomized Response, and Differential Privacy. Moreover, it will cover Differentially Private machine learning algorithms the impact data privacy has on Machine Learning performance.


By 2018, the world will be producing 50TB of data per second. This explosion of data threatens the privacy of individuals. As a result, the global populace is demanding that technology companies be held responsible for safeguarding individual data. This has driven countries to implement laws governing individuals’ data rights such as Europe’s General Data Protection Regulation (GDPR), making it even more important that data scientists ensure their techniques maintain privacy to avoid excessive fines.

This presentation examines key points of the data privacy regulations currently in place, and how they impact the way data analysis is currently performed. It will also cover fine-grained details, benefits, pitfalls, and assumptions of techniques designed to maintain privacy, such as K-anonymization, Randomized Response and Differential Privacy.

Furthermore, some assume that the nonlinear nature of AI protects it from privacy violations. This talk will cover how ML algorithms are still vulnerable to privacy violations and the impact these techniques have on Machine Learning algorithms. It will conclude with a case study showcasing how a Differentially Private neural network outperformed a traditionally trained network.