Privacy regulations around the world impose strict requirements on how organisations handle personal data (such as Australia's Privacy Act, the EU's GDPR, US's HIPAA & CCPA, etc). Among the hardest to implement in a data pipeline are protecting sensitive data from breaches, and erasing an individual's personal data on request.Finding and deleting every copy of sensitive data is cumbersome, error-prone, and breaks referential integrity. Meanwhile, sensitive data sitting in plaintext is exposed the moment a breach occurs.
This talk introduces a novel data pipeline architecture, enabled by an open-source Python library created by the speaker, to solve this complex compliance requirement in a single action. The library implements the crypto-shredding pattern.
The talk covers the architecture, the cryptographic foundations of the Python library, practical integration into Python pipelines, and data governance framework for encryption key management and rotation, including a live demo of the library.
XiaoHan Li is an Analytics Engineer and Consultant with expertise in ELT-driven Medallion Architecture & data modelling, and dbt architecture design & implementation.