Safe Data Project: A Case Study in Balancing Privacy and Innovation


Our client is a world leader in converged broadband, video and mobile communications. They are a NASDAQ-listed multinational telecommunications company with headquarters in London, Amsterdam and Denver. With 25,000 employees and 85m mobile, fixed line subscribers, their brands include Virgin Media-02 in the UK, VodafoneZiggo in The Netherlands, Telenet in Belgium, Sunrise in Switzerland, Virgin Media in Ireland and UPC in Slovakia.

The Business Challenge

Our client is responsible for the development, maintenance and security of many business systems that hold Personally Identifiable Information (PII). They tried to obfuscate PII in their non-production environments to minimise the risk of GDPR regulatory breaches. However several issues defeated this:

  • Due to substantial data volumes the obfuscation processing required too much time/resources to be viable therefore PII-safe data for software testing and for training ML models was not available when required.
  • Inconsistent shuffling of data both within and between systems meant that non-production databases were unsafe to use due to data integrity issues.
  • After processing PII was still found to be present in non-production databases, this posed an unacceptable risk.

Wish to Find a Safe Data Solution with us?

Solve your GDPR compliance challenges with secure & efficient PII obfuscation. Contact us now!

Joint Solution Discovery & Implementation Plan

Hoonartek was approached by the client to implement an automated solution to consistently and efficiently shuffle PII “on demand” before data leaves a production environment so that a human cannot be identified. 

The solution needed to offer a simple self-service interface to allow software development teams to place and order for PII-safe test data to be delivered to fit their test schedules.

The automated process reliably preserves the topology of data to create PII-safe, accurate “walled gardens” that are suitable for training machine learning models without introducing Data Skewness. Hoonartek produced the re-usable automated solution using the Test Data Manager (TDM) and Semantic Discovery products from Ab Initio Software.

Safe Data Project A Case Study in Balancing Privacy and Innovation​


By embracing the proposed solution, the Hoonartek team successfully attained several key objectives:

Improved Data Integrity
  • Our client can now shuffle PII data on demand before it goes to non-production environments, minimizing the amount of PII data that needs to be secured.
  • The consistent shuffling process ensures that the data used for testing and machine learning is reliable and accurate.
Automation & GDPR Compliance
  • The new automated system is faster and more efficient than the old masking process.
  • The new system helps the client to comply with GDPR by protecting PII data.
Eliminate Data Skewness
  • By using high-quality data for testing and machine learning, our client can develop better software and machine learning models.

Industry Perspective

Data Skewness is one of the most common problems within machine learning, where the statistical distribution of training datasets is not an accurate representation of the topology across a production dataset. Skew influences the behaviours of ML models because they apply knowledge gained from a statistically inaccurate world when they operate in the real world.  Data is the raw material for ML, the ML model is only ever as good as the quality of its data, but since GDPR requires human consent to be given for each use case, most ML training datasets require PII to be obfuscated prior to use. 

Scroll to Top