Home Opinions Alternatives to flushing private data: how to use pseudonymization to achieve GDPR...

Opinions

Alternatives to flushing private data: how to use pseudonymization to achieve GDPR compliance

May 9, 2018

Share on

Introduction

EU Broadcasters and MVPDs are bracing themselves for the new General Data Protection Regulation (GDPR) that takes effect on May 25th. The regulation is based on a new, fundamental right of personal data protection.

Broadcasters and MVPDs need a data strategy to protect personal data, to support forgetting personal data on request, and to manage consent to collect personal data.

Broadcasters and MVPDs have legitimate reasons to collect data. Tracking programs and movies that are watched is essential for programme planning and movie/series licensing agreements. Is collecting private data justified (e.g., personal identity, IP addresses, e-mail, etc.)? Strictly speaking, without a direct transaction to rent a movie or PPV, GDPR does not allow gathering personal data without consent.

However, according to GDPR, personal data can be collected without consent if pseudonymization is used. Using pseudonymization provides the flexibility to manage personal data and the ability to perform historical analysis. In addition, end user requests to have their data forgotten, no longer require the removal of all end-user data from the data warehouse.

What is pseudonymization?

Using an attribute, pseudonymization connects personal identifiers to an anonymized identifier; this enables de-anonymized at a later date (Figure 1). When consent is removed or end users request that their data be forgotten, it is enough to delete the attribute that creates the link without deleting all the data. The attribute can be thought of as a table that matches a pseudonym (like an alias) with a personal identifier. The pseudonym is associated with the data rather than directly connecting a personal identifier.

Figure 1: Separation of personal data

When personal identifier is anonymized (has a pseudonym), the pseudonym must change over time—using the same pseudonym for the same end-user over time is not a secure implementation of pseudonymization. The pseudonym must change over time to be secure. So how do you make the pseudonym secure?

Figure 2 below shows the hierarchy of instantiation associated with an end user. The lowest level is a single instance for starting an app or web login and closing the app or webpage. During the web/app session, a couple of movies are watched, and at some point the app is put temporarily in the background. The next level consists of all instances of the app on a specific platform (e.g. iPad). The highest level includes all instances across different platforms for the same end-user. The pseudonym of the personal identifiers should be assigned at the level of a single session. In this example, there would be three pseudonyms at the platform level and six at the cross-platform level.

Figure 2: Instantiation Hierarchy Overview

When an end user either removes consent or requests that they are forgotten, the table with the association between personal identifier and the pseudonym is removed. The data with the pseudonyms is left without the possibility to identify end users.

End-user data

Much data can be generated when an end user consumes media (Table 1 below). Some of the data can be legitimately gathered without requiring consent. The personal identifier, as explained before, must either be anonymized or have a pseudonym generated. When data is gathered without a clear legitimate reason, a consent is required by the end-user. For example, collecting information using Automatic Content Recognition or sharing data that includes personal identifiers with external parties requires consent.

Table 1. End-user data

Service provider data storage strategy

The only reason for anonymizing all private data is because there is no personal data to be associated – for example, watching free public broadcasters, or using a service that does not require a login (Table 2). The anonymization is performed at a session level and should not be tracked across sessions on the same platform, for example, by using cookies. Note that anonymized data cannot be forgotten on request.

Table 2. Service provider data storage strategy

If there is a login with the same ID, pseudonymization should be used. Using pseudonymization allows you to perform historical analysis of all data. At any point, an end user can request that their data be forgotten without affecting the historical analysis.

When to request end users to opt-in and consent to use the private data is not without controversy. While some GDPR authors intended to stop the use of private data without prohibiting the use of the service, this is not realistic. There is no free lunch when running a business. If a company depends on revenue from targeted advertisement that uses private data, not using personal data would affect the business negatively. Another approach that has been taken is to prohibit using the service if the end user does not agree with the conditions for use.

Consider your public image when deciding whether or not to use private data without direct consent. Be transparent about the purpose of using private data to enhance public trust that end users’ EU fundamental right of data protection is secured. Sharing private data as a commercial service would require opt-in consent by end users.

For details on setting up a pseudonymization framework, see “Data science under GDPR with pseudonymization in the data pipeline”.

Related Videonet opinion

Nordic countries not prepared for GDPR (also by Jan Lindquist)

Share on

Most Popular