1. Introduction
The General Data Protection Regulation (GDPR) is a definitive and far reaching data protection law applicable to countries in the European Union. It came into effect on May 25th, 2018.
The EU member states are in blue.
1.1 Key points
- The law applies to EU citizens and the processing of their personal data.
- It applies to anything that can identify an individual such as first and surnames, email addresses and other types of “personally identifiable information”
- It encourages through its legal mechanisms a strong sense of accountability if you process personal data.
- It applies to anyone that’s processing the data of EU citizens, regardless of their location.
- It doesn’t just apply to traditional businesses but anyone who is processing data such as bloggers.
- There are severe penalties for non-compliance - up to €20 million ($24m) or 4% of global revenue, whichever is higher.
As software developers, this has an important impact on the design of our applications. Given that EU itself likes to think of the GDPR as “the toughest privacy and security law in the world”, we shouldn't underestimate it's importance! [1]
I’ve decided to read through the law and break down all its key areas and articles that directly impact on our software development processes. I then offer some solutions for each one.
The document itself contains hundreds of pages’ worth of new requirements for organizations around the world.
Why on earth would I consciously choose to endure all of this?
Well, I took a module in EU law during my undergrad and am armed with a familiarity of its jargon. Furthermore, this article is also opportune as I develop many of the same considerations into the code of this blog.
Disclaimer: this is not legal advice and should not be construed as such. I am not a lawyer; I am a developer presenting the research I have done on the subject and which might just save you a lot of worry and expense. Your circumstances may be different and for that you need seek out the professionals.
2. Definitions
We will quickly address some terminological definitions that regularly appear in the articles. Most of the terminology is defined in Article 4 of the treaty, here's a rundown:
2.1 Data subject
The data subject is essentially the end user, identifiable by the personal data you are collecting directly or indirectly.
2.2 Personal data
Outlined in Article 4(1) and is defined as 'any information relating to an identified or identifiable natural person (referred to as "data subject").' [2]
The same article expands on this definition. Personal data is information that can identify an individual:
- directly from the information in question such as first and last names,
- indirectly from a combination of information such as date of birth, telephone number, license plate, member of an association.
As you can see, it covers a broad scope, here are some examples of personal data where they relate to natural persons (people!):
- Surname, first name, pseudonym, date of birth.
- Telephone number, home address, email.
- IP address, cookie identifier.
- Photos, sound recordings of voices.
- Fingerprint, retinal or palm scan, venous network of the hand.
- License plate number, social security number;
- Application usage data, blog comments, etc.
The GDPR considers some data as particularly sensitive and requires the data subject to give informed and express consent.
- Health status.
- Sexual orientation.
- Racial/ethnic origin.
- Political inclination, religious/philosophical beliefs, trade union membership.
- Genetic data.
2.3 Controller
The GDPR defines the controller as:
"The natural or legal person, public authority, agency or other body which, alone or jointly with others, determines the purposes and means of the processing of personal data". [Article 4:7]
In other words, it is the entity, typically the individual or department, responsible for complying with the GDPR.
2.4 Processor
The processor is the employee that processes the data on behalf of the data controller. [Article 4(8)]
The GDPR differentiates between a controller and a processor in order to recognize that not all entities involved have the same degree of accountability.
This is so that clear roles and responsibilities are established, and both the organisation concerned, and the authorities can determine where responsibility lies in the event of a breach.
2.5 Anonymisation vs Pseudonymization
The GDPR draws an important distinction between anonymisation and pseudonymization.
Anonymisation aims to make it irreversibly impossible to identify individuals from personal data sets.
That must make it impossible to:
- single an individual out from a dataset,
- link two records on the same data subject or group of data subjects from the same dataset,
- infer with an attribute from the values of other attributes in the data set.
Once anonymised, the data is no longer considered 'personal data' and thus not subject to the GDPR.
Pseudonymization aims to make it impossible to identify individuals without additional information.
The additional information must be kept separate and secure in order to avoid re-identification of data subjects. Contrary to anonymisation, pseudonymization is a reversible process.
This can consist of replacing identifying information such as names, addresses, etc. with aliases as well as encrypting IP address, user ID, e-mail addresses to reduce their sensitivity.
3. Key areas and their implementation
Now that we've defined some key terms, let’s get to work with the articles.
The GDPR document itself, at over 250 pages, with 99 main provisions (Articles) and 173 supplementary “recitals” is quite the mouthful.
So, I’ve extracted the articles and areas that I consider the most meaningful and cover all the important areas of 1. Consent 2. Right to be forgotten 3. Right to access and restrict data processing 4. Minimisation 5. Data portability 6. Data retention 7. Data security
Being mindful of these core principles and integrating them into your development should help to ensure compliance for most circumstances. Although, seek advice if you’re unsure. Large organizations will tend to have a department dealing with this stuff.
Note that you could handle many of these features with manual processes such as manual database queries. However, automating them is significantly better for scalability and efficiency.
3.1 CONSENT
Covered in:
- [Article 6] Lawfulness of processing
- [Article 7] Conditions for consent
- [Article 8] Conditions applicable to child's consent in relation to information society services
The lowdown:
- Users must be able to give consent to any sort of processing and use of their personal data.
- This applies in the following cases:
- Direct data collection such as from a form, online purchases, subscription, opening a bank count, or via devices or technologies that track the activity of individuals, geolocation and Wi-Fi analytics.
- Indirect collection such as data retrieved from data brokers, trading partners, publicly available sources.
- Consent must be attained:
- At the moment of data collection for direct collection.
- as soon as possible for indirect collection and no later than a month. This also applies to the reusing of existing user data for new purposes or in the event of a data breach.
- Consent must be fully informed, concise and in plain language.
- Users have the right to easily withdraw their consent at any time.
- If the above is found to have been infringed, the perceived consent is no longer binding.
Implementation:
Attaining consent with “I accept the terms and conditions” is no longer enough. There should be separate checkboxes or simple yes / no buttons, on the registration or user profile pages for whatever you intend to use the data for.
Should happen as soon as possible, hence the pop up most sites employ in order to start tracking your navigation.
Consent is required to send users emails/messages/newsletters when you receive their contact details.
You should hold these separate consent value in different database columns for each user, you can offer users the possibility of withdrawing consent by unchecking checkboxes from their profile page
The data you must specify when asking consent is covered in [Article 30]:
- Your organisation and contact details.
- The purposes and whether it is optional in light of the objectives pursued.
- The lawful basis (covered in Article 6).
- Recipients of the data – whom it will be shared with.
- The data retention period.
- The rights of the data subject (rights of access to the data, rectification, erasure and restriction, right to file a complaint).
- For indirect collection specify the categories of data collected and source of data.
Checkboxes should not be preselected, the GDRP does not count that as “consent”.
Not all processing activities need consent checkboxes. There are “legitimate interests” such as collecting addresses for shipping an item.
Consent for features that trigger e-mails or other sorts of notification to users is required.
Request for consent must be presented in a dedicated manner distinguishable and separate from anything else on the page, such as contractual clauses or general terms and conditions of use
Keep a record of the exact language used when requesting consent of users and their responses. This could be required if the consent constitutes a legal agreement which could be asked for if an audit is performed on your company in future.
Service messages do not need consent. They include things such as account changes, update password links, billing alerts, and important security messages.
If data is repurposed, develop a mass-emailing feature to inform users to revisit their profile page and recheck their types of processing checkboxes.
If you are providing an online service solely intended for children, you should code in an additional step/pop-up asking for the age of the user and if they are under 16 (ages differs according to EU member state), develop a flow where the child specifies the email of a parent who can confirm on their behalf.
Smart kids will cheat but you’ve done your job as far as your responsibilities are concerned, parents also share a responsibility in safeguarding the privacy of their young ones.
3.2 RIGHT TO BE FORGOTTEN
Covered in:
- [Article 17] Right to erasure
The lowdown:
- Popularly referred to as ‘the right to be forgotten’, data subjects can request a deletion of their personal data.
- Data subjects can make a request either in writing or verbally.
- You must respond to a request within one month.
- This right does not apply in certain circumstances.
Implementation:
• Implement a method in your application which takes a userId and deletes all of that user's personal data. You may or may not want to implement this as button on your user’s admin page, but either way, ensure customers know where to make the request.
• Where records in other tables rely on a user, you may need to delete all related records via cascades or allow for nullable foreign keys. For example, setting the userId for a purchase order to null.
• If the order is used to for accounting purposes or tracking stock, etc. you implement a way to remove a past event and generate an intermediate snapshot.
• Consider how you handle backups. You may want to have a separate table of “forgotten ids” in place and each time you restore a backup, run a function to redelete the data associated with those ids.
3.3 RIGHT TO ACCESS AND RESTRICT DATA PROCESSING
Covered in:
- [Article. 5] Principles relating to processing of personal data
- [Article. 15] Right of access by the data subject
- [Article. 16] Right to rectification
- [Article. 18] Right to restriction of processing
- [Article. 19] Notification obligation regarding rectification or erasure of personal data or restriction of processing
The lowdown:
- Individuals must have easy access their personal data.
- Individuals have the right to request that their personal data be restricted or deleted.
- Data restriction means that an organization must stop using an individual’s personal data, although it can continue storing it.
- When a data subject requests it, the controller should communicate the erasure or restriction of personal data to any third party where personal data has been disclosed. [Article 19] Where this is not possible or involves disproportionate effort. The controller must inform the data subject.
- Data subjects can make requests either in writing or verbally.
- You must respond to a request within one month.
- Must be free.
Implementation:
- Data could be accessible in the user’s profile page.
- Users must be able to fix any inaccurate or incomplete data about them.
- Corrections can be made via a manual support process, but to avoid the effort and expense allow users a means to edit their data via UI form.
- You could implement a dropdown for users to restrict or consent to processing, this could be linked to a database column defined integer reflecting whichever option. Follow up with if-clauses as required throughout your application to restrict processing as necessary.
- A similar delete button could be implemented, this comes down to how much direct control you want to offer users providing they’re aware of how they can exercise their rights).
A good example of the implementation of right to access in the real world is Google maps which shows your location history - all the places you've recently been to.
• If a request is made, you must ensure that data is also deleted on third party APIs linked to your application such as twitter or Salesforce. [Article. 19] They usually provide a means on their APIs to delete such personal data.
• Ensure you also remove public profile pages containing personal data from your website so that they do not appear on search engines like Google. Returning a 404 HTTP status on the page tells the Google crawler to remove it from their index.
3.4 MINIMISATION
Covered in:
- [Article 5] Principles relating to processing of personal data
At a glance:
- Data should be collected for specific, explicit and legitimate purposes.
Implementation:
- Figure out the type data you need to store and only save those values for your use case. Collect the bear minimum required and document it - you probably do not need to know gender if you’re seeking consent to send users newsletters, so don’t collect it.
3.5 DATA PORTABILITY
Covered in:
- [Article 20] Right to data portability
At a glance:
- Users have the right to obtain and reuse their personal for their own use.
- Their personal data should be easily accessible for moving, copying or transfer from one IT environment to another in a safe, secure and working manner.
- This allows individuals to take advantage of this data by pushing it to services that could for example, help them understand their spending habits or expand on DNA testing if you possess their genetic data.
- The right applies only to the information which the controller has received from a data subject.
Implementation:
- On a user’s admin page, code in an “export data” button. The user should receive all the data you hold about them when clicking on it.
- That data should include all the data you would delete in Article 17’s right to be forgotten but may include additional data such as a user’s order history.
- The structure of the dump is not defined in the law but most organisations would opt for either JSON or XML, conforming to the standards of the definitions found on schema.org as far as possible. If the data is not overly complex, an export of CSV / XLS would be fine too.
- Exporting data can take time, especially if there is a lot of it. The ‘export’ button could trigger a background process and notify the user when the user via email when it’s ready.
3.6 DATA RETENTION
Covered in:
- [Article 8] Principles relating to processing of personal data
At a glance:
- If you have collected data for a specific and temporary reason (e.g. shipping a product), it must be deleted / anonymised as soon as it is not needed.
Implementation:
- Schedule a task/cron to periodically go through and anonymise data, but only until a certain conditions are met – e.g. the product is confirmed as delivered.You could store the the deadline for deletion in a db field for the product and extend it in case there's a delivery problem.
- Think carefully and treat each category of data differently, for example auto-deleting old browsing history if you only use the previous month to drive your recommendations system.
- Where it is not possible to delete data, it must be anonymised or pseudomised until it isn't possible to identity an individual even when combined with other data points.
- If you archive this data, don't keep it in the active databases simply noting it as 'archived'.
- Only a specific service responsible for accessing and removing data should have access to the archive (intermediate archive).
- Archived data should have specific access modes as the use of an archive must be on an ad hoc and exceptional basis.
- Don't solely rely on automatic purge systems but also introduce manual reviews of stored data into your internal processes.
- Introducing solid retention and data management policies is all-round good practice as it ensures that the data stored about users remains relevant and saves on storage costs.
- Log automatic deletion processes which can be kept as proof of the deletion action.
3.7 DATA SECURITY
Covered in:
- [Article 30] Records of processing activities
- [Article 32] Security of processing
At a glance:
- It’s important that the organization processing personal data adopts good technical practices that protect the stored personal data in their care.
Implementation:
- Enforce authentication before any access to personal data
- Use SSH keys and adopt state of the art cryptography, key length algorithms, protection of private keys with a passphrase and key rotation.
- Restrict communication ports to only those necessary for the proper functioning of installed applications and block other ports by your firewall.
- Encrypt data in transit between your application layer and database (or your message queue, etc.) over TLS
- Your certificates could be auto-signed (and possibly pinned) or you could use an intern CA.
- Some databases require gossiping among the nodes which should also be set up to use encryption.
- Encrypt your backups.
- Encrypt your data at rest.
- Do not store passwords in clear text, hash them using a proven library like bcrypt.
- If you want to use production data for test/staging servers, either build a dummy data set or change any sensitive field to a "pseudonym", theoretically could make use of a 'pseudonymizing' method implementing a hash+salt/bcrypt/PBKDF2 encryption. Some database have these features built-in, eg. Oracle.
- In order to conform with Article 30(1), consider logging all access attempts to your data, so you know who has accessed what and for what reason. This isn't specifically mentioned in the article but is one way to address it's emphasis on accountability.
- Consider the risks associated with any development tools, including risks related to SaaS (Software as a Service) and collaborative cloud tools (such as Slack, Trello, GitHub, etc.). Make sure you keep secrets and passwords out of your folder with source code!
- If you use cookies for authentification. 1. force the use of HTTPS via HSTS 2. use secure glags 3. use the HttpOnly flag.
- Address security in your documentation to ensure consistent practices overtime.
4. Final notes
Although developers maintaining mature systems prior to the introduction of GDPR weren't as lucky, adopting good GDPR compliance for new applications proves significantly easier than shoehorning them in retroactively.
Many of the considerations presented above are generally good practice anyway, many developers and software designers have already up'd their game and incorporated privacy by design in their work in the same way they think about accessibility, performance and security.
It's not all quite as daunting as it looks, and shouldn't be, it's just a way in which we must develop.
As well being mindful of them during the design and testing stages of your app, I also suggest drawing up a sort of checklist and regularly auditing their implementation among your colleagues/dev teams.
Communicating openly and honestly on how you use data to improve user experience on your platform demonstrates that you value their data which can only have a positive impact on your credibility and organisation's trust. Especially in a world where most reputable organisations are already onboard.
Using robust and well-maintained data protection policies will improve confidence in the analytics enhancing customer experience and helping to minimise cyber-security flaws. Providing you're honest and don't spam your users with disingenuous marketing material, they'll be happy to grant you consent.