The need for information governance and data classification to comply with GDPR

As the new General Data Protection Regulation (GDPR) approaches in May 2018, companies based in Europe or that own the personal data of people living in Europe are struggling to find their most valuable asset in the organization—their sensitive data.

New regulations require organizations to prevent any data breaches of personally identifiable information (PII) and to delete any data when requested by certain individuals. Once all PII data has been deleted, the company will need to prove to the person and authorities that the data has been fully deleted.

Today, most companies understand they have an obligation to demonstrate accountability and compliance, and thus begin preparing for new regulations.

There is so much information out there on how to protect your sensitive data that people can get overwhelmed and start pointing in different directions, hoping to hit the target exactly. If you plan data governance ahead of time, you can still meet deadlines and avoid penalties.

Some organizations, primarily banks, insurance companies, and manufacturers, have vast amounts of data because they generate data at a faster rate by changing, saving, and sharing files, creating terabytes or even petabytes of data. The difficulty for such companies lies in finding their sensitive data amidst millions of files, structured and unstructured data, which unfortunately is an impossible task in most cases.

The following personally identifiable data is classified as PII according to the definitions used by the National Institute of Standards and Technology (NIST):

o full name

o Home address

o Email address

o National Identification Number

o Passport number

o IP address (when linked, but not the PII itself in the US)

o Vehicle license plate number

o Driver’s license number

o Human face, fingerprint or handwriting

o credit card number

o Digital identity

o date of birth

o Place of birth

o genetic information

o phone number

o Login name, screen name, nickname or handle

Most organizations with PII of European citizens will need to detect and prevent any PII data breaches and delete PII from company data (often referred to as the right to be forgotten). Official Journal of the European Union: Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 states:

“The supervisory authority should monitor the application of the provisions under this Regulation and promote its consistent application across the EU in order to protect natural persons processing their personal data and to promote the free flow of personal data within the internal market.”

In order for companies with PII of European citizens to be able to facilitate the free flow of PII within the European market, they need to be able to identify their data and classify it according to the sensitivity level of their organizational policies.

They define data flow and market challenges as follows:

“Rapid technological development and globalization have created new challenges for the protection of personal data. The scale at which personal data is collected and shared has increased significantly. Technology has enabled private companies and public institutions to use personal data on an unprecedented scale in order to conduct their activities .Natural persons are increasingly making personal information publicly available on a global scale.Technology has transformed economic and social life and should further facilitate the free flow of personal data within the Union and transfers to third countries and international organizations, while ensuring the High protection of personal data.”

Phase 1 – Data Detection

Therefore, the first step that needs to be taken is to create data lineage, which will help understand where their PII data is located across the organization and will help decision makers detect specific types of data. The EU proposes automated technology that can process large amounts of data through automatic scanning. No matter how big your team is, it’s not a project that can be handled manually when faced with millions of files of different types hiding in different areas: in the cloud, in storage, and on your local desktop.

The main concern for these types of organizations is that if they fail to prevent data breaches, they will not be compliant with the new EU GDPR regulations and could face heavy fines.

They need to appoint specific staff to be responsible for the entire process, such as a Data Protection Officer (DPO) who primarily deals with technology solutions, a Chief Information Governance Officer (CIGO), usually a lawyer responsible for compliance, and/or a Compliance Risk Officer ( CRO). This person needs to be able to control the entire process from start to finish and be able to provide management and authorities with full transparency.

“The controller shall, in particular, take into account the nature of the personal data, the purpose and duration of the proposed processing operations, and the circumstances in the country of origin, third countries and the country of final destination, and shall provide appropriate safeguards to protect natural persons who process their personal data fundamental rights and freedoms in relation to

PII data can be found in all types of files, not only PDF and text documents, but also image documents – such as scanned checks, CAD/CAM files containing product IP, confidential sketches, code or binary files wait. Common techniques today can extract data from files, making the data hidden in text and easily found, but the rest of the files in some organizations (such as manufacturing) may hold most of the sensitive data in image files. These types of files cannot be accurately detected, and without the right technology to detect PII data in file formats other than text, one can easily miss this important information and cause significant damage to the organization.

Phase 2 – Data Classification

This phase consists of behind-the-scenes data mining operations created by automated systems. The DPO/controller or information security decision maker needs to decide whether to track specific data, block the data or send a data breach alert. In order to perform these operations, he needs to view his data in different categories.

Classifying structured and unstructured data requires comprehensive identification of the data while maintaining scalability – efficiently scanning all databases without “boiling the ocean”.

DPOs also need to maintain data visibility across multiple sources and quickly present all documents related to a person based on a specific entity (eg: name, DOB, credit card number, social security number, phone, email address, etc.).

In the event of a data breach, the DPO shall report directly to the top management of the controller or processor, or to the Information Security Officer responsible for reporting the breach to the relevant authorities.

Article 33 of the EU GDPR requires this breach to be reported to the authorities within 72 hours.

Once the DPO has identified the data, his next step should be to flag/flag the files according to the sensitivity level defined by the organization.

As part of meeting regulatory compliance, organizational files need to be accurately tagged so that they can be tracked internally, even when shared outside the organization.

Phase 3 – Knowledge

Once the data is tokenized, you can map structured and unstructured personal information across networks and systems, and it can be easily tracked, enabling organizations to protect their sensitive data and enable their end users to safely consume and Share files for increased data loss prevention.

Another aspect to consider is protecting sensitive information from insider threats—employees who try to steal sensitive data (such as credit cards, contact lists, etc.) or manipulate data for some benefit. These types of behaviors are difficult to detect in a timely manner without automated tracking.

These time-consuming tasks apply to most organizations, driving them to find effective ways to gain insights from their enterprise data so they can make decisions based on them.

The ability to analyze patterns within data helps organizations better understand their enterprise data and pinpoint specific threats.

Integrated encryption technology enables the controller to efficiently track and monitor data, and by implementing an internal physical isolation system, he can create data geo-fences through individual data isolation definitions, across geographies/domains, and report sharing violations as soon as rules are violated. Using this A combination of technologies, the controller can enable employees to send messages securely within the organization, between the right departments and outside the organization without being overly blocked.

Phase 4 – Artificial Intelligence (AI)

After data is scanned, flagged, and tracked, the added value to organizations is the ability to automatically screen for anomalous behavior in sensitive data and trigger protections to prevent these incidents from turning into data breaches. This advanced technology is known as “artificial intelligence” (AI). The AI ​​capabilities here typically consist of powerful pattern recognition components and learning mechanisms to enable the machine to make these decisions or at least recommend a preferred course of action to the data protection officer. This intelligence is measured by its ability to get smarter with each scan and change in user input or data mapping. Ultimately, AI capabilities build an organization’s digital footprint as an essential layer between raw data and business flows around data protection, compliance, and data governance.

About the author


Leave a Comment