Manual Personal Data Audits are not Scalable

Many companies build data policies around manual audits and aim to comply with GDPR in a similar way. Manual data audits usually consist of surveying staff to see where they are storing what files and what types of data are in those files. They rely on humans to remember and accurately report where they have stored data. Manual audits for structured databases, often consist of a lucky staff member literally sitting in front of a terminal looking at a database explorer. It is mind-numbing, ineffective, and only provides a one-time snapshot while data is changing by the second.

It is critical that certain classes of data, Personal Data in specific, are treated better. 

  • Your customers expect you to process their data in accordance with the policies that were explained to them when you received their data. 
  • GDPR (and other current and upcoming regulations) now requires you to keep this promise and build "Privacy by Design" into your processes to ensure that you are correctly processing Personal Data. 
  • You need to know where you have Personal Data in your organization (identification of Personal Data), why it is there (legal basis for processing), and, if it is not supposed to be there, remediate it (delete the data).

The problem with how modern business processes work with IT systems is that it is very easy to simply copy and paste files, dump data from one system into a spreadsheet, and simply add new fields and columns structured databases. Data changes. All. The. Time. The result is that Personal Data builds up in your organization where it should not be.

Setting up the Data X-Ray to consistently monitor and audit Personal Data processed within your organization is easy. The immediate benefit is that you will be able to more easily audit data in a consistent way but the long term benefit is better data hygiene that will allow you to find and leverage data in your organization that you were not using before. This article shows you how to get going in just a few steps.

Determine the Scope of Datasources Included in the Auditing Process

The first step to setting up a Personal Data audit is to decide what you are going to audit. Normally we recommend to start small in scope (such as a couple of Windows shared drives or Google Drives) to ensure that you are getting satisfactory results from your audit. Subsequently you can expand that audit across other datasources that you might want to include.

Build an Classifier that Represents Personal Data in your Organization

This step requires a background knowledge of how the Data X-Ray builds models for data classification. If you have not done so yet, we recommend checking out our introductory articles on Classifiers and Classes before continuing.

The default Personal Data Classifier the Data X-Ray out of the box gets many of our customers a fairly long way towards accomplishing a basic Personal Data audit on the first try. However, in conducting a Personal Data audit on large amounts of data over time, you should consider how Personal Data is represented in the datasources you audit to get even better, higher resolution, results. This will vastly reduce false positives and save you and your staff even more time in the long run.

For instance, if auditing a marketing division in your organization, a Classifier focused on the Classes likely to be found in a marketing datasource ecosystem, such as IP addresses, demographic cohorts, and postal addresses would provide you the highest resolution results. Likewise, if auditing a human resources division, a Classifier focused on Classes like national ID numbers, postal addresses, and employee ID numbers would make sense. You can and should build multiple slightly tweaked Classifiers depending on the datasource scope that you are choosing.

When building a Classifier for Personal Data auditing, we recommend that you first test it out on a couple of known datasets and view whether the results are acceptable or not. If you get acceptable results on the first try, great! However if not, you can easily add new Classes to a Classifier by adding additional training data to an existing Class, building new AI Classes, add dictionaries, or build new regular expressions.

Schedule Periodic Personal Data Audits

The Data X-Ray makes it very easy to set up periodic audits so that you can ensure that your employees properly manage Personal Data over time. If you click into any datasource you will see a "Scan Scheduler" box where you can choose the frequency at which you would like to audit a datasource.
(Soon you will be able to manage scanning schedules for all of your datasources from the front page.)

For your most sensitive datasources, you might want to audit once a day. For less sensitive datasource, once a month may be sufficient. You want to attenuate this not only to reduce your scanning cost but also to reduce workload in case anything is found. It may be acceptable for some business processes to have a monthly or quarterly round up of Personal Data management issues but critical for other business processes to immediately identify and correct such data management issues.

Notifications and Viewing Audit Reports

Every time Data X-Ray scans your datasources, you will get an email report with the results and some high level statistics about how that datasource changed since the previous scan. 

You can then click in and deep dive on that datasource to see what might have changed and what you might need to do about it. You can then use the Data X-Ray's native explorer features or export those results into business intelligence and visualization tools like Tableau.

If you have any questions, please feel free to contact us at any time through the Intercom button at the bottom right, through email at [email protected], or give us a ring at +44 20 8133 7236 / +1 415 800 2913 (Monday-Friday, 8:00-17:30 London time).

Did this answer your question?