The job of the machine learning output of the Data X-Ray is to build actionable results to decrease time to value for humans in making positive change in the business.

Scoping the Problem

Most employees do not intentionally misuse or misplace data. However, the reality of modern working practices means that data often ends up where it shouldn't be. This can be as obvious as forgetting to delete an XLS file that includes sensitive data after building a presentation graph or as obscure as adding columns to a database that should be pre-approved by compliance and security teams. It's hard.

The main issue is that employees just aren't that good at keeping track of where the data that they use is. They have quarterly goals, client deadlines, and families to get home to. Additionally, systems may be old and are constantly changing. Whenever someone creates another file or copies and pastes data, there are new compliance and commercial risks created for the organization that humans simply cannot cope with in a scalable way.

In this article we will talk about how to use the Data X-Ray to identify and remediate sensitive data so that your organizations has the tools and data that it needs to become more secure, get hold of data, and ultimately drive growth.

Remediation in Three Steps

Step 1: Connect your Datasources

Connecting the datasources that you want to remediate is the first step. You need to scope the probable high risk areas of your organization. We recommend that you start off on a limited number of datasources first so that you can get quick turnaround on results and iterate on what is working and fix what is not. 

In most organizations, unstructured (Word documents, PDFs, Excel, etc.) datasources are often a large problem. Think about where these are stored in your organization. Often the answer is Windows shared drives, Google Drive, and other file storage solutions that an enterprise might use. Go ahead and connect to one or more of these datasources for your first scans and go to Step 2 to set up your Classifiers.

As you begin to get better and better results in your scans, you can start to connect more datasources and have conversations within your organization to drive the change that your project requires.

Step 2: Setting Up a Classifier

By now you should have read the articles on getting acquainted with what rules are within the Data X-Ray and how to build customized Classifiers and Classes yourself. If you haven't done so yet, we recommend that you scan those articles before continuing.

When going about setting up a Classifier for your remediation activities, you need to think about the kind of data you want to identify. If it is personal data that you are concerned about, build a targeted Classifier for personal data. If it is corporate secrets that worry you, build a Classifier that includes identifiers for corporate secrets. As you iterate on the initial datasources and start to obtain good results, you can then go back to Step 1 and start to connect other datasources and use the Classifier that you built on other datasources that you have.

Step 3: Drive Change with Proof Points
After you have started to get good results you are going to be able to reach across the organization and see where there are potential risks, who owns that data, and how it is being handled. You can start adding labels to that data for future remediation efforts.

In the image above, you can see that we have found a couple of items that have been flagged as sensitive with the red mark. In this case we were searching for personal data and found that there was a item with possible unencrypted client data. Therefore we went through a labeling exercise to mark this item for encryption. We found another item with customer data that should not have been there and we have marked it to review with the team and delete it.

After you have finished your labeling, you can get an overview of all of your datasources within the organization in the Data Tracking tab. Here you can build a bird's eye view of which data items within your organization require what effort and start having conversations within your organization about how these data items should be remediated.

You can also deep dive on particular action items to keep track of what has been done over time.

Finally the "Add Label" button in the top right allows you to customize workflows in case you want to build out other labels that might be better for your particular internal business processes.


In this tutorial we discussed a high level process for remediating datasources using the Data X-Ray. We showed how to connect, scan, and build workflows for data remediation. However, more importantly we showed how to empower your workforce with the data that they need to do better at their jobs. The Data X-Ray automates scanning through data and taking internal audits of datasources so that your teams can start having more productive conversations, stay compliant with regulations and policies, and more effectively leverage their data.

Did this answer your question?