What are Rules Anyway?

Definitions

The Data X-Ray provides out of the box training data and models for general types of Personal Data (names, addresses, emails, phone numbers, etc.) for both English and Japanese and simple data that might be found in contracts (company names, addresses, phone numbers, pricing data, etc.). The Data X-Ray calls models "Classifiers".

Classifiers, in turn are powered by rules known as "Classes" that can be:

  • Training Data: data that are used to power machine learning models within the Data X-Ray to give probabilistic analysis of whether any given text element is close to known text element in a training data set,
  • Regular Expressions: matching true/false statements that a certain line of text either matches or does not match a rule, very much like Excel "If" statements, and
  • Dictionaries: matching true/false statements for lists of words, text, or numbers that you might want to find (useful for finding individuals within large datasets).

Thinking about how your Data Should be Scanned

By default, the Personal Data Classifier is used to scan datasources. However each company (and even teams within companies) is different. A bank for instance might have account numbers while a supermarket might have loyalty point card numbers that they might want to find and classify within their organization. Therefore to get the most value out of Data X-Ray, you need to think about the types of data that are unique to your company or team and build rules that match that data type so that you can find data elsewhere that you might want to identify. 

The Data X-Ray makes it very easy to optimize classifiers based on your own data and needs. In the next tutorial, we'll take you through how the default Personal Data Classifier works and how you can build your own models that might better reflect your own data.


If you have any questions, do not hesitate to email us at [email protected] or simply click the Intercom button to start a chat with our support team.

Did this answer your question?