How to use Spotless Data?

Introduction

Our 1st video focusses on cleaning an EPG file of rogue data using Spotless Machine Learning filters

Spotless is a web service to filter incoming feeds as they come into your data platforms. It's designed to be easy to incorporate into your applications to ensure that you always have clean data to work from. There are many different rules to validate whether data is correct and a number of different options for how invalid data is handled.

 

You should start by uploading a file here and then optimize the rule you want the filter to use. You can then then use the API to integrate Spotless into your workflow.

 

There is an example file you can download here for your testing

 

Key concepts

 

Spotless filters set up validation rules for each field and have a number mult-field validation rules are well.

 

We have four key ways in which we can validate fields

 

There are also two types of multi-column validation rules:

 

When you execute a filter, Spotless creates a job which then provides a report on the cleanliness of the file and actions that have been taken.

 

In order to run Spotless you can start by uploading a CSV file here, customise the rules, and then hit “run”.

 

Uses Cases

Spotless is typically used to cleanse multiple records in a file. You can process any kind of text file for processing. Spotless supports any CSV style data files and delimiters and encoding can be customised

 

Typical use cases are:

 

By implementing Spotless you can reduce the failures on your ingest processes and reduce the time spent diagnosing data issues.