Our 1st video focusses on cleaning an EPG file of rogue data using Spotless Machine Learning filters
Spotless is a web service to filter incoming feeds as they come into your data platforms. It's designed to be easy to incorporate into your applications to ensure that you always have clean data to work from. There are many different rules to validate whether data is correct and a number of different options for how invalid data is handled.
There is an example file you can download here for your testing
Spotless filters set up validation rules for each field and have a number mult-field validation rules are well.
We have four key ways in which we can validate fields
There are also two types of multi-column validation rules:
When you execute a filter, Spotless creates a job which then provides a report on the cleanliness of the file and actions that have been taken.
In order to run Spotless you can start by uploading a CSV file here, customise the rules, and then hit “run”.
Spotless is typically used to cleanse multiple records in a file. You can process any kind of text file for processing. Spotless supports any CSV style data files and delimiters and encoding can be customised
Typical use cases are:
By implementing Spotless you can reduce the failures on your ingest processes and reduce the time spent diagnosing data issues.