How Machine Learning Filters Cleanse Your Data

Our Machine Learning Filters eliminating rogue data thus ensuring you have spotless data quality.

A simple definition of filters is that they filter out information or data which are not wanted. A more complex definition is that filters take one list of information/data, made of one or several columns, and convert it into another modified list. It does this by examining the content of the list and changing it through an algorithm according to a set of criteria based on some particular goal, such as filtering out blanks or overlaps in an EPG file.

For Spotless, the basic goal is to convert Rogue Data which lies in a dataset into spotlessly clean data that are now ready to enter your data platform. We do this by using our Machine Learning filters, based on Artificial Intelligence, and our own patent-pending algorithm.

Using Spotless Machine Learning Filters

To get a basic idea of how to use Spotless rogue data removing filters you can watch youtube video demos on cleansing an EPG file and cleansing a genre column. One great thing about our filtering process is that, after you upload your data file to our My filters page (you need to be signed-up to our service to see this) and receive a report suggesting any modifications that our automated systems believe the data require, a process that except for the very largest of files will take less than a minute. It is then you, the owner of the data, who is in charge and who customises the filters by setting the specifications of the cleansing process through following the very simple instructions given to you to controls how they work on filtering your data to cleanse them. Then the subsequently scrubbed-up file is sent back to you, meeting your requirements for data quality you can trust in.

Thus it is you who defines the data quality you want so that your datasets will fit together before they actually enter your data pipeline. This cleansing takes no more than a minute or two. You can use comma separated value as well as tab separated value files for when you have lots of data that you require to be cleansed quickly and the whole process should normally take no longer than 5 minutes, as the youtube videos demonstrate.

Complex Cases

There are occasions when the data cleansing is complicated or unusual and we will be unable to cleanse the data solely using our automated processes. In these cases we quarantine the data and then quickly get in touch with you via your contact details by writing you an email so that we can discuss with you how we can resolve the issue between us. As data quality is our passion in life we love nothing more than resolving some new and tricky data cleansing issue that defies even our machine learning filters and so you can be sure of our very best attention in trying to resolve whatever issues may arise. And in the knowledge that, once we ensure you are a satisfied customer by resolving the issue, our Machine Learning filters will then have learnt something new to take forward into future tasks and won't have any problems dealing with this particular complex issue again.

The same is true if for some reason you are not satisfied with any of the results you are getting from our data cleansing and cannot tweak said results to your satisfaction. We are giving away 500Mb of free data cleansing to each new customer precisely in order to allow you to tweak the specifications of the cleansing processes to ensure that any rogue data in your particular datasets are eliminated entirely to your satisfaction. If you aren't fully happy with the results and cannot see any easy way to resolve the issue please refer to our contact details which we send you via email when you first sign-up to our service, where you can directly contact our data science team and discuss with us where to go forward, and they will swiftly respond so that we and you can discuss the issue. We refuse to accept defeat when it comes to cleansing any particular rogue data issue.

Machine Learning the Key to our Filters

Rogue data costs your business in terms of both money and reputation as well as skewing your internal reporting, which itself can have a catastrophic effect on how your business operates, due to poor decision-making that is not because of any incompetence on the part of your decision makers but simply because they are making decisions based on faulty information caused by rogue data lying at the heart of your datasets and needing a good filtering process to ensure their cleanliness.

You will find it incredibly easy to implement our Python API into Airflow Celery or, on the other hand, any other tools you are using in order to build your data pipeline.

We guarantee that your data are secure and  not accessible by any unauthorised parties during the time they are in our care and we take this responsibility very seriously indeed.

A quick link to our FAQ. You can also check out our range of Subscription Packages and Pricing.

 

If your data quality is an issue or you know that you have known sources of dirty data but your files are just too big, and the problems too numerous to be able to fix manually please do log in and try now