How Spotless Data achieves seamless data cleaning

With spotless data cleaning one can get the data working on all the platforms

When data cleaning all your big data with Spotless Data's machine learning filters API solution you have only valid and integrated data in your data platforms.

We have developed the Spotless Data API solution for all your dirty or rogue data issues realising that the analogy of data as the new oil and data cleaners as the new refineries is only partly correct. While oil refineries are complex and require a lot of investment, the processes in cleaning oil are well known and follow easy to understand procedures that result in perfectly refined oil every time. Would that the refining of data were so simple!

At Spotless we have long known both of the importance of ensuring that your data, which are often big data, are not dirty and of how complicated the data cleaning or refining of these data are. It is not for nothing that data scientists on high salaries spend 60-90% of their working lives simply cleaning data.

Searching for the best data cleaning solution

Before setting up Spotless, we made a thorough investigation of the whole data cleaning market, which is dominated by downloadable software solutions. The two obvious disadvantages of using a data cleaning software app are that said software has to be downloaded onto every device which is going to clean data and that price differentiation is very difficult. This means that the customer who only submits 500Mb of data a year will pay the same as the person who submits 500Gb. Generally, software economic models work by offering a basic service for free and charging money for a premium service. However, we could not see how to make this work. Offering half of a service would leave the data half dirty, totally unacceptable to us. We pride ourselves on the spotlessly clean data that can be the only finished product. Indeed, we offer 500Mb of free data cleaning so you can test our service but what we offer is a complete service, so that you can fully clean the data you submit.

While within the business rules that we have created for automating data cleaning, we do offer three levels of email validation, i.e. reviewing the structure, a domain check and a subscription verification, Spotless does not have a focus on cleaning emails but on cleaning any and all data. This means that if there are no automated rules for your particular data cleaning needs, we will write ones that do work for you.

Yet the real issue that we had with these downloadable software apps is that the software is on your device where it cleans the data you have submitted to it, and there is no interaction with the company. If you aren't actually satisfied with the service, it is probably not for you. After all, what else can you do? Write to them and wait months for a reply? We considered this insufficient. Data is very complex to clean and includes deleting and modifying pieces of data, and we realised that some kind of interactivity between the owner of the data and ourselves was absolutely essential even if it was simply automated activity on our part. We realised that those whose data it is are always going to know it best as well as knowing exactly how it will be intended for use. And this kind of interactivity is very difficult to achieve with a software app.

So we developed our API solution to clean all rogue data from any device that hosts a web browser. And we built this interactivity into the API. So when you go to our my filters page (you need to be logged-in first) you upload your data. Instead of immediately cleaning it for you we give you (in under a minute) a report which identifies what we believe are the areas which need data modification but we leave open the specifications so that you, the owner of the data, can set said specifications to have precisely the data you want. And if there are any really tricky problems that cannot be fixed using our automated processes the data are quarantined, and you are informed as a matter of urgency from the contact details you gave us when signing up for the service.

A cynic, or someone with no experience of data, might say that surely if we need to interact with our users and occasionally quarantine data then our data refining service cannot be outstanding. We argue exactly the opposite. Those apps which offer to clean your data completely blind and with no input from the owner of said data are being excessively naive, in our opinion, about the difficulties involved in producing quality data. And we know from 12+ years of experience as data scientists working with dirty data that it is very easy to underestimate the sheer complexity involved in validating and integrating rogue data so that they become spotlessly clean.

We don't know of a single commercial rival who quarantines dirty data as we do. And these quarantined data fascinate us. We are more than happy to get our top data scientists to put the hours in to fix some new and challenging data issue, generally by writing some new business rule so that the problem can then be fixed automatically in the future. And our experience with Spotless, and based on customer feedback, is that getting the data owners (or their employees) to interact with the data in the way that we do works really well so that the data owners actually get the data they want and need.

Spotless Data's solution

Spotless Data can now offer offline processing of your data to ensure GDPR compliance. You can read our introduction to using our API to validate your data. You can take advantage of our free offer of 500Mb of free data cleaning to see how much you like our service. If you haven't already done so, you can sign-up using your email address, Facebook, Google or GitHub accounts. You may also view our video on data cleaning an EPG file, which also explains how to use our API.

We use the https protocol, guaranteeing your data are secure while they are in our care and that they are not accessible by any third party, a responsibility we take very seriously.

Here is a quick link to our FAQ. You can also check out our range of subscription packages and pricing. If you would like to contact us, you can speak to one of our team by pressing on the white square icon with a smile within a blue circle, which you can find in the bottom right-hand corner of any of the web pages on our site.


If your data quality is an issue or you know that you have known sources of dirty data but your files are just too big, and the problems too numerous to be able to fix manually please do log in and try now