How to trust your data again

Pressing trust on a keyboard easy when having data quality with Spotless

The best way to trust your data again is to ensure they are quality data by using Spotless machine learning Filters.

At the heart of Spotless Data Quality API solution lie our newly developed machine learning filters for data cleaning, filtering out the dirty data containing mismatches, duplications, and other corruptions. Many of these dirty data issues are predictable and are caused by a mixture of mistakes, often caused by human error, and the fact of having multiple incoming data feeds, none of which have been standardised.

Often the data in your company are required in real-time so to be able to quickly and thoroughly clean the data feeds before they enter your data lake or data warehouse is essential. These data are also increasingly likely to be big data, so huge that even a large team of humans would be unable to clean them quickly or thoroughly enough to be of any use to a modern company in a data-driven world where it is those companies with the best quality data who are going to be the winners in the eternal struggle to stand out and offer the highest levels of customer satisfaction.

Simplifying data cleaning

The reason Spotless have developed the idea of machine learning filters is in order to make cleaning your data simpler than ever before so that anyone can have the benefits of clean data. You simply upload the file that you believe contains dirty data and Spotless will immediately analyse the file without actually changing anything and recommend the various cleaning filters which we consider, thanks to our Machine Learning patent-pending algorithm, that the data in the file require in order to be spotlessly clean. Using an API which is accessible anywhere where there is an Internet connection, our easy-to-use drop-down boxes then allow you to configure the actual cleaning processes you want for your data that will then be used to clean the file.

In the past, when there were fewer data available, it was simple enough to trust your data. In 2017, due in large part to technological improvements in the way data are collected, and the need to use big data analytics in order to remain competitive, even or perhaps especially in non-IT businesses which typically haven't been so prepared for the challenges of big data, it has become much harder to trust your data. And the situation can only get worse. Recognising this reality, our basic mission in life at Spotless is to allow you to trust your data again!

Three areas where data cleaning is critical

We have identified three different areas where data cleaning is critical to the success of your organisation and where our Machine Learning filters will ensure that your data do what you want them to do, and more besides.

1. Data collection

Typically data collection means gathering data together from many different sources, both within and often outside the organisation as well. With Spotless API the data which have been collected are validated by us in order to exclude errors. The data are also monitored and analysed to distinguish different types of data, different ranges (this might be a range of dates, eg if all the dates are within the range of 2017 except for one which is dated from 1917 that would be flagged as a possible error) and also different incoming values (if one list only contains numbers apart from one entry which contains unexpected text, perhaps of a number written longhand, eg thirty-seven instead of 37, this would also be flagged).

2. Data warehousing

A data warehouse allows a company to store all of its data in the same place, for easy access by all appropriate parties. While those factors relevant to data collection are also relevant to data warehousing. Given that a data warehouse will have a great deal of new data coming in on a regular basis (perhaps several times a day) Spotless have devised solutions to five known dirty data issues in order to check this incoming data. This is where incorporating Spotless into your data workflow so that these incoming data are cleaned before they are incorporated into the warehouse makes a great deal of sense, ensuring that dirty data do not enter your warehouse and mess it up, often in incalculable and profoundly negative ways. Having a pristine clean data warehouse is the sign of a winner.

3. Data analytics

One of the main reasons for having a data warehouse is to engage in successful data analytics in order to extract the maximum relevant information or business intelligence from your data in order to understand how your business is progressing, make decisions for the future and guarantee maximum customer satisfaction. However, building analytic models is not an easy task even with the best data quality. So it is really essential in preparing for and doing this analytics that any and all sources of dirty data do not intrude as the end result will be analytics that are so skewed that it would probably have been better not to have engaged in the data analytics in the first place. Business intelligence and the analysis of the various factors of interest within your company need to be accurate in order to ensure that bad decisions are not made, not in any way due to the incompetence of the analysts but because dirty data has created a false impression or even a series of false impressions.

So if the dirty data make it appear that your actually successful new advertising campaign hasn't reached expectations and then you cancel said campaign this could have a catastrophic knock-on effect on your business as well as wasting the valuable time and resources of your advertising team who you have now ordered to create a new, and substantially different, advertising campaign because you wrongly believe the one they have developed over a number of months was a failure. As we have seen again and again in our work with clients at Spotless, it is dirty data which can cost a business more than any other single negative factor as it strives to make money by giving the customer brilliant products and services that they really do want.

The same can be said for any datasets that do not fit your requirements. By using Spotless API you can ensure that only relevant datasets are used in your data analytics in order to give your organization the greatest possible chance of being successful, to benefit you, your employees and your customers!

Using Spotless Data to trust your data again

Our introduction to using our browser-based API. You can try out our service on your my filters page; however, you will need to be logged in to access this. You can also view our videos on data cleaning an EPG file and data cleaning a genre column which explain how to use our API. You can sign-up to Spotless Data using your email address, Facebook, Google or GitHub accounts.

In order to show you how easily and well our API filters work we are giving away 500Mb of free data cleaning to each new customer.

We guarantee that your data are secure and cannot be accessed by any third parties during the time they are in our care, a responsibility we take very seriously. If problems with the data cleansing process do appear, then an automated flag alerts our data scientists who will then manually review the problem and, if necessary, contact you via your log-in details so that you and they can resolve the issue together.

Here is a quick link to our FAQ. You can also check out our range of subscription packages and pricing. If you would like to contact us you can speak to one of our team by pressing on the white square icon with a smile within a blue circle, which you can find in the bottom right-hand corner of any of the web pages on our site.



If your data quality is an issue or you know that you have known sources of dirty data but your files are just too big, and the problems too numerous to be able to fix manually please do log in and try now