The Spotless data validation API

Data and various connected terms on a blackboard to illustrate the importance of data validationa

The need for effective data validation has never been more important due to the explosion of big data across most sectors of the modern economy.

We are a group of experienced data scientists who have started Spotless Data because of our experience of how the recent massive explosion of data has primarily affected business sectors with little experience of big data. Retail, manufacturing, healthcare, the media industry, transport and the financial sector, among others, are all new to dealing with the sheer volume of new data coming into their platforms, and which they need to stay alive in the competitive world of 2018, where data is the new oil. All this data is in need of data validation as it enters the platform. We know from long experience that the difference between clean data and dirty or rogue data is critical to the health of modern businesses. While in recent years the connectivity of the Internet drove a lot of the data, the Internet-of-things is increasingly starting to generate massive amounts of data as well, which companies and their IT systems need to store, analyse and understand, as well as using them directly in their platforms.

How does Spotless Data work?

Our team has developed Spotless Data's machine learning filters which validates the data entering the platform. We know that the business rules which you want to use are going to be different from those of companies operating in a completely different type of business. Our resolution to this is to offer 500 Mb of free data cleaning. Then, when you log in and go to your my filters page, you can upload the data you want to be validated, in a CSV or TSV format. You will receive a report (within a few seconds) that will allow you to set a series of specifications of how your datasets should fit together, and it is these specifications which are actually going to validate the data so that they can then be integrated into your data pipeline, to then enter your platforms and data repositories. So you can try out our service and see how good it really is while our machine learning filters can learn from your dirty data and from the specifications you set for your datasets. It may be that you need to play around with the specifications until you get the results that are going to be perfect for you. At this point, the functions can be automated so that your known types of dirty data can be validated without the need for human intervention every time. And as more of the same type of dirty datasets are validated so our machine learning filters, which use artificial intelligence, become better at the tasks set to them.

Big data are by nature complex, and, even with known sources of rogue data, there are times when unexpected errors arise. However, our machine learning filters are especially good at spotting when an error cannot be resolved, and, in these situations, the data are then quarantined, and you will be alerted via the contact details you submitted when you signed up to be able to use the service.

Those companies which have already started using Spotless Data's data validation solution and incorporated it into their workflow have consistently reported substantial savings in terms of the costs of managing their data, not to mention the tremendous financial benefits gained from having good quality data. We do not think it is an exaggeration to state that the winners in 2018 and going forward during the foreseeable future are those who have data which are sufficiently validated so that they can be said to have data quality which are trustworthy.

The rapid, seamless and reliable data solution

We cannot emphasise enough, and based on the working experience of the members of our data science team, the importance of ensuring that only reliable and high-quality data can be present in your data platforms, particularly given how much data now flows across organisations. Properly configured, Spotless can sit inline with your data pipeline, ensuring that the only data which hit your systems are those which have the seamless data quality which your business is so crying out to have. Our configurable business rules are ideal, once set up, to automatically and rapidly correct errors in your data. They can also compare multiple datasets for consistency, a particularly important issue when it comes to data from multiple sources. Given both inconsistencies of data from within different departments of the same organisation and the increasing use of data from third-party sources, inconsistency is a rampant problem within data which have not been validated. In order for data to get past Spotless' machine learning filters, they need to meet every criteria you have set up in the configuration process. This eliminates all inconsistencies.

Why are people using the Spotless data validation solution?

The main advantage of using Spotless to validate your data is that problems are solved before they even occur, and even better they are solved without the need to be having daily checks and reports about the state of your data. We take the strain, and you get guaranteed clean, validated data, every time. This then allows you and your various operating teams to focus on what is really important to your business in the secure knowledge that your data will work both to do what you want them to, and other things which had not originally occurred to you and your team, such as using the dark data you have, or simply using the data entering your pipeline for new purposes that have only just occurred to you and your team, perhaps suggested by an artificial intelligence programme you are developing.

Many firms which fail to address their rogue data issues find that their data pipelines keep going down every time that the data entering them fail to meet the specifications that the business requires for the data to do the things they are meant to do. Unfortunately, this can mean data pipelines crashing multiple times daily though even crashing once is too often for any company in a competitive industry. When the data don't work correctly, the results can be catastrophic.

Another advantage of using Spotless is to speed up the data loading process, as all the checking of syntaxes and data validation have already been done by us. Our customers invariably report their data working more rapidly and seamlessly once Spotless has been integrated into their system.

One of the key areas where validated data show themselves to be effortlessly superior is in terms of internal reporting. This reporting process is used both to give an accurate picture of how the business is doing in any given moment and to help in the critical decision-making that is bound to be a disaster if those decisions are made on the basis of what rogue data are telling the decision-makers in the company.

Data scientists in the US expect to make $100,000 a year when they are starting out and are likely to be charging more very soon as there are simply not enough to go round. Business sectors which did not need to employ a data science team until very recently are suddenly chasing after at least one or two data scientists, recognising how necessary they have become in the last couple of years to keep any business profitable and giving the all-important customer satisfaction. However, data cleaning is far from the only task data scientists engage in, and one great advantage of incorporating Spotless into the entry point of your data platforms is that the data scientists' time will then be freed up to focus on solving business problems rather than just fixing data.

Spotless Data for all your data validation problems

Spotless Data can now offer offline processing of your data to ensure GDPR compliance. You can read our introduction to using our API to validate your data. You can take advantage of our free offer of 500Mb of free data cleaning to see how much you like our service. If you haven't already done so, you can sign-up using your email address, Facebook, Google or GitHub accounts. You may also view our video on data cleaning an EPG file, which also explains how to use our API.

We use the https protocol, guaranteeing your data are secure while they are in our care and that they are not accessible by any third party, a responsibility we take very seriously.

Here is a quick link to our FAQ. You can also check out our range of subscription packages and pricing. If you would like to contact us, you can speak to one of our team by pressing on the white square icon with a smile within a blue circle, which you can find in the bottom right-hand corner of any of the web pages on our site.


If your data quality is an issue or you know that you have known sources of dirty data but your files are just too big, and the problems too numerous to be able to fix manually please do log in and try now