Is your data lake just a mud pit?

A magnifying glass examines what is now a spotlessly clean data lake

The benefits of having a sparkling data lake which has undergone data cleaning should never be underestimated.

You’ve just spent the last 18 months consolidating all of your business data into a Hadoop based platform and your CEO is asking you when we’re going to see benefits from your data lake. What do you mean, you say? We’ve already got Tableau deployed and we have managers across the business able to query any of our datasets in real time and use them across the business. The problem is that even though the managers have access to the data they do not have data integrity and so they frequently cannot use them to make solid business actions.

I’ve frequently seen maps with the comment “10,223 null” in the bottom right-hand corner simply because address data is not accurately tagged and cannot be used for geographical tagging. How about when data comes from different systems? Your data lake thinks your most popular product is something you discontinued in 2013, just because all of the products since then have been sold through multiple channels and each channel partner provides the data in a slightly different format. None of the data have been validated to ensure data validation.

At Spotless, we find that the success of a data project doesn’t rely so much on the technologies you use as the quality of the data. You can have the best Hadoop or Teradata cloud with the greatest Qlik or Tableau deployment since the dawn of history, but if the data is junk then the business isn’t going to be able to make any use of it. We’re focussed on the problem of cleaning that data, and by building our APIs into your ingest workflow you can make sure your data lake is as clear as drinking water.

Now that’s progress.

Take a look at our introduction to using our browseable API. You can also test our API on your my filters page though you will need to be logged in first to see this. You can sign-up using your email address, Facebook, Google or GitHub accounts. You can also view our videos on data cleaning an EPG file and data cleaning a genre column which explain how to use our API.

In order to help you demonstrate for yourself how smoothly our API works we are giving away 500Mb of free data cleansing to each new customer. Data cleaning has never been easier thanks to Spotless Data's machine learning filters. Here is a quick link to our FAQ. You can also check out our range of subscription packages and pricing.

Spotless data, the one-stop data quality solution API!

If your data quality is an issue or you know that you have known sources of dirty data but your files are just too big, and the problems too numerous to be able to fix manually please do log in and try now