When applying a magnifying glass to data which has data quality all is seamlessly perfect.
In the past, data was relatively easy to understand, and so the concept of data quality had little importance. If data had poor quality then somebody would have to spend a few days working through all the pieces of paper which contained them to fully grasp order out of the chaos and that was the end of the matter, perhaps until the following year when the same process would be repeated.
However as data began to be generated on an ever increasing number of interconnected personal computers the sheer amount of data., now known as big data, in the world and used by businesses began to explode exponentially, creating huge problems in terms of data quality. According to Forbes, the amount of data worldwide doubled in the 2 years up to Autumn 2015 with estimates that by 2020 there will be 1.7Mb of data produced about every human every second, with a total of 4.4 zettabytes of data worldwide. Other reports talk about human knowledge (data) doubling every 12 months, soon to be doubling every 12 hours. And almost all of this data will be crude data which will fail the basic data quality test of being fit for purpose, that is, they will be rogue data, opaque and almost impossible to understand, unless they have specifically undergone data cleaning.
This is why we started Spotless Data, developing our machine learning filters solution to ensure that your company's data quality is exceptional by modifying faulty data and removing unwanted data. Our goal is that, once you have passed your data through our swift and easy-to-use data cleaning Python API, your data will all have data integrity and be fit not solely for the purposes you have in mind right now but for any new purposes which occur to either your own artificial intelligence programmes or your brightest, most innovative staff.
At Spotless Data we believe in setting the benchmark as high as possible when it comes to data quality, and then it will be for our competitors to reach our high standards. We recognise that big data are now so massive that even a team of specialists working all year round would struggle to manually ensure that all the data have the level of data quality which 2017 demands.
The only viable solution is to apply artificial intelligence to the task of getting the best data quality out of data which are just too big and complicated for humans to deal with. And with our awareness of the huge complexity and variety of data and the need to ensure their data validation for any task, it made sense to us to use the latest machine learning insights which learn from their experience of cleaning data. This would help our clients who come to us time and again with the same data issues which we can now fix in just the way they require. It would also allow new clients with new data problems to be able to experience the joys of data quality as our Machine Learning solution the new data, produced a report suggesting the changes required and then let the client decide precisely which specifications to apply to get the data quality they so urgently need. The more clients who use our service, the better it becomes.
So while variety in data cleaning problems can be a nightmare for data scientists, who mostly feel data cleaning distracts them from the "more important" work that they often rightly feel they should be doing, these new data challenges are a joy to us. They allow our Machine Learning filters to learn something new today which they can then apply in the future, again and again.
Data are either used in specific ways, such as to display and sell the products being sold on a shopping website, or else they are analysed, using software analytics, before being converted into business intelligence about the shopping website. Raw data which are of mediocre quality can perhaps successfully display and sell the products, but they are unlikely to be able to give the kind of intelligence required to anticipate say what customers are likely to want to buy in the future or what goods might be worth putting on a special offer. Intelligence about the products sold in the run-up to Christmas can be used to place orders with the manufacturers to make the decisions as to what quantities of which goods to produce for the run-up to next Christmas. These same data can also be used to ensure that the shopping website's warehouse is refilled with the right number of goods to replace those already despatched in the run-up to this Christmas. This means that what the customer chooses to buy is always available and that the site doesn't have to sell an excess of products in a half-price sale in January because it overestimated demand. What all this requires is swiftly available data quality which is capable of being used in multiple ways.
When data quality is poor, it may be possible to extract a certain amount of useful information and intelligence out of the data. This may lead your company executives to think data which are just okay are acceptable even though they would fail any reasonable data quality tests, simply because they can do the very specific thing they were designed for, such as to ensure the warehouse remains stocked this Christmas, even when by December 20th the best lines have all run out and there is excess stock in other lines. And then these executives just hope for the best next Christmas while, if things go wrong, they blame the suppliers and try to get them to pay the cost of what they claim is the "surprising" lack of demand for certain products and lack of supply for others.
However, this may not be so easy as it seems when your main rival is using our data cleaning solution, maximising their data quality and can do things which seem beyond your company, such as successfully anticipating demand and even recognising the new trends which are occurring this Christmas which your company, at any rate, failed to see, based on last year's Christmas sales. Perhaps your rival was using dark data, which your company has in abundance as much as their company does, but which your staff failed to recognise let alone use because the poor quality of your data impeded this from happening.
The International Data Corporation have estimated that 90% of data are dark data. These data can only be mined when they are quality data. But when they are successfully used they can make the great difference between success and failure, especially when used in complicated areas which require good judgments such as the shopping website's marketing and advertising campaigns for this Christmas. If you are based in the UK but discover that a surprising 10% of those who look at your website come from the Republic of Ireland it may well be worth paying for a TV advertising campaign in that country. Whereas if only 0.1% of those who use your site are located in that country, you are best off spending your advertising money in another place as you will save your company a fortune. Thus cleaning your data so they have data quality to give you insights such as who is using your website, where they are located and what their interests are is going to be highly cost-effective, seriously helping your executives making those difficult decisions because the business intelligence they are receiving is reliable, and indeed is quality intelligence.
You can read our introduction to using our API and then try out our service on your my filters page but, for this, you need to log in first. If you haven't already done so, you can sign-up using your email address, Facebook, Google or GitHub accounts. You may also view our video on data cleaning an EPG file, which also explains how to use our API.
We use the https protocol, guaranteeing your data are secure while they are in our care and that they are not accessible by any third party, a responsibility we take very seriously.
Here is a quick link to our FAQ. You can also check out our range of subscription packages and pricing. If you would like to contact us you can speak to one of our team by pressing on the white square icon with a smile within a blue circle, which you can find in the bottom right-hand corner of any of the web pages on our site.
If your data quality is an issue or you know that you have known sources of dirty data but your files are just too big, and the problems too numerous to be able to fix manually please do log in and try now