Data quality prediction for 2017

Road sign illustrates data quality the way for accuracy, integrity and validity

Spotless Data's prediction for 2017 is that data quality is going to be a new buzzword!

There is no doubt both that data itself has been one of the key themes of the last few years, and that the other popular tech subjects of the moment, such as artificial intelligence, machine learning, robotics, the internet of things, and big data, are all related to data, without which none of these things can function. Yet to not merely function but work at doing what they are supposed to do, the data that underpins these cutting-edge modern tools, but also any data used by any company or organisation, need to be data quality.

What are data quality?

Essentially they are data which are fit for the purpose for which they were intended. If you have a list of email addresses for your latest mailshot, only those which arrive in the inbox of the person they are intended for can be considered quality data, regardless of whether the owner of the inbox then deletes your email unread. If you are a charity asking for sponsorship for your latest project it is not enough that the emails in your list of email addresses arrive in the inbox of somebody. If the email is jblogs@ and you accidentally send it to jjblogs@, which is also a real email address, this is poor quality data and one that you will not discover if you check through all the emails that have bounced from your mailshot. Jblogs will be annoyed when his friends share about the email they received from your charity as he would have liked the opportunity to donate himself but the email never arrived in his inbox. As a result, he may decide to support another charity instead.

To give another example of the pitfalls of poor quality data, if your company sells timeshare apartments through telephone sales in a country with strict rules about cold calling and your sales team targets jblogs, who has already expressed an interest in timeshares, perhaps by leaving his phone number on your website, then if  jjblogs receives the call from your salesperson trying to sell him a timeshare flat because the poor quality of your dirty telephone numbers data has resulted in your staff calling the wrong phone number, jjblogs may report your company to the authorities for illegal cold calling. Or simply share what happened with his 4,000 facebook friends and 20,000 twitter followers, accusing your company of being a cold caller company. Before you know it your company's website has been blacklisted by anti-spamming/cold calling security companies. No legitimate business wants to be accused of cold calling as such an accusation could ruin your company's reputation, especially when the allegations are not true and the real problem is in your dirty, poor quality data.

These examples illustrate why it is so important that your data is quality data, fit for purpose. There is no question that the best way to ensure that your data are properly scrubbed up and error-free is through Spotless Data's unique web-based data quality API solution.

How does Spotless make sure your data quality is fit for purpose?

When dealing with any particular job data cleaning your files, Spotless Data uses the https protocol to ensure that from the moment you send your data, Spotless Data will use encrypted storage in order to ensure that your valuable data are completely secure from any kind of hacking or cyber-attacks during the time they pass through Spotless until they are safely back in your repository. We guarantee that no third party will be able to access them. There are three ways of signing in; you can create a user account using your email address in Spotless itself, or you can sign in directly from your Facebook, Google or GitHub accounts.

The data cleaning process to make sure your data is high quality is based on solutions, of which there are four fundamental types.

(i) Regex or regular expression solutions to check what is present in a specific field

(ii) Duplication solutions to remove duplications from your data

(iii) Reference solutions to check your data against a reference database, e.g. for addresses or for using Tableau.

(iv) Session solutions to cleanse gaps and overlaps in a session data

You may need to use several solutions, assigned from the Spotless Data solutions database, on any one dirty file to ensure its spotless data quality. The solutions are themselves designed to be self-learning through Spotless Machine Learning filters, which in practice means that the quality of the solutions themselves improve each time that they are used. As your particular data problems may be unique or unusual, you can also write your own solutions and then have your data cleaned following your particular solutions. On the other hand, if your data is already clean the file will be sent back to you without any modifications.

Aware as Spotless is that you need to clean your data quickly, especially when the data changes over short periods of time, we make sure that your data cleaning is done over a wide spread of server farms to ensure distributed job processing. Recognizing that automated processes have their limitations when it comes to spotless data quality, Spotless has devised its own work-flow system. Thus if the automated systems flag a problem with a particular cleaning process, that this flag is then subjected to manual review by Spotless Data's team of data scientists where it is then either fixed or can be escalated if the problem is not easily fixable or if there are critical issues involved.

So if you want your data quality to be fit for purpose, get started with Spotless Data's unique web-based API solution today. We guarantee you'll be impressed.

Data quality with the Spotless Data solution

You can read our introduction to using our API and then try out our service on your my filters page but, for this, you need to log in first. If you haven't already done so, you can sign-up using your email address, Facebook, Google or GitHub accounts. You may also view our video on data cleaning an EPG file, which also explains how to use our API.

We use the https protocol, guaranteeing your data are secure while they are in our care and that they are not accessible by any third party, a responsibility we take very seriously. If you would like to contact us you can speak to one of our team by pressing on the white square icon with a smile within a blue circle, which you can find in the bottom right-hand corner of any of the web pages on our site.

Here is a quick link to our FAQ. You can also check out our range of subscription packages and pricing.

If your data quality is an issue or you know that you have known sources of dirty data but your files are just too big, and the problems too numerous to be able to fix manually please do log in and try now