The importance of data quality for data security

Padlocks for data security function best with clean data quality

The best first step towards ensuring data security is to use Spotless Data's machine learning filters.

Data security has never been so important as it is today. In the European Union, the soon to be passed General Data Protection Regulation (GDPR) is particularly strict in terms of legal compliance, demanding that any organisation will need to protect the data of its users under this law. Any business with a website on the World Wide Web (Internet) will also have to comply with the many other data and privacy laws throughout the world to legally operate.

Why data security requires data quality

The first step towards achieving data security is ensuring that the data entrusted to one's organisation are trustworthy, quality data. This of itself is a great reason why all organisations would benefit from using Spotless machine learning filters, which root out rogue data with the result being data that are spotlessly clean.

A journalist from the UK's Guardian newspaper recently asked the dating site Tinder for the data they had about her as she had been a user there for a while. They sent her 800 pages of data, all based on her interactions with this one site. Ever tighter regulations are making it easier than ever for individuals (and not merely journalists) to demand that they too promptly receive whatever data your organisation has about them in a format where they can easily access and understand said data. So all organisations need to have the data quality which will allow them to seamlessly and swiftly fulfill this obligation with the minimum of effort on their part. Excuses that the data are just too difficult to retrieve or to present in an easy-to-read format are unlikely to be received kindly by any judges, should your organisation be unfortunate enough to be unable to complete this what is a primary demand in the modern world.

One of the keys to achieving data security is to classify data according to importance. For instance, your latest algorithm or secret manufacturing process details, such as a food recipe, or indeed any intellectual property, is likely to require a higher level of data security than statistics about the number of visitors to your website. Yet if your data are chaotic and badly organised because the data quality is not there, it is hard to know which data require greater security and even harder to assign these more valuable data the higher security levels they require.

Why data breaches require data quality

Another important area when it comes to data security is data breaches. This is something we at Spotless are very aware of and take very seriously indeed when it comes to the data you upload to our browseable API, as we know how important, valuable and private any data you send to us are.

We also know that the best way to avoid a data breach is to understand the data you have, and, to do this, you need to have clean, quality data. There are a number of reasons for this. One is that you may simply have too much data. While many organisations have lots of dark data which they are failing to use, distinguishing between useful dark data and useless dark data, and getting rid of the latter, is one way to slim down your data. Another is to ensure that you get rid of out-dated legacy data which you do not need. The less data you have the easier it is to protect and by making sure that you have quality data it is much easier to make sure that all the data you store are needed data.

Quality data which has undergone data validation is much easier to monitor. If your data lake is actually a mud pit and then you put systems in place to set off alarms for you believe that a data breach is taking place, you may be overwhelmed with false positives. These will not only waste your organisation's time and energy but may mean you end up taking the alarms less seriously than you should. This could prove disastrous when a real data breach occurs. With good data quality in place, it makes it easier to detect any real or potential data breaches and making false positives much less likely to happen. If a successful data breach does happen you haven't merely let criminals access and possibly even destroy your data, you have also badly damaged your brand reputation, possibly fatally so. Getting the data quality in place today to avoid this painful scenario may well be the most important thing your organisation can do.

On using Spotless Data's machine learning filters

Here is our introduction to using the Spotless API. You can try using our service on your my filters page but you need to log in first. You can sign-up using your email address, Facebook, Google or GitHub accounts. You may also view our videos on data cleaning an EPG file and data cleaning a genre column which both explain how to use our API.

We are giving away 500Mb of free data cleaning to each newly signed-up customer so that you can see for yourself how seamlessly and swiftly our API filters work at data integration.

We guarantee your data are secure while they are in our care and that they are not accessible by any third party, a responsibility we take very seriously.

Here is a quick link to our FAQ. You can also check out our range of subscription packages and pricing. If you would like to contact us you can speak to one of our team by pressing on the white square icon with a smile within a blue circle, which you can find in the bottom right-hand corner of any of the web pages on our site.

If your data quality is an issue or you know that you have known sources of dirty data but your files are just too big, and the problems too numerous to be able to fix manually please do log in and try now