High-quality data is the first step towards actually protecting the private information of your customers.
The requirements demanded by authorities for all businesses who have users with personally identifiable information (PII) throughout the world to comply with their ever more demanding regulations has been one of the news stories of 2016. As cases involving Apple, Amazon and WhatsApp in Brazil have demonstrated, even the largest companies can have serious legal problems when governments want to access the data of individuals using their platforms. Failure to comply with these regulations can result in authorities shutting these platforms down if they do not meet the legal requirements in any one of the countries in which they operate. An example of a new and increased regulation is the General Data Protection Regulation (GDPR), through which the European Union intends to strengthen and unify data protection for individuals within its borders.
If you are the owner of a communication-based platform, you need to make sure that you have accurate and duplicate-free data quality which you can trust concerning the PIIs of all your users in case legal authorities come knocking at your door. In practice this means all your data need data validation. For example, if, entirely unknown to you and your employees, people are using your platform as a way to launder money from the buying and selling of illegal drugs, and then the police come along, rightly demanding that you give them the PII data you have about the individuals whom they suspect of illegal money laundering, it is no good saying to them, "well we would love to help you out but unfortunately our data are corrupted, inaccurate, full of duplications and generally too dirty to be of any use to you, and so we are unable to help." The response of legal authorities is likely to be to shut your platform down for non-compliance until you can fix the problem, i.e. until you can make sure that your PII data are of sufficiently high quality, that is, clean and easily accessible.
Spotless Data, in its recent new version 5 release, has developed a new kind of solution, known as a session solution. This is to help make sure that the data cleaning of PIIs runs more smoothly in your business, and thus making sure that your data have the data integrity whereby they seamlessly and effortlessly allow your company to comply with your legal obligations. Many companies operate online, which means that their products and services reach people all over the world, but they do not have the resources to employ lawyers in all the countries they operate in; so the best protection which any online business, which deals in large quantities of data, which authorities may want to access, can do is to make sure that said data are quality data that are clean and easily retrievable and readable, for example by using a single customer view. Imagine the alleged money launderer in the above example is known as HT123 in your database but, because your dirty data contain unplanned duplications, you have identified eleven different users in your database known as HT123. You cannot be sure which one the police are referring to, simply because your dirty data mean you cannot distinguish the one guilty HT123 from the ten innocent ones. Even ignoring the fact that the police do not want to have to investigate eleven people in order to catch the perpetrator, creating eleven times the work for them, the problem you face is actually much worse, due to the fact that these ten of your users, who are completely innocent of any wrongdoing, would be subject to an unnecessary police investigation. If this investigation then became public knowledge, perhaps through a leak from a serving police officer disgruntles at your dirty data, nobody who is honest would then want to use your service ever again, as law-abiding users do not want to face unnecessary police investigations, which might affect their reputations, solely because your data are dirty through being corrupted by unwanted duplications.
The only solution to this potential but all too serious problem is to make sure that the data in your company have been properly cleansed to a high quality already, and thus contain no unplanned-for duplications, and are easy to analyse. Then you could say to the police, "this is the user you are looking for and these are the data that we have about her or him". From your point of view that will be the end of the story, though the fact that the legal authorities in the country where this imaginary money laundering investigation is taking place are satisfied with your data and your response to their legal request can do no harm to your business whatsoever. You honest customers, if they were to hear about the case, would be reassured that your platform is one they can use safely. Therefore the best way to stay on the right side of the law and to have data which have been properly scrubbed is to use Spotless Data's unique web-based data quality API solution in order to ensure that your company stays ahead of the pack when it comes to legal compliance.
Please do sign up for our service using your email address, Facebook, Google or GitHub accounts. Here is a quick link to our FAQ. You can also check out our range of subscription packages and pricing, and try it out with 500Mb of free data cleaning. You can also view our videos on data cleaning an EPG file and data cleaning a genre column which explain how to use our API. If you would like to contact us you can speak to one of our team by pressing on the white square icon with a smile within a blue circle, which you can find in the bottom right-hand corner of any of the web pages on our site.
Spotless Data, the one stop data quality API solution!
If your data quality is an issue or you know that you have known sources of dirty data but your files are just too big, and the problems too numerous to be able to fix manually please do log in and try now