14 causes of data quality problems part 1

A businessman controls a huge variety of data in his hands thanks to them having data quality

High-quality data so spotlessly clean is the goal of all data-driven businesses which is why we have developed our Machine Learning filters to remove rogue data.

Part 2 of this blog.

Spotless Data has identified 14 different causes of poor data quality, which is data which your company and your customers are unable to trust. Given that poor quality data can cost businesses 20-35% of their operating revenue, ensuring your data quality is dependable has never been more important.

1. Skill shortages

Data scientists spend 60-90% of their time data cleaning, according to Forbes, and can charge more than $100,000 a year in the USA, due to shortages of them. Also, many businesses, especially those manufacturing internet of things (IoT) products, are now having to deal with large quantities of critical data outside the manufacturing process for the 1st time. Marketing and sales teams which lack data skills may also fail to launch and follow through successfully on a data-driven marketing campaign.

2. Integration of data sources

Many businesses have data from various sources, such as their own automated process, User Generated Content (UGC) and data from multiple third-party sources, each of which may be stored in different file formats. Also, when updating one's platform or moving from a legacy platform to a new one, there may be a need to change the format in which the data are stored. Integrating these data sources in different formats is a major cause of poor quality or dirty data, full of inaccuracies and corruptions. Now by using Spotless Data, you can guarantee data integrity.

Ensuring the smooth integration of these different formats can be a major challenge for businesses. Data integration from different sources is a good example of where your company needs our unique web-based data quality API solution to ensure your company truly can rely on the quality of its data.

3. Technical faults

No technology is perfect, or better than the people who designed it. Whether you are developing your own data models or using other people's, faults produced by the technology itself are not uncommon, and some of these may be beyond the control of your company. However, the solution is identifying these faults and ensuring that they are directly fixed or that a workaround solution is implemented so that all your different data have data validation.

4. Using data for new purposes

While traditional definitions of data quality define it as being fit for the purpose for which it has been designed, Spotless Data recognizes that, in a rapidly changing world, the real test of data quality is whether the data are fit for new purposes for which they were not originally designed. Machine learning is where the software learns for itself as it is used, but if the data quality is poor and only suitable for purposes for which the data were designed, machine learning won't be able to do anything new with the data. Nor will your company.

5. Manual entry of data

It is estimated that 58% of errors in data are caused by simple typos or spelling mistakes. While you can teach your employees to enter data in a particular way you cannot guarantee that they won't make mistakes in their data entry, at least once in a while. Recording a phone number as 1324 instead of 1234 is easily done, especially if one is manually entering phone numbers all day long, but could have negative consequences if this results in unintentional cold calling, illegal in many countries. Not to mention the wasted resources in calling the wrong person instead of the right person. This issue is exacerbated with user manually entered content or UGC. Hurried or poorly educated users often make typos by mistake but a smaller number do so deliberately, eg giving the same address formatted in two different ways, perhaps hoping to fraudulently benefit from some special deal your company is offering.

6. Software updates

Also known as patches, software updates are done by most competitive businesses at least several times a year, both to fix vulnerability updates but also to improve the functioning of the software. These updates can sometimes fail due to poor quality data or to unexpected factors which those updating the software had not taken into account. This is essentially an issue of poor data management.

7. Inadequate testing time

Whenever changes are made to one's platform it is important to test that the data are still functioning to a level you can trust. Yet if your platform has large quantities of complex or big data and limited resources for testing the data, eg only being to test on a live site or only having the time to do automated but not manual testing, these can result in corruptions, making the new release worse or more vulnerable than what it has replaced.

So, paradoxically, inadequate testing time can result in needing more time to finish the project successfully. For instance, if you have just updated your website but only checked the new version on Mozilla Firefox's web browser, there may be formatting issues occurring when the site is displayed on Google Chrome. With half a dozen major web browsers and a whole range of devices used to access your platform and unless you properly test all devices and all web browsers, you may find that your new release works perfectly in smartphones and PCs but looks just awful in Apple Macs and tablets.

The API solution

While some of these issues clearly overlap with each other, and require different solutions, ensuring that your data are quality data through using a Data Quality product, such as Spotless Data Quality API, is at the heart of ensuring that poor quality data do not negatively affect your business or impede its pursuits of its goals. The companies with data they can trust are likely to be the winners.

Take a look at our introduction to using our browseable API. You can also test our API on your my filters page though you will need to be logged in first to see this. You can sign-up using your email address, Facebook, Google or GitHub accounts.

You can also view our videos on data cleaning an EPG file and data cleaning a genre column which explain how to use our API.

Here is a quick link to our FAQ. You can also check out our range of subscription packages and pricing. We are offering 500Mb of free data cleaning to all new customers so you can try out our solution for yourself. If you would like to contact us you can speak to one of our team by pressing on the white square icon with a smile within a blue circle, which you can find in the bottom right-hand corner of any of the web pages on our site.

Spotless Data, the One Stop Data Quality Solution API!

If your data quality is an issue or you know that you have known sources of dirty data but your files are just too big, and the problems too numerous to be able to fix manually please do log in and try now