14 causes of data quality problems part 2

The huge variety of big data require data cleaning for data quality

Fixing a wide range of data quality problems has never been easier, thanks to Spotless Data's Machine Learning Filters.

Continuing our blog on data quality issues, numbers 1-7.

8. Operational issues

These are issues which arise in the normal day-to-day functioning of a platform such as a website, and including security issues, data quality issues, issues with wastage and delays, and with the overall management control of the platform. Poor data quality due to lack of data validation of the data which gets in the way of the smooth functioning of your platform in its normal day-to-day functioning is itself an operational issue.

9. Legacy technologies

As technology evolves so the technologies one is actually using become redundant and need replacing with new up-to-date technologies. The process of transferring the data that you were storing in these legacy technologies into your new platform and ensuring their data integrity can create big issues with data quality. Even small mistakes in your data, which did not matter in the legacy platform, can render the data useless in the new platform. If a lot of your material is stored in excel files with the format of .xls and then the technology used in your new platform only reads in the more up-to-date .xlsx format, then transferring the data from the legacy platform is going to be a difficult task and may well deplete the quality of the data that worked so well in the legacy platform unless each file is updated. Yet failing to update your platform is the perfect recipe for allowing your competitors to perform better than you and attract the majority of potential customers to their updated platforms. On the other hand, if your data still works with the new technology it is a good sign that you have quality data.

10. Poor change control

This is when the data change over time, such as in a TV listings service, where the entire data change week by week, or in online shopping where prices may change on a daily basis; eg a European online supermarket having to raise the price of broccoli due to a weather-inspired shortage of these vegetables causing their real price to rise. If these changes cannot be incorporated in a smooth way so that the updated data is trustworthy it could mean a loss of TV viewing customers or losing money because the broccoli is being charged at a loss-making price.

11. Synchronization failures

These occur due to poor quality data in one area of your system which, because various records need to work together in a coordinated manner, effect all the other processes, and indeed the system as a whole. What is required is to establish consistency among the various parts and to ensure that this consistency is not affected by updates, new releases or new uses of the system. A failure to update the varying prices of the vegetables for sale on your online retail website due to poor synchronization, where the new prices are available to your company but are not showing up on the website, can be an expensive synchronization failure.

12. Incompatible data formats

The most common case of incompatible data formats is trying to play audio or video files where the file format will not work with the particular hardware. This incompatibility is especially prevalent when businesses import a lot of 3rd party data, which may be in file formats that are incompatible with the hardware on your company's platform, or when transferring from a legacy platform.

13. Hardware failures

While most errors caused by the technology being used are software failures, even the hardware has been designed by somebody, and that person may not have foreseen the uses your company wishes to implement with said hardware. This is particularly relevant in the internet-of-things (IoT). Your guarantee to your customers that their fridges will not spoil the food due to your detection systems should be able to prevent a broken fridge from causing problems but only if there are no hardware issues in the detection systems themselves!

A different kind of hardware issue might be a driverless car which has a flat tyre. Unless your data is of sufficient quality to be able to detect that the particular problem with the car is caused by the flat tyre, your data quality has failed to identify the hardware issue. It is thus unable to take the appropriate automated steps to fix the issue, which in this case would mean stopping the car so the problem can be fixed manually.

14. Poorly trained personnel

Poorly trained personnel can have all kinds of negative effects on the quality of your company's data. For instance, staff who are not data scientists and who are implementing a new release to your platform and then make mistakes in implementing the update. This then renders the data that you have in your platform as completely useless or containing enough errors so that the platform does not do what it is meant to do. Marketers with no experience and a poor understanding of data may make a complete mess of your new data-driven marketing campaign. Whereas a week spent with a data scientist on a training course learning about the benefits and drawbacks of data quality and how to manage the data may well make all the difference between a successful and a failed marketing campaign.

The API solution

While some of these issues clearly overlap with each other, and require different solutions, ensuring that your data are quality data through using a Data Quality product, such as Spotless Data Quality API, is at the heart of ensuring that poor quality data do not negatively affect your business or impede its pursuits of its goals. The companies with data they can trust are likely to be the winners.

Take a look at our introduction to using our browseable API. You can also test our API on your my filters page though you will need to be logged in first to see this. You can sign-up using your email address, Facebook, Google or GitHub accounts.

You can also view our videos on data cleaning an EPG file and data cleaning a genre column which explain how to use our API.

Here is a quick link to our FAQ. You can also check out our range of subscription packages and pricing. We are offering 500Mb of free data cleaning to all new customers so you can try out our solution for yourself. If you would like to contact us you can speak to one of our team by pressing on the white square icon with a smile within a blue circle, which you can find in the bottom right-hand corner of any of the web pages on our site.

Spotless Data, the One Stop Data Quality Solution API!

If your data quality is an issue or you know that you have known sources of dirty data but your files are just too big, and the problems too numerous to be able to fix manually please do log in and try now