Are you using your dark data effectively?

A businesssman with blindfold missing out on all the valuable dark data

When dark data are of a high quality they can greatly help your business succeed, which is why data cleaning them with Spotless API solution makes so much sense.

What are dark data?

Dark data are those data that a company stores but which do not appear directly useful. Using the analogy of an iceberg, those data that a company uses are the visible tip of the iceberg, and the dark data are the roughly 90% of data which make up the rest of the iceberg. The term was invented by Gartner. Many companies store all the data that they have, either for legal compliance or simply because they feel that deleting them would be a waste of these potential resources, in spite of not precisely knowing what they are going to do with them. Businesses frequently fail to monetise these data, which have been described as a billion-dollar opportunity and as data that are kept 'just in case' but haven't so far found a proper usage.

When dark data are dirty or corrupted then of course such data are going to have limited value. They could simply be a security liability for your company without adding any value to it, but when the data have already been properly cleaned to a level of high data quality you can trust using Spotless Data's unique web-based API solution to ensure data integrity, then these dark data can be analysed and either used in business intelligence or sold on to others, for whom they will have a monetary value.

A Dark data example: IP location

For instance, a manufacturer of soft drinks which runs a popular website might think that, of all the data that they have, only those that are directly relevant to the marketing and sales of their soft drink products have any value for them. While they also store many other data, such as the IP location of their users, they fail to see how these "dark" data can also have value to their company. Yet if their data, properly cleaned to a high quality and then analysed, reveal that 7% of the users of their website are accessing their service from outside the country where they are located, in spite of the fact that the product is only directly sold to retailers within that country, these are in themselves valuable data, for instance, to those who target ads at users of soft drinks. These dark data could also be seen as an opportunity to think about marketing their product elsewhere. For instance, if 40% of users from outside the country where the company was located access their site from India, according to the IP location data, while only 4% came from the European Union, it would strongly suggest that a marketing campaign within Europe would have considerably less chance of success than one aimed at the Indian sub-continent.

Perhaps this particular soft drinks manufacturer is too small to be thinking of expanding into an international market. However, these same dark data could still be useful to a large, multinational which is thinking of entering the soft drinks market with a very similar product and wondering which would be the best regions in the world in which to enter the market first. They would probably pay good money to receive reliable information indicating that India was a better market for them than the European Union. The soft drinks manufacturer could also sell the dark data they have on to those 3rd parties interested in targetting ads to soft drinks users, for whom IP location and a whole range of other factors which the manufacturer has among its dark data about soft drinks consumers. They might be willing to pay for these data, always assuming that they were well scrubbed up and easily retrievable and analysable.

Other dark data examples

Other typical examples of dark data, which most websites store, but fail to utilize the value of, include the type of device one accesses the Internet from, typically a smartphone, tablet or computer; the web-browser the Internet is being accessed through, eg Chrome, Mozilla, Opera, Edge or IE, among others, and even more obscure or dark information such as the number of times users reset their password, which would be useful to a company which specializes in Internet and password security.

A well-known example of dark data which goes to waste is where companies have a feedback form which allows users to give feedback concerning their website or service but then they don't have the data structures in place which allow the easy analysis of these data, resulting in a failure to take on board and act on their users judgments and criticisms, whether positive or negative (both of which have value), that users make about their site or service.

Having your dark data cleansed to a high data quality you can trust in and well-structured will allow your automatic processes and employees to easily read and analyse them before extracting business intelligence of real value, whether directly for the company involved or to be sold on to 3rd parties. This will allow you to stand out among your competitors, and make you one of the winners in 2017. Sometimes this means updating your website to make these data more accessible. Ensuring the data validation of all your data is the key.

Please do sign up for our service using your email address, facebook, google or github accounts. Here is a quick link to our FAQ. You can also check out our range of subscription packages and pricing, and try it out with 500Mb of free data cleansing. You can also view our videos on cleansing an EPG file and cleansing a genre column which explain how to use our API. If you would like to contact us you can speak to one of our team by pressing on the white square icon with a smile within a blue circle, which you can find in the bottom right-hand corner of any of the web pages on our site.

Spotless Data, the One Stop Data Quality Solution API!

If your data quality is an issue or you know that you have known sources of dirty data but your files are just too big, and the problems too numerous to be able to fix manually please do log in and try now