Big data analytics part 1

Illustrating all the areas where big data analytics can help digital transformation

Big data has so many different aspects, but all of them need to be quality data to be useful rather than worse than useless

You can also read part 2 of this big data analytics blog.

With the explosion of data that the world is seeing big data have become a regular part of modern life and are required for SMEs and large companies to thrive and fully engage with their customers; though this can only happen by converting said big data into useful analysis. This data explosion is perhaps not such a challenge for IT companies as it is for the rest of the business world, due to lack of expertise and experience with big data. By definition, they are too large to simply get a team of business analysts to analyse them manually and then extract those ongoing insights which all businesses really do need not merely to stand out among their competitors but to even keep up with them.

The business analyst perspective

Looking through the eyes of business analysts, they require big data to be in a format where they can easily access and understand them, typically on their computer screens, without having to have the technical knowledge of a data scientist who brings a completely different set of skills to the table. It is for the data science team to produce the business analysis and it is for the analysts to read and understand this information and then make decisions based on their reading of the analysis. The analysts should not need to know how the big data was transformed into useful business intelligence and the data science team should not be required to be able to analyse the data and turn them into those practical insights which are the end goal of the whole process of big data analysis. However, it is for the analysts to define and communicate with the data science team what they require in terms of business analysis.

Analysts require data that are up-to-date and not already out of date by the time they read them. It is simply not practical for data scientists to spend their entire lives working to ensure the latest data can be analysed. So automated processes instigated by the data scientists and then accessible in real-time by the analysts are required so that yesterday's sales figures and transactional data are available to the analysts when they arrive at work today without the data scientists having done anything recently to help them.

Part of this automated process needs to be ensuring that the data are cleaned to the point where the company have data quality which can be trusted. The best way to ensure this is for the data science team to integrate Spotless data quality API solution as an essential automated element in turning the big data into useful insights. Data which are free from corruptions and inaccuracies such as duplications and mismatches are a pre-requisite for ensuring that the analysts have valuable insights and not ones based on dirty data, making their job impossible.

Structure required for business analysts

The goal of business analysis is to create what is known as "analytic-ready data" where there are commonly agreed definitions about basic terminology, for instance, how to define what a customer is. Are customers solely people who have bought a product or service already or does the company define anyone who has expressed an interest in buying said products and services as a customer? And how is "expressing an interest" defined? Is it someone who has given the company their email address or simply someone who has browsed their website? The specific answers do not matter. What does matter is that everyone in the same company has the same answer, predefined by the analysts who then work with the data scientists in setting up the whole automated process of turning big data into material that the analysts can then manually review to extract the required insights.

Wrong assumptions or faulty analysis can also produce inaccurate results. For instance, if a company specialising in selling ice-cream creates a special offer for their product during July and August but the offer is only available to customers they already have, then the ability to accurately measure whether the special offer has been a success, and therefore worth repeating, is highly important. If the offer has been a success, but due to poor data quality and poorly defined assumptions the analysts wrongly conclude that it has been a failure they are unlikely to repeat such an offer. This would be a shame if actually the offer was a great success for the company's customers but the rogue data and the misreading of the data made it appear that the offer was a failure. This is a scenario many companies have faced, with the analysts often never realising that the offer was actually successful and certainly should be being repeated this summer.

One cause of a faulty reading of the data, in this case, might be that the analysts have concluded that the project is too open to dishonest individuals being able to take advantage of the offer when they should not be able to do so. Whereas in fact the "fraudulent individuals" simply belong to the same household as the people being targetted and therefore are not being dishonest at all. For instance, husbands on holiday and thus able to make the household purchases during this time and taking up the offer which was open to their wives who, during most of the year, are the ones who make the actual purchases, are not dishonest purchasers. Yet the analysts need to have the right data at their fingertips to see this, such as the data on the household demographics of their customers and an accurate address database.

Using Spotless Data

An introduction; you can also view our videos on data cleaning an EPG file and data cleaning a genre column which explain how to use our API.

We are giving away 500Mb of free data cleaning so you can test our service and see how well it actually works. We guarantee that your data remain secure and not available to any 3rd parties during the time they are in our care. If there are issues with the data, an automated flag alerts our data science team who will review the issue manually and, if necessary, contact you via your log-in details to discuss the problem.

Please do sign up for our service using your email address, Facebook, Google or GitHub accounts. Here is a quick link to our FAQ. You can also check out our range of subscription packages and pricing. If you would like to contact us you can speak to one of our team by pressing on the white square icon with a smile within a blue circle, which you can find in the bottom right-hand corner of any of the web pages on our site.

If your data quality is an issue or you know that you have known sources of dirty data but your files are just too big, and the problems too numerous to be able to fix manually please do log in and try now