Data quality is now achievable for the financial sector at minimum effort with Spotless Data's data cleaning solution
The financial sector has always needed to ingest and analyse large quantities of both structured and unstructured data to make the often complex decisions required within highly competitive markets where millions can be made or lost depending on the decisions taken. The changes the sector is currently seeing are the explosion in the quantity of the big data available and the use of artificial intelligence to analyse the data and then either submit them as reports to human decision-makers or to make the financial decisions themselves, using sophisticated algorithms. Both of these aspects are reliant on the data having data quality.
And there is no question that this usage of both big data analytics and artificial intelligence for making day-to-day financial decisions works. The result is that all financial organisations recognise the need to collect as much relevant data as they can from as many different sources as possible. However, how these data are found, captured, cleaned, stored, and analysed is by no means an easy issue to implement. This means that those companies which do best at processing their data are the ones emerging as the winners.
We started Spotless Data because we recognised the fundamental importance of having clean, quality data which have undergone data validation before they enter the repository where they can then be analysed and used as the basis of decision-making. We know from experience that all financial organisations need to address the data cleaning element of their own data processing. Without the data validation and data quality that only a really good cleaning of the data can provide, the data themselves will not do what is required of them, regardless of how well they are captured, stored and subsequently analysed.
Even the best financial algorithms and analytics software available will struggle to produce useful results if the data they are dealing with are rogue data, full of corruptions, inaccuracies and inconsistencies. This is regardless of whether these are in the data themselves or in the metadata which structure the data within the data repository (typically a data lake or perhaps a data warehouse).
The data which financial organisations have can be divided into two types: structured data which comes from their databases and unstructured data, which is essentially everything else. Some organisations are using increasingly vast amounts of historical data (such as the historical price of stocks and shares data) combined with complex mathematical models as a basis from which to make current decisions. They also use social media data to gauge customer needs and consumer confidence. Inevitably, given the business of financial organisations, a lot of these data will be numbers, mostly matched to words, so a long list of numbers of the price of gold on particular dates are two columns of numbers which are associated with the word gold. We specialise in the data validation of numbers as we know how important it is to get numbers right every time and to ensure that the right words always matched the right numbers. Thus we ensure that the number reflecting today's price of gold is not accidentally mismatched to the price of silver or IBM shares.
Pricing information contains a mixture of names and numbers, typically in different columns. Therefore pricing information is highly vulnerable to mistakes, usually either where a number itself is wrong or where a number is assigned to the wrong name. It may be that if a number is assigned to the wrong name, for instance, if a number is missing and instead of being left blank the numbers in the number column move up to adjust, resulting in the number and name columns not being paired correctly and resulting in many inaccuracies occurring. And financial institutions cannot afford even one mistake in the data as this can result in poor decision making, and loss of profits in a situation where they should have been making money were it not for the dirty data. Where multiple entries are incorrect, the situation compounds itself, and the result is likely to be terrible decision making and a massive loss of potential profits.
Spotless has a particular focus on number data validation as we realise how important it is to get these data right.
There are many excellent web scrapers on the market to make web scraping itself a reasonably seamless process. However, because these are unstructured third-party data, there is no guarantee that the data will be accurate. Financial organisations are unlikely to be solely scraping from one source. This inevitably means there will be a lack of consistency in the data structure, because even two scraped feeds will be from websites which use their own meta tags and also use different words to describe the same thing- So while one website might talk about data sets another may talk about datasets. Simple errors will also appear, often where one site appears to contradict another site due to errors in one of the sites. However, if the scraper is scraping twenty websites and the error only appears in one or two of them, it is relatively easy for Spotless to point out the error. This is not because of any awareness that say the price of gold in February 2015 is mistaken in a particular but simply because 18 sources give the price as one figure and only two give a different, mistaken price (where perhaps one has made the error, and another website has unknowingly copied that error).
While retail and sales data are fairly typical of useful information for financial institutions, the sheer variety of data that nowadays use in making their decisions would surprise many. For instance, while a credit history is used to decide whether to offer a loan to a business, nowadays social media may also be used to try and gauge the popularity of the business, alongside its sales record, expenditure in marketing, etc, in order to decide whether the loan should be made or not. Trying to understand what people are thinking can have great value to financial institutions so the banks which can successfully predict who is going to win the next election in a country can then make better financial decisions than their rivals who fail to predict such an event accurately. In the past banks and other similar businesses would rely on opinion polls though nowadays not only are these often wrong, but an excellent analysis of social media data such as Facebook and Twitter can often give a more accurate picture, not merely of political trends but also of consumer confidence or lack thereof. What this means in practice is that the more data these groups can gather together the better, especially given the powerful ability of artificial intelligence to extract meaning from data so vast that no team of humans could hope to analyse them unaided.
While the structured data the banks have will be relatively easy to both clean and then analyse the same is clearly not true for the unstructured data which is equally important that they have. Fortunately, with Spotless the solution to this problem has never been easier, with our simple-to-use data cleaning API that is available through a web browser and can be used on any device.
When you initially submit your data to us, you will immediately receive a report on the data and suggested specifications for cleaning them. Once you been using our system a while, and with sources of dirty data that have also been submitted to Spotless various times, the whole process becomes routine and can even be automated. However, the first time you receive a report for a file of data such as pricing information it is essential to study the report as it will indicate where the dirty data issues are.
You can read our introduction to using our API and then try out our service on your My Filters page but, for this, you need to log in first. If you haven't already done so, you can sign-up using your email address, Facebook, Google or GitHub accounts. You may also view our video on data cleaning an EPG file, which also explains how to use our API.
We use the https protocol, guaranteeing your data are secure while they are in our care and that they are not accessible by any third party, a responsibility we take very seriously.
Here is a quick link to our FAQ. You can also check out our range of Subscription Packages and Pricing. If you would like to contact us you can speak to one of our team by pressing on the white square icon with a smile within a blue circle, which you can find in the bottom right-hand corner of any of the web pages on our site.
If your data quality is an issue or you know that you have known sources of dirty data but your files are just too big, and the problems too numerous to be able to fix manually please do log in and try now