Exploring artificial intelligence

Building an artificial intelligence brain for data quality

Artificial intelligence can only do new things when the data which underpin it are quality data.

Many commentators believe the great majority of businesses will adopt some form of artificial intelligence (AI)  in the coming years as this once rare technology goes mainstream. Estimates forecast that 62% of enterprises will use AI by 2018, at least in part driven by the need to successfully analyse their big data and in the hope that their business will perform better as a result. Those businesses which do not take on board these new technologies will likely be the businesses set to fail and to cease to exist, enhancing the trend towards AI adoption.

A fundamental pre-requisite to having successful artificial intelligence, and the machine learning which accompanies it, is having data quality which you can trust in, as the following examples illustrate.

Hedge fund use of AI - An example

For instance, for a fintech organisation such as a hedge fund, having an understanding of the trends in modern finance and economic performance is essential to making the buying, selling and "hedging their bets" decisions so critical to the success or failure of their organisation. Failing to see and accurately predict how events such as Brexit will affect the financial markets could result in a major loss of profits through making the wrong financial decisions.

Let us imagine that there are approximately 300 hours worth of material (i.e. data) relevant to fintech organisations becoming available every working day, from reports of financial institutions and credit rating organisations to monitoring of prices on the stock exchange, available corporate accounts and media reports. Given that a hedge fund manager spends a lot of their time making transactions it is unlikely that they will be able to spend more than 4-5 hours a day reading and absorbing these reports. Nor can they delegate others to read them instead as an employee then conveying what they have read to their boss is likely to be almost equally time-consuming.

For an artificial intelligence programme, reading and understanding 300 hours worth of financial data on a daily basis and then making decisions based on their understanding of these data does not represent an insurmountable challenge because they work so much faster than a human can. For this reason, many hedge funds are already using artificial intelligence applications, because they make better decisions than their human counterparts.

The importance of data quality

However, there are issues regarding data quality, and an AI programme will only be as good as the data which it examines. And when the data come from multiple sources, which in this hedge fund case is most certainly the case, given that the 300 hours worth of relevant data produced each working day will come from a very wide range of different sources, then mismatching will be a particular problem.

For instance, let us say one report the AI is reading talks of the price of apples and another talks about the issues facing the well-known company of the same name. It would be very easy for the AI software, but almost impossible for a well-educated human being, to confuse the two and think that the problems Apple plc faces in this quarter are going to affect the price of apples! And for a hedge fund that, spreading its bets, makes investments in both the IT world and in agricultural produce, such a mismatch could have profoundly negative consequences. It could, for example, mean a failure to invest in apple producers at just the moment the price is going up and for no good reason other than a correct but misplaced assumption about the possible lack of growth of the digital technology manufacturer.

Simply getting the AI software to distinguish between Apple and apple or Apple and Apples won't work either as then when the AI reads "Apple production fell by 20%" it may conclude that the hedge fund needs to sell Apple stock rather than buying apples, which would (presumably) be the correct conclusion. This is but one example though in reality there may be hundreds of different mismatches occurring, all of which could cause the hedge fund problems if they did not correct the mismatches were not before making any important decisions.

There are two possible solutions for this issue; one is to hire a team of data scientists, at great cost, train them up in your organisation's requirements and then set them to work in creating the automated processes for data cleaning your data. The other solution is to use Spotless cost-effective data quality API Solution, which can be built into your automated processes to ensure clean data all the time, as mismatches is an area in which Spotless already specialises, as so many of our customers have had problems here.

Fake news

Fake news can also be problematic for AI, especially when involving controversial issues where people have strong political opinions, and a small minority feel the need to create fake news to put their point of view across more strongly. So an individual or small group who oppose Brexit might create fake news stories giving dire economic warnings that are simply not true, and of course the same could occur with pro-Brexit fake news stories creating an unrealistically optimistic view of the economic consequences of Brexit. Neither are going to be any good to a hedge fund, which requires real and accurate information about future economic indicators and not fake news which is driven by people with an ideological bone to pick and who aren't actually interested in real future economic indicators but in simply pushing their agenda, particularly in the run-up to an election or a referendum.

Arguably the best ways to spot fake news are to examine rigorously the sources of any news stories, to examine the texts for typos and grammatical inconsistencies and to seek corroborating evidence. Generally speaking, any news stories will be picked up by multiple news sources, who will themselves be concerned with avoiding the fake news, even if they are ideologically driven newspapers.

So if 19 stories claim Britain's economy is going downhill (or uphill) and only one or two are claiming the opposite, it is likely that those which contradict the majority will be the fake news stories. And humans get fooled by fake news as surely as AI programmes. This is especially so if it is what they want to hear, or if it coincides with a belief that they already have, such as a hedge fund manager believing that Brexit will make (or break) the UK economy, or simply wanting to believe the economy is running into problems because that would mean greater profits based on the decisions the hedge fund has made in recent months.

AI, because it works so much more rapidly than humans and doesn't get bored with repetition, will be able to read all 20 news stories on the expected short-term future for the British economy and reject those which appear out of step with the majority. Whereas if a human hedge fund manager were to do the same they would likely not have time to do anything else.

On using Spotless Data's API solution

An introduction to using our browseable API. You can also view our videos on data cleaning an EPG file and data cleaning a genre column which explain how to use our API. You can sign-up to Spotless Data using your email address, Facebook, Google or GitHub accounts.

To help you demonstrate for yourself how well our API works we are giving away 500Mb of free data cleaning. We guarantee your data remains secure and not available to any 3rd parties during the time they remain in our care. If issues with the data cleansing process do arise, an automated flag alerts our data science team who will then review the issue manually and, if necessary, contact you via your log-in details to talk about the problem.

Here is a quick link to our FAQ. You can also check out our range of subscription packages and pricing. If you would like to contact us you can speak to one of our team by pressing on the white square icon with a smile within a blue circle, which you can find in the bottom right-hand corner of any of the web pages on our site.

If your data quality is an issue or you know that you have known sources of dirty data but your files are just too big, and the problems too numerous to be able to fix manually please do log in and try now