Data cleaning of healthcare data

A female doctor using a tablet to work with healthcare data

When doctors and healthcare IT specialists have clean, quality data to work with then the data can help save lives and cure illnesses.

Modern technologies that involve large quantities of data are increasingly used in medicine and healthcare to help us all live longer and healthier lives, while helping cure intractable and horrible illnesses like cancer and Parkinson's. However, getting this data to work in a seamless fashion can be very challenging. While Spotless cannot solve all your healthcare data issues, we can very much help you in the first and vital step towards doing so by cleaning your data at the speed of business so they are ready to do all those things which will help ensure top quality healthcare for all your patients.

Dealing with data in healthcare

Until recently doctors and others in the healthcare professions rarely if ever had to deal with data that they could not absorb, analyse and reach conclusions about for themselves, as individuals or in small groups. Yet nowadays there is artificial intelligence which can diagnose cancer from an X-ray image with better accuracy than the best doctors. Then there are Internet of things (IoT) sensors within medical devices which can monitor the heartbeat of a heart attack survivor or the stomach of an individual with an ulcer to ensure their condition does not deteriorate. Then there is the new trend of healthy people using IoT devices to ensure that if they do get cancer or a narrowing of their arteries that they can go to the doctor's surgery before these illnesses develop into something much more severe. This is because the sensors have flagged the problem and even, in the case of the most sophisticated IoT, made them a doctor's appointment.

All this means that doctors, hospitals and other healthcare organisations are now entering what is for them a new world where they have to deal with large amounts of big data on a daily basis. They cannot hope to understand these without having them all in a single data repository and using some excellent analytics software.

Those in charge of IT within healthcare organisations need to gather the data together from different feeds from different platforms, put them all into an enterprise data warehouse and then apply analytics to extract the information they need from some of the data. They also need to set-up alerts with other data so that if it appears that someone is about to have a heart attack that there is sufficient warning so that the patient can take preventative measures, perhaps chewing on an aspirin while being driven to the nearest hospital.

The importance of data cleaning of healthcare data

None of which is remotely possible if the data are rogue data, full of corruptions and inconsistencies. Even the best analytics in the world is going to fail when it comes to analysing rogue data. Endless false alarms about heart attacks that don't happen or, even worse, a heart attack that kills someone, which came without warning, because of faulty data failing to alert anyone to an imminent problem, would clearly be a disaster.

Don't blame the data! Clean it! This is our motto at Spotless, specialising as we do in data cleaning. We are acutely aware that, in healthcare, spotlessly clean data which have data quality, because they have undergone a thorough data validation process, literally saves lives. We set up Spotless because we are a team of data scientists dedicated to ensuring that your organisation enjoys the benefits of clean data, knowing that, unless you take this first step in cleaning your data, they are not going to be fit for purpose.

Part of the problem the healthcare sector faces is that doctors and others have to go through lengthy and rigorous training to grasp the intricacies of medicine, but this training contains minimal teaching about data, about the best ways to produce data so that inconsistencies do not plague them, or about IT generally. And as the adage goes, if you ask two doctors about treatment for Type 1 diabetes you will get three different answers! This certainly does not mean that doctors and healthcare managers should ignore data or just hope it will all turn out alright but it does, in our opinion, mean that medicine and healthcare need clean data more than perhaps any other area where data is vital in our society. And they need to obtain these clean data easily.

Yet while most data scientists dislike data cleaning, which can take between 60 & 90% of their working lives, our Spotless team loves nothing better than the challenges of tricky dirty data. For this reason, we have developed our API solution based on machine learning filters which filter out dirty data issues and modify inaccuracies with corrections. This solution is accessible through a web browser from any device and is thus a solution which makes a lot of sense for medical professionals, their managers and the IT professionals working in healthcare.

Integration of different feeds from different platforms

Healthcare data tends to come from lots of different places, more so than most other types of data. It may come from different specialities such as radiology or ENT, or from pharmacy, or from inpatients or outpatients, or from a GP's surgery or from a hospital. Generally, the level of IT expertise in healthcare is low, partly because managers and doctors tend to prioritise actual patient care and tend to undervalue the importance of data, exacerbating any issues of inconsistency. Data is typically stored in different ways and using different meta tags. The aim of a healthcare IT manager should be to try to gather all the data, both structured and unstructured, into a single enterprise data warehouse. Yet to do this with any chance of success, rather than as a recipe for chaos, requires data cleaning so that all the data have undergone data validation. They can thus can be called quality data, fit not merely for the purpose initially thought of, such as the purposes of the doctor's surgery, perhaps very different from the agenda of the hospital, but for new purposes as well, such as some new research programme trying to find a cure for some intractable and nasty disease, or simply trying to make patient care more cost-effective.

Healthcare data also tend to come in different formats so that the radiology department's data will be full of X-rays, that is, images. And an image of a broken arm may require different data assigned to it, for instance when it comes to the physiotherapist helping the patient and when it comes to the patient making an insurance claim for the injury. Not to mention all the different file formats the various departments which make up a healthcare region tend to use. Most healthcare IT workers are already of how much more difficult it is to get everyone to use the same formats than in most other businesses.

Getting all this data cleaned is the first and essential step towards modernising the healthcare sector and allowing all the new and wonderful technologies for curing people or at least preventing the spread of their illnesses, to actually work, and work together consistently. Which is where Spotless Data comes in, making life much simpler for healthcare IT specialists and managers.

Healthcare data tend to be unstructured for the simple reason that providers tend to format their data in whatever way is most convenient for them with no thought of the need to structure the data so that the cardiology department's data and the mental health department's data are consistent. This seems to be a particular theme in healthcare because the priority is giving the patient good treatment, and until very recently this has not involved IT and big data. Old habits die hard. This can make life a bit of a nightmare for healthcare managers who are the ones who have to see the bigger picture, which includes consistent data which are clean, integrated, and ready for analysis or any other purposes.

Having clean healthcare data where all the disparate data are integrated into one seamless data warehouse repository won't solve all the problems. However, it is a start, and we would suggest that without the cleaning of the data any attempts to structure or have it all in a useful and functioning data warehouse will be doomed to failure, and it will be an expensive failure, sucking in money that should be being used to treat people. A classic example of this was the expensive IT project in the UK to digitalise hospital and GP appointments, which in spite of a multi-billion pound investment failed to do what it was designed for, and took immense resources away from the ill people needing treatment.

Data cleaning of feeds from medical devices

Medical devices using IoT are set to explode in their usage over the coming years, with an estimated 15% a year growth rate and a business sector estimated to be worth $117 billion by 2020. As people are fitted with sensors, often via their smartphone, so that either they or their doctors can monitor their health to anticipate and then flag any problems that occur, so the need for clean healthcare data is also set to explode. And because these IoT healthcare devices are manufactured by a whole variety of different companies using different meta tags, and different words to describe the same thing, a particular problem in healthcare because an illness might be known by its popular name, say a heart attack, or by its technical name, a myocardial infarction. 

Data cleaning healthcare data with the Spotless Data solution

You can read our introduction to using our API and then try out our service on your my filters page but, for this, you need to log in first. If you haven't already done so, you can sign-up using your email address, Facebook, Google or GitHub accounts. You may also view our video on data cleaning an EPG file, which also explains how to use our API.

We use the https protocol, guaranteeing your data are secure while they are in our care and that they are not accessible by any third party, a responsibility we take very seriously.

Here is a quick link to our FAQ. You can also check out our range of subscription packages and pricing. If you would like to contact us you can speak to one of our team by pressing on the white square icon with a smile within a blue circle, which you can find in the bottom right-hand corner of any of the web pages on our site.

If your data quality is an issue or you know that you have known sources of dirty data but your files are just too big, and the problems too numerous to be able to fix manually please do log in and try now