Getting the data quality from IoT

Iot connecting a smart city require data integration of the highest order

The amount of data in need of cleaning to a high quality is set to rocket due to the explosion in IoT gadgets.

The internet of things (IoT) has been described as the infrastructure of the information society. All these "things" connect to the Internet via wifi and then "talk" to each other. Gartner estimates that by 2020 over 26 billion of these devices will be connected to the Internet with other estimates as high as 50 billion.

What does the rise of IoT mean for data quality?

This both means the amount of data available to companies are quite literally exploding, as are the number of companies using said data, and that most of the companies receiving this data are not traditional users of big data. They are thus often unprepared for the challenges and changes they need to make to their company's way of doing things to ensure that their IoT data are data quality which they can trust.

Traditionally, the ingredients necessary for IoT are sensors, wifi, and a control panel, which in the simplest of cases can be a smartphones or PC. The sensors pick up the information, the wifi transfers this information to a repository, often a data lake, while the control panel can respond when the data flag problems, either fixing them automatically, or the problems can be fixed manually. For instance, a smart pacemaker can indica/te heart murmurs, but a doctor will still need to intervene surgically as a response to the flag.

Ensuring data quality

So if your data lake is actually a mud pit and your staff cannot cope when things go wrong, the lack of good quality data integrity you can trust in means that things will go badly wrong. Yet if the data have all undergone data validation so that you can trust in them, your knowledge workers won't need to be data scientists to make not merely head or tail of the data but to be able to use them effectively and respond adequately to a crisis. The best solution for ensuring yours are quality data which you can trust is to use Spotless unique web-based API solution to clean your data at the point of entry, or, for instance, when you are putting a new software update in place.

Examples of IoT

1. Smart homes

Including security devices, temperature and lighting control, nanny and doorbell cams, automated cooking appliances (so your coffee is ready when you enter the kitchen in the morning and your roast dinner when you arrive home from work) and systems, such as Amazon Echo, in order to control your music, television etc. While the systems in a block of flats may be under the control of the building's management, these are generally under the control of the person living in the home. A typical example might be when your smart fridge notices you are about to run out of milk and either informs you with an SMS or, assuming you have authorised it, orders milk from your supermarket which is then delivered.  Ensuring that your data is quality data you can trust in means you don't arrive home to find extra milk you don't need, or, as was the case with Amazon Echo recently, you don't have to pay for a doll's house you don't want, either because your 5-year-old daughter says out loud "I want a doll's house" or because you are watching the news report on the incident. Ensuring the roast dinner is actually cooking and that the gas cooker hasn't failed to ignite, possibly resulting in a dangerous gas leak, is a reminder of how important it is to have quality data that work every time.

If your fridge is connected to your smartphone the best way for criminals to hack all the valuable data you have in your phone may be to hack your fridge. The same can be said of the sensors your electricity supplier uses in your and everyone else's property to monitor and control the electricity supply. If the data are poor quality or it is possible for cybercriminals to access the electricity company's control panel through the IoT sensors in your property (perhaps simply because your wifi has not been properly password protected) then the data underpinning the IoT of said company and their entire system can be said to be running on untrustworthy data. While terrorists do represent a threat, a likely much more common threat is criminals using ransomware to demand from your company in order not to initiate a denial-of-service or other hack which could create chaos and huge financial losses, not to mention the loss of your company's reputation.

2. Smart cities

Including schools, libraries, hospitals, smart streetlights, power and water supplies, emergency services and waste disposal. These are generally under the control of either the local authorities or say the power and water providers and failing to get the data quality right could result in power cuts, which, if it takes out the wifi, will hugely exacerbate the problem. Good Data Management is particularly important here. Given the importance of water, hospitals and street lighting, failures here will have huge knock-on effects that may affect millions of people, especially during unusual conditions such as a severe storm or heatwave.

3. Intelligent transport

Including driverless cars and autonomous aeroplanes, smart traffic lights (so you don't have to wait when nobody is passing the junction where the lights are currently green), GPS, road surveillance (so when a crash does happen police can be dispatched there more quickly) and other security measures, as well as air pollution monitors so that vehicles can be temporarily be banned from an area if the pollution level is unacceptable, and in-car monitors to ensure your delivery drivers are working and have not slipped off for a quick but unauthorised coffee or quick nap. Given the sheer danger of large objects travelling at speed having quality data will always be a life-and-death issue. The autonomous plane may carry no passengers, only goods, but if it crashes into a housing estate the results would be tragic. And while nobody doubts that driverless cars and trucks will result in fewer deaths on the road, who gest charged with a crime if a driverless vehicle, due to poor quality data, engages in behaviour which is legally defined as causing death by dangerous driving. Pollution is a less dramatic killer but mostly targets the most vulnerable members of society, ie babies and the elderly, and so a failure to accurately record high levels of contaminants int eh air can equally result in tragic consequences.

4. Smart grids

Including analytic prediction to ensure you can switch your kettle on without problems straight after the big game, when half the population will also be switching their kettles on, resulting in a surge in demand, devices to ensure energy efficiency. If you can see how much your electricity is costing and the price varies throughout the day you are more likely to put your washing machine on when the price is lower, which both saves you money and allows the electricity company to distribute their resources more evenly over a 24 hour or weekly period rather than having everyone demand lots of electricity at the same time. Poor quality data will result in inefficiency and possible power cuts just at the moment when everyone is demanding electricity, and if it is repeated may, in a competitive market, result in your customers deserting your company and choosing your rivals instead.

For instance, Google's new temperature control system, known as The Nest, which initially got great reviews as an exemplary example of the IoT at its best, suddenly chose a cold snap in the north-east of the USA to stop working, leaving its users without any heating at all, a far worse situation for them than if they had never started using The Nest. Google eventually traced the problem to a glitch in its software update, which they had riskily installed in chilly December. Users had to wait for Google to come up with a solution and even then the solution required each customer to manually fix the problem, and thus of limited value to those away from home who had hoped that The Nest would allow them to control the temperature in their homes remotely.

5. Smart medicine

Including pacemakers which monitor your heart condition and flag when there is suspicious activity which might be signs of a heart attack, which monitor blood sugar levels for diabetics, and equally can monitor the health of your cows to maximise milk production and reduce costs for farmers, a group which typically have limited experience of dealing with complex data. Poor quality data here can result in death eg from a diabetic coma because of the failure to flag high glucose levels in the patients' blood, or increased costs for the farmer rather than the reduced costs you have promised them.

Take a look at our introduction to using our browseable API. You can also test our API on your my filters page though you will need to be logged in first to see this. You can sign-up using your email address, Facebook, Google or GitHub accounts.

You can also view our videos on data cleaning an EPG file and data cleaning a genre column which explain how to use our API.

Here is a quick link to our FAQ. You can also check out our range of subscription packages and pricing. We are offering 500Mb of free data cleaning to all new customers so you can try out our solution for yourself. If you would like to contact us you can speak to one of our team by pressing on the white square icon with a smile within a blue circle, which you can find in the bottom right-hand corner of any of the web pages on our site.

Spotless Data, the One Stop Data Quality Solution API!

If your data quality is an issue or you know that you have known sources of dirty data but your files are just too big, and the problems too numerous to be able to fix manually please do log in and try now