Tackling IoT issues with Spotless Data

Smart Internet-of-things traffic lights send data to a central repository where Spotless validates them before being sent back to do their intelligent work

As the Internet-of-things takes off in a big way, Spotless Data have written a session rule which makes it so much easier to validate and clean all your session data of all those annoying gaps and overlaps which so often plague the data from IoT devices.

The Internet-of-things (IoT) represents a new gold rush for all the many companies getting involved in this exciting new technology where sensors use the Internet to convey the data they are picking up back to a central repository. Once there the data are then processed and used in a whole myriad of ways, some of them in real time, such as a healthcare monitor informing a diabetic patient they need some glucose to avoid going into a coma. For this to happen seamlessly, the data on the patient's new blood sugar levels need to enter the data repository, say every 90 seconds. When a reading which is either too high or too low is received the patient will receive a text message or automated phone call warning him or her of the problem and the need to take urgent action.

Spotless Data and IoT

At Spotless we have devised and developed our data validation, data cleaning and data integration rule, which are so useful when it comes to ensuring that IoT data are free from the mistakes and inconsistencies of rogue data. In our latest version 19 release, we have also given a JSON config to our Spotless Data machine learning filters API, which means that your data can now be cleaned ten times faster. This extra speed is vital for many time-sensitive operations such as the above healthcare example or when smart traffic lights respond to the lack of traffic coming in the other direction by changing colour, and where the whole automated light-changing process takes place in the central repository and is based on incoming data from several IoT traffic light devices.

Fortunately, Spotless can be built into the entry point of your data platforms to swiftly and automatically handle the new IoT data entering the system. The JSON config has also made the Spotless API easier-to-use for software developers and all those who use our service, vital for ensuring that you get the clean IoT data you require, and all our other customers too!

Using session data in IoT

We have developed a unique session rule with IoT data particularly in mind as errors caused by gaps and overlaps are particularly common with IoT data. So, with the case of our diabetic patient, a great deal of data is entering the system though only a small portion will trigger an alert, i.e. a new session with its own unique session id begins each hour, but an alert may only occur every six months.

While there are many causes of rogue data, perhaps, in this case, the problem has been triggered by the elevated blood sugar levels the monitor is looking for occurring right at the moment when one id session is ending and another beginning. This may have had no bearing on the alert which went out and saved the patient's life but may, through either an overlap, where one session id finishes before another begins or a gap, where one session id ends say two minutes before the next one starts. Depending on how the whole system has been designed by whoever built it these gaps and overlaps may be catastrophic. In this case, failures in the session data could literally result in an unnecessary death, even if the IoT system is failing to work for as short a period of time as an hour when perhaps the system rights itself as yet another new id session begins.

By automatically fixing any overlaps or gaps at the point of entry to the data repository Spotless can fix these problems before they occur and negatively affect the client, in this case, the diabetic patient.

A smart traffic lights example

Let us imagine a situation where there is a busy road junction where the passing of both vehicles and pedestrians is controlled by a series of traffic lights all connected to each other, with the changes in the lights being driven by a fixed time interval. You are delighted when your bosses decide to buy some new, smart traffic lights with sensors built into them which can detect when the flow of vehicles dries up in the approach to a light which is green so that the lights can be switched to red rather than at a fixed moment so traffic which is waiting on the other road can then move  along more quickly.

However, when these new smart traffic lights are implemented the initial result is chaos as these new lights appear to be acting incredibly "stupidly". When you try to fix the problem, you discover rogue data issues which are affecting the session data. It may be that with so many sensors that need to work together, the chaos in the session data is due to either overlaps or gaps from inconsistent data messing up the session start and finish times.

It doesn't matter to us at Spotless why the data are rogue, and, in the short-term, it won't matter to you either as if your patient is running an immediate risk of dying or the smart traffic lights are about to cause an accident you need that rogue data validated right now, and you can then figure out what the cause of the problem is at your leisure, a process made much easier by all your data have been validated.

We offer various options for your data cleaning, which can be automated once you are comfortable with the results (and please chat to us on our site to get more info about this automation of our API for your IoT data). You can ignore gaps and overlaps, or you can cut short an overlap or extend a gap either at the start or finish of the session. We use fields to specify where the rules are to be applied and have a special key field, which is usually the id of the IoT device which contains the monitor, e.g. a smart traffic light or a device which measures blood sugar levels. Thus you have all the tools you need to validate your IoT session data at your fingertips!

Validating IoT data with Spotless Data

You can read our introduction to using our API to validate your data. You can take advantage of our free offer of 500Mb of free data cleaning to see how much you like our service. If you haven't already done so, you can sign-up using your email address, Facebook, Google or GitHub accounts. You may also view our video on data cleaning an EPG file, which also explains how to use our API.

We use the https protocol, guaranteeing your data are secure while they are in our care and that they are not accessible by any third party, a responsibility we take very seriously.

Here is a quick link to our FAQ. You can also check out our range of subscription packages and pricing. If you would like to contact us, you can speak to one of our team by pressing on the white square icon with a smile within a blue circle, which you can find in the bottom right-hand corner of any of the web pages on our site.

If your data quality is an issue or you know that you have known sources of dirty data but your files are just too big, and the problems too numerous to be able to fix manually please do log in and try now