Data Quality

Spotless measure the quality of datasets according to specification against a set of business rules. The business rules are automatically generated using machine learning algorithms and can be customised to meet the appropriate business requirements.

Typical applications include:

  • Integration datasets from different providers - Spotless will validate that each feed is well formed and consistent with other feeds
  • Loading data into a data lake - although full transformation is not required on data ingested into a data lake, simple validation using Spotless to ensure the data is well formed and well referenced saves significant time downstream
  • Cleansing data scraped from the internet - web scraped data is notoriously irregular, and Spotless will ensure that no information that does not conform to the expected specifications is loaded
  • Integrating internal data from different platforms - you can define specific business rules to provide consistent and valid data whenever it is loaded
  • Truncating or removing overlapping sessions; extended or filling sessions with gaps between them

    Blog posts about Data Quality

    March 16, 2018, 6:48 a.m.
    A brief history of data explores the causes of the huge explosion in big data in a process that reaches back to the 1970s, examines the fundamental problems associated with data and offers some solutions to rogue data issues in 2018. The quantity of data in the world and available to the individual businesses which store them have been building in recent years. Most companies will likely be ...
    Read More
    Feb. 16, 2018, 12:05 p.m.
    In order to achieve the data quality you can trust in where everything works seamlessly requires data refining just as surely as crude oil does. Don't let rogue data destroy your platforms by using the Spotless API solution. Data is increasingly being seen as the new oil. It is unfortunate that the concept of refining these data, which are increasingly big data, does not have in the publ...
    Read More
    Nov. 16, 2017, 11:47 a.m.
    When applying a magnifying glass to data which has data quality all is seamlessly perfect. In the past, data was relatively easy to understand, and so the concept of data quality had little importance. If data had poor quality then somebody would have to spend a few days working through all the pieces of paper which contained them to fully grasp order out of the chaos and that was the end of...
    Read More
    Feb. 17, 2017, 2:16 p.m.
    Data quality is essential when getting big data to work properly. What is big data? Big data are data which are so large and complex that it can be challenging to produce meaningful information from them. However, large quantities of raw data captured and stored in your data lake, or your data warehouse, have limited value in themselves. They are not considered big data until they are ana...
    Read More
    Jan. 27, 2017, 7:57 a.m.
    Spotless Data's prediction for 2017 is that data quality is going to be a new buzzword! There is no doubt both that data itself has been one of the key themes of the last few years, and that the other popular tech subjects of the moment, such as artificial intelligence, machine learning, robotics, the internet of things, and big data, are all related to data, without which none of these ...
    Read More
    Nov. 24, 2017, 6:45 a.m.
    Smart data is the fundamental pre-requisite for ensuring smart cities and smart homes in the new Internet-of-things world. The Internet of things (IoT) is going to bring massive changes to the way we live in the next few years, bringing the benefits of smart homes & smart healthcare to individuals, smart cities to societies and smart factories to manufacturers, among others, all of which...
    Read More
    Oct. 13, 2017, 5:45 a.m.
    Without veracity, expressed through data quality, your big data are worse than useless, a positive menace. The three traditional Vs to describe big data are volume, variety, and velocity, and, between them, they portray basic characteristics of data which are so vast and complex that they have been given this memorable name, one of the buzzwords of the present era. For us at Spotless Data, h...
    Read More
    Oct. 6, 2017, 5:59 a.m.
    The best first step towards ensuring data security is to use Spotless Data's machine learning filters. Data security has never been so important as it is today. In the European Union, the soon to be passed General Data Protection Regulation (GDPR) is particularly strict in terms of legal compliance, demanding that any organisation will need to protect the data of its users under this law...
    Read More
    Aug. 25, 2017, 8:25 a.m.
    Using a tablet to organise a data-driven factory manufacturing goods. A question we get asked a lot is why should you use Spotless Data's API solution to address data quality issues in your organisation. This question has two features to it. There is the question of why would you need to use any data cleaning service to be able to trust your data. And then there is the question of why...
    Read More
    Aug. 4, 2017, 9:04 a.m.
    The best way to trust your data again is to ensure they are quality data by using Spotless machine learning Filters. At the heart of Spotless Data Quality API solution lie our newly developed machine learning filters for data cleaning, filtering out the dirty data containing mismatches, duplications, and other corruptions. Many of these dirty data issues are predictable and are caused by a m...
    Read More