Data Quality

Spotless measure the quality of datasets according to specification against a set of business rules. The business rules are automatically generated using machine learning algorithms and can be customised to meet the appropriate business requirements.

Typical applications include:

  • Integration datasets from different providers - Spotless will validate that each feed is well formed and consistent with other feeds
  • Loading data into a data lake - although full transformation is not required on data ingested into a data lake, simple validation using Spotless to ensure the data is well formed and well referenced saves significant time downstream
  • Cleansing data scraped from the internet - web scraped data is notoriously irregular, and Spotless will ensure that no information that does not conform to the expected specifications is loaded
  • Integrating internal data from different platforms - you can define specific business rules to provide consistent and valid data whenever it is loaded
  • Truncating or removing overlapping sessions; extended or filling sessions with gaps between them
 

    Blog posts about Data Quality

    Nov. 16, 2017, 11:47 a.m.
    When applying a magnifying glass to data which has data quality all is seamlessly perfect. In the past, data was relatively easy to understand, and so the concept of data quality had little importance. If data had poor quality then somebody would have to spend a few days working through all the pieces of paper which contained them to fully grasp order out of the chaos and that was the end of...
    Read More
    Feb. 17, 2017, 2:16 p.m.
    Data Quality is essential when getting Big Data to work properly. What is Big Data? Big data are data which are so large and complex that it can be challenging to produce meaningful information from them. However, large quantities of raw data captured and stored in your data lake, or your data warehouse, have limited value in themselves. They are not considered big data until they are ana...
    Read More
    Jan. 27, 2017, 7:57 a.m.
    Spotless Data's prediction for 2017 is that data quality is going to be a new buzzword! There is no doubt both that data itself has been one of the key themes of the last few years, and that the other popular tech subjects of the moment, such as Artificial Intelligence, Machine Learning, Robotics, the Internet of Things, and Big Data, are all related to data, without which none of these ...
    Read More
    Oct. 13, 2017, 5:45 a.m.
    Without veracity, expressed through data quality, your big data are worse than useless, a positive menace. The three traditional Vs to describe big data are Volume, Variety, and Velocity, and, between them, they portray basic characteristics of data which are so vast and complex that they have been given this memorable name, one of the buzzwords of the present era. For us at Spotless Data, h...
    Read More
    Oct. 6, 2017, 5:59 a.m.
    The best first step towards ensuring Data Security is to use Spotless Data's Machine Learning Filters. Data security has never been so important as it is today. In the European Union, the soon to be passed General Data Protection Regulation (GDPR) is particularly strict in terms of legal compliance, demanding that any organisation will need to protect the data of its users under this law...
    Read More
    Aug. 25, 2017, 8:25 a.m.
    Using a tablet to organise a data-driven factory manufacturing goods. A question we get asked a lot is why should you use Spotless Data's API solution to address data quality issues in your organisation. This question has two features to it. There is the question of why would you need to use any data cleaning service to be able to trust your data. And then there is the question of why...
    Read More
    Aug. 4, 2017, 9:04 a.m.
    The best way to trust your data again is to ensure they are quality data by using Spotless Machine Learning Filters. At the heart of Spotless Data Quality API solution lie our newly developed Machine Learning filters for data cleaning, filtering out the dirty data containing mismatches, duplications, and other corruptions. Many of these dirty data issues are predictable and are caused by a m...
    Read More
    July 28, 2017, 8:08 a.m.
    Swimming in a lake of data requires primarily that the data are transformed into data quality, for instance using Spotless Machine Learning Filters. It is widely recognised that the simplest way to ensure data quality is to minimise the variety of sources which your data come from. As Machine Learning Berkeley professor Michael Jordan states: "data variety leads to a decline in data qua...
    Read More
    May 22, 2017, 11:49 a.m.
    Without spotlessly clean data quality having business intelligence is simply not possible. Business Intelligence turns data into insights or intelligence. It is used by enterprises of all types to understand their companies and to be able to plot strategies. These which will allow said companies to be competitive, disrupt industries and drive marketing and sales campaigns to maximise the lik...
    Read More
    April 21, 2017, 6:47 a.m.
    Spotless Data version 9 includes data validation, substitution and lookalike improvements for better data quality. We have just launched Version 9 of our unique Data Quality web-based API solution, which includes a new rule type, known as a Data Validation Rule, as well as significant enhancements to our rules engine, which have been driven by Machine Learning. Here are the five fundament...
    Read More