Data Quality

Spotless measure the quality of datasets according to specification against a set of business rules. The business rules are automatically generated using machine learning algorithms and can be customised to meet the appropriate business requirements.

Typical applications include:

  • Integration datasets from different providers - Spotless will validate that each feed is well formed and consistent with other feeds
  • Loading data into a data lake - although full transformation is not required on data ingested into a data lake, simple validation using Spotless to ensure the data is well formed and well referenced saves significant time downstream
  • Cleansing data scraped from the internet - web scraped data is notoriously irregular, and Spotless will ensure that no information that does not conform to the expected specifications is loaded
  • Integrating internal data from different platforms - you can define specific business rules to provide consistent and valid data whenever it is loaded
  • Truncating or removing overlapping sessions; extended or filling sessions with gaps between them

    Blog posts about Data Quality

    July 28, 2017, 8:08 a.m.
    Swimming in a lake of data requires primarily that the data are transformed into data quality, for instance using Spotless Machine Learning Filters. It is widely recognised that the simplest way to ensure data quality is to minimise the variety of sources which your data come from. As Machine Learning Berkeley professor Michael Jordan states: "data variety leads to a decline in data qua...
    Read More
    July 7, 2017, 7:58 a.m.
    Big data analytics are complicated but much less so when the data have been properly cleaned. Continuing the blog on big data and analytics. Part 1 can be found here. Different varieties of data While data come from various sources and have differing file formats, an issue which is generally fixed by data cleaning the data and placing them all into a single data warehouse, there is a s...
    Read More
    June 15, 2017, 7:53 a.m.
    Advertising strategies have changed a great deal in the data-driven modern world, demanding high-quality data. Advertising has long been used to attract new customers and ensure that the advertising campaigns of one's rivals don't take away the customers one already has. So if your company's campaign stresses how your product makes one's clothes or teeth "white than whit...
    Read More
    May 22, 2017, 11:49 a.m.
    Without spotlessly clean data quality having business intelligence is simply not possible. Business Intelligence turns data into insights or intelligence. It is used by enterprises of all types to understand their companies and to be able to plot strategies. These which will allow said companies to be competitive, disrupt industries and drive marketing and sales campaigns to maximise the lik...
    Read More
    April 21, 2017, 6:47 a.m.
    Spotless Data version 9 includes data validation, substitution and lookalike improvements for better data quality. We have just launched Version 9 of our unique data quality web-based API solution, which includes a new rule type, known as a data validation Rule, as well as significant enhancements to our rules engine, which have been driven by machine learning. Here are the five fundament...
    Read More
    May 5, 2017, 6:50 a.m.
    Financial institutions are very aware of the perils and pitfalls of rogue data affecting the quality of their data in every area of their organisations. Fintech describes those companies which are part of the financial world and use the latest data-driven technologies in order to achieve the best results. There is considerable variation in the way different organizations use fintech to advan...
    Read More
    April 28, 2017, 6:56 a.m.
    Data warehouses will only work properly when they contain quality data. A data warehouse is a repository or storage area where all the data in one's company is kept in a single place. This includes data from different sources as well as both current and historical data, perhaps from a legacy platform. It can consist of data from the company itself, which if the company is a large one mig...
    Read More
    April 14, 2017, 6:52 a.m.
    Data quality is the key to successful machine learning and our machine learning filters the key to quality data. Machine learning is where computers learn things that they were not specifically designed to do. Traditional definitions of data quality define it as data which can do what they were designed to do. Spotless Data has recognised that this is an out-of-date definition which fails wh...
    Read More
    March 27, 2017, 11:44 a.m.
    Blockchain has uses that go far beyond alternative currencies and demands high-quality data. The blockchain is a distributed database where the different parts of the database are not connected by a computer processor, meaning that once a piece of data is recorded it cannot be modified. This makes it ideal for maintaining records that concern one or more organisations whose interests are not...
    Read More
    Feb. 24, 2017, 7:40 a.m.
    Fixing a wide range of data quality problems has never been easier, thanks to Spotless Data's Machine Learning Filters. Continuing our blog on data quality issues, numbers 1-7. 8. Operational issues These are issues which arise in the normal day-to-day functioning of a platform such as a website, and including security issues, data quality issues, issues with wastage and delays, an...
    Read More