Data Quality

Spotless measure the quality of datasets according to specification against a set of business rules. The business rules are automatically generated using machine learning algorithms and can be customised to meet the appropriate business requirements.

Typical applications include:

  • Integration datasets from different providers - Spotless will validate that each feed is well formed and consistent with other feeds
  • Loading data into a data lake - although full transformation is not required on data ingested into a data lake, simple validation using Spotless to ensure the data is well formed and well referenced saves significant time downstream
  • Cleansing data scraped from the internet - web scraped data is notoriously irregular, and Spotless will ensure that no information that does not conform to the expected specifications is loaded
  • Integrating internal data from different platforms - you can define specific business rules to provide consistent and valid data whenever it is loaded
  • Truncating or removing overlapping sessions; extended or filling sessions with gaps between them

    Blog posts about Data Quality

    Nov. 29, 2016, 9:51 a.m.
    Our unique browseable API is the swiftest and surest route to data quality. Spotless Data's unique web-based data quality API solution takes care of your big data throughout their life cycles in order to ensure that they remain clean from contamination or corruption from the moment you receive or generate your data until they are no longer required. We have identified six stages within t...
    Read More
    Dec. 12, 2016, 7:52 a.m.
    Examining data quality within the context of TV show titles. TV title importance TV titles can be a source of great frustration for websites who base their business model on having easily identifiable TV show names. A title can make-or-break a TV show, but a successful name from the perspective of a TV production company is not the same thing as a good name from the point of view of websi...
    Read More
    Oct. 14, 2016, 10:23 a.m.
    At the heart of any successful data management is ensuring the quality of the data. Recent research from SAS suggests that a large number of companies are falling behind in data management, and many of those that are "laggards" believe their data management is just as effective as those who are leading. SAS describe the difference as people who have "a clear approach to dat...
    Read More
    Sept. 2, 2016, 11:45 a.m.
    Spotless Data's machine learning filters will help your business win the race to the top. We've just released version 3 of Spotless with two key new features: Firstly, deduplication solutions let you identify data that's been duplicated and delete it from the platform. You can check against individual columns, all columns, or all but one column and get the row deleted automati...
    Read More
    July 29, 2016, 12:59 p.m.
    Forging the new Spotless Data machine learning filters API solution for data quality. We're delighted to have released version 2 of Spotless Data. With version 2, full pricing details are available.  New features include Create and edit solutions online without using the RESTful API Now accept any file encoding - not restricted to UTF-8 any more Automagically create...
    Read More