Data Integration

Integration of multiple datasets together can be fraught with difficulty, including inconsistent fields, missing datasets, and conflicting sets of information. The Spotless API has rules to ensure referential integrity between datasets:

  • Ensure that primary keys in one dataset are indeed unique, even compound primary keys
  • Ensure that foreign keys in one file match the primary keys in another file
  • Validation that all other fields are well formed, and cleaned as required

In the data integration environment, it's also important that data issues can be quickly acted upon. Spotless provides the following options:

  • Automatic quarantining of data to ensure that invalid data is not ingested. Even if this is only part of a file, the invalid data is removed and the remainder quarantined
  • Email alerts when data issues are identified so they can quickly be escalated using PagerDuty or similar tools

    Blog posts about Data Integration

    Feb. 16, 2018, 12:05 p.m.
    In order to achieve the data quality you can trust in where everything works seamlessly requires data refining just as surely as crude oil does. Don't let rogue data destroy your platforms by using the Spotless API solution. Data is increasingly being seen as the new oil. It is unfortunate that the concept of refining these data, which are increasingly big data, does not have in the publ...
    Read More
    Feb. 2, 2018, 7:26 a.m.
    Spotless Data's machine learning filters, accessible as an API through a web-browser, are the ideal tool for data integration of data from multiple sources. Data integration describes the process of merging or combining two different datasets, or data from two different sources. Whether this is a retailer gathering together disparate data about their customers into a single customer view...
    Read More
    Nov. 6, 2017, 11:57 a.m.
    When your company has valid data then all those things the data does for it will work seamlessly, like clockwork. Data validation is the process of ensuring that the data one has are valid so that they are fit for purpose rather than being contaminated with rogue data. If you fail to validate your company's data you are taking a big risk which may backfire in a whole range of catastrophi...
    Read More
    Nov. 2, 2017, 6:10 a.m.
    The first step towards achieving data integrity for all the different data entering your repository is using Spotless Data's machine learning filters, to ensure trustworthy data quality every time. Data Integrity means have data quality throug data validation so that you can trust in your data throughout their life cycle, starting from their ingestion into a data repository. At Spotless ...
    Read More
    Jan. 4, 2018, 10:59 a.m.
    Data is at the heart of the modern retail industry. Ensuring all the data a retailer has are integrated and spotlessly clean is no easy task but nowadays goes much more smoothly with Spotless Data's API solution. Data have been described as having taken over the retail industry due to the profound impact they have on every area of retailing. They are changing the way we buy and sell prod...
    Read More
    Dec. 11, 2017, 7:27 a.m.
    By using big data in manufacturing processes, such as a tablet monitoring what is happening in a factory, manufacturers ensure greater efficiency and success. Manufacturing has always needed large quantities of data to produce the complicated goods typical of a modern industrial society while using data for the time management of workers and general efficiency within a factory stretches back...
    Read More
    Feb. 10, 2017, 7:38 a.m.
    High-quality data so spotlessly clean is the goal of all data-driven businesses which is why we have developed our Machine Learning filters to remove rogue data. Part 2 of this blog. Spotless Data has identified 14 different causes of poor data quality, which is data which your company and your customers are unable to trust. Given that poor quality data can cost businesses 20-35% of their...
    Read More
    March 20, 2017, 12:45 p.m.
    Anyone who wants to be in charge of their data should consider using Spotless Data's Machine Learning Filters data cleaning solution. Who would benefit from using Spotless data quality API? The simple answer is that any business or organisation that has to deal with any data would benefit, whether it is a new start-up or a multi-billion dollar business. However, we have identified thr...
    Read More
    Nov. 29, 2016, 9:51 a.m.
    Our unique browseable API is the swiftest and surest route to data quality. Spotless Data's unique web-based data quality API solution takes care of your big data throughout their life cycles in order to ensure that they remain clean from contamination or corruption from the moment you receive or generate your data until they are no longer required. We have identified six stages within t...
    Read More
    Nov. 22, 2016, 7:51 a.m.
    Five different ways in which poor data quality is a barrier to success for any organisation. We have identified five different barriers to success for those working with big data, including businesses which integrate systems between platforms or from a legacy platform, businesses who have to deal with a lot of user generated content (UGC) or who know that their data are dirty, and businesses...
    Read More