Data Integration

Integration of multiple datasets together can be fraught with difficulty, including inconsistent fields, missing datasets, and conflicting sets of information. The Spotless API has rules to ensure referential integrity between datasets:

  • Ensure that primary keys in one dataset are indeed unique, even compound primary keys
  • Ensure that foreign keys in one file match the primary keys in another file
  • Validation that all other fields are well formed, and cleaned as required

In the data integration environment, it's also important that data issues can be quickly acted upon. Spotless provides the following options:

  • Automatic quarantining of data to ensure that invalid data is not ingested. Even if this is only part of a file, the invalid data is removed and the remainder quarantined
  • Email alerts when data issues are identified so they can quickly be escalated using PagerDuty or similar tools
 

    Blog posts about Data Integration

    Nov. 6, 2017, 11:57 a.m.
    When your company has valid data then all those things the data does for it will work seamlessly, like clockwork. Data validation is the process of ensuring that the data one has are valid so that they are fit for purpose rather than being contaminated with rogue data. If you fail to validate your company's data you are taking a big risk which may backfire in a whole range of catastrophi...
    Read More
    Nov. 2, 2017, 6:10 a.m.
    The first step towards achieving data integrity for all the different data entering your repository is using Spotless Data's Machine Learning filters, to ensure trustworthy data quality every time. Data Integrity means have data quality throug data validation so that you can trust in your data throughout their life cycle, starting from their ingestion into a data repository. At Spotless ...
    Read More
    Dec. 11, 2017, 7:27 a.m.
    By using big data in manufacturing processes, such as a tablet monitoring what is happening in a factory, manufacturers ensure gretaer efficiency and success. Manufacturing has always needed large quantities of data to produce the complicated goods typical of a modern industrial society while using data for the time management of workers and general efficiency within a factory stretches back...
    Read More
    Feb. 10, 2017, 7:38 a.m.
    High-quality data so spotlessly clean is the goal of all data-driven businesses which is why we have developed our Machine Learning filters to remove rogue data. Part 2 of this blog. Spotless Data has identified 14 different causes of poor Data Quality, which is data which your company and your customers are unable to trust. Given that poor quality data can cost businesses 20-35% of their...
    Read More
    March 20, 2017, 12:45 p.m.
    Anyone who wants to be in charge of their data should consider using Spotless Data's Machine Learning Filters data cleaning solution. Who would benefit from using Spotless Data Quality API? The simple answer is that any business or organisation that has to deal with any data would benefit, whether it is a new start-up or a multi-billion dollar business. However, we have identified thr...
    Read More
    Nov. 29, 2016, 9:51 a.m.
    Our unique browseable API is the swiftest and surest route to data quality. Spotless Data's unique web-based Data Quality API solution takes care of your big data throughout their life cycles in order to ensure that they remain clean from contamination or corruption from the moment you receive or generate your data until they are no longer required. We have identified six stages within t...
    Read More
    Nov. 22, 2016, 7:51 a.m.
    Five different ways in which poor data quality is a barrier to success for any organisation. We have identified five different barriers to success for those working with Big Data, including businesses which integrate systems between platforms or from a legacy platform, businesses who have to deal with a lot of User Generated Content (UGC) or who know that their data are dirty, and businesses...
    Read More
    Dec. 5, 2016, 6:52 a.m.
    High quality data can transform Tableau data visualisation. Tableau, the data visualization software, is a way of illustrates data that is particularly useful when dealing with data which changes over time. It has a great mapping functionality, with a number of geographic identifiers built into their software, such as country, region and sometimes postcode. However, when using Tableau in ord...
    Read More
    July 11, 2016, 7:18 a.m.
    The benefits of having a sparkling data lake which has undergone data cleaning should never be underestimated. You’ve just spent the last 18 months consolidating all of your business data into a Hadoop based platform and your CEO is asking you when we’re going to see benefits from your Data Lake. What do you mean, you say? We’ve already got Tableau deployed and we have...
    Read More