Data Integration

Integration of multiple datasets together can be fraught with difficulty, including inconsistent fields, missing datasets, and conflicting sets of information. The Spotless API has rules to ensure referential integrity between datasets:

  • Ensure that primary keys in one dataset are indeed unique, even compound primary keys
  • Ensure that foreign keys in one file match the primary keys in another file
  • Validation that all other fields are well formed, and cleaned as required

In the data integration environment, it's also important that data issues can be quickly acted upon. Spotless provides the following options:

  • Automatic quarantining of data to ensure that invalid data is not ingested. Even if this is only part of a file, the invalid data is removed and the remainder quarantined
  • Email alerts when data issues are identified so they can quickly be escalated using PagerDuty or similar tools
 

    Blog posts about Data Integration

    Dec. 5, 2016, 6:52 a.m.
    High quality data can transform Tableau data visualisation. Tableau, the data visualization software, is a way of illustrates data that is particularly useful when dealing with data which changes over time. It has a great mapping functionality, with a number of geographic identifiers built into their software, such as country, region and sometimes postcode. However, when using Tableau in ord...
    Read More
    July 11, 2016, 7:18 a.m.
    The benefits of having a sparkling data lake which has undergone data cleaning should never be underestimated. You’ve just spent the last 18 months consolidating all of your business data into a Hadoop based platform and your CEO is asking you when we’re going to see benefits from your data lake. What do you mean, you say? We’ve already got Tableau deployed and we have...
    Read More